Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #611 (closed defect: fixed)

Opened 12 years ago

Last modified 10 years ago

Schema Validation is not working properly on CompoundWidgets

Reported by: alberto Owned by: anonymous
Priority: normal Milestone: 0.9
Component: TG Widgets Version:
Severity: normal Keywords: widgets schema nested


I'll take a look later in more detail, meanwhile I'll attach a patch with a failing test... Got to go and can't give in more details right now... test speaks better :)


P.S. I'm beginning to think we *might* need full qualification of form names (so schemas can be attached to forms) and recursive update of error dicts (as the schemas error dict could be nested so it would need to get merged with the nested dict produced by _validate_children)


nested_schema_failing.patch Download (4.1 KB) - added by alberto 12 years ago.
fix-for-schema.patch Download (1.8 KB) - added by michele 12 years ago.
The fix, now _retrieve_dict_from_parent always returns the right dict for the actual parent (a dict if the widget is a compound) the outermost dict that contain our key if the widget is simple, hope the logic is implemented in the right way
nested_schemas.patch Download (6.8 KB) - added by alberto 12 years ago.
the patch. Should apply cleanly as I've removed ronald's hunk
nested_schemas.2.patch Download (7.5 KB) - added by alberto 12 years ago.
last patch was broken, this is the good one
nested_schemas.3.patch Download (8.8 KB) - added by alberto 12 years ago.
optimized version
nested_schemas.4.patch Download (9.2 KB) - added by alberto 12 years ago.
v3 with stricter tests to check that errors from one level don't polute other levels
michele-nested-schemas-1.patch Download (11.2 KB) - added by michele 12 years ago.
nested_schemas.5.patch Download (9.5 KB) - added by alberto 12 years ago.
how about this?

Change History

Changed 12 years ago by alberto

comment:1 Changed 12 years ago by michele

I will take a quick look now but unitl Friday I won't have time (exam coming) anyway I think we can get along without fully qualified names, keep in mynd that we already have them but the form one is skipped. :P

comment:2 Changed 12 years ago by michele

Now I have to go sleep, anyway the problem is just in the _retrieve_dict_from_parent method the errors dict is already created in the right way and recursively, I have some small fixes here but they aren't ready yet I will look into it on Friday afternoon probably.

The problem is that we should always return the right dict from the _retrieve_dict_from_parent method, but this is not happening for a fieldset since the last chain is avoided, we should instead walk the entire tree and return the right dict if present and otherwise the outermost dict that contains the name key for simple fields.

This should make all work.

comment:3 Changed 12 years ago by michele

I think I've fixed the problem by fixing (as I said) retrieve_dict_from_parent, the browser test show that all is working right using your form of the test.

You should fix your test that's not passing since you're asserting not errors, but keep in mynd that you're passing a dict that contains only a key, this will give a missing value error with the Schema validator so errors is not empty.

I'm going to attach the patch, give me any feedback.

Changed 12 years ago by michele

The fix, now _retrieve_dict_from_parent always returns the right dict for the actual parent (a dict if the widget is a compound) the outermost dict that contain our key if the widget is simple, hope the logic is implemented in the right way

comment:4 Changed 12 years ago by alberto

  • Cc michele removed

Great! I've committed you last patch at r824 and updated the tests. It seems to be working fine except for the case of mixed validators which I haven't tested yet. Specially when a Schema validatior has nested Schema validators (not the same as nested widgets having validators). I've got a pile of work so I might not be able to get them today.

Hey! Forget about all this until you finish your exam! ;)

Many thanks! Alberto

P.S. Keeping this open until I get to test the nested schema stuff

comment:5 Changed 12 years ago by anonymous

  • Priority changed from high to normal

comment:6 Changed 12 years ago by alberto

BTW, I forgot to comment that the test I sent was broken, it would alway fail as it was implemented (corrected at the commit) because the Schema needed to have ignore_missing=True so it wouldn't bark when some fields weren't set. Sorry for any time you might have wasted with this :(

Thanks again, Alberto

comment:7 Changed 12 years ago by michele

Great, yep after a while I noticed the missing thing as I said, anyway no problem. :-)

Nested schema? yep, I also wondered if they will work, and I think (and hope) they will work, anyway only your test will say it for sure! :D

comment:8 Changed 12 years ago by michele

Just one thing, in r824 I noticed you added this:

for var in name? + self.class.template_vars]))

in particular name?, what's the intended behavior? just to be sure... ;-)

comment:9 Changed 12 years ago by alberto

Mmmm I'm not so sure that nested schemas will work :( (but shouldn't be talking without a test, though) because a nested schema validator can return a tree aswell, lets say we have:





phone = PhoneNumber?() age = no validator here

fieldset2 fieldset3

and a schema validator attached to form which is:


fieldset1 = Schema:

fieldset1.1 = Schema

fieldset1.1.1 = Schema

phone = validated by it's own widget age = Int()

As you can see, the schema validates the deep buried age and the phone number is validated by it's own widget. The schema validator will only run when the form is validated AFTER the child widgets are validated and will produce a nested dict with the Invalid exception at one of it's leafs. What I don't think will work as expected is that both of this error dicts (the one produced by validate_children and the one produced by the schema) get merged so we see both errors: on phone and on age.

But, as i've said, I'm just speculating... :) I *do* hope it works too, or else we'll have to resort to the "turtlish" recursive_update() of one of my patches at #577 :(

Regarding [name]: it's just so the widget's repr prints something like TableForm?(name="form",...) and it's easier to spot which widget is being processed when printing self while debugging. It used to do this before you removed "name" from the Widget's template_vars so I added it back here.

Regards, Alberto

comment:10 Changed 12 years ago by alberto

Oh my god! that's what happends when you don't preview what you blabber...:

should be:

              phone = PhoneNumber() 
              age = no validator here


and a schema validator attached to form which is:

   fieldset1 = Schema:
      fieldset1.1 = Schema
         fieldset1.1.1 = Schema
             age = Int()
             phone = no validator, already validated by it's widget

comment:11 Changed 12 years ago by michele

Hi Alberto,

I never used nested Schema, anyway we do want both errors to be merged on the same error dict, I will wait for your test and take a look tomorrow but I'm pretty confident that if it doesn't work we can easily fix it.

Regarding name, all ok didn't notice this was under repr, sorry. ;-)

comment:12 Changed 12 years ago by alberto

Hi Michele,

There's a failing test at r843 :( Sorry but I couldn't post before, I'll try to look at it later too.. Regards, Alberto

comment:13 Changed 12 years ago by alberto

I think we could solve this without merging both dicts at each node by merging them at the end of the tree walk. We could accomplish this by changing the "parent" paramteter that goes into each validate by a "state" object (+or- like formencode's) which stored the parent (as it's done now) and a parallel schema errors dict. These two dicts can be recursively merged only once at the end of the walk. What do you think?

comment:14 Changed 12 years ago by michele

Fixed... the test. :P

I tried the same thing yesterday and it worked on the FF test.

The problem: you're asserting an error on the age field, but you haven't defined a validator for it not by using a schema or a validator directly.

  • widgets/tests/test_nested_widgets.py

    315315                    widgets.FieldSet( 
    316316                        name = "sub2",  
    317317                        fields = [ 
    318                             widgets.TextField("name", validator=int_validator), 
    319                             widgets.TextField("age"), 
     318                            widgets.TextField("name"), 
     319                            widgets.TextField("age", validator=int_validator), 
    320320                            widgets.TextField("number", validator=int_validator), 
    321321                        ] 
    322322                    ), 
    357357        assert errors.pop('number', False) 
    359359        errors = errors['sub2'] 
    360         #XXX: The next assertion fails, the widget validartor's errors 
    361         #     from this inner dict are getting lost. 
    362360        assert errors.pop('age', False) 
    363361        assert errors.pop('number', False) 
    364362        assert not errors 

my nosetests output:

michele@ionic:~/Progetti/TurboGears/svn/turbogears/widgets$ nosetests
Ran 59 tests in 2.523s


that was easy. :P But probably I'm missing something?

comment:15 Changed 12 years ago by michele

The reason why errors are not getting lost is that we are merging them node by node, doing it at the end will lost everything because if some keys are nested dict they will endup to not being updated (like a dict with it's update method) but just replaced.

comment:16 Changed 12 years ago by alberto

Maybe I should write some tests for the tests too :D Why the fk do I want to validate a name as an Int()??

Anyway, I'm committing the fix plus a test that tests what I really wanted to test (test, test, test ;). That is, nested Schemas (not nested widgets with mixed validators, which isn't bad to test anyway....).

Haven't looked at the cause yet but seems like the error dict from the schema is not passed along but a Invalid instance is, should be easy to fix.

Regarding the merging without losing keys it's what I implemented with recursive_update() at a patch back at #577, I was doing it at every node which is a waste as if two parallel dicts are built we can merge the only once and at the end. im not sure this is needed yet as this problem hasn't arised yet, as I've said, there's something wrong besides this...

BTW, maybe I'm pushing this too far... do really want to support nested schemas?? Maybe from a consistency POV yes, but from a real use-case POV? (I have none, personally)

comment:17 Changed 12 years ago by simon

Nested schemas can be usefull from reusability stand point. One can define some elemental schemas and than composes everything from that.

comment:18 Changed 12 years ago by alberto

I've committed at r847 some progress on this. It seems that nested error dicts from the widgets are effectively not merged with the nested dict from the schema... I had to build an adapter function to translate the error dict formencode.Schema returns from nested schemas to the one we're used to.

I've also seen that value dicts should be merged too.

Now the question is, how can we implement this in the most effective way? It would be a waste to recursively update at every node on the way back to the root as the the lower levels will be already updated. But we need to take into account too that nested schemas can be arbitrariliy nested.

The best way that comes to my head is to build 4 separate dicts (widget values, widget errors, schema values and schema_errors) and merge them at the end into values and errors. To do this we'll need to do something like the "state" variable I mentioned.

Anyway, I hope you can come up with something easier like you've been doing in the previous patches, this 4 dict thing sounds scary... ;) I might give a try later and write some POC...

Regards, Alberto

Simon: I also have a feeling that nested schemas should be supported... but from a consistency POV, hadn't thought of the reusability stand point but sure makes sense.

comment:19 Changed 12 years ago by alberto

Since r847 both dicts seem to be merged fine. It's done at every node so not very computationally efficient. Should be improved somehow. I'm going to write some tests just to be sure...

comment:20 Changed 12 years ago by alberto

Added another test at r849 for this. It seems values don't need to be recursively updated so I've fixed that.

comment:21 Changed 12 years ago by Ooops

The post where it starts "since r847" should read r848

comment:22 Changed 12 years ago by michele

Well, I tested yesterday nested shema and noticed they will not work, since they error dict we get back from formencode is wrong and not nested, IMHO we shouldn't workaround it in TG but ask Ian and fix it in formencode IMHO.

But thinking a bit about it yesterday and now, I do think we are pushing this thing a bit too much forward, it doesn't make sense supporting nested schema because that's not DRY, at this point we can just trash away the widget own validator and provide it only for compound widgets by supporting a schema, why would someone want to do this thing:

class ContactSchema(Schema):
    name = String()
    age = Int()
    email = Email()

class ContactFields(WidgetsDeclaration):
    name = TextField()
    age = TextField()
    email = TextField()

8 lines

instead of:

class ContactFields(WidgetsDeclaration):
    name = TextField(validator=String())
    age = TextField(validator=Int())
    email = TextField(validator=Email())

4 lines

Schema are good, but just for things like checking two password are corresponding, in the widget case, IMHO widgets should provide one good (and always working right) way of doing this thing and not fifty thousand different way that we need to support doing, I don't even want to encourage something like this, moreover as you can see nested schema are a pain for us to support (for value dict and errors dict) and can have bad impacts on performance ATM, and I even noticed that Schema are already slow them self (try them on the python shell).

Sorry but this is really an unneeded complication IMHO ( 4 dict, a state value, recursive update). I would like to hear kevin opinion on this and in case work on it on a ticket for finding the best solution before submitting anything to svn.

comment:23 Changed 12 years ago by michele

Anyway I will give a try to find a simpler solution (if any exists) just for fun... :D

I had in mind something yesterday if the error dict was right, I could try something using your adapter function, but we should push this fix on formencode because is really wrong to get back a flat dict...

But not before monday probably (I don't think I can come up with something good in 20minutes...). :/

comment:24 Changed 12 years ago by michele

For the moment I think we should revert any change on svn for handling nested schema, just because we should not have failing tests on svn and to discuss the best eventual solution on this ticket before submitting it on svn.

comment:25 Changed 12 years ago by michele

Ok, I still don't like this thing :P anyway I have the solution sitting here in my mind.

At 99,9% I'm sure it will work with just 4/5 lines of code added (and the formencode dict adaption function until we fix this in formencode where it belongs).

I will try to post a patch tomorrow probably later than sooner as I will be away.

Have a nice weekend.

comment:26 Changed 12 years ago by alberto

r850 reverts the changes, I'm attaching a patch for them which probably wont apply clean since I haven're reverted the changes at r844 made by ronald, should be alright if those hunks get removed.

All tests were passing though, but you're are right in that it should have been discussed further. Just though that as 0.9a1 is out the trunk wouldn't be so touchy, I could have been wrong...

Anyway. The 4 dicts proposal was an optimization idea, to postpone the merging of the dicts until the last possible moment. I do think that this should be supported if we're going to support Schemas altogether. just because schemas *could* have nested schemas and it would be disturbing to see that they don't work as expected, or even worse, than they hide validation errors from other widgets

Schemas are good for other things besides chained validators in any semi-complex form, for example, you're able to conditonally validate certain fields (password & passford_confirm) depending on the value of other fields (a change_password flag) thanks to the state variable. This is a real use-case.

As I've said, my strongest argument is for consistency's sake, so Schema validators behave as schema validators (that is, they can contain arbitrarily nested schemas).

Anyway, here's the patch... feel free to improve in any way you like as you're more knowledgeable of the widget internals ;) it should be Kevin's decision wether this gets in or not. All the tests pass, just needs some optimization at the merging of dicts, most of the bulk are tests so don't panic ;), the CompundWidget? is only touched in two lines and two extra utility functions.

Regarding formencode's patching I'm not so sure about that, I *think* this is FormEncode's behaviour and might be for a good reason (don't know who it integrates in other framework so cannot say for sure). I think the correct thing to do is to adapt it to our nested error dicts

Have a nice weekend too ;) Alberto

Changed 12 years ago by alberto

the patch. Should apply cleanly as I've removed ronald's hunk

Changed 12 years ago by alberto

last patch was broken, this is the good one

comment:27 Changed 12 years ago by simon

I think recursion is the clearest way. We can always optimise later if performance becomes an issue (if need be even in C).

comment:28 Changed 12 years ago by alberto

Hi, I'm attaching an optimized version. This one only incurs in the overhead of recursively updating dicts if we need to merge a Schema (possibly) nested error dict. It adds a flag parameter to _set_errors() and _update_error_dict_from_parent(). This means that if no Schema validators are used the overhead is minimal (only an extra if clause).

Regarding the optimized C version, I was surprised I didn't find any built in function or external library that implemented this kind of functionality (I haven't looked really so no wonder ;). I guess that recursively updating nested dicts should be a fairly common necessity... If anyone knows about one please tell me :)


Changed 12 years ago by alberto

optimized version

Changed 12 years ago by alberto

v3 with stricter tests to check that errors from one level don't polute other levels

comment:29 Changed 12 years ago by michele

Hi all,

First of all, excuse me if I sounded a bit harsh yesterday but I was coming back from a nervous football match, I'm really sorry! :-(

Alberto, I also think as you and Simon said that for doing this thing the only solution is using a recursive function, your last patch is very similar to what I had in mind and I will build my patch upon it since I was going to also use a flag as you have done (what about modifying the adapter function to return a flag if the error dict is nested? that's the only case we should really cover, right?).

And also sorry for the svn/test thing, I hadn't tried the latest version yet when commenting (I was at 846) and didn't notice test were passing.

Finally, I do think we should support also this type of validation schema since people have different opinion and someone will need it for sure at some point, it doesn't matter if I like or not it. :D

Sorry again, I will try my approach ASAP.


Changed 12 years ago by michele


comment:30 Changed 12 years ago by michele

Ok, I've posted my patch.

I also renamed some functions and implemented the nested "signal" by the adapt function.

The trick I used is to recursively pass to every update_dict_by_parent a computed parent and it's associated dict, at the end of every recursion the dict is no more nested and hence treated as any other error dict by our update_dict_by_parent function.

In other words I'm just simulating a children validation by iterating the nested error dict and taking advantage of the logic we already implemented in the update_dict_by_parent function.

What do you think guys? any suggestion for the keys_to_remove list? I use it since you can't delete a dict key while you're iterating over it.

comment:31 Changed 12 years ago by alberto

Hi Michele,

No worry, we all know how excited Italians get over Calcio ;) ... I've looked at your patch and looks very similar to mine. You've moved the nested-detection flag to adapt_schema_error_dict, which will optimize the most common case (I think) which is when schema error dicts are not nested, and you've removed the tail recursion from recursive_update by unrolling that logic at the end of set_errors.

However, this time I like my patch better :) I think recursive_update is more clear, as most recursive algorithms are, and, being a separate function, easier to optimize if the need arises. However, it's normal that I see it clearer as I've wrote it :)

I propose to use your adapt_schema_error dict and my recursive_update (probably unbinding it from the widget for easier reusal). Whatever we do I think should get done ASAP so we can move to something else as we're spending too much time on this :)

I'll post a patch in 3 seconds ;)

Regards, Alberto

Changed 12 years ago by alberto

how about this?

comment:32 Changed 12 years ago by simon

What the hell, I'll jump into the fray as well ;)

Optimisation (haven't actually timed it, at the very least it's less code ;)) of Michele's code:

def _update_dict_by_parent(self, orig_dict, new_dict, parent,
        # Avoid any computation if new_dict is empty
        if new_dict:
            temp_dict = orig_dict
            for name, skip in itertools.ifilterfalse(itemgetter(1), parent):
                if not temp_dict.has_key(name):
                    temp_dict[name] = {}
                temp_dict = temp_dict[name]
            if nested:
                for (key, value) in filter(lambda (key, value):
                            isinstance(value, dict), new_dict.iteritems()):
                    parent.append((key, False))
                    self._update_dict_by_parent(orig_dict, value, parent, True)
                    del new_dict[key]

itemgetter is a Py2.4 thing, so it would have to be manually implemented for 2.3, but that's trivial.

comment:33 Changed 12 years ago by simon

For the record, I'm still in favour of a straight-forward recursion.

comment:34 Changed 12 years ago by michele

Hi all,

yes we (Italians) get too much excited with Calcio probably, I shouldn't hack right after a football game, sorry again. :D

Well back to this issue, your is cleaner for sure, I don't have any problem with it.

I was just wondering if it cover all use case, the reason I used the update_dict_by_parent is that in this way I'm sure that while updating recursively we don't loose any other nested dict already present (by leveraing the same logic we use for every other update). As I said my way just simulates a sequential children validation by flattering backward the nested dict and providing the right parent to use, if just feels more coherent to what we are already doing, but as you said they are quite similar.

Anyway I'm not sure if we can experience this corner case...

I can also avoid the keys_to_remove list, I can post an updated patch in 1 hour.

Anyway do what you think is the best, whitin 1 hour I can probably test if the problem I mentioned can arise (calling recursively_update on a dict that's not empty).

comment:35 Changed 12 years ago by alberto

Hi all,

The recursive_update should handle empty dicts properly, it's just a regular (slow) update():

for k, v in d.iteritems(from):
   to[k] = v

+ a recursive call on itself to handle the case when we're updating a dict with another dict.

It can even be used with any two dicts that need to be recursively updated, no particular dependency on error dicts.

But of course, you're free to test it just to make sure (though I'm pretty confident as, surprsingly enough, I found an almost identical version when looking for something to get this job done faster  http://mail.python.org/pipermail/python-list/2001-December/077009.html).

Anyway, I really don't care which version gets in as long as it passes the tests and it's the most efficient and clearer solution, which, in this case, i think mine is ;) (at least clearer, and efficency-wise more or less the same). But I have no problem conceding if there is a better choice, which in other cases, you're solution was obviously simpler and more elegant.

Simon, you're always welcome to the fray, the more people involved the better :)

Let's please settle this down and close this ticket and move to something else, this is getting boring :)

Regards, Alberto

comment:36 Changed 12 years ago by michele

Ok, I checked the corner case I was talking about and it's being handled in the right way, feel free to commit your version.

Just for the sake of completeness, the corner case I'm talking about is this (not a empty dict but a not empty dict):

There are situation where update_errors_dict_from_parent is called with a base_error_dict that already contains a series of nested errors generated from children validation (for example you can check this by removing if_empty=None from int_validator), my solution is to call update_errors_dict_from_parent for every key like this and passing a constructed parent, this case is then handled by just using the python built-in update function of a dict (temp_dict.update(errors_dict)) and the initial logic that you can see in the update_errors_dict_from_parent function.

Your solution also works since you're always iterating every single key of the dict and every nested dict recursively and replacing it, you're just using a customized update function for a dict.

At this point the only difference that I can see could reside in the two update functions being used, yours is in python, the python one (that I'm using) is implemented in python or CPython and hence already optimized? Anyway that's not a big deal...

I would still like to rename the two functions as I've done in my first patch tough:

_retrieve_dict_from_parent -> _get_dict_by_parent _update_errors_dict_from_parent -> _update_dict_by_parent (with renaming of base_errors_dict to orig_dict and errors_dict to new_dict)

What do you think?

Sorry again.


comment:37 Changed 12 years ago by alberto

Ok,I'll rename those funcs and variables and commit it sometime this afternoon, now I'm going to have lunch :)

I'll keep the recursive_update as I really find it much clearer, and can also be used in other dicts if needed. I think this solution also respects you're original algorithm structure better (was one of my design goals), it really only applies recursive update in case we're expecting a nested_error dict from the schema.

So that said, if this solution is ok with you (both) just leave me unreplied and I'll close this later when I get back.

Thanks both for the help! Alberto

comment:38 Changed 12 years ago by alberto

  • Status changed from new to closed
  • Resolution set to fixed

I've committed 5 at r856 with the variable and function renames. Closing this at last :)

comment:39 Changed 12 years ago by simon

Why not move recursive_update to util, as it's completely generic and may be of use elsewhere?

comment:40 Changed 12 years ago by alberto

good idea, I've done it at r858 thanks

comment:41 Changed 12 years ago by michele

Great work Alberto, I think that now widgets support any schema/validator combination! ;)

comment:42 Changed 12 years ago by alberto

Im still not sure... Maybe we should reopen this? :D

Note: See TracTickets for help on using tickets.