Warning: Can't synchronize with repository "(default)" (Unsupported version control system "svn": No module named svn). Look in the Trac log for more information.

Ticket #1879 (closed enhancement: wontfix)

Opened 11 years ago

Last modified 10 years ago

Identity should use a sql query to get the user's permissions

Reported by: Janzert Owned by: anonymous
Priority: normal Milestone: 1.1.x bugfix
Component: Identity Version: 1.1 HEAD
Severity: minor Keywords:
Cc:

Description

Currently Identity iterates through all of a user's groups and individually pulling all the permissions for each group and then adding them to a set before returning them.

It should instead use an appropriate query that will return the unique permissions from all the user's groups.

Christoph Zwerschke suggested two SA methods for doing this:

def permissions(self):

return set(Permission.query.distinct().join(['groups',

'users']).filter_by(user_id=self.user_id))

def permissions(self):

return set(Permission.query.filter(Permission.groups.any(

Group.users.any(User.user_id==self.user_id))).all())

The disadvantage of the first is that the join will gather all the duplicate permissions and then strip them out with the distinct clause. This is probably slower than the second method that uses an exist clause.

The problem with the second is that it is not compatible with SA 0.3.x.

Attachments

dbtest.zip Download (100.1 KB) - added by Janzert 11 years ago.
quickstarted project with scripts to test query speeds.

Change History

comment:1 Changed 11 years ago by Janzert

  • Type changed from defect to enhancement

comment:3 Changed 11 years ago by Janzert

Before working on trying to get a SA 0.3 compatible statement I decided to check the speed difference of the different options. To my surprise the current method of iterating through the groups in python was much faster.

python -m timeit -s"import perm_test;perm_test.iter()" "perm_test.iter()" 10000 loops, best of 3: 94.9 usec per loop

python -m timeit -s"import perm_test;perm_test.join()" "perm_test.join()" 100 loops, best of 3: 13.1 msec per loop

python -m timeit -s"import perm_test;perm_test.exist()" "perm_test.exist()" 100 loops, best of 3: 14.7 msec per loop

These were done using SA 0.5 and postgresql 8.3.1. I also tested with sqlite and the results were very similiar.

In order to test I initialized the database with one user having 20 groups each group having 11 out of 60 permissions. I will attach the project with the scripts I used to test. To initialize the database first do "tg-admin sql create" then "python init.py". Then the timings can be tested using the commandlines above. You can also vary the number of groups, permissions and permissions per group in the init.py script.

Unless someone else shows why my timing method is wrong or comes out different for them I'll go ahead and close this in a few days.

Changed 11 years ago by Janzert

quickstarted project with scripts to test query speeds.

comment:4 Changed 11 years ago by chrisz

I can reproduce the numbers with your test, but as I wrote in the thread above, the test is not realistic because it queries the permissions in a loop in the same database session, where the group.permissions are all kept in memory. In reality, the requests run in different sessions and need to requery the database. If you simulate this using turbogears.database.session.expire_all(), then the results reverse - the old method is then 2-3 times slower. As suggested in the thread above, we may consider some caching to cover the case that permissions are queried multiple times in the same database session.

I can also reproduce that the second method using exists is somewhat slower than the method using join (no matter whether I use SQLite or Postgres 8.3, or whether I add additional indices and ANALYZE the tables), contrary to what we expected. Maybe this is because the database cannot optimize the chained exists clause for using indices.

comment:5 Changed 11 years ago by chrisz

This is the SQL query used in the first method:

SELECT DISTINCT permission_name FROM permission
JOIN group_permission USING (permission_id)
JOIN user_group USING (group_id)
WHERE user_id = $1

And this is the SQL query used in the second method:

SELECT permission_name FROM permission
WHERE EXISTS (SELECT 1 FROM group_permission
WHERE permission_id = permission.permission_id
AND EXISTS (SELECT 1 FROM user_group
WHERE group_id = group_permission.group_id
AND user_id = $1))

(The actual queries issued by SQLAlchemy are a bit more complicated because tg_user and tg_group are joined which is actually not necessary).

I tested these with PostgreSQL 8.3 and even larger numbers of groups and permissions, and it turned out that the second query is indeed much slower (of course I did an analyze on the database so that it can do an index scan).

I then did another test with the following third SQL query:

SELECT permission_name FROM permission
WHERE permission_id IN (SELECT permission_id FROM group_permission
WHERE group_id IN (SELECT group_id FROM user_group
WHERE user_id = $1))

It turned out that this query is even faster than the others, contrary to all we learned in school about how bad "in" is compared with "exists".

Does anybody have a good explanation for this and an idea how to elegantly express the last query with SQLAlchemy?

comment:6 Changed 11 years ago by Janzert

It's certainly not anything close to elegant, but at least to get started here is a method that will use the "in" query to get the permissions.

    @property
    def in_perms(self):
        from sqlalchemy import sql
        g_ids = sql.select([user_group_table.c.group_id],
                user_group_table.c.user_id == self.user_id)
        p_ids = sql.select([group_permission_table.c.permission_id],
                group_permission_table.c.group_id.in_(g_ids))
        return set(Permission.query.filter(
            Permission.permission_id.in_(p_ids)))

Also only tested with 0.5 and guessing that it needs >=0.4.

It does slightly beat out the join query (~14ms to ~15.5ms).

Also as a side note, explicitly caching the permissions results in it being about 10,000 times faster for subsequent checks. I'd be a bit worried about ending up with stale results if someone hangs onto a User object long term though.

comment:7 Changed 11 years ago by chrisz

That doesn't look so bad. I find it a bit more readable if you rewrite it as below (and this also works with SA 0.3.10):

from sqlalchemy.sql import select

...

    @property
    def permissions(self):
        perm = permissions_table.c
        group_perm = group_permission_table.c
        user_group = user_group_table.c
        return set(Permission.query.filter(perm.permission_id.in_(
            select([group_perm.permission_id], group_perm.group_id.in_(
                select([user_group.group_id],
                    user_group.user_id == self.user_id))))))

The database sessions usually last only one request, so I wouldn't worry about stale results. It may be even an advantage if permissions cannot change during one request.

comment:8 Changed 11 years ago by Janzert

Yes, your formulation is certainly better.

I also would like to change to returning a frozenset to emphasize that this is a read-only view of the permissions.

Changing that and adding caching it ends up as:

    @property
    def permissions(self):
        """
        Get the User's permissions.
        This just returns a static view of the permissions and cannot be used to
        change them.
        """
        try:
            return self._permissions
        except AttributeError:
            perm = permissions_table.c
            group_perm = group_permission_table.c
            user_group = user_group_table.c
            p = frozenset(Permission.query.filter(perm.permission_id.in_(
                select([group_perm.permission_id], group_perm.group_id.in_(
                    select([user_group.group_id],
                        user_group.user_id == self.user_id))))))
            self._permissions = p
            return p

If that looks good to everyone I'll work on getting a patch together for the quickstart templates. I'll be out of town for the latter half of the week though so it'll probably be next week before I have anything.

comment:9 Changed 11 years ago by chrisz

Returning a frozenset makes sense (the identity methods already do that).

We'll probably not want to patch this in TG 1.0 (only critical bugfixes there), so you should base your patch on TG 1.1/1.5.

Btw, we should also make a similar change to the SQLObject model.

comment:10 Changed 11 years ago by Janzert

Yep, I'm working with a checkout of the 1.1 branch.

I can't find any way to get access to the intermediate tables in order to use sqlobject's sqlbuilder. Raw sql can be used, which gives the following:

    _get_permissions(self):
        try:
            return self._permissions
        except AttributeError:
            p = frozenset(Permission.select("""permission.id IN (
                SELECT group_permission.permission_id FROM group_permission
                WHERE group_permission.group_id IN (
                    SELECT user_group.group_id FROM user_group, tg_user
                    WHERE user_id = tg_user.id))"""))
            self._permissions = p
            return p

This seems like fairly straight up sql but I wonder if it will run into database compatibility problems?

I also still need to take a look at an Elixir version, but hopefully that can be based off the sqlalchemy one.

comment:11 Changed 11 years ago by Janzert

The following works for Elixir but is quite ugly to get a hold of the intermediary tables.

    @property
    def permissions(self):
        perm = Permission.c
        user_group = self._descriptor.properties['groups'].secondary.c
        group_perm = Permission._descriptor.properties['groups'].secondary.c
        p = frozenset(Permission.query.filter(perm.permission_id.in_(
            select([group_perm.permission_permission_id],
                group_perm.tg_group_group_id.in_(
                    select([user_group.tg_group_group_id],
                        user_group.tg_user_user_id == self.user_id))))))
        self._permissions = p
        return p

I'm thinking at this point that unless someone can find a cleaner way to do this for sqlobject and elixir, maybe this should only be changed in sqlalchemy.

comment:12 Changed 11 years ago by faide

  • Milestone changed from 1.5 to 1.1

comment:13 Changed 11 years ago by faide

  • Milestone changed from 1.1 to 1.1 maintenance

Is that worth the effort? Did someone bench the real implementation to get an idea?

comment:14 Changed 10 years ago by Chris Arndt

I'd say if nobody comes up with an implementation soon, then close this ticket. It's working as it is after all. And if somebody really needs more speed, it's always possible to write a custom identity provider.

comment:15 Changed 10 years ago by Janzert

I basically gave up on this because there doesn't appear to be a clean way to make the queries in Elixir or SQLObject. I could make a patch for just SA if wanted, otherwise this should just be closed.

comment:16 Changed 10 years ago by Janzert

  • Status changed from new to closed
  • Resolution set to wontfix

Marking wontfix as no one has enough interest to make the needed changes and it's likely a very small effect in most applications anyway.

Note: See TracTickets for help on using tickets.