Efficient way to store relation values in NDB

2019-03-03 09:59发布

问题:

I've this data model (I made it, so if there's a better way to do it, please let me know). Baically I've Club that can have many Courses. now I want to know all the members and instructors of a Club. members and instructors are stored in the Course model, and Club has a reference to them. See the code..

class Course(ndb.Model):
    ...
    instructor_keys = ndb.KeyProperty(kind="User", repeated=True)
    member_keys = ndb.KeyProperty(kind="User", repeated=True)

    @property
    def instructors(self):
        return ndb.get_multi(self.instructor_keys)

    @property
    def members(self):
        return filter(lambda x: x.is_active, ndb.get_multi(self.member_keys))

    def add_instructor(self, instructor):
        if instructor.key not in self.instructor_keys:
            self.instructor_keys.append(instructor.key)
            self.put()

    def rm_instructor(self, instructor):
        if instructor.key in self.instructor_keys:
            self.instructor_keys.remove(instructor.key)
            self.put()

    def add_member(self, member):
        if member.key not in self.member_keys:
            self.member_keys.append(member.key)
            self.put()

    def rm_member(self, member):
        if member.key in self.member_keys:
            self.member_keys.remove(member.key)
            self.put()

and

class Club(ndb.Model):
    ...
    owners_keys = ndb.KeyProperty(kind="User", repeated=True)
    course_keys = ndb.KeyProperty(kind="Course", repeated=True)


    @property
    def members(self):
        # TODO: is this correct? efficient?
        l_members = []
        for courses in self.courses:
            l_members = l_members + courses.members
        return l_members

    @property
    def trainers(self):
        l_trainers = []
        for courses in self.courses:
            l_trainers = l_trainers + courses.trainers
        return l_trainers

    @property
    def owners(self):
        return ndb.get_multi(self.owners_keys)

    @property
    def courses(self):
        return filter(lambda x: x.is_active, ndb.get_multi(self.course_keys))

    def add_owner(self, owner):
        if owner.key not in self.owners_keys:
            self.owner_keys.append(owner.key)
            self.put()

    def rm_owner(self, owner):
        if owner.key in self.owners_keys:
            self.owner_keys.remove(owner.key)
            self.put()

    def add_course(self, course):
        if course.key not in self.courses_keys:
            self.course_keys.append(course.key)
            self.put()

    def rm_course(self, course):
        if course.key in self.courses_keys:
            self.course_keys.remove(course.key)
            self.put()

    def membership_type(self, user):
        if user.key in self.owners_keys:
            return "OWNER"
        elif user in self.members:
            return "MEMBER"
        elif user in self.trainers:
            return "TRAINER"
        else:
            raise Exception("The user %s is not in the club %s" % (user.id, self.id))

Now, the @property on the Course seems to be ok to me. (am I right?) but the one in the Club seems to be very inefficient. every time i've to iterate all the Courses to compute the members and trainers. Plus these methods are not cached but computed every time. So if it's very costly.

Any suggestion? I was thinking about having instructors and members as a list of key also in Club, and update the club every time I add someone to the Course, but not sure it's correct.

PS: Is there a better way to do also the filter on a ndb.get_multi?

回答1:

I'd try to normalize your model instead of going for a de-normalized one:

class CourseInscription(ndb.Model):
    member = ndb.KeyProperty(kind='User', required=True)
    course = ndb.KeyProperty(kind='Course', required=True)
    is_active = ndb.BooleanProperty(default=True)

Then, you can just add something like

class Course(ndb.Model):
    # all the stuff already there

    @property
    def members(self):
        return CourseInscription.query(CourseInscription.course == self.key)

In general, I prefer to just return the query and let the caller decide to even call it directly or add some more filtering/sorting instead of doing ndb.get_multi directly.

Another nice touch I usually do is to construct the id for the relational entities using their parents, so I can easily check for existence with a get by id instead of having to query

class CourseInscription(ndb.Model):
    # all the other stuff

    @staticmethod
    def build_id(user_key, course_key):
        return '%s/%s' % (user_key.urlsafe(), course_key.urlsafe())

# somewhere I can create an inscription like

CourseInscription(
    id=CourseInscription.build_id(user_key, course_key),
    user=user_key, course=course_key, is_active=True
).put()

# somewhere else I can check if a User is in a Course by just getting... no queries needed
if ndb.Key(CourseInscription, CourseInscription.build_id(user, course)).get():
    # Then the user is in the course!
else:
    # then it's not


回答2:

It makes perfect sense for the members and instructors to belong to the club in addition to being referenced by the courses. Thinking of the use-case, the member would need to join the club before they joined a course offered by the club (though this may happen at the same time of course). It is also significantly more efficient to have the members/instructors referenced directly (in the club) rather than having to traverse the courses each time. Your trade-off comes in additional code maintenance, but this seems justified in this case.

Regarding the active course filter, you may wish to consider an 'active' course list and an 'inactive' course list for the club: once a course is not longer offered, move it to the 'inactive' list (and remove it from the 'active' list).