Between query equivalent on App Engine datastore?

2019-03-29 18:24发布

I have a model containing ranges of IP addresses, similar to this:

class Country(db.Model):
  begin_ipnum = db.IntegerProperty()
  end_ipnum = db.IntegerProperty()

On a SQL database, I would be able to find rows which contained an IP in a certain range like this:

SELECT * FROM Country WHERE ipnum BETWEEN begin_ipnum AND end_ipnum

or this:

SELECT * FROM Country WHERE begin_ipnum < ipnum AND end_ipnum > ipnum

Sadly, GQL only allows inequality filters on one property, and doesn't support the BETWEEN syntax. How can I work around this and construct a query equivalent to these on App Engine?

Also, can a ListProperty be 'live' or does it have to be computed when the record is created?

question updated with a first stab at a solution:

So based on David's answer below and articles such as these:

http://appengine-cookbook.appspot.com/recipe/custom-model-properties-are-cute/

I'm trying to add a custom field to my model like so:

class IpRangeProperty(db.Property):
  def __init__(self, begin=None, end=None, **kwargs):
    if not isinstance(begin, db.IntegerProperty) or not isinstance(end, db.IntegerProperty):
        raise TypeError('Begin and End must be Integers.')
    self.begin = begin
    self.end = end
    super(IpRangeProperty, self).__init__(self.begin, self.end, **kwargs)

  def get_value_for_datastore(self, model_instance):
    begin = self.begin.get_value_for_datastore(model_instance)
    end = self.end.get_value_for_datastore(model_instance)
    if begin is not None and end is not None:
      return range(begin, end)

class Country(db.Model):
  begin_ipnum = db.IntegerProperty()
  end_ipnum = db.IntegerProperty()
  ip_range = IpRangeProperty(begin=begin_ipnum, end=end_ipnum)

The thinking is that after i add the custom property i can just import my dataset as is and then run queries on based on the ListProperty like so:

q = Country.gql('WHERE ip_range = :1', my_num_ipaddress)

When i try to insert new Country objects this fails though, complaning about not being able to create the name:

...
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/db/__init__.py", line 619, in _attr_name
return '_' + self.name
TypeError: cannot concatenate 'str' and 'IntegerProperty' objects

I tried defining an attr_name method for the new property or just setting self.name but that does not seem to help. Hopelessly stuck or heading in the right direction?

3条回答
戒情不戒烟
2楼-- · 2019-03-29 18:51

This is odd, because I'm trying to figure out the same basic premise and running into all kinds of problems (and based on the 130K record number, assume you're using MaxMind Geo Country file). Although David's answer would work easy enough, it doesn't work in practice. This from Google:

First, if a query has multiple inequality filters on a given property, an entity will only match the query if it has an individual value for that property that matches all of the inequality filters. For example, if an entity has the values [1, 2] for property x, it will not match the query WHERE x > 1 AND x < 2. Each filter does match one of x's values, but there's no single value that matches both filters.

I batch uploaded the entire data set creating the listproperty field to attempt to no avail.

So any additional help would be greatly appreciated!

查看更多
小情绪 Triste *
3楼-- · 2019-03-29 18:54

Short answer: Between queries aren't really supported at the moment. However, if you know a priori that your range is going to be relatively small, then you can fake it: just store a list on the entity with every number in the range. Then you can use a simple equality filter to get entities whose ranges contain a particular value. Obviously this won't work if your range is large. But here's how it would work:

class M(db.Model):
    r = db.ListProperty(int)

# create an instance of M which has a range from `begin` to `end` (inclusive)
M(r=range(begin, end+1)).put()

# query to find instances of M which contain a value `v`
q = M.gql('WHERE r = :1', v)

The better solution (eventually - for now the following only works on the development server due to a bug (see issue 798). In theory, you can work around the limitations you mentioned and perform a range query by taking advantage of how db.ListProperty is queried. The idea is to store both the start and end of your range in a list (in your case, integers representing IP addresses). Then to get entities whose ranges contain some value v (i.e., between the two values in your list), you simply perform a query with two inequality filters on the list - one to ensure that v is at least as big as the smallest element in the list, and one to ensure that v is at least as small as the biggest element in the list.

Here's a simple example of how to implement this technique:

class M(db.Model):
    r = db.ListProperty(int)

# create an instance of M which has a rnage from `begin` to `end` (inclusive)
M(r=[begin, end]).put()

# query to find instances of M which contain a value `v`
q = M.gql('WHERE r >= :1 AND r <= :1', v)
查看更多
倾城 Initia
4楼-- · 2019-03-29 18:54

My solution doesn't follow the pattern you have requested, but I think it would work well on app engine. I'm using a list of strings of CIDR ranges to define the IP blocks instead of specific begin and end numbers.

from google.appengine.ext import db    
class Country(db.Model):
    subnets = db.StringListProperty()
    country_code = db.StringProperty()

c = Country()
c.subnets = ['1.2.3.0/24', '1.2.0.0/16', '1.3.4.0/24']
c.country_code = 'US'
c.put()

c = Country()
c.subnets = ['2.2.3.0/24', '2.2.0.0/16', '2.3.4.0/24']
c.country_code = 'CA'
c.put()

# Search for 1.2.4.5 starting with most specific block and then expanding until found    
result = Country.all().filter('subnets =', '1.2.4.5/32').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/31').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.4/30').fetch(1)
result = Country.all().filter('subnets =', '1.2.4.0/29').fetch(1)
# ... repeat until found
# optimize by starting with the largest routing prefix actually found in your data (probably not 32)
查看更多
登录 后发表回答