Google cloud datastore only store unique entity

2019-07-03 09:39发布

问题:

I am trying to learn NoSQL with Google Datastore but I am running into a problem with uniqueness.

Consider an ecommerce store, it has categories and products.

You do not want two products of the same SKU in the database.

So I insert an entity with JSON:

{"sku": 1234, "product_name": "Test product"}

And it shows up with two fields. But then I can do that again and I have two or more identical products.

How do you avoid this? Can you make the sku field unique?

Do I need to do a query before insert?

The same issue arises with categories. Should I just use one entity for ALL my categories and stucture it in my JSON?

What is a good common practice here?

回答1:

Create a new kind called 'sku'. When you create a new product, you'll want to do a transactional insert of both the product entity and the sku entity.

For example, let's say you want to add a new product with the kind name product with the id of abc:

  • "product/abc" = {"sku": 1234, "product_name": "Test product"}

To ensure uniqueness on the property "sku", you'll always want to insert an entity with the kind name sku and the id that equals the property's value:

  • "sku/1234" = {"created": "2017-05-11"}

The above example entity has a property for created date - just something optional I threw in as part of the example.

Now, as long as you insert both of these as part of the same transaction, you will be ensuring that the "sku" property has a unique value. This works because:

  • Insert ensures write will fail if the sku entity for that number already exists
  • The transaction ensures writing the product entity (with the sku value) and the sku entity are atomic - so if the sku isn't unique, writing the sku entity will fail, causing the product entity write to also fail.


回答2:

You can use "sku" as an "id" (if it's a number) or "name" (if it's a string) for your entity, instead of storing "sku" as a property. Then it's guaranteed to be unique as it becomes part of the unique entity key.



回答3:

Data model is a big subject but IMO there are two approaches you can choose. This is more fundamental rather specific to your question. It gives some ideas.

The first approach – storing a reference as a property

Same as thinking of product contains product variants ...

This approach sort of the same from RDBMS world. You can create products separately, and each product will have a reference in each product variants. It is similar to how foreign keys work in databases. So, you will have a new property for the product variant entities, which will contain a reference to the product to which it belongs. The product attribute will actually contain the key of an entity of the Product Kind. If it sounds confusing this is how u can dissect it. I will use python as example:

# product model
class Product(ndb.Model):
    name = ndb.StringProperty()

# product variant model
class ProductVariant(ndb.Model):
    name = ndb.StringProperty()
    price = ndb.IntegerProperty()
    # product key.
    product = ndb.KeyProperty(kind=Product)

hugoboss = Product(name="Hugo Boss", key=ndb.Key(Product, 'hugoboss'))
gap = Product(name="Gap", key=ndb.Key(Gap, 'gap'))

pants1 = ProductVariant(name="Black panst", price=300, product=hugoboss.key)
pants2 = ProductVariant(name="Grey pants", price=200, product=hugoboss.key)
tshirt = ProductVariant(name="White graphic tshirt", price=10, product=gap.key)

pants1.put()
pants2.put()
tshirt.put()

# so lets say give me all pants that has label hugoboss
for pants in ProductVariant.query(ProductVariant.product == hugoboss.key).fetch(10):
    print pants.name

# You should get something:
Black pants
Grey panst

The second approach – a product within the key

To take full advantage of it you need to know about sorting feature of Bigtable(Datastore build on top of Bigtable) row keys and how data manipulated around it. if you want to deep dive there is great paper Bigtable: A Distributed Storage System for Structured Data

# product model
class Product(ndb.Model):
    name = ndb.StringProperty()

# product variant model
class ProductVariant(ndb.Model):
    name = ndb.StringProperty()
    price = ndb.IntegerProperty()

hugoboss = ndb.Key(Product, 'hugoboss')
gap = ndb.Key(Product, 'gap')

Product(name="Hugo Boss", key=hugoboss).put()
Product(name="Gap", key=gap).put()

pants1 = ProductVariant(name="Black pants", price=300, parent=hugoboss)
pants2 = ProductVariant(name="Grey pants", price=200, parent=hugoboss)
tshirt = ProductVariant(name="White graphic tshirt", price=10, parent=gap)

pants1.put()
pants2.put()
tshirt.put()

# so lets say give me all pants that has label hugoboss
for pants in ProductVariant.query(ancestor=hugoboss).fetch(10):
    print pants.name

# You should get something:
Black pants
Grey pants

Second approach is very powerful! I hope this helps.