I am thinking about using a NoSQL database to scale database reads. Please see the relational database structure below:
CREATE TABLE Person(
ID uniqueidentifier not null,
Name varchar(100),
DateOfBirth datetime)
CREATE TABLE Sport (
ID uniqueidentifier not null,
Description varchar(50)) -- e.g. Football; Tennis; Badminton etc
CREATE TABLE PersonPlaysSport (
PersonID uniqueidentifier FOREIGN KEY REFERENCES Person(ID),
SportID uniqueidentifier FOREIGN KEY REFERENCE Sport (ID),
primary key (PersonID, SportID)
In the example above a Person Plays many Sports. In my real application; I have many-to-many relationships like this that do not perform well.
How would these be stored in a NoSQL document database (DynamoDB)?
Disclaimer - I'm not familiar with DynamoDb
, but have used several other NoSql databases
The common approach is to choose the most important subject entity as the root
of the document (in your case, I would say this is Person
)
A document is then created for each person, and will include the "person centric" view of all associated entities (i.e. linked sports):
Joe (Person, Keyed on a natural, or surrogate id).
+ Fields of Joe (Date of Birth, etc)
+ SportsPlayed: (Collection)
--> Golf (Sport)
--> Tennis (Sport)
If it becomes important to view the relationship from a Sport centric approach (e.g. you need to know which persons are 'subscribed' to which Sport):
You could attempt a secondary index on Person.Sport
, if the NoSql database allows this. This would allow for queries like "Who plays Golf?", although this approach is often frowned upon in NoSql terms.
Alternatively, and preferably, create a second collection of documents, this time keyed by Sport
:
Golf (Sport)
- Joe
- Jim
...
etc. Obviously there's extra work to be done in keeping both sets of documents up to date when a change is made to a Person
, a Sport
, or the relationship between them, however the benefit is high performance on the read side - only a single document needs to be retrieved to pull the entire entity graph - In SQL terms, this would have required a Query joining 3 distinct tables.