-->

How to model Student/Classes with DynamoDB (NoSQL)

2019-03-14 16:43发布

问题:

I'm trying to get my way with DynamoDB and NoSQL.

What is the best (right?) approach for modeling a student table and class tables with respect to the fact that I need to have a student-is-in-class relationship. I'm taking into account that there is no second-index available in DynamoDB.

The model needs to answer the following questions:

Which students are in a specific class?

Which classes a student take?

Thanks

回答1:

A very simple suggestion (without range keys) would be to have two tables: One per query type. This is not unusual in NoSQL databases.

In your case we'd have:

  • A table Student with attribute StudentId as (hash type) primary key. Each item might then have an attribute named Attends, the value of which was a list of Ids on classes.
  • A table Class with attribute ClassId as (hash type) primary key. Each item might then have an attribute named AttendedBy, the value of which was a list of Ids on students.

Performing your queries would be simple. Updating the database with one "attends"-relationship between a student and a class requires two separate writes, one to each table.

Another design would have one table Attends with a hash and range primary key. Each record would represent the attendance of one student to one class. The hash attribute could be the Id of the class and the range key could be the Id of the student. Supplementary data on the class and the student would reside in other tables, then.



回答2:

To join two Amazon DynamoDB tables

The following example maps two Hive tables to data stored in Amazon DynamoDB. It then calls a join across those two tables. The join is computed on the cluster and returned. The join does not take place in Amazon DynamoDB. This example returns a list of customers and their purchases for customers that have placed more than two orders.

CREATE EXTERNAL TABLE hive_purchases(customerId bigint, total_cost double, items_purchased array<String>) 
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "Purchases",
"dynamodb.column.mapping" = "customerId:CustomerId,total_cost:Cost,items_purchased:Items");

CREATE EXTERNAL TABLE hive_customers(customerId bigint, customerName string, customerAddress array<String>) 
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "Customers",
"dynamodb.column.mapping" = "customerId:CustomerId,customerName:Name,customerAddress:Address");

Select c.customerId, c.customerName, count(*) as count from hive_customers c 
JOIN hive_purchases p ON c.customerId=p.customerId 
GROUP BY c.customerId, c.customerName HAVING count > 2;