NoSQL: Getting the latest values from tables Dynam

2019-01-17 01:18发布

I have a little problem that needs some suggestions:

  • Lets say we have a few hundred data tables with a few dozen million rows each.
  • Data tables are timestamp(key) - value
  • Data tables are written once every second

The latest entry of each table should be quickly obtainable and will most likely be queried the most (sorta like "follow data in real time"). With the lack of 'Last()' or similar, I was thinking of creating another table "LatestValues" where the latest entry of each data table is updated for a faster retrieval. This, however, would add an extra update for each write operation. Also, most of the traffic would be concentrated on this table (good/bad?). Is there a better solution for this or am I missing something?

Also, lets say we want to query for the values in data tables. Since scanning is obviously out of the question, is the only option left to create a secondary index by duplicating the data, effectively doubling the storaging requirements and the amount write operations? Any other solutions?

I'm primarily looking at DynamoDB and Azure Table Storage, but I'm also curious how BigTable handles this.

2条回答
三岁会撩人
2楼-- · 2019-01-17 01:59

I just published an article today with some common "recipes" about DynamoDB. One of them is "Storing article revisions, getting always the latest" I think it might interest you :)

In a nutshell, you can get the latest item using Query(hash_key=..., ScanIndexForward=True, limit=1)

But, this assumes you have a range_key_defined.

With Scan, you have no such parameter as ScanIndexForward=false and anyway, you can not rely on the order as data is spread over partitions and the Scan request is then load balanced.

To achieve you goal with DynamoDB, you may "split" your timestamp this way:

  1. hash_key: date
  2. range_key: time or full timestamp, as you prefer

Then, you can use the 'trick' of Query + Limit=1 + ScanIndexForward=false

查看更多
干净又极端
3楼-- · 2019-01-17 01:59

In general, you probably just want to reverse the timestamp, so it decreases over time, leaving the newest row on top.

Here's a blog post of mine outlining how to do this with Windows Azure storage: http://blog.smarx.com/posts/using-numbers-as-keys-in-windows-azure.

UPDATE

I use DynamoDB for one project, but in a very simplistic way, so I don't have much experience. That said, http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html suggest to me that you can just specify ScanIndexForward=false and Limit=1 to get the last item.

查看更多
登录 后发表回答