How to store aggregated catalog tree search result

2019-02-21 06:59发布

I have a big product catalog tree which currently contains ~36 thousand categories and ~1 milion products (i.e leafs). It is structured like this (maximum depth is 5):

Cat1
|_Cat11
| |_Cat111
| | |_Cat1111
| | | |_Product1
| | | |_...
| | |_Cat1112
| | | |_Product1
| | | |_...
| | |_Cat1113
| | | |_Product1
| | | |_...
| |_Cat112
|   |_Cat1121
|   | |_Product1
|   | |_...
|   |_Cat1122
|   | |_Product1
|   | |_...
|   |_Cat1123
|     |_Product1
|     |_...
|_Cat12
| |_Cat121
| | |_Cat1211
| | |_Cat1212
| | |_Cat1213
| |_Cat122
|   |_Cat1221
|   |_Cat1222
|   |_Cat1223
|_...
Cat2
|...

When searching this catalog (using SQL Server Freetext search) I get a bunch of categories and products back very fast. Some searches get hits on very many products. I want the result to be aggregated and presented with a total of hits for each category currently expanded. Like this (2 examples at different levels):

**Ex1 (first level)**
Cat1(563)
|
Cat2(332)
|
Cat8(2)

**Ex2 (second level)**
Cat1
|_Cat12(102)
|_Cat14(201)
|_...

What I have tried so far is to store all parent/child relations in Redis (stored as sets). Then to get the aggregated result I simply traverse from the products (from the search result) up through its parents up to the currently expanded category (or rather its immediate children) to find which categories to present and to count number of products below it matching the search. If I have about 5000 products in the search result then this takes about 20 seconds. Way to long.

What whould be a better way to accomplish this? One way would be to have all 1 million products already aggregated on each category but that would require 36 million keys and would probably require to much RAM. I currently use 500Mb already.

标签: redis
1条回答
地球回转人心会变
2楼-- · 2019-02-21 07:59

If you want speed, you should prepare as much as possible when storing the structure or 'cache' in redis. If you store the products in a HSET, and add the category counters (one per catagory) alongside your 'product data' member in this HSET, you can use HINCRBY to increment/decrement the counters.

In general (designing a Redis cache for your needs): you should try to prevent retrieving any data that you don't need.

I recommend to use a Lua script for storing (/updating/deleting) as well as retrieving your aggregated report. Lua scripts are executed on the Redis server. ServiceStack supports them (SCRIPT LOAD + EVALSHA or simply EVAL), and you could also try the BookSleeve C# client module (which we use, and is a bit faster. 'faster': good redis-data design comes first ofcourse). The BookSleeve C# client focuses on multithreaded redis pipelining, which is probably what you want when dealing with large datasets. Pipelining should be possible with ServiceStack as well.

If the categories and products have an integer ID, you can also combine this with a ZSET, where you can use the ID as the score field. With a ZRANGEBYSCORE you can directly get the 'record'. This technique is safe as long as your ID's are using 15 digits or less, and don't use the decimal part of the 'score'. So the ID has to stay in the range -999999999999999 to 999999999999999. Note: These limits exist because Redis server actually stores the score (float) as a redis-string representation internally.

Hope this helps, TW

查看更多
登录 后发表回答