Postgresql fastest way of getting the set of uniqu

2019-07-05 06:05发布

I have a constantly growing, potentially very large table in a Postgresql database that contains "data" from different "channels" for different "devices" eg.:

Table data:
  id (PK)
  device_id (FK -> device)
  channel_id (FK -> channel)
  timestamp (TIMESTAMP)
  value (Float)

I am using partitioning to separate the table into multiple subtables, one for each device, since I never need data for different devices in the same query. Since not all devices supply all channels, I would like to get a list of all available channels for a single device. A simple SQL Query that solves my problem ist:

SELECT DISTINCT(channel_id) FROM data where device_id = 1;

This works just fine, but takes quite long for devices with many entries in the data table. Using Partitioning sped up the process quite a bit, since the database only has to go over the entries for one device, but it still takes quite long to complete for some of the devices.

The properties of my application are such that it rarely happens that new channel types are added for one device. Mostly the channels are added quickly after the device is added to the database and no new channels will be added afterwards. However, data for the available channels is added quite frequently. I also need the list of available channels per device quite often to be displayed to the user and would therefore like to speed up the lookup of the available channels. I already have an index over the channel_ids, in the hope that it would speed up getting the desired list, but it can still take quite a long time to retrieve the list.

Currently, I can think of a couple of ways to help my problem:

  1. Use some "magic" function of postgres that does exactly what I want but of which I know nothing about.
  2. Use triggers to maintain an extra table of some sorts, that contains only one entry per "device" / "channel" combination and therefor allows a quick access to the available channels. Since new data is inserted quite often, however, I would imagine the trigger creating quite a lot of overhead while it only detects new channels in rare occasions.
  3. Rethink my database design. Maybe inserting another table that contains the link between "channel" and "device" and then only habe a FK in "data" to this table. The query on the intermediate table should be quick, however, this basically adds more complexity to my application that inserts the data into the database.
  4. Maybe Views can be of help here?
  5. Use buffering in my frontend to only periodically execute the query, since the available channels do not change often.

Points 2. and 3. seem to me like they would add a lot of unnecessary overhead which I would like to avoid. Since I did not find anything regarding this topic, I currently think that solution 5. is the way I would go. However, I was wondering if anybody had a better, possibly database based solution for my problem.

Thank you for your help.

1条回答
看我几分像从前
2楼-- · 2019-07-05 06:43

For option 1, what you want to do is a "loose index scan", or a "skip scan".

It would be nice if PostgreSQL would automatically do those when they are beneficially, but as of now it does not do that. But you can trick it into it. I've never tried this on a partitioned table, but I think it would be simple matter of adding the suitable WHERE clause to each branch of the union all.

Maybe you want option 3 as well or instead, there is not enough info for me to say.

查看更多
登录 后发表回答