postgres: get random entries from table - too slow

2019-07-25 07:51发布

In my postgres database, I have the following relationships (simplified for the sake of this question):

Objects (currently has about 250,000 records)
-------
n_id
n_store_object_id (references store.n_id, 1-to-1 relationship, some objects don't have store records)
n_media_id (references media.n_id, 1-to-1 relationship, some objects don't have media records)

Store (currently has about 100,000 records)
-----
n_id
t_name,
t_description,
n_status,
t_tag

Media
-----
n_id
t_media_path

So far, so good. When I need to query the data, I run this (note the limit 2 at the end, as part of the requirement):

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
limit
    2

This works fine and gives me two entries back, as expected. The execution time on this is about 20 ms - just fine.

Now I need to get 2 random entries every time the query runs. I thought I'd add order by random(), like so:

select
    o.n_id,
    s.t_name,
    s.t_description,
    me.t_media_path
from
    objects o
    join store s on (o.n_store_object_id = s.n_id and s.n_status > 0 and s.t_tag is not null)
    join media me on o.n_media_id = me.n_id
order by
    random()
limit
    2

While this gives the right results, the execution time is now about 2,500 ms (over 2 seconds). This is clearly not acceptable, as it's one of a number of queries to be run to get data for a page in a web app.

So, the question is: how can I get random entries, as above, but still keep the execution time within some reasonable amount of time (i.e. under 100 ms is acceptable for my purpose)?

4条回答
Viruses.
2楼-- · 2019-07-25 07:57

I'm thinking you'll be better off selecting random objects first, then performing the join to those objects after they're selected. I.e., query once to select random objects, then query again to join just those objects that were selected.

查看更多
祖国的老花朵
3楼-- · 2019-07-25 08:01

Of course it needs to sort the whole thing according to random criteria before getting first rows. Maybe you can work around by using random() in offset instead?

查看更多
Viruses.
4楼-- · 2019-07-25 08:16

Here's some previous work done on the topic which may prove helpful:

http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/

查看更多
劳资没心,怎么记你
5楼-- · 2019-07-25 08:17

It seems like your problem is this: You have a table with 250,000 rows and need two random rows. Thus, you have to generate 250,000 random numbers and then sort the rows by their numbers. Two seconds to do this seems pretty fast to me.

The only real way to speed up the selection is not have to come up with 250,000 random numbers, but instead lookup rows through an index.

I think you'd have to change the table schema to optimize for this case. How about something like:

  • 1) Create a new column with a sequence starting at 1.
  • 2) Every row will then have a number.
  • 3) Create an index on: number % 1000
  • 4) Query for rows where number % 1000 is equal to a random number between 0 and 999 (this should hit the index and load a random portion of your database)
  • 5) You can probably then add on RANDOM() to your ORDER BY clause and it will then just sort that chunk of your database and be 1,000x faster.
  • 6) Then select the first two of those rows.

If this still isn't random enough (since rows will always be paired having the same "hash"), you could probably do a union of two random rows, or have an OR clause in the query and generate two random keys.

Hopefully something along these lines could be very fast and decently random.

查看更多
登录 后发表回答