可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to get 25 random samples of 15,000 IDs from a table. Instead of manually pressing run every time, I'm trying to do a loop. Which I fully understand is not the optimum use of Postgres, but it is the tool I have. This is what I have so far:
for i in 1..25 LOOP
insert into playtime.meta_random_sample
select i, ID
from tbl
order by random() limit 15000
end loop
回答1:
Procedural elements like loops are not part of the SQL language and can only be used inside the body of a procedural language function, procedure (Postgres 11 or later) or a DO
statement, where such additional elements are defined by the respective procedural language. The default is PL/pgSQL, but there are others.
Example with plpgsql:
DO
$do$
BEGIN
FOR i IN 1..25 LOOP
INSERT INTO playtime.meta_random_sample
(col_i, col_id) -- declare target columns!
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000;
END LOOP;
END
$do$;
For many tasks that can be solved with a loop, there is a shorter and faster set-based solution around the corner. Pure SQL equivalent for your example:
INSERT INTO playtime.meta_random_sample (col_i, col_id)
SELECT t.*
FROM generate_series(1,25) i
CROSS JOIN LATERAL (
SELECT i, id
FROM tbl
ORDER BY random()
LIMIT 15000
) t;
About generate_series()
:
- What is the expected behaviour for multiple set-returning functions in select clause?
About optimizing performance of random selections:
- Best way to select random rows PostgreSQL
回答2:
I just ran into this question and, while it is old, I figured I'd add an answer for the archives. The OP asked about for loops, but their goal was to gather a random sample of rows from the table. For that task, Postgres 9.5+ offers the TABLESAMPLE clause on WHERE. Here's a good rundown:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/
I tend to use Bernoulli as it's row-based rather than page-based, but the original question is about a specific row count. For that, there's a built-in extension:
https://www.postgresql.org/docs/current/tsm-system-rows.html
CREATE EXTENSION tsm_system_rows;
Then you can grab whatever number of rows you want:
select * from playtime tablesample system_rows (15);
回答3:
Below is example you can use:
create temp table test2 (
id1 numeric,
id2 numeric,
id3 numeric,
id4 numeric,
id5 numeric,
id6 numeric,
id7 numeric,
id8 numeric,
id9 numeric,
id10 numeric)
with (oids = false);
do
$do$
declare
i int;
begin
for i in 1..100000
loop
insert into test2 values (random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random(), i * random(), i / random(), i + random());
end loop;
end;
$do$;
回答4:
I find it more convenient to make a connection using a procedural programming language (like Python) and do these types of queries.
import psycopg2
connection_psql = psycopg2.connect( user="admin_user"
, password="***"
, port="5432"
, database="myDB"
, host="[ENDPOINT]")
cursor_psql = connection_psql.cursor()
myList = [...]
for item in myList:
cursor_psql.execute('''
-- The query goes here
''')
connection_psql.commit()
cursor_psql.close()