Locate popular strings with PostgreSQL

I have a bunch of text rows in a PostgreSQL table and I am trying to find common strings.

For example, let's say I have a basic table like:

CREATE TABLE a (id serial, value text);
INSERT INTO a (value) VALUES
    ('I go to the movie theater'), 
    ('New movie theater releases'), 
    ('Coming out this week at your local movie theater'),
    ('New exposition about learning disabilities at the children museum'),
    ('The genius found in learning disabilities')
;

I am trying to locate popular strings like movie theater and learning disabilities across all the rows (the goal is to show a list of "trending" strings king of like Twitter "Trends")

I use full text search and I have tried to use ts_stat combined with ts_headline but the results are quite disappointing.

Any thoughts? thanks!

标签： sql postgresql full-text-search postgresql-9.6 tsvector

2条回答

Animai°情兽

2楼-- · 2020-06-29 02:24

There is no ready-to-use Posgres text search feature to find most popular phrases. For two-words phrases you can use ts_stat() to find most popular words, eliminate particles, prepositions etc, and cross join these words to find most popular pairs.

For an actual data you would want to change values marked as --> parameter. The query may be quite expensive on a larger dataset.

with popular_words as (
    select word
    from ts_stat('select value::tsvector from a')
    where nentry > 1                                --> parameter
    and not word in ('to', 'the', 'at', 'in', 'a')  --> parameter
)
select concat_ws(' ', a1.word, a2.word) phrase, count(*) 
from popular_words as a1
cross join popular_words as a2
cross join a
where value ilike format('%%%s %s%%', a1.word, a2.word)
group by 1
having count(*) > 1                                 --> parameter
order by 2 desc;


        phrase         | count 
-----------------------+-------
 movie theater         |     3
 learning disabilities |     2
(2 rows)

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2020-06-29 02:35

How about something like: SELECT * FROM a WHERE value LIKE '%movie theater%';

This would find rows which match the pattern 'movie theater' somewhere in the value column (and could include any number of characters before or after it).

0人赞添加讨论(0) 举报

Locate popular strings with PostgreSQL

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间