I have a bunch of text rows in a PostgreSQL table and I am trying to find common strings.
For example, let's say I have a basic table like:
CREATE TABLE a (id serial, value text);
INSERT INTO a (value) VALUES
('I go to the movie theater'),
('New movie theater releases'),
('Coming out this week at your local movie theater'),
('New exposition about learning disabilities at the children museum'),
('The genius found in learning disabilities')
;
I am trying to locate popular strings like movie theater
and learning disabilities
across all the rows (the goal is to show a list of "trending" strings king of like Twitter "Trends")
I use full text search and I have tried to use ts_stat
combined with ts_headline
but the results are quite disappointing.
Any thoughts? thanks!
There is no ready-to-use Posgres text search feature to find most popular phrases. For two-words phrases you can use
ts_stat()
to find most popular words, eliminate particles, prepositions etc, and cross join these words to find most popular pairs.For an actual data you would want to change values marked as
--> parameter.
The query may be quite expensive on a larger dataset.How about something like:
SELECT * FROM a WHERE value LIKE '%movie theater%';
This would find rows which match the pattern 'movie theater' somewhere in the value column (and could include any number of characters before or after it).