I have the following table:
CREATE TABLE recipemetadata
(
--Lots of columns
diet_glutenfree boolean NOT NULL,
);
Most every row will be set to FALSE
unless someone comes up with some crazy new gluten free diet that sweeps the country.
I need to be able to very quickly query for rows where this value is true. I've created the index:
CREATE INDEX IDX_RecipeMetadata_GlutenFree ON RecipeMetadata(diet_glutenfree) WHERE diet_glutenfree;
It appears to work, however I can't figure out how to tell if indeed it's only indexing rows where the value is true. I want to make sure it's not doing something silly like indexing any rows with any value at all.
Should I add an operator to the WHERE
clause, or is this syntax perfectly valid? Hopefully this isn't one of those super easy RTFM questions that will get downvoted 30 times.
UPDATE:
I've gone ahead and added 10,000 rows to RecipeMetadata with random values. I then did an ANALYZE on the table and a REINDEX just to be sure. When I run the query:
select recipeid from RecipeMetadata where diet_glutenfree;
I get:
'Seq Scan on recipemetadata (cost=0.00..214.26 rows=5010 width=16)'
' Filter: diet_glutenfree'
So, it appears to be doing a sequential scan on the table even though only about half the rows have this flag. The index is being ignored.
If I do:
select recipeid from RecipeMetadata where not diet_glutenfree;
I get:
'Seq Scan on recipemetadata (cost=0.00..214.26 rows=5016 width=16)'
' Filter: (NOT diet_glutenfree)'
So no matter what, this index is not being used.
I've confirmed the index works as expected.
I re-created the random data, only this time set diet_glutenfree
to random() > 0.9
so there's only a 10% chance of an on
bit.
I then re-created the indexes and tried the query again.
SELECT RecipeId from RecipeMetadata where diet_glutenfree;
Returns:
'Index Scan using idx_recipemetadata_glutenfree on recipemetadata (cost=0.00..135.15 rows=1030 width=16)'
' Index Cond: (diet_glutenfree = true)'
And:
SELECT RecipeId from RecipeMetadata where NOT diet_glutenfree;
Returns:
'Seq Scan on recipemetadata (cost=0.00..214.26 rows=8996 width=16)'
' Filter: (NOT diet_glutenfree)'
It seems my first attempt was polluted since PG estimates it's faster to scan the whole table rather than hit the index if it has to load over half the rows anyway.
However, I think I would get these exact results on a full index of the column. Is there a way to verify the number of rows indexed in a partial index?
UPDATE
The index is around 40k. I created a full index of the same column and it's over 200k, so it looks like it's definitely partial.
An index on a one-bit field makes no sense. For understanding the decisions made by the planner, you must think in terms of pages, not in terms of rows.
For 8K pages and an (estinated) rowsize of 80, there are 100 rows on every page. Assuming a random distribution, the chance that a page consist of only rows with a true
value is neglectable, pow (0.5, 100)
, about 1e-33, IICC. (and the same for 'false' of course) Thus for a query on gluten_free == true
, every page has to be fetched anyway, and filtered afterwards. Using an index would only cause more pages (:the index) to be fetched.