I was trying to see the kind of performance gains column-store indexes can provide on a table. The table has about a 3.7 million rows, 11 columns and is stored as a heap (i.e without a primary key). I create a column-store index on the table and run the following query:
SELECT
[Area], [Family],
AVG([Global Sales Value]) AS [Average GlobalSalesValue],
COUNT([Projected Sales])
FROM
dbo.copy_Global_Previous5FullYearSales
WHERE
[Year] > 2012
GROUP BY
[Area], [Family]
The create table statement is as follows:
CREATE TABLE [dbo].[copy_Global_Previous5FullYearSales]
(
[SBU] [NVARCHAR](10) NULL,
[Year] [INT] NULL,
[Global Sales Value] [MONEY] NULL,
[Area] [NVARCHAR](50) NULL,
[Sub Area] [NVARCHAR](50) NULL,
[Projected Sales] [MONEY] NULL,
[Family] [NVARCHAR](50) NULL,
[Sub Family 1] [NVARCHAR](50) NULL,
[Sub Family 2] [NVARCHAR](50) NULL,
[Manufacturer] [NVARCHAR](40) NULL,
[rowguid] [UNIQUEIDENTIFIER] NOT NULL,
[ID] [INT] IDENTITY(1,1) NOT NULL,
PRIMARY KEY CLUSTERED ([ID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The performance gains I get from column-store index in this case is negligible. The query with the column-store index runs nearly as slow as the original query without an index, in some cases, even slower, even though batch mode is processing is used too.
Surprisingly, when I create an ever increasing primary key - ID on the existing table and rebuild the column-store index, I get a 15X improvement on the CPU time and a 3X improvement on the elapsed time.
I don't understand how the addition of a primary key could affect query performance for column-store indexes which store the data in a compressed format anyway. Also primary keys only change the ordering of the pages which in this case, would be none.
Below is the execution plan
The presence of a a key changes how the columnstore is built. Because the builder gets its input in order the resulted segments are better candidates for segment elimination. Read more at Ensuring Your Data is Sorted or Nearly Sorted by Date to Benefit from Date Range Elimination:
Your order is by
ID
but I'm pretty sure that causes functional dependencies side-effects.