I was trying to see the kind of performance gains column-store indexes can provide on a table. The table has about a 3.7 million rows, 11 columns and is stored as a heap (i.e without a primary key). I create a column-store index on the table and run the following query:
SELECT
[Area], [Family],
AVG([Global Sales Value]) AS [Average GlobalSalesValue],
COUNT([Projected Sales])
FROM
dbo.copy_Global_Previous5FullYearSales
WHERE
[Year] > 2012
GROUP BY
[Area], [Family]
The create table statement is as follows:
CREATE TABLE [dbo].[copy_Global_Previous5FullYearSales]
(
[SBU] [NVARCHAR](10) NULL,
[Year] [INT] NULL,
[Global Sales Value] [MONEY] NULL,
[Area] [NVARCHAR](50) NULL,
[Sub Area] [NVARCHAR](50) NULL,
[Projected Sales] [MONEY] NULL,
[Family] [NVARCHAR](50) NULL,
[Sub Family 1] [NVARCHAR](50) NULL,
[Sub Family 2] [NVARCHAR](50) NULL,
[Manufacturer] [NVARCHAR](40) NULL,
[rowguid] [UNIQUEIDENTIFIER] NOT NULL,
[ID] [INT] IDENTITY(1,1) NOT NULL,
PRIMARY KEY CLUSTERED ([ID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The performance gains I get from column-store index in this case is negligible. The query with the column-store index runs nearly as slow as the original query without an index, in some cases, even slower, even though batch mode is processing is used too.
Surprisingly, when I create an ever increasing primary key - ID on the existing table and rebuild the column-store index, I get a 15X improvement on the CPU time and a 3X improvement on the elapsed time.
I don't understand how the addition of a primary key could affect query performance for column-store indexes which store the data in a compressed format anyway. Also primary keys only change the ordering of the pages which in this case, would be none.
Below is the execution plan