Search
  • Kamil Kosno

Partitioning clustered columnstore index

Updated: May 2

Partitioning the clustered columnstore index is performed slightly differently than the rowstore one. This post will help avoid problems we may face when creating a partitioned columstore index.

Prerequisites


I will be working with Microsoft SQL Server 2019(RTM), developer's edition. For this exercise I will use testdb database. My table is a list of orders, and the column I want to partition on is order_time.


Let's create the table:


use testdb;
go 
drop table if exists dbo.orders_cs_partition_test;
create table dbo.orders_cs_partition_test 
( order_id bigint not null, 
order_time datetime2(0) not null, 
customer_id bigint not null, 
amount decimal(24,8) not null ); 
go

We also need partition function and scheme:


declare @pf_name varchar(128) = 'pf_orders_datetime2';
declare @ps_name varchar(128) = 'ps_orders_datetime2';

if exists(select * from sys.partition_schemes where [name] = @ps_name)
 drop partition scheme ps_orders_datetime2;
if exists(select * from sys.partition_functions where [name]= @pf_name)
 drop partition function pf_orders_datetime2;

create partition function pf_orders_datetime2(datetime2(0)) as range right for values ('20210401','20210501','20210601');
create partition scheme ps_orders_datetime2 as partition pf_orders_datetime2 all to ([primary]);

For my tests, I chose to partition by month, and created four ranges. Range right means that each range's end is marked by the range value, but the value itself is excluded.


For instance, in case of value '20210401' the range extends records starting with the previous range value - in this case previous range value is not specified so all records with order_time earlier than 1st of April 2021 will belong to that range - and ending just before midnight of 2021-04-01. 1st of April 2021 marks the starting point of the following range.

Partitioning an empty table


In our first experiment we will try to create a columnstore index on a table, using the partition scheme we created earlier. As a reminder, if our clustered index was a rowstore index, we could create it like so:


drop table if exists dbo.orders_cs_partition_test;
create table dbo.orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
);
create clustered index idx_cl_order_time on dbo.orders_cs_partition_test(order_time, order_id) on ps_orders_datetime2(order_time); 

No problems there. Now, let's try the same approach with the clustered columnstore index:


drop table if exists dbo.orders_cs_partition_test;
create table dbo.orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
);
create clustered columnstore index idx_cci on dbo.orders_cs_partition_test on ps_orders_datetime2(order_time);

and this time, SQL Server complains:

Msg 35316, Level 16, State 1, Line 25
The statement failed because a columnstore index must be partition-aligned with the base table. Create the columnstore index using the same partition function and same (or equivalent) partition scheme as the base table. If the base table is not partitioned, create a nonpartitioned columnstore index.

What is it trying to tell us?


In order to create a partitioned columnstore index on a table, the base table has to be partitioned already, using the same partition function and the same (or equivalent) partition scheme. Otherwise, we can only create a nonpartitioned columnstore index.


Let's create our table again, now as suggested, using the same partition scheme:


drop table if exists dbo.orders_cs_partition_test;
create table dbo.orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
) on ps_orders_datetime2(order_time);
go

It worked! If we try to create a columnstore index now, we succeed:


create clustered columnstore index idx_cci on dbo.orders_cs_partition_test on ps_orders_datetime2(order_time);

To conclude this section, introducing a partitioned clustered columnstore index requires that the base table is partition-aligned before the index can be added to the table. It is quite simple if we are dealing with a new table, we can recreate it using a given partition scheme and then add an index.

Partitioning an existing table


By now we know that if we need to create a partitioned clustered columnstore index, it has to be aligned with partitioning of the base table. If the table already exists and contains records we can either create a new, partitioned, table and copy the data or we may choose to add an index to the existing table.


The former case is very similar to what we have already covered in the previous section.


When we really have no choice but to apply clustered columnstore index to an existing table, we need to take a slightly different approach.


Using the same dbo.orders_cs_partition_test table, let's assume we have a rowstore clustered index in place like so:


drop table if exists dbo.orders_cs_partition_test;
create table dbo.orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
);
create clustered index idx_cl_ordertime on dbo.orders_cs_partition_test(order_time);
insert into dbo.orders_cs_partition_test
values
(1,'20210408 10:18',1,25.68),
(2,'20210410 12:34',2,155.60),
(3,'20210429 16:25',2,10.99);

In order to add a clustered columnstore index, we need to drop the current index first:


drop index idx_cl_ordertime on dbo.orders_cs_partition_test;

We cannot simply add a new columnstore index, if we try we will encounter the same error as before:


create clustered columnstore index idx_cci on dbo.orders_cs_partition_test on ps_orders_datetime2(order_time);
 
Msg 35316, Level 16, State 1, Line 75
The statement failed because a columnstore index must be partition-aligned with the base table. Create the columnstore index using the same partition function and same (or equivalent) partition scheme as the base table. If the base table is not partitioned, create a nonpartitioned columnstore index.

A possible workaround to apply here:

  1. re-create clustered rowstore index on the partition scheme

  2. drop the rowstore index

  3. create clustered columnstore index on the partition scheme


--1.create partitioned clustered rowstore index
create clustered index idx_cl_ordertime on dbo.orders_cs_partition_test(order_time) with(drop_existing=ON) on ps_orders_datetime2(order_time);
--2.drop the rowstore index
drop index idx_cl_ordertime on dbo.orders_cs_partition_test;
--3.add partitioned columnstore index
create clustered columnstore index idx_cci on dbo.orders_cs_partition_test on ps_orders_datetime2(order_time);

Partition switch


Partition switch operation should not pose a big challenge, as long as we remember to align the partitions between the tables. Assuming we have a staging table defined as follows:


drop table if exists dbo.stg_orders_cs_partition_test;
create table dbo.stg_orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
);
create clustered columnstore index idx_cci on dbo.stg_orders_cs_partition_test 

we can switch "out" a partition from the base table:


alter table dbo.orders_cs_partition_test switch partition 2 to dbo.stg_orders_cs_partition_test

However, if we attempt to switch it back we will encounter an error:

alter table dbo.stg_orders_cs_partition_test switch to dbo.orders_cs_partition_test partition 2

Msg 4982, Level 16, State 1, Line 109
ALTER TABLE SWITCH statement failed. Check constraints of source table 'testdb.dbo.stg_orders_cs_partition_test' allow values that are not allowed by range defined by partition 2 on target table 'testdb.dbo.orders_cs_partition_test'.

As suggested by the error message, we can introduce a check constraint on the order_time like so:


alter table dbo.stg_orders_cs_partition_test add constraint chk_time check(order_time >= '20210401' and order_time < '20210501')

and then the partition switch will be allowed.


The cleaner way (in my opinion) to go about it is to create staging table on the same partition scheme as the base table instead:


drop table if exists dbo.stg_orders_cs_partition_test;
create table dbo.stg_orders_cs_partition_test
(
 order_id bigint not null,
 order_time datetime2(0) not null, 
 customer_id bigint not null,
 amount decimal(24,8) not null
) on ps_orders_datetime2(order_time);
create clustered columnstore index idx_cci on dbo.stg_orders_cs_partition_test on ps_orders_datetime2(order_time);

Both tables' partitions are now perfectly aligned, we just need to make sure to add partition number after referring to the source tables in the switch statements:


alter table dbo.orders_cs_partition_test switch partition 2 to dbo.stg_orders_cs_partition_test partition 2;

and:


alter table dbo.stg_orders_cs_partition_test switch partition 2 to dbo.orders_cs_partition_test partition 2

Summary


Table partitioning is a useful feature, and in some scenarios we may want to apply partitioning together with the clustered columnstore index. Columnstore index is slightly more restrictive, when it comes to partitioning, than the rowstore index. It requires the base table to be partition-aligned before partitioned columnstore index can be created.


6 views0 comments