Console app that shards Local Xdb and migrates contacts/interactions to new shard DB

In case of Experience DB, Sitecore uses sharding to distribute and manage contacts across multiple databases. By default, in a local XP0 setup, there are two shard DBs and contacts (or any map entity like interactions, device profiles) go into any one of these DBs based on shard key range specified in global shard mapping manager, which again is a db. Sitecore, supports custom splitting/sharding in an Azure setup, wherein more DBs can be added to existing setup so that contacts can be distributed and there is less load on existing shards. This sharding capability is currently unavailable for a local setup. So, out of curiosity, i decided to implement sharding, in a local setup. To understand the concept better, there are shard maps. For instance contacts are a shard map. Then, there is shard mapping wherein the map is allocated to a shard. Thankfully, all these are part of Sitecore DB table design and Sitecore has enough documentation

This tool is tested on Sitecore 10.4.0 and 10.4.1 instances.

Before using this tool, ensure to take a backup of your shard DBs and store it safely for future restore. This tool has scope for improvement. The idea here is just to do a PoC and then improve as and when there is a real need.

Prerequisite for running the sharding console app is to setup these params in the app.config:

While most params above are self-explanatory and you must replace the correct params as per your local xconn setup. Most importantly, certConnString thumbprint must match the one in xconn web root thumbprint:

Then, smmConnString must be picked from xconn web root connectionstrings.config as highlighted below:

The next prerequisite is to allow ad-hoc queries like OPENROWSET / OPENDATASOURCE:

So, execute this in the backend/SQL Server:

*****

*****

The console app I built has two  basic shard options:

Option 1- Implement shard splitting in a fresh setup wherein there are no contacts: Splits shards equally between multiple DBs as the shard key range falls between 0x00 to 0xFF. Since Sitecore stores shard key as byte-1 array, here is a list of how byte-1 array value maps to numeric decimal value - Byte-1 and decimal numeric value relation. For an idea, here is a snapshot of the list that runs from 0 to 255: 

Option 2- Gets the source shard db and target shard db details, then gets the range to split from source and moves that range into target db

Understandably, for this option, you need existing contacts and there is a separate console app, built based on Sitecore documentation, to bulk insert contacts and Sitecore takes care of sending these contacts to appropriate shard based on shard key:

In the backend, you should see contacts split as follows:

Shard0 (with shardkey column): 0x0 - 0x80

Shard1  (with shardkey column): 0x80 - null or 0xFF

Note that sharding will split in a contiguous manner. For instance, the first part of the split will go to one DB while the second part will go to the next db. In other words, you CANNOT split/take a range in between and expect the first and third parts to go to first db while the middle part goes to the new DB. It is always two contiguous splits that is sharded.

Since we are currently dealing with option 2 or splitting existing map data, get the server name, source db and target db.

Note that as of now, source and target db must be in the same machine/server name.

The utility goes off and creates the new shard db or the target apart from deploying the related schema:

Then, provide the new range for source followed by target db shard range:

In the backend, at this point the new shard db will have all required tables and procedures apart from creating a db collection user, which is very important for the functionality:

For instance, in case of default setup, 0x80 to NULL is stored in Shard1. So, new setup could be 0x80 to 0xC0 (a center split) wherein the first half stays in Shard1. Then, the second half will go to the new shard db. So, the input for the same is 0x80 to 0xFF.

Now, the tool goes off to delete old mappings from Global Shard map manager as well as from local shard for the specified range. Note that the tool shards all maps - 

DeviceProfileIdShardMap

ContactIdentifiersIndexShardMap

ContactIdShardMap

In other words, any table that has shard key column will be sharded. For instance, Interactions, Contacts and DeviceProfiles to name the primary entities.

As part of sharding, the new range gets inserted in the corresponding tables. This step is what decides what contact/relevant map will go to what shard from hereon.

Finally, based on the new range, the contacts/relevant map in the source db will hold only the relevant contacts while the rest of the contacts/maps should now be migrated to the new shard db.

Now, restart the xConnect services related to the web site -

1. Marketing Automation

2. Index worker

3. Processing Engine

4. Also, restart the xConnect website instance

Now fire the contacts console app to see the contacts getting inserted as per new mapping.

The following scr shot is taken from one of the new shards added as part of testing based on the new shard key range - 0xC0 to 0xFF.

Github repo

Shard existing contacts demo:


All-in-one demo:

Comments