Content migration strategy for XM Cloud

One of the most common scenarios in any Sitecore migration project is content migration. This is very much relevant in case of migration to XM Cloud environment. Although there might be multiple ways to migrate content, including use of Items as resources CLI plugin, this blog post utlilizes razl as the primary content migration tool. I'll also aim to build a separate blog post comparing different possible migration tools/approaches with pros and cons alongside the benchmarks, presuming my next work assignment will involve Sitecore XM Cloud access! 

Anyway, with respect to this post, the volume range talked about is 100K items for a specific large item bucket since each bucket can be migrated as a separate scheduled script or task. Personally speaking, item buckets are one of the most complex scenarios involved in content migration. Apart from the content migration strategy discussed, I've also blogged some of the scripts and tools handy for such migration. First things first, as with any project, here are the important phases and since real-time content is involved, good to follow an iterative approach right from the PoC stage helpful in deciding the final approach/scripts etc.:


Note that media item migration to Content Hub is not covered here.

1. Razl is a good tool for such content migrations and there are sample scripts available as part of the installation under C:\Program Files (x86)\Sitecore\Razl\Samples. However, here are the most useful ones:

CopyTreeUnderHome.ps1: Most effective approach for mass content migration esp., item buckets

*****

*****

DeepCompare.ps1: Useful as a post scheduled task to compare results between source and target

*****

*****

CopyChildrenofHome.ps1: Useful for scenarios wherein you have to push missing items row-by-row through a csv

*****

*****

2. Create separate branch for code build/deployment with respect to XM Cloud environment: This is highly advantageous since you don't want to mess-up with existing setup as well as improve velocity of development, build and deployment for the XM Cloud stream especially deploying ad-hoc environment variable patch config settings.

3. Two broad classifications: With respect to content migration -

3.1. Content Serialization setup: Unicorn to Sitecore CLI migration: A tool like this could be useful. 

3.2. Content migration/sync:

3.2.1 Two types of content migration:

- One-off or ad-hoc content migration: For initial content setup, although they are one-off PS scripts better to run via scheduler since you can log information and leave it to run overnight without intervention

- Scheduled content migration or also called as sync: track daily changes and migrate those on a daily basis

3. When you setup a razl package connection, you get a access guid and this code must be deployed across all the involved environments in order to establish connection

4. Have all involved XM Cloud environment identifiers handy

5. For scheduled daily sync:

- Setup a virtual machine that will handle such tasks

- the scheduled task should run in a quite time after business hours

- if need be, split tasks for different tree nodes based on time taken

Post scheduled task run:

- On a daily basis, run a separate compare task after the scheduled run, log the missing item results to a csv file and take corrective action based on failure. Note that while using Razl deep compare with item buckets around the 30K items mark, the compare failed and threw a connector error as stated down below against XM Cloud.

6. While the razl sample scripts are the actual reference, here is some script template, could be useful to just plugin tree nodes involved in migration:

Copy Item tree template:

*****

*****

Deep Compare template:

*****

*****

Lightning mode is useful to skip existing items.

7. Windows scheduled task details:

8. While you migrate content, there will be a need to clean-up data to start from scratch. Unfortunately, I couldn't use the Razl's Remove-RazlItem method, which actually would have been the most effective option for high volumes. So, I was left to look for other options. Out of those, Sitecore Powershell ISE and this PowerShell script seemed more effective:

Get-Item -Path master:'/sitecore/content/xyz/Shared/Shared/Data/ProductData' | Remove-Item -Recurse

Although it took me a lot of re-tries and many hours to delete about 100K items, PowerShell at least did the job compared with other options! 

This one was handy to find the count of items in a bucket:

$count=(Get-ChildItem -Path master:'/sitecore/content/xyz/Shared/Shared/Data/ProductData/Products' | Measure-Object).Count

Write-Host($count)

The other options I tried for clean-up without much success are as follows:

a. dbbrowser.aspx: Since this uses http context, always resulted in the 500 error in XM Cloud

b. Sitecore CLI: this is one area I would need to investigate especially involving item buckets

c. Delete items using Delete descendants in the Sitecore ribbon: ineffective

d. Create an item package and install as specified here: since this doesn't work in the background, wasn't effective for the volume I dealt with and resulted in 500 error similar to other options

9. Practically speaking, since migration to XM Cloud will happen over a period of time, some content tree nodes will be changed in the source and so, those changes have to be migrated to XM Cloud on a day-to-day basis. 

Next, as part of tree node comparison, one of the razl errors that hogged a lot of my time is the one below. Although I've added a screen shot of the razl UI, the same error occurred with the script too. My logs too had the same message:

WARNING: CopyItems: CopyAll encountered error Can't install connector. The web root can't be found copying item 

/sitecore/content/xyz/Shared/Shared/Data/ProductData.

WARNING: CopyAll encountered error Can't install connector. The web root can't be found copying item 

Unfortunately, the solution provided here didn't match what i encountered.


Razl log console:

******

ERROR Error 'Can't install connector. The web root can't be found' getting Sitecore items. System.IO.FileNotFoundException: Can't install connector. The web root can't be found    at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.InstallConnector(ICancelTracker cancelTracker)   at HedgehogDevelopment.Razl.DomainImpl.Sitecore.SitecoreConnectorImpl.InstallConnector()   at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.CallServiceAsyn(Func`2 beginInvoke, Action`2 endInvoke)  at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.GetMultipleItemProperties(String databaseName, Guid[] itemIds)  at HedgehogDevelopment.Razl.DomainImpl.Sitecore.SitecoreConnectorImpl.GetMultipleItemProperties(String databaseName, Guid[] itemIds)  at HedgehogDevelopment.Razl.Controllers.RazlController.GetOtherSideChildren(ISitecoreConnector connection, String databaseName, IEnumerable`1 childItemIds)   at HedgehogDevelopment.Razl.Controllers.RazlController.PopulateTree(SitecoreItemPairView parentItem, ObservableCollection`1 parentItems, Guid parentId, Side side, Action`1 callback)   at HedgehogDevelopment.Razl.Controllers.RazlController.<>c__DisplayClass206_0.<PopulateChildren>b__0()

******

Surprisingly, this occurred for only one item bucket but that was the one with highest volume of items. So, i finally cleaned up the bucket using SPE and started the razl tree item copy script (same as above Copy Item tree template) and it worked fine. 

I also noticed that while migrating content from one xmc environment to another (using Copy Item tree template scriptand when the no. of items was high, this error occurred. So, I migrated one node at a time rather than multiple nodes together. For instance, if brand data, brand categories and product data buckets are sent together as part of the script, the script errored so then I tried each separately and the migration was successful.

Now, the above approach of cleaning-up the concerned content tree node and reloading items would be good if the items were small in number but since the items were about 100K, the load time (about 12 hours) seemed costly that too on a day-to-day basis. 

Due to the above issue, I invented a script that can find the changes within specific duration in the source and apply just the changed items onto target:

Copy changes using history:

#######

CPU Processors Introduction:

From here, I'm going to discuss about two processors tested for content migration:

Processor 1: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.00 GHz



Processor 2: Intel Xeon

With the execution of above Usehistorytocopyitems.ps1 script on Processor 1, which is Intel Core, is when the actual issue kicked-in. Since the daily volume was around 30K range, but the script took about 3 hrs (precisely, 02:59:48.7536762 for 9741 items executed on a low-traffic sunday morning between 10 am and 1 pm) in a 32 GB RAM machine with Intel Core processor running on Windows 11 Pro PC and that is when I decided to tweak the script. 

Note: Ironically, the same 9741 items running on the Xeon processor without threads took 122 mins to sync on a sunday night starting at 8 pm.

Multi-threading concept:

In order to better the daily sync process script, I started thinking in lines of multi-threading alongwith PowerShell scripting. Now, based on my analysis/observation, the machine processor plays a main role in the sync process that too when multi-threading comes into picture. Here is one article that highlights differences between Xeon and Intel processors. Since my development PC was on Intel Core while the AWS VM where the task was scheduled was running on a Xeon processor, this was a significant change in setup. 

Although PowerShell 7 offers foreach parallel, the HedgehogDevelopment.RazlClient.dll used in the sync scripts,  seemed compatible with .NET framework and not .NET core and so, had to settle down to PowerShell 5 and there were a couple of options available to speed-up the sync process using threads:

a. Jobs

Copy history with Jobs script:

******

******

Note that out of 9741 items, 1700 items are allocated to each job so, there will be 6 jobs intoto relevant to this task.

Since parallel jobs are explained here as just an option, I'm explaning the broken-down state for this concept.

Parallel jobs in action (with execution of above script):


Fyi, Get-Job always provides the status of the running jobs.

To get the job statuses, a script like this will do:

######++++
######++++

Receive-Job -id <id from job list> will execute the command lines and when you check the XMC log, you should see the same rows there.

For example, with the above status script that invokes Receive-Job, you should be able to see the progress of the sync at any point in time:

In the following scenario the job as a whole is completed since all step' state is Completed and Receive-Job script has executed all the commands so, the job steps don't have any data:


The execution time for 9741 rows in this case was approx 45 mins. 

b. Runspace pool

Copy history with Runspace pool script:

With usage of jobs, the script execution time improved to about 45 minutes. On the other hand, with the usage of runspaces, the script execution time improved to 27 mins for the same 9741 items and 75 minutes for about 30000 items

Final tally for 9741 rows:

Sync script written with runspace pool is the winner!

Sample XM Cloud log entries:

Next, the runspace script was executed as a scheduled task in a VM server with 4 GB RAM capacity on a Xeon processor  (note that all scripts were executed as scheduled tasks in both the machines). The execution time was about 4 hrs (for about 30K items) compared with about 75 mins in a 32 GB RAM Windows 11 Pro machine with Intel Core processor. Note that, even when the Xeon processor' capacity was increased to 32 GB RAM, the execuion time was the same 4 hrs for about 30K items (irrespective of the RAM capacity) since Xeon could handle only about 100 threads and an attempt to increase to 1000 threads resulted in outofmemory exception during the script execution. 

So, the bright-side is, about a one-third cutdown in duration when Intel Core processor is used in place of Xeon processor. 

Pictorial representation of stats:

Runspace pool performance:

Actual data (Intel Xeon processor):


Chart (Intel Xeon processor):

Total items (x-axis) vs Sync Duration in mins (y-axis):

Runspace pool performance:

Actual data (Intel Core processor):

Chart (Intel Core processor):

Total items (x-axis) vs Sync Duration  in mins (y-axis):

Personal verdict: if you run razl with runspace pool using PowerShell script on a Intel Core processor, you will gain the best benefit if you run the threads at 1/2-1/3 of total items count for migrating 30000 items. In this case, the sync duration is 60-75 mins.

Although the majority of the blog post works with history data, the idea of the post and the stats is to show that usage of parallelism while migrating content should improve duration involved.

Note that testing hasn't been done to confirm the migrated content is stable and consistent. Based on random item checks in the buckets, everything seemed good to my naked eyes! 

10. Lessons learnt:

- Plan suitable timing and it makes a big difference

- Keep an eye on the volume for stats/benchmarking

- Metrics are always important

- Be ready to be flexible and iterative

- Content migration is "more than" a full-time job

- SPE/PowerShell is very handy so elevate permissions for your XMC environments

- Disable item cloning setting before sync process and enable back after the process

Comments