Content migration strategy for XM Cloud

One of the most common scenarios in any Sitecore migration project is content migration. This is very much relevant in case of migration to XM Cloud environment. Although there might be multiple ways to migrate content, including use of Items as resources CLI plugin, this blog post utlilizes razl as the primary content migration tool. I'll also aim to build a separate blog post comparing different possible migration tools/approaches with pros and cons alongside the benchmarks, presuming my next work assignment will involve Sitecore XM Cloud access! 

Anyway, with respect to this post, the volume range talked about is 100K items for a specific large item bucket since each bucket can be migrated as a separate scheduled script or task. Personally speaking, item buckets are one of the most complex scenarios involved in content migration. Apart from the content migration strategy discussed, I've also blogged some of the scripts and tools handy for such migration. First things first, as with any project, here are the important phases and since real-time content is involved, good to follow an iterative approach right from the PoC stage helpful in deciding the final approach/scripts etc.:


Note that media item migration to Content Hub is not covered here.

1. Razl is a good tool for such content migrations and there are sample scripts available as part of the installation under C:\Program Files (x86)\Sitecore\Razl\Samples. However, here are the most useful ones:

CopyTreeUnderHome.ps1: Most effective approach for mass content migration esp., item buckets

*****

# Setup the error preferences
$ErrorActionPreference = "Stop"
$InformationPreference = "Continue"
# Create the source and target connections
$source = Get-RazlConnection -SitecoreWebUrl [server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Source Name]
$target = Get-RazlConnection -SitecoreWebUrl [server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Target Name]
# Copy the tree from source to target.
# Lightning mode is specified to make the copy go quicker.
# The description shows up in the copy status messages created by PowerShell. This helps the user the status messages
Copy-RazlItemTree -source $source -target $target -ItemId "{110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9}" -overwrite -LightningMode -Description "Copy Items Under Home"

*****

DeepCompare.ps1: Useful as a post scheduled task to compare results between source and target

*****

$ErrorActionPreference = "Stop"
$InformationPreference = "Continue"
# Create the source and target connections
$source = Get-RazlConnection -SitecoreWebUrl [Server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Source Name]
$target = Get-RazlConnection -SitecoreWebUrl [Server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Target Name]
# Deep compare returns an array of objects that show the differences under a root item. In this case, we are using the Home item.
Get-RazlDeepCompareResults -source $source -target $target -RootItemId "{110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9}"
view raw DeepCompare.ps1 hosted with ❤ by GitHub

*****

CopyChildrenofHome.ps1: Useful for scenarios wherein you have to push missing items row-by-row through a csv

*****

# This demo script shows how PS Pipelines can be used to easily copy items. It is more efficient to use the Copy-RazlItemTree function, but if the user needs to do some processing of the
# items or filtering, these techniques allow for virtually unlimited possiblities.
#
# Get the source and target connections
$source = Get-RazlConnection -SitecoreWebUrl [Server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Source Name]
$target = Get-RazlConnection -SitecoreWebUrl [Server] -DatabaseName [Database] -AccessGuid [Access Guid] -verbose -Name [Target Name]
# Get-RazlChildItems returns an array of ItemProperty objects that contain basic information about each item under the ParentItemID. In this example, the
# array is converted into an array of ID's by pipeing it to the select statement.
$homeItemChildIds = Get-RazlChildItems -Connection $source -ParentItemID "{110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9}" -verbose | select -ExpandProperty Id
# The Get-RazlItem Cmdlet can accept an array of ID's or a single ID as a parameter to -ItemID. It will return an array of object containing all fields for all versions and
# languages of an item. This array of objects can be piped directly to Set-RazlItem, or processes by additional pipeline steps.
Get-RazlItem -Connection $source -ItemID $homeItemChildIds -verbose | Set-RazlItem -Connection $target -verbose

*****

2. Create separate branch for code build/deployment with respect to XM Cloud environment: This is highly advantageous since you don't want to mess-up with existing setup as well as improve velocity of development, build and deployment for the XM Cloud stream especially deploying ad-hoc environment variable patch config settings.

3. Two broad classifications: With respect to content migration -

3.1. Content Serialization setup: Unicorn to Sitecore CLI migration: A tool like this could be useful. 

3.2. Content migration/sync:

3.2.1 Two types of content migration:

- One-off or ad-hoc content migration: For initial content setup, although they are one-off PS scripts better to run via scheduler since you can log information and leave it to run overnight without intervention

- Scheduled content migration or also called as sync: track daily changes and migrate those on a daily basis

3. When you setup a razl package connection, you get a access guid and this code must be deployed across all the involved environments in order to establish connection

4. Have all involved XM Cloud environment identifiers handy

5. For scheduled daily sync:

- Setup a virtual machine that will handle such tasks

- the scheduled task should run in a quite time after business hours

- if need be, split tasks for different tree nodes based on time taken

Post scheduled task run:

- On a daily basis, run a separate compare task after the scheduled run, log the missing item results to a csv file and take corrective action based on failure. Note that while using Razl deep compare with item buckets around the 30K items mark, the compare failed and threw a connector error as stated down below against XM Cloud.

6. While the razl sample scripts are the actual reference, here is some script template, could be useful to just plugin tree nodes involved in migration:

Copy Item tree template:

*****

[CmdletBinding()]
Param
(
[string] $sourceName="src name here",
[string] $destName="dest xmcloud here",
[string] $sourceEnvAccessGuid="razl access guid",
[string] $destEnvAccessGuid="razl access guid",
[string] $sourceUrl="src cms url",
[string] $destUrl="xmc dest url",
[string] $environmentId="xmc dest env id",
[string] $envVarName="ITEM_CLONING_ENABLED",
[switch] $local
)
function SetCloneEnvVariable
{
param(
$envId,
$envVarName,
$envVarValue
)
Write-Host "Setting $envVarValue for $envVarName"
dotnet sitecore cloud environment variable upsert --name $envVarName --value $envVarValue --target CM --environment-id $envId
}
function syncContent
{
param(
$sourceName,
$destName,
$srcUrl,
$destUrl,
$srcAccessGuid,
$destAccessGuid
)
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Import Razl powershell commands. This may need to be updated if the script is moved to another computer.
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
# # Create connections
$TargetEnv = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destAccessGuid
$SourceEnv = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $srcAccessGuid -ReadOnly
#/sitecore/content/xyz/Shared/Shared/Data/BrandData
Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-cdfd-drft-9A04-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
#/sitecore/content/xyz/Shared/Shared/Data/BrandCategoriesData",
Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-cdfg-4f5y-AE8E-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
#"/sitecore/content/xyz/Shared/Shared/Data/BrandSitesData",
Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-cdfg-3455-8481-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
#"/sitecore/content/xyz/Shared/Shared/Data/CategoryData",
Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-dcrt-dfdf-99F8-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
#"/sitecore/content/xyz/Shared/Shared/Data/ProductData",
#Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-34fr-cvcc-A250-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
#"/sitecore/content/xyz/Shared/Shared/Data/StoreData"
Copy-RazlItemTree -source $SourceEnv -target $TargetEnv -ItemId "{xxxxxxxx-dr45-dfr4-837C-yyyyyyyyyyyy}" -LightningMode -ContinueOnError -Verbose -Overwrite
}
$watch = [System.Diagnostics.Stopwatch]::StartNew()
$watch.Start() # Timer start
$time = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
Write-Host("Start Date/Time - $time")
dotnet tool install Sitecore.CLI --add-source https://sitecore.myget.org/F/sc-packages/api/v3/index.json
dotnet tool restore
dotnet sitecore cloud login --authority https://auth.sitecorecloud.io/ --audience https://api.sitecorecloud.io --allow-write false --client-credentials true --client-id zcxxzczczzczcczcxczczc --client-secret zzvvxzvxvxvxvxvxvxvxvxvxc_aQLN-zxzccxzczxczczczczczc
if (!$local)
{
SetCloneEnvVariable -envId $environmentId -envVarName $envVarName -envVarValue "false" #set cloud env var
Write-Host("Restarting target environment")
dotnet sitecore cloud environment restart --environment-id $environmentId #restart env4
Write-Host("Restarted target environment")
Write-Host("Starting target environment sync now")
syncContent -sourceName $sourceName -destName $destName -srcUrl $sourceUrl -destUrl $desturl -srcAccessGuid $sourceEnvAccessGuid -destAccessGuid $destEnvAccessGuid
Write-Host("Completed target environment sync")
Write-Host("Enabling target environment item clone setting")
SetCloneEnvVariable -envId $environmentId -envVarName $envVarName -envVarValue "true" #reset cloud env var
Write-Host("Enabled target environment item clone setting")
dotnet sitecore cloud environment restart --environment-id $environmentId #restart env
Write-Host("Restarted target environment - All complete!")
}
else
{
#local run
#.\down.ps1
#.\clean.ps1
#.\init.ps1 -initEnv -EnableClone $false
#Do above steps manually before running the following script
#.\publish.ps1 -all -msBuildExeFilePath "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" -buildConfiguration "debug"
#.\up.ps1
#since a running instance is needed, only after above publish and up step, the below synccontent step can be kicked-off since a valid connection is needed
syncContent -sourceName $sourceName -destName $destName -srcUrl $sourceUrl -destUrl $desturl -srcAccessGuid $sourceEnvAccessGuid -destAccessGuid $destEnvAccessGuid
#.\down.ps1
#.\clean.ps1
#.\init.ps1 -initEnv
#.\publish.ps1 -all -msBuildExeFilePath "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" -buildConfiguration "debug"
#.\up.ps1
}
$watch.Stop() # Stopping the timer
Write-Host "Execution time - " $watch.Elapsed # Print script execution time

*****

Deep Compare template:

*****

[CmdletBinding()]
Param
(
[string] $sourceName="src name here",
[string] $destName="XMCloud dest name here",
[string] $sourceEnvAccessGuid="razl access guid",
[string] $destEnvAccessGuid="razl access guid (same as above)",
[string] $sourceUrl="src url here",
[string] $destUrl="dest xmc url here"
)
$ErrorActionPreference = "Stop"
$InformationPreference = "Continue"
$watch = [System.Diagnostics.Stopwatch]::StartNew()
$watch.Start() # Timer start
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Import Razl powershell commands. This may need to be updated if the script is moved to another computer.
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
# Deep compare returns an array of objects that show the differences under a root item. In this case, we are using the Home item.
Get-RazlDeepCompareResults -source $source -target $target -RootItemId "{czczcxzczxcc-zcxz-zczcz-zcxz-zczcxzczczcz}"
$watch.Stop() # Stopping the timer
Write-Host "Product Data comparison done: Execution time - " $watch.Elapsed # Print script execution time

*****

Lightning mode is useful to skip existing items.

7. Windows scheduled task details:

<?xml version="1.0" encoding="UTF-16"?>
<Task version="1.2" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">
<RegistrationInfo>
<Date>2024-03-26T19:35:26.3964794</Date>
<Author>domain\xyz.admin</Author>
<URI>\Copy Prod Product Data</URI>
</RegistrationInfo>
<Triggers />
<Principals>
<Principal id="Author">
<UserId>S-1-5-21-2424244242-3222242-244424242-242224</UserId>
<LogonType>Password</LogonType>
<RunLevel>HighestAvailable</RunLevel>
</Principal>
</Principals>
<Settings>
<MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>
<DisallowStartIfOnBatteries>true</DisallowStartIfOnBatteries>
<StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>
<AllowHardTerminate>true</AllowHardTerminate>
<StartWhenAvailable>false</StartWhenAvailable>
<RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>
<IdleSettings>
<StopOnIdleEnd>true</StopOnIdleEnd>
<RestartOnIdle>false</RestartOnIdle>
</IdleSettings>
<AllowStartOnDemand>true</AllowStartOnDemand>
<Enabled>true</Enabled>
<Hidden>false</Hidden>
<RunOnlyIfIdle>false</RunOnlyIfIdle>
<WakeToRun>false</WakeToRun>
<ExecutionTimeLimit>PT72H</ExecutionTimeLimit>
<Priority>7</Priority>
</Settings>
<Actions Context="Author">
<Exec>
<Command>c:\windows\system32\cmd.exe</Command>
<Arguments>/c "powershell.exe c:\razl-scripts\xm-cloud\Copy-PRODProductDatawithRunspace.ps1" *&gt; c:\razl-scripts\xm-cloud\Copy-PRODProductDatawithRunspace.log</Arguments>
<WorkingDirectory>c:\razl-scripts\xm-cloud</WorkingDirectory>
</Exec>
</Actions>
</Task>

8. While you migrate content, there will be a need to clean-up data to start from scratch. Unfortunately, I couldn't use the Razl's Remove-RazlItem method, which actually would have been the most effective option for high volumes. So, I was left to look for other options. Out of those, Sitecore Powershell ISE and this PowerShell script seemed more effective:

Get-Item -Path master:'/sitecore/content/xyz/Shared/Shared/Data/ProductData' | Remove-Item -Recurse

Although it took me a lot of re-tries and many hours to delete about 100K items, PowerShell at least did the job compared with other options! 

This one was handy to find the count of items in a bucket:

$count=(Get-ChildItem -Path master:'/sitecore/content/xyz/Shared/Shared/Data/ProductData/Products' | Measure-Object).Count

Write-Host($count)

The other options I tried for clean-up without much success are as follows:

a. dbbrowser.aspx: Since this uses http context, always resulted in the 500 error in XM Cloud

b. Sitecore CLI: this is one area I would need to investigate especially involving item buckets

c. Delete items using Delete descendants in the Sitecore ribbon: ineffective

d. Create an item package and install as specified here: since this doesn't work in the background, wasn't effective for the volume I dealt with and resulted in 500 error similar to other options

9. Practically speaking, since migration to XM Cloud will happen over a period of time, some content tree nodes will be changed in the source and so, those changes have to be migrated to XM Cloud on a day-to-day basis. 

Next, as part of tree node comparison, one of the razl errors that hogged a lot of my time is the one below. Although I've added a screen shot of the razl UI, the same error occurred with the script too. My logs too had the same message:

WARNING: CopyItems: CopyAll encountered error Can't install connector. The web root can't be found copying item 

/sitecore/content/xyz/Shared/Shared/Data/ProductData.

WARNING: CopyAll encountered error Can't install connector. The web root can't be found copying item 

Unfortunately, the solution provided here didn't match what i encountered.


Razl log console:

******

ERROR Error 'Can't install connector. The web root can't be found' getting Sitecore items. System.IO.FileNotFoundException: Can't install connector. The web root can't be found    at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.InstallConnector(ICancelTracker cancelTracker)   at HedgehogDevelopment.Razl.DomainImpl.Sitecore.SitecoreConnectorImpl.InstallConnector()   at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.CallServiceAsyn(Func`2 beginInvoke, Action`2 endInvoke)  at HedgehogDevelopment.RazlClient.Impl.SitecoreRazlService.GetMultipleItemProperties(String databaseName, Guid[] itemIds)  at HedgehogDevelopment.Razl.DomainImpl.Sitecore.SitecoreConnectorImpl.GetMultipleItemProperties(String databaseName, Guid[] itemIds)  at HedgehogDevelopment.Razl.Controllers.RazlController.GetOtherSideChildren(ISitecoreConnector connection, String databaseName, IEnumerable`1 childItemIds)   at HedgehogDevelopment.Razl.Controllers.RazlController.PopulateTree(SitecoreItemPairView parentItem, ObservableCollection`1 parentItems, Guid parentId, Side side, Action`1 callback)   at HedgehogDevelopment.Razl.Controllers.RazlController.<>c__DisplayClass206_0.<PopulateChildren>b__0()

******

Surprisingly, this occurred for only one item bucket but that was the one with highest volume of items. So, i finally cleaned up the bucket using SPE and started the razl tree item copy script (same as above Copy Item tree template) and it worked fine. 

I also noticed that while migrating content from one xmc environment to another (using Copy Item tree template scriptand when the no. of items was high, this error occurred. So, I migrated one node at a time rather than multiple nodes together. For instance, if brand data, brand categories and product data buckets are sent together as part of the script, the script errored so then I tried each separately and the migration was successful.

Now, the above approach of cleaning-up the concerned content tree node and reloading items would be good if the items were small in number but since the items were about 100K, the load time (about 12 hours) seemed costly that too on a day-to-day basis. 

Due to the above issue, I invented a script that can find the changes within specific duration in the source and apply just the changed items onto target:

Copy changes using history:

[CmdletBinding()]
Param
(
[string] $sourceName="XM src url",
[string] $destName="XMCloud dest url",
[string] $sourceEnvAccessGuid="razl access guid here",
[string] $destEnvAccessGuid="razl access guid here",
[string] $sourceUrl="xm src url",
[string] $destUrl="xmc dest url"
)
$watch = [System.Diagnostics.Stopwatch]::StartNew()
$watch.Start() # Timer start
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Import Razl powershell commands. This may need to be updated if the script is moved to another computer.
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
$referenceTime = (Get-Date).AddDays(-1)
#Write-Host($referenceTime)
$count=0
Get-RazlHistory -connection $source -from $referenceTime | % {
$item = $_
if($item.Path.StartsWith("/sitecore/content/xyz/Shared/Shared/Data/ProductData/Products")){
$count+=1
Write-Host($item.Id)
if ($item.Action -eq "Deleted")#todo check if moved must be included here
{
Remove-RazlItem -connection $target -ItemId $item.Id -Verbose
}
else
{
Get-RazlItem -Connection $source -ItemID $item.Id | Set-RazlItem -Connection $target -verbose
}
}
}
Write-host($count)
$watch.Stop() # Stopping the timer
Write-Host "Product Data Copy tree done: Execution time - " $watch.Elapsed # Print script execution time

#######

CPU Processors Introduction:

From here, I'm going to discuss about two processors tested for content migration:

Processor 1: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.00 GHz



Processor 2: Intel Xeon

With the execution of above Usehistorytocopyitems.ps1 script on Processor 1, which is Intel Core, is when the actual issue kicked-in. Since the daily volume was around 30K range, but the script took about 3 hrs (precisely, 02:59:48.7536762 for 9741 items executed on a low-traffic sunday morning between 10 am and 1 pm) in a 32 GB RAM machine with Intel Core processor running on Windows 11 Pro PC and that is when I decided to tweak the script. 

Note: Ironically, the same 9741 items running on the Xeon processor without threads took 122 mins to sync on a sunday night starting at 8 pm.

Multi-threading concept:

In order to better the daily sync process script, I started thinking in lines of multi-threading alongwith PowerShell scripting. Now, based on my analysis/observation, the machine processor plays a main role in the sync process that too when multi-threading comes into picture. Here is one article that highlights differences between Xeon and Intel processors. Since my development PC was on Intel Core while the AWS VM where the task was scheduled was running on a Xeon processor, this was a significant change in setup. 

Although PowerShell 7 offers foreach parallel, the HedgehogDevelopment.RazlClient.dll used in the sync scripts,  seemed compatible with .NET framework and not .NET core and so, had to settle down to PowerShell 5 and there were a couple of options available to speed-up the sync process using threads:

a. Jobs

Copy history with Jobs script:

******

[CmdletBinding()]
Param
(
[string] $sourceName="XM src url",
[string] $destName="XMCloud dest url",
[string] $sourceEnvAccessGuid="access guid here",
[string] $destEnvAccessGuid="access guid here",
[string] $sourceUrl="src url",
[string] $destUrl="xmc dest url"
)
$watch = [System.Diagnostics.Stopwatch]::StartNew()
$watch.Start() # Timer start
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Import Razl powershell commands. This may need to be updated if the script is moved to another computer.
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
$referenceTime = (Get-Date).AddDays(-1)
#Write-Host($referenceTime)
$count=0
$arr=@()
#Get-RazlHistory -connection $source -from $referenceTime
Get-RazlHistory -connection $source -from $referenceTime | % {
$item = $_
if($item.Path.StartsWith("/sitecore/content/xyz/Shared/Shared/Data/ProductData/Products")){
#Write-Host($item.Id)
if ($item.Action -ne "Deleted")#todo check if moved must be included here
{
$arr+=$item.ID
}
}
}
Write-host($arr.Count)
Write-host(Get-date)
$JobRows=1700 ## The number of rows to process in each Job
$NumJobs = [math]::Ceiling($arr.Count / $JobRows)
$k=0
for ($i=0; $i -le $NumJobs; $i++)
{
[int]$StartRow = ($i * $JobRows)
[int]$EndRow=(($i+1) * $JobRows - 1)
write-host ("Rows {0} to {1}" -f $StartRow.ToString(),$EndRow.ToString())
Start-Job -ArgumentList (,$arr[$StartRow..$EndRow]) -ScriptBlock {
PARAM ($arrRows)
foreach ($row in $arrRows)
{
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$k++
Write-Host($k)
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
$sourceName="XM src name"
$destName="XMCloud dest name"
$sourceEnvAccessGuid="razl guid here"
$destEnvAccessGuid="razl guid here"
$sourceUrl="src url"
$destUrl="xmc dest url"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
Get-RazlItem -Connection $source -ItemID $row | Set-RazlItem -Connection $target -verbose
}
}
}
$watch.Stop() # Stopping the timer
Write-Host "Product Data Copy allocated to jobs: Execution time - " $watch.Elapsed # Print script execution time

******

Note that out of 9741 items, 1700 items are allocated to each job so, there will be 6 jobs intoto relevant to this task.

Since parallel jobs are explained here as just an option, I'm explaning the broken-down state for this concept.

Parallel jobs in action (with execution of above script):


Fyi, Get-Job always provides the status of the running jobs.

To get the job statuses, a script like this will do:

######++++
#Reference: https://www.get-blog.com/?p=22
ForEach($Job in Get-Job){
"$($Job.Name)"
"****************************************"
Receive-Job $Job
" "
}
######++++

Receive-Job -id <id from job list> will execute the command lines and when you check the XMC log, you should see the same rows there.

For example, with the above status script that invokes Receive-Job, you should be able to see the progress of the sync at any point in time:

In the following scenario the job as a whole is completed since all step' state is Completed and Receive-Job script has executed all the commands so, the job steps don't have any data:


The execution time for 9741 rows in this case was approx 45 mins. 

b. Runspace pool

Copy history with Runspace pool script:

#Reference: https://github.com/GordonVi/ip_scan/blob/main/ip_scan_JSON.ps1
[CmdletBinding()]
Param
(
[string] $sourceName="XM Src name",
[string] $destName="XMCloud dest name",
[string] $sourceEnvAccessGuid="razl access guid here",
[string] $destEnvAccessGuid="razl access guid here",
[string] $sourceUrl="src url here",
[string] $destUrl="dest xmc url here"
)
$watch = [System.Diagnostics.Stopwatch]::StartNew()
$watch.Start() # Timer start
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
# Import Razl powershell commands. This may need to be updated if the script is moved to another computer.
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
$referenceTime = (Get-Date).AddDays(-1)
Write-Host($referenceTime)
$arr=@()
Get-RazlHistory -connection $source -from $referenceTime | % {
$item = $_
if($item.Path.StartsWith("/sitecore/content/xyz/Shared/Shared/Data/ProductData/Products")){
#Write-Host($item.Id)
if ($item.Action -ne "Deleted")#todo check if moved must be included here
{
$arr+=$item.ID
}
}
}
$TimeStart = $(Get-Date -UFormat "%H:%M:%S")
#$threads = [math]::Ceiling($arr.Count/2)#effective in Intel Core cpu, just takes 75 mins for 30K items
$threads = 100 #for Xeon processor with 30K items, takes abt 4 hrs
$list = $arr
write-host($list.Count)
# --------------------------------------------------
clear
""
write-host " Threads: " -nonewline -foregroundcolor yellow
$threads
" Actual Pool: "
" Drain Pool: "
" ---------------------"
write-host " Total Items: $($list.count)"
# BLOCK 1: Create and open runspace pool, setup runspaces array with min and max threads
#$pool = [RunspaceFactory]::CreateRunspacePool(1, [int]$env:NUMBER_OF_PROCESSORS + 1)#this definitely works in Xeon processor but takes 4 hrs
$pool = [RunspaceFactory]::CreateRunspacePool(1, $threads)
$pool.ApartmentState = "MTA"
$pool.Open()
$runspaces = $results = @()
# --------------------------------------------------
#$k=0
# BLOCK 2: Create reusable scriptblock. This is the workhorse of the runspace. Think of it as a function.
$scriptblock = {
Param (
[string]$itemId
)
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
#$k=$k+1
#Write-Host($k)
import-module "C:\Program Files (x86)\Sitecore\Razl\HedgehogDevelopment.RazlClient.dll"
$sourceName="XM src name here"
$destName="XMCloud env name here"
$sourceEnvAccessGuid="razl access guid here"
$destEnvAccessGuid="razl access guid here (same as above)"
$sourceUrl="src url here"
$destUrl="dest xmc url here"
# Get the source and target connections
$target = Get-RazlConnection -SitecoreWebUrl $desturl -DatabaseName master -Name $destName -AccessGuid $destEnvAccessGuid
$source = Get-RazlConnection -SitecoreWebUrl $sourceurl -DatabaseName master -Name $sourceName -AccessGuid $sourceEnvAccessGuid -ReadOnly
$itemDetails=Get-RazlItem -Connection $source -ItemID $itemId
Set-RazlItem -ItemDetails $itemDetails -Connection $target
# return whatever you want, or don't.
return [pscustomobject][ordered]@{
itemId = $itemId
itemName=$itemDetails.Properties.Name
Path=$itemDetails.Properties.Path
}
}
# --------------------------------------------------
# BLOCK 3: Create runspace and add to runspace pool
$counter=0
foreach ($i in $list) {
$runspace = [PowerShell]::Create()
$null = $runspace.AddScript($scriptblock)
$null = $runspace.AddArgument($i)
$runspace.RunspacePool = $pool
# BLOCK 4: Add runspace to runspaces collection and "start" it
# Asynchronously runs the commands of the PowerShell object pipeline
$runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 16 , 2
$counter++
write-host "$counter " -nonewline
}
# --------------------------------------------------
# BLOCK 5: Wait for runspaces to finish
<#
do {
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 5 , 9
$cnt = ($runspaces | Where {$_.Result.IsCompleted -ne $true}).Count
write-host "$cnt "
} while ($cnt -gt 0)
#>
# --------------------------------------------------
$total=$counter
$counter=0
# BLOCK 6: Clean up
foreach ($runspace in $runspaces ) {
# EndInvoke method retrieves the results of the asynchronous call
$results += $runspace.Pipe.EndInvoke($runspace.Status)
$runspace.Pipe.Dispose()
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 16 , 3
$counter++
write-host "$($total-$counter) " -nonewline
}
$pool.Close()
$pool.Dispose()
# --------------------------------------------------
# Bonus block 7
# Look at $results to see any errors or whatever was returned from the runspaces
# Use this to output to JSON. CSV works too since it's simple data.
# $results | convertto-json -depth 10 > ip_scan.json
$total=$results.count
#$alive = $($results | ? {$_.ping -eq "true"}).count
#$dead = $($results | ? {$_.ping -ne "true"}).count
$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0 , 5
write-host " Total Items: " -nonewline -foregroundcolor cyan
$total
#write-host " Alive Hosts: " -nonewline -foregroundcolor green
#$alive
#write-host " Dead Hosts: " -nonewline -foregroundcolor red
#$dead
write-host(Get-Date)
" ---------------------"
foreach ($item in $($results | select itemId,itemName,Path | sort itemName)) {
#Write-host($item.name)
$nodes += [PsCustomObject]@{
itemId = $item.itemId
itemName=$item.itemName
Path=$item.Path
}
}
$jsonobject = [pscustomobject]@{
Name = "Item Copy"
Date = $(Get-Date -UFormat "%Y-%m-%d")
TimeStart = $TimeStart
TimeEnd = $(Get-Date -UFormat "%H:%M:%S")
Threads = $threads
Total = $total
Nodes = $nodes
}
$jsonobject | ConvertTo-Json | Out-File -FilePath ".\$filename"
#pause
#$null = [Console]::ReadKey()

With usage of jobs, the script execution time improved to about 45 minutes. On the other hand, with the usage of runspaces, the script execution time improved to 27 mins for the same 9741 items and 75 minutes for about 30000 items

Final tally for 9741 rows:

Sync script written with runspace pool is the winner!

Sample XM Cloud log entries:

Next, the runspace script was executed as a scheduled task in a VM server with 4 GB RAM capacity on a Xeon processor  (note that all scripts were executed as scheduled tasks in both the machines). The execution time was about 4 hrs (for about 30K items) compared with about 75 mins in a 32 GB RAM Windows 11 Pro machine with Intel Core processor. Note that, even when the Xeon processor' capacity was increased to 32 GB RAM, the execuion time was the same 4 hrs for about 30K items (irrespective of the RAM capacity) since Xeon could handle only about 100 threads and an attempt to increase to 1000 threads resulted in outofmemory exception during the script execution. 

So, the bright-side is, about a one-third cutdown in duration when Intel Core processor is used in place of Xeon processor. 

Pictorial representation of stats:

Runspace pool performance:

Actual data (Intel Xeon processor):


Chart (Intel Xeon processor):

Total items (x-axis) vs Sync Duration in mins (y-axis):

Runspace pool performance:

Actual data (Intel Core processor):

Chart (Intel Core processor):

Total items (x-axis) vs Sync Duration  in mins (y-axis):

Personal verdict: if you run razl with runspace pool using PowerShell script on a Intel Core processor, you will gain the best benefit if you run the threads at 1/2-1/3 of total items count for migrating 30000 items. In this case, the sync duration is 60-75 mins.

Although the majority of the blog post works with history data, the idea of the post and the stats is to show that usage of parallelism while migrating content should improve duration involved.

Note that testing hasn't been done to confirm the migrated content is stable and consistent. Based on random item checks in the buckets, everything seemed good to my naked eyes! 

10. Lessons learnt:

- Plan suitable timing and it makes a big difference

- Keep an eye on the volume for stats/benchmarking

- Metrics are always important

- Be ready to be flexible and iterative

- Content migration is "more than" a full-time job

- SPE/PowerShell is very handy so elevate permissions for your XMC environments

- Disable item cloning setting before sync process and enable back after the process

Comments