How to generate sitemap.xml for ASP.NET Core rendering host serving Sitecore headless content

Disclaimer: This is a bespoke solution, NOT recommended by Sitecore, using GraphQL and was attempted for the Sitecore MVP site since MVP site doesn't have a sitemap.xml and is not built on SXA headless. Note that for XM Cloud, Sitemap must ideally be built using the OOTB capability.

After writing the original post, I realised, this solution could be tweaked further. Since this is a new terrain, I'm writing my thoughts here in the Further Update section while the original post is under it! Also, based on general analysis, it doesn't matter if you generate sitemap.xml on-the-fly or serve stored content until the xml is up-to-date and doesn't suffer from load-time issues!

Further Update(s):

Ideally, any Sitemap.xml generation process would be effective/efficient with this approach:


While the original post below covers the generation and get process in a one-step approach (with some caching option since MVP site is small), it would be good to have a two-step approach (specifically for bigger sites on XM Cloud):

1. Generate and store the xml file in Sitecore content tree before-hand. For example, GraphQL uploadMedia mutation or insert the xml to a Sitecore sitemap item

2. During the actual http request, retrieve the generated xml and render the sitemap.xml

Original Post:

The use of sitemap.xml for a website is very well known. In short, it helps index your site content by specifying all pages in one place for SEO to rank your site on top of a search page/engine like Google. For more details refer this Google documentation.

This is one of my long pending issues with regard to the Sitecore MVP site. I'm glad that I probably hit a feasible solution this time. If not, this blog post could be a starting point for the next attempt or anyone trying to build sitemap.xml for a dotnet rendering host that consumes Sitecore headless content using GraphQL. Without much ado, here is a high-level depiction of the programmatic flow:

1. User requests sitemap.xml

2. Request goes to the controller method and asynchronous call to generate sitemap is made

3. GraphQL client is configured for invocation

4. GraphQL query is sent, json result is deserialised, XML Sitemap generated and sent back to user

The key difference compared with a monolithic Sitecore instance is; the content in this case is fetched using a GraphQL query. Since MVP site is already running on asp.net rendering host and lacks a sitemap at the time of this writing, have used the MVP site as the starter project. 

Pre-requisites

Based on XM Cloud readme, setup the MVP site local instance running on Docker

GraphQL playground for Sitecore MVP local instance:

Once the Docker instance is up and running, should be able to access the GraphQL playground from this url: 

https://xmcloudcm.localhost/sitecore/api/graph/edge/ide

Add the following sc_apikey to http headers:

{"sc_apikey":"{E2F3D43E-B1FD-495E-B4B1-84579892422A}"}

Note that the above experience edge token is picked from appSettings.development.json under Mvpsite\rendering folder in the file system:

GraphQL(GQL) query:

*****************************************************************

*****************************************************************

So, this is how the GraphQL IDE looks like with a successfully executed GraphQL query with Sitemap data from mvp-site, i love the Prettify option in the IDE:


_Sitemap base template for Page and Homepage base templates:

Understandably, sitemap.xml has a structure as follows, 

Note that all the main fields (loc, lastmod, changefreq and priority) are returned by the GraphQL query.  

Also, in order to utilise the existing sitemap fields like changefrequency and priority, the Sitecore page template(s) must actually inherit from _Sitemap base template and the concerned template exists under:

/sitecore/templates/Foundation/Experience Accelerator/SiteMetadata/Sitemap/_Sitemap

But then, I also felt a need to add an additional IncludeinSitemap checkbox field since such a checkbox will be easy to use for the content editors on individual pages. While there is another option wherein if the change frequency is "do not include" or priority is not set, then do not include the page in sitemap, I felt this approach is not intuitive and hence decided to detour. So, I created a separate template in the following location that inherits the above _Sitemap base template:

/sitecore/templates/Feature/Experience Accelerator/SiteMetadata/_SitemapData

And, the above _SitemapData template will be inherited by the page templates.

Altogether, this is how Sitecore Data template inheritance is in play:


Note that a module.json must be added under the feature/sitemap folder so that the new template can be serialised to the file system:

Sitemap.module.json

Since this is a new feature folder, have to setup serialisation before running the pull command:

************************

************************

Sitemap field setup for a page:

SitemapExtensions.cs

Helps in setting up dependency injection for necessary classes/interfaces.

************************

************************

Startup.cs that sets up necessary service configurations:


The most important sitemap.xml endpoint setup in Startup.cs:


Default Controller Sitemap Action method:

The actual action method that gets invoked when user requests for sitemap and gets the memory stream that in turn is converted to xml format.

**********************************

**********************************

One of the things that could help with sitemaps is setting the cache duration, this is something that has to be tested.


SitemapBuilder.cs

This method is responsible for turning the C# object to stream data before sending back to the controller method.

**********************************

**********************************

MvpSitemapUrlProvider.cs

This class implements an interface and in this particular case, is responsible for building the target C# object list, for the xml, from de-serialised GraphQL data.

*********************************

*********************************

CustomGraphQLQueryHandler.cs

The core class responsible for building the GraphQL client and making the GraphQL query call. 

*********************************

*********************************

Note that although the rootitem of a site can be picked up using GQL query, prefer picking it from appSettings.json just for ease of configuration.

C# object model for GQL response data deserialisation:

Snapshot of serialisation object model:

**********************************

**********************************

DotnetSitemapGenerator nuget package:

The DotnetSitemapGenerator nuget package is useful in the C# side to convert C# object to sitemap xml format. Particularly, the SitemapNode class, part of this package, has the C# model that is automatically serialised to xml format from the list of nodes - finally rendered as IResult.

Snapshot of object model:

SitemapNode for reference:


End-result:


Feature branch

References:

https://khalidabuhakmeh.com/generate-sitemaps-for-all-of-aspnet-core

https://sitecore.stackexchange.com/questions/30357/how-to-get-the-modified-date-and-time-for-item-versions-in-sitecore-via-graphql

https://doc.sitecore.com/xp/en/developers/sxa/103/sitecore-experience-accelerator/configure-a-sitemap.html

https://stackoverflow.com/questions/30465792/how-to-convert-string-with-a-t-and-z-to-datetime

Comments

Popular Posts