Learnings from a year of implementing Sitecore Publishing Service

Sitecore Publishing Service (SPS) is a replacement for the built-in publish function. It’s built on dotnet core and runs as a separate micro service instead of the built-in publisher that runs in-process.

I’ve been using it, or rather tried to use it, for about a year now on a large Sitecore 9.0.1 solution. It was everything but a smooth ride, so I thought it would be worth sharing my experience and what I learned during the process.

SPS has its clear advantages regarding the speed it publishes content. It’s not as “lightning fast” as Sitecore claims it to be, but still a lot faster than the built-in one. The greatest advantage, in my opinion, is that it runs outside the Sitecore Content Management (CM) worker process. So an ongoing publish processes doesn’t break due to an IIS application pool recycle. Those two reasons were also why we tried moving to SPS.

Note: This post contains my experiences while working with SPS 3.0 to 3.1.3. Some of the issues have been fixed in later versions. Some issues may also remain in SPS 4 as it was released before 3.1.3. Many of the issues turned out to exist in 2.x as well.

Continue reading

Indexing and OCR scanning PDF documents in Sitecore

PDF documents in Sitecore media library can be indexed using IFilters, but it has faced its limitations regarding Azure support etc and isn’t very effective from a performance point of view. The way the extracted content is indexed also makes it harder to use in multi-language solutions.

I’ve taken a different approach on indexing PDF documents, making it more accurate and increase the performance at the same time. The IFilter approach is a generic approach, supporting multiple file formats. I’ve focused on PDF documents in this post, as it’s a common format. Similar principles can be applied to other file formats as well.

In this post:

  • Avoiding heavy computation during index time
  • Extracting document content through PDF libraries
  • OCR scanning of image/non-text based PDF documents
  • Indexing documents with language stemming
Continue reading

Inherited and non-inherited fields to Sitecore clone items

When an item is cloned in Sitecore, the clones inherits its values from the source item. This is represented by a null value in each field, meaning that it inherits its value from the clone source item. When a value is written to a field in a clone, that value is used instead, hence breaking the inheritance. This works great in most cases.

In some scenarios you might not want to inherit all the fields. You might want to exclude some of them, enforcing a local value in each field for such clones. By default a few fields are not inherited. Those are __Created, __Created by__Updated, __Updated by, __Revision, __Source, __Source item, __Workflow, __Workflow state and __Lock. It’s quite natural that those fields are not inherited to clones, since each item, the source and the clone, should keep their own values of those fields.

You can add your own fields to this list by modifying the ItemCloning.NonInheritedFields setting. It’s a string setting where you can provide a pipe (|) separated list list of field ID’s or field keys. The drawback of the setting being a pipe separated list, is that it’s hard to add additional fields through config patch files. I hope Sitecore will change this in the future.

Continue reading

Defragment the SQL Server heap on Sitecore databases

I discovered that the heap gets very fragmented in SQL Server in some of our solutions. Large tables, such as Items, Shared-, Versioned- and Unversioned-fields, Blobs, Descendants and Links tables, that easily occupies a few GB on disk, also suffered from great fragmentation. More than 90% fragmentation was common.

From what I’ve found, the only way to fix SQL Server Heap fragmentation (the heap is where all the table data is stored), is to have a clustered index on each table.

However, I noticed that no tables in the Sitecore databases have any clustered indexes. All indexes are non-clustered in the common master/core/web databases. Sitecore used to have clustered indexes back in 5.2, but over the course of multiple Sitecore versions, the database schema has changed to non-clustered indexes.

A clustered index means that the table rows as stored in the index order physically on disk. That’s also why there can be only one clustered index per table. With a non-clustered index, there is a second list that has pointers to the physical rows. It’s generally faster to read from a clustered index, but it may be slower to write to it as there may be a need to rearrange the table data.

Continue reading

Correcting ambiguous Sitecore field scopes

As you probably know, all fields in Sitecore can have one of three field scopes: Versioned (aka Normal), Unversioned and Shared. Versioned fields have individual version numbers for each language. Unversioned fields have individual values for each language in the same way as versioned fields, but there can be only one value per language. Shared fields are just a single value regardless of language and item version. There are no such thing as a “versioned shared” field type.

This is configured using two check boxes on a field level: Shared and Unversioned. If none are checked, the field becomes a versioned field. As you see, there’s an ambiguous “invalid” state where both check boxes are checked. In this case, Shared has precedence.

Continue reading

Sorting with Sitecore Content Search/Solr

Sorting search results are rather straight forward at first glance, but there are some pitfalls to be aware of. When using Sitecore Content Search, the Linq provider supports the OrderBy method and it get serialized into a sort statement in a Solr query. Example:

var result = searchContext.GetQueryable<MyModel>()
   .Where(...)
   .OrderBy(x => x.DisplayName)
   .GetResults()

will be serialized into a Solr query like

?q=...&fq=...&sort=_displayname ASC

This usually works quite well, but consider the following list of item display names:

Continue reading

Sitecore MVP 2019

Sitecore MVP Technology 2019

Thank you Sitecore for awarding me “Most Valuable Professional” (MVP) again! Seven years in row!

The Sitecore MVP Award celebrates the most active Sitecore community members from around the world who provide valuable online and offline expertise that enriches the community and makes a difference

My contribution to Sitecore and the community over the last year have, besides the nine posts on this blog, have been mostly focused on improving the product by having a dialog with various Sitecore staff. During 2018 I filed over 50 confirmed bugs, mostly related to Sitecore Publish Service and Content Search and a handful of accepted product enhancements.

Optimize Sitecore Solr queries

I’ve written a few posts on Sitecore Content Search and Solr already, but there seems to be an infinite amount of things to discover and learn in this area. Previously I’ve pointed out the importance of configuring the Solr index correctly and the benefit of picking the fields to index, i.e. not indexing all fields as default (<indexAllFields>false</indexAllFields>). This will vastly improve the performance of Content Search operations and reduce the index size in large solutions.

Recently I’ve been investigating a performance issue with one of our Sitecore solutions. This one is running Sitecore 9 with quite a lot of data in it. It’s been performing quite well, but as the client were loading more data into it, it got a lot slower. Our metrics also showed the response time (P95) in the data center that got quite high. It measured around 500 ms instead of the normal 100 ms.

Continue reading

An easy way to create Sitecore config files

Some people find it a bit tricky to write Sitecore config files. It can sometimes be a bit tricky or time consuming to get the element structure correct. Ever found yourself debugging an issue where it turned out the config file wasn’t applied properly due to an element structure mistake?

The XPath Tools plugin, by Uli Weltersbach, for Visual Studio is a great help for creating those config patch files. Here’s a way to create those in a fast and simple way:

Continue reading

Improving Editing Performance when using Sitecore Publish Service

The Sitecore Publish Service vastly improves the publish performance in Sitecore. For me it was really hard to get it working properly and I’ve blogged about some of the issues before. I received a lot of good help from Sitecore Support and now it seems like I’ve got into a quite stable state.

However, there is a backside of the Publish Service that may affect the editing performance. Publish Service doesn’t use the PublishQueue table for knowing what to publish. Instead it has an event mechanism for detecting what needs to be published. As an item is saved, Sitecore emits events to the Publish Service so that it knows what pages should be put into the publish manifest.

Note: The solution in this post may not suit every project. Address this only if you’re experiencing the performance decade described and make sure you test everything well. Make sure you fully understand this approach before dropping it into your project.

As part of the Publish Service package, a item:saved event handler is added to do some post processing. When a unversioned field is changed on an item, the event handler loops over all versions of that language and updates the __Revision field. When a shared field is changed on an item, the event handler loops over all versions on all languages and updates the __Revision field. Thereby the Publish Service gets a notification that the content of the item has been changed.
Continue reading

Sitecore X-Forwarded-For handling

A Sitecore solution is typically behind one or several reverse proxies, such as load balancers, content delivery networks etc. From a Content Delivery server perspective, the remote address, i.e. “the visible client IP” is the closes proxy instead of the IP of the connecting client. To solve this, the chain of proxies adds a http header with the IP address it’s communicating with. This header is typically called X-Forwarded-For or X-Real-IP.

Below is an example of such setup. Each proxy adds the IP they’re receiving the connection from:


Continue reading

Sitecore Publish Service 3.1 update-1

After having tons of problems and several filed tickets on the initial release of Sitecore Publish Service 3.1, I was happy to find that Sitecore have addressed many of the problems of the previous versions. This update contains 12 fixes and I found my customer support ticket number listed six times.

Sitecore Publish Service 3.1 update 1 release notesUnfortunately the update didn’t solve these issues properly, so while I’m waiting for new patches I thought I’d share a UI fix that wasn’t included in the release. When working with multiple languages, the language list isn’t very user friendly in the Publish Service UI. It’s essentially just becomes a small letterbox with unsorted languages and a large area for displaying the targets.

This is the layout provided as default when having multiple languages:

Default Publish Service dialog
Continue reading

Memory hungry Sitecore indexing

While investigating stability issues, I’ve found a few things that may need addressing.

Sitecore updates indexes in batches. This is good in general, but it turned out it may be very memory hungry. There are essentially two config parameters you can control the batch size with:

<setting name="ContentSearch.ParallelIndexing.Enabled" value="true" />
<setting name="ContentSearch.IndexUpdate.BatchSize" value="300" />

The default config above, essentially means Sitecore will start multiple threads processing 300 indexable objects each. This might not be an issue at all, but when combined with a multi-language setup, media indexing and crazy authors, this may become a real problem.
Continue reading

Sitecore Language Fallback caching issue

Language Fallback is a powerful feature in Sitecore. It’s been around for years as a module and since Sitecore 8.1 it is part of the core product. In short, it allow values to be inherited between item language versions. This allows you to show default content when translation is missing. You may have dialects of a languages, such as US English vs British English, and you can use Language Fallback to avoid translating content that is the same for the two dialects etc.

TL;DR
Increase Caching.SmallCacheSize to about 10MB if you’re using Language Fallback in Sitecore.

Continue reading

Things to test when using Sitecore Content Search and Publish Service

This is partly a follow-up post of my previous post on Workign with Solr and Sitecore Content Search in Sitecore 9. In that post I raised a few issues that needs to be dealt with, and I’ve found some more. Most of what’s in this post I’ve found on Sitecore 9.0 update-1 and/or Sitecore 8.2 update-5, and it seems like most of the things applies to many more versions too. So I’ve focused on how you can verify if your solution needs patching too.

This is essentially just a brief list of some issues I’ve found over the last few weeks, while working against the clock towards the releases of two large Sitecore projects. Big thank you to all my great colleagues that have put an enormous effort into getting things to work.
Continue reading

Working with Content Search and Solr in Sitecore 9

During an upgrade project to Sitecore 9, I got some insights worth sharing. Some findings in this post applies to multiple Sitecore versions and some are specific to Sitecore 9. I’ve been using SolrCloud 6.6, but some of it applies to other versions as well. It be came a long, yet very abbreviated, post covering many areas.

In this post:

  • Solr Managed schemas in Sitecore 9
  • Tune and extend Solr managed schema to fit your needs
  • How to fix Sitecore config for correct Solr indexing and stemming
  • How to make switching index work with Solr Cloud
  • How to reduce index sizes and gain speed using opt-in
  • How to make opt-in work with Sitecore (bug workaround)
  • Why (myfield == Guid.Empty) won’t give you the result you’re expecting

Continue reading

Sitecore MVP 2018

itecore MVP Technology 2018Thank you Sitecore for awarding me “Technology Most Valuable Professional” (MVP) again! Six years in a row!

The Sitecore MVP Award celebrates the most active Sitcore community members from around the world who provide valuable online and offline expertise that enriches the community experience and makes a difference.

My contribution to Sitecore and the community over the last year have, besides this blog, have been a continuous dialog with Sitecore staff on how to improve the product and I’ve filed around thirty confirmed bugs. I’ve also held a few talks on Sitecore User Group Gothenburg (SUGGOT) and a few modules are shared on GitHub.

Improved Sitecore delete item access rights

Sitecore has a quite advanced access right management system. However, I’ve found a few quite common requirements that, as far as I know, isn’t supported out of the box. One is to allow content authors to remove individual item versions without allowing them to remove the entire item. This is especially useful for multi language sites. Another requirement is to allow authors to delete items they have created themselves, but no other items.

I’ve seen people work around these kind of issues by playing around in the core database, modifying ribbon buttons etc. Personally I don’t like that approach. That would just hide the button and if the user could initiate the command in any other way, Sitecore will gladly perform the delete action. It’s easy to forget a command action in a context menu or something like that.

Instead I created a very small Sitecore module to solve these issues. All the source is available on Github.
Continue reading