Dealing with Solr Managed Schema through Sitecore config files

When working with Content Search and Solr in Sitecore, it’s quite common that a field needs to be managed in both Sitecore and Solr. It can for example be a computed field or a basic field where you want specific field processing etc. where you need to slightly change the Solr schema. Having the configuration for such scenario split in two places can be quite annoying.

I while ago, I wrote a small module that’ll let you put your Solr Managed Schema configuration as part of the standard Sitecore configuration. This gives a better overview and configuration that plays together, stays together.

The module adds a new <solrManagedSchema> section in the contentSearch/indexConfiguration-section. The content of this section is almost identical to what you’d put in the Solr schema. The only difference is that fields, field types, dynamic fields and copy-fields are grouped separately with a Sitecore “hint” attribute.

Let’s describe the benefits of this with an example. Let’s say you need a computed field returning a set decimal numbers, like a set of variant specs etc. You’d obviously need the computed field class itself, but you’d also need to do a type match (because list of floats are not within the default mapping) and you’d need to modify the schema to support fields with multiple floats. The configuration could end up something like this:

Continue reading

Sitecore Azure Blob Storage module findings

A few years ago, I wrote about storing Sitecore binaries in an external blob storage service instead of having them in the database. You can read more about it here and the code is available on GitHub. It has several benefits and it works great! I’ve used this implementation in production on large scale solutions for many years.

In Sitecore 9.3, Sitecore introduced its own Azure Blob Storage module, that uses the same principles. Sitecore also slightly changed how databases are configured, so my old module works up to 9.2 as it is right now.

Since Sitecore supports for their own module, it makes sense to use that one instead of running a custom one. However, in true Sitecore spirit, the module was released without testing, so beware of the findings below before using it:

Update: Sitecore have issued a cumulative update to Sitecore 9.3 that addresses the two major faults described in this post. Make sure you have SC Hotfix 415404-1 or later installed.

Continue reading

Prevent Sitecore editors playing around with rendering cache options

Caching of renderings is a vital part of good performing Sitecore websites. Getting all the settings right aren’t trivial and it’s really only the developers of components that knows the details about how each component can be cached. Therefore caching options are typically managed on the rendering items and kept serialized together with solution source.

However, rendering cache options can also be set on a rendering in the layouts field when it’s used on a page. This is done in the “Control Properties” dialog. Personally I’ve never found a use case where this is needed, but the option is there.

Whether a user has access to the rendering parameters in the “Control Properties” dialog is controlled by “write” access to the Standard Rendering parameters template’s standard value. Sigh!

Continue reading

Faster indexing in Sitecore

Sitecore indexes is very powerful for getting various items fast, especially when they are located in various places. There are also some pitfalls that one needs to be aware of. This post covers methods that could be considered in some scenarios. In this post I’ll describe how I reduced indexing time from around three hours down to two minutes for a specific scenario.

Continue reading

Installing Content Hub CMP connector to Sitecore CM having Publish Service installed

There is a dll version conflict for Polly.dll between Sitecore CMP connector and Sitecore Publish Service 4.1.0. The Publishing Service Module comes with Polly.dll version 5.9.0 and the Sitecore Connect for CMP 1.0.0 comes with Polly 6.0.1. This will cause Sitecore CM to stop working during the CMP install, if the SPS module is already in the system.

Update: As Sitecore 9.3 and SPS 4.2.0 was just released a few hours after writing this post, I noticed this applies to that version as well. SPS module 9.3 also comes with Polly 5.9.0.

Update 2: After some help from Sitecore Support and some additional adjustments, I got the following solution to work on my machine:

Keep you 5.9.0 version of Polly.dll in the bin folder. Create a cmp sub folder (bin/cmp) and put the 6.0.1 version of Polly.dll in that folder. Then add the following to your web.config assembly binding section:

   <assemblyIdentity name="Polly" publicKeyToken="c8a3ffc3f8f825cc" />
   <codeBase version="" href="bin\Polly.dll" />
   <codeBase version="" href="bin\cmp\Polly.dll" />
   <codeBase version="" href="bin\cmp\Polly.dll" />

I don’t think the binding is really needed, but I found in my logs that the CMP connector was trying to load both and

The easy fix for this is to add assembly redirects into the web.config file before installing the connector and keep the old file: Update: It turned out I tricked myself. I thought I got everything working, but the CMP connector throws exceptions in the log while importing content from Content Hub.

  <assemblyIdentity name="Polly" /><!-- publicKeyToken="c8a3ffc3f8f825cc" -->
  <bindingRedirect oldVersion="" newVersion=""/>

As described in this stackoverflow post, it’s not possible to do assembly redirect between assemblies with different public keys. Polly 5.9 doesn’t have a public key (i.e. it’s null), so at least I couldn’t make binding work to the newer 6.0 version.

As of writing this, I don’t have a solution to this problem. I’m currently awaiting an answer from Sitecore if CMP and SPS can live together or not.

Clean up Sitecore database and avoid corrupt published content

We’ve discovered a rare issue in Sitecore Publish Service (SPS) where it may publish incorrect content to some fields. Even though I think SPS does this wrong, the root cause was inconsistent data in the master database. It turned out such inconsistency exist in most databases, even in a clean Sitecore install.

Continue reading

Improving Sitecore code quality with ReSharper External Annotations

I guess most of us Sitecore developers are familiar with the JetBrains ReSharper plugin for Visual Studio. The tool actually made me accept moving from the Java/IntelliJ IDEA world to the .Net/Visual Studio world many many years ago and I’m still on the Idea keyboard shortcut scheme.

Besides all the nice refactoring tools, code hints etc that comes with ReSharper, it also comes with a framework for annotating code with attributes. One can argue if this should be used or not in your own code, but it opens for a really nice way for improving code quality when working with external libraries.

As Sitecore has grown over the years, the API becomes larger and larger and there are sometimes multiple ways of achieving the same thing. Sometimes the API is a bit ambiguous to new developers and some operations should be avoided from a performance perspective etc. With ReSharper External Annotations we can give developers code hints and feedback directly in Visual Studio when using the Sitecore API in a way that may not be intended.

Continue reading

Learnings from a year of implementing Sitecore Publishing Service

Sitecore Publishing Service (SPS) is a replacement for the built-in publish function. It’s built on dotnet core and runs as a separate micro service instead of the built-in publisher that runs in-process.

I’ve been using it, or rather tried to use it, for about a year now on a large Sitecore 9.0.1 solution. It was everything but a smooth ride, so I thought it would be worth sharing my experience and what I learned during the process.

SPS has its clear advantages regarding the speed it publishes content. It’s not as “lightning fast” as Sitecore claims it to be, but still a lot faster than the built-in one. The greatest advantage, in my opinion, is that it runs outside the Sitecore Content Management (CM) worker process. So an ongoing publish processes doesn’t break due to an IIS application pool recycle. Those two reasons were also why we tried moving to SPS.

Note: This post contains my experiences while working with SPS 3.0 to 3.1.3. Some of the issues have been fixed in later versions. Some issues may also remain in SPS 4 as it was released before 3.1.3. Many of the issues turned out to exist in 2.x as well.

Continue reading

Indexing and OCR scanning PDF documents in Sitecore

PDF documents in Sitecore media library can be indexed using IFilters, but it has faced its limitations regarding Azure support etc and isn’t very effective from a performance point of view. The way the extracted content is indexed also makes it harder to use in multi-language solutions.

I’ve taken a different approach on indexing PDF documents, making it more accurate and increase the performance at the same time. The IFilter approach is a generic approach, supporting multiple file formats. I’ve focused on PDF documents in this post, as it’s a common format. Similar principles can be applied to other file formats as well.

In this post:

  • Avoiding heavy computation during index time
  • Extracting document content through PDF libraries
  • OCR scanning of image/non-text based PDF documents
  • Indexing documents with language stemming
Continue reading

Inherited and non-inherited fields to Sitecore clone items

When an item is cloned in Sitecore, the clones inherits its values from the source item. This is represented by a null value in each field, meaning that it inherits its value from the clone source item. When a value is written to a field in a clone, that value is used instead, hence breaking the inheritance. This works great in most cases.

In some scenarios you might not want to inherit all the fields. You might want to exclude some of them, enforcing a local value in each field for such clones. By default a few fields are not inherited. Those are __Created, __Created by__Updated, __Updated by, __Revision, __Source, __Source item, __Workflow, __Workflow state and __Lock. It’s quite natural that those fields are not inherited to clones, since each item, the source and the clone, should keep their own values of those fields.

You can add your own fields to this list by modifying the ItemCloning.NonInheritedFields setting. It’s a string setting where you can provide a pipe (|) separated list list of field ID’s or field keys. The drawback of the setting being a pipe separated list, is that it’s hard to add additional fields through config patch files. I hope Sitecore will change this in the future.

Continue reading

Defragment the SQL Server heap on Sitecore databases

I discovered that the heap gets very fragmented in SQL Server in some of our solutions. Large tables, such as Items, Shared-, Versioned- and Unversioned-fields, Blobs, Descendants and Links tables, that easily occupies a few GB on disk, also suffered from great fragmentation. More than 90% fragmentation was common.

From what I’ve found, the only way to fix SQL Server Heap fragmentation (the heap is where all the table data is stored), is to have a clustered index on each table.

However, I noticed that no tables in the Sitecore databases have any clustered indexes. All indexes are non-clustered in the common master/core/web databases. Sitecore used to have clustered indexes back in 5.2, but over the course of multiple Sitecore versions, the database schema has changed to non-clustered indexes.

A clustered index means that the table rows as stored in the index order physically on disk. That’s also why there can be only one clustered index per table. With a non-clustered index, there is a second list that has pointers to the physical rows. It’s generally faster to read from a clustered index, but it may be slower to write to it as there may be a need to rearrange the table data.

Continue reading

Correcting ambiguous Sitecore field scopes

As you probably know, all fields in Sitecore can have one of three field scopes: Versioned (aka Normal), Unversioned and Shared. Versioned fields have individual version numbers for each language. Unversioned fields have individual values for each language in the same way as versioned fields, but there can be only one value per language. Shared fields are just a single value regardless of language and item version. There are no such thing as a “versioned shared” field type.

This is configured using two check boxes on a field level: Shared and Unversioned. If none are checked, the field becomes a versioned field. As you see, there’s an ambiguous “invalid” state where both check boxes are checked. In this case, Shared has precedence.

Continue reading

Sorting with Sitecore Content Search/Solr

Sorting search results are rather straight forward at first glance, but there are some pitfalls to be aware of. When using Sitecore Content Search, the Linq provider supports the OrderBy method and it get serialized into a sort statement in a Solr query. Example:

var result = searchContext.GetQueryable<MyModel>()
   .OrderBy(x => x.DisplayName)

will be serialized into a Solr query like

?q=...&fq=...&sort=_displayname ASC

This usually works quite well, but consider the following list of item display names:

Continue reading

Sitecore MVP 2019

Sitecore MVP Technology 2019

Thank you Sitecore for awarding me “Most Valuable Professional” (MVP) again! Seven years in row!

The Sitecore MVP Award celebrates the most active Sitecore community members from around the world who provide valuable online and offline expertise that enriches the community and makes a difference

My contribution to Sitecore and the community over the last year have, besides the nine posts on this blog, have been mostly focused on improving the product by having a dialog with various Sitecore staff. During 2018 I filed over 50 confirmed bugs, mostly related to Sitecore Publish Service and Content Search and a handful of accepted product enhancements.

Optimize Sitecore Solr queries

I’ve written a few posts on Sitecore Content Search and Solr already, but there seems to be an infinite amount of things to discover and learn in this area. Previously I’ve pointed out the importance of configuring the Solr index correctly and the benefit of picking the fields to index, i.e. not indexing all fields as default (<indexAllFields>false</indexAllFields>). This will vastly improve the performance of Content Search operations and reduce the index size in large solutions.

Recently I’ve been investigating a performance issue with one of our Sitecore solutions. This one is running Sitecore 9 with quite a lot of data in it. It’s been performing quite well, but as the client were loading more data into it, it got a lot slower. Our metrics also showed the response time (P95) in the data center that got quite high. It measured around 500 ms instead of the normal 100 ms.

Continue reading

An easy way to create Sitecore config files

Some people find it a bit tricky to write Sitecore config files. It can sometimes be a bit tricky or time consuming to get the element structure correct. Ever found yourself debugging an issue where it turned out the config file wasn’t applied properly due to an element structure mistake?

The XPath Tools plugin, by Uli Weltersbach, for Visual Studio is a great help for creating those config patch files. Here’s a way to create those in a fast and simple way:

Continue reading

Improving Editing Performance when using Sitecore Publish Service

The Sitecore Publish Service vastly improves the publish performance in Sitecore. For me it was really hard to get it working properly and I’ve blogged about some of the issues before. I received a lot of good help from Sitecore Support and now it seems like I’ve got into a quite stable state.

However, there is a backside of the Publish Service that may affect the editing performance. Publish Service doesn’t use the PublishQueue table for knowing what to publish. Instead it has an event mechanism for detecting what needs to be published. As an item is saved, Sitecore emits events to the Publish Service so that it knows what pages should be put into the publish manifest.

Note: The solution in this post may not suit every project. Address this only if you’re experiencing the performance decade described and make sure you test everything well. Make sure you fully understand this approach before dropping it into your project.

As part of the Publish Service package, a item:saved event handler is added to do some post processing. When a unversioned field is changed on an item, the event handler loops over all versions of that language and updates the __Revision field. When a shared field is changed on an item, the event handler loops over all versions on all languages and updates the __Revision field. Thereby the Publish Service gets a notification that the content of the item has been changed.
Continue reading

Sitecore X-Forwarded-For handling

A Sitecore solution is typically behind one or several reverse proxies, such as load balancers, content delivery networks etc. From a Content Delivery server perspective, the remote address, i.e. “the visible client IP” is the closes proxy instead of the IP of the connecting client. To solve this, the chain of proxies adds a http header with the IP address it’s communicating with. This header is typically called X-Forwarded-For or X-Real-IP.

Below is an example of such setup. Each proxy adds the IP they’re receiving the connection from:

Continue reading

Sitecore Publish Service 3.1 update-1

After having tons of problems and several filed tickets on the initial release of Sitecore Publish Service 3.1, I was happy to find that Sitecore have addressed many of the problems of the previous versions. This update contains 12 fixes and I found my customer support ticket number listed six times.

Sitecore Publish Service 3.1 update 1 release notesUnfortunately the update didn’t solve these issues properly, so while I’m waiting for new patches I thought I’d share a UI fix that wasn’t included in the release. When working with multiple languages, the language list isn’t very user friendly in the Publish Service UI. It’s essentially just becomes a small letterbox with unsorted languages and a large area for displaying the targets.

This is the layout provided as default when having multiple languages:

Default Publish Service dialog
Continue reading

Memory hungry Sitecore indexing

While investigating stability issues, I’ve found a few things that may need addressing.

Sitecore updates indexes in batches. This is good in general, but it turned out it may be very memory hungry. There are essentially two config parameters you can control the batch size with:

<setting name="ContentSearch.ParallelIndexing.Enabled" value="true" />
<setting name="ContentSearch.IndexUpdate.BatchSize" value="300" />

The default config above, essentially means Sitecore will start multiple threads processing 300 indexable objects each. This might not be an issue at all, but when combined with a multi-language setup, media indexing and crazy authors, this may become a real problem.
Continue reading