Sorting with Sitecore Content Search/Solr

Sorting search results are rather straight forward at first glance, but there are some pitfalls to be aware of. When using Sitecore Content Search, the Linq provider supports the OrderBy method and it get serialized into a sort statement in a Solr query. Example:

var result = searchContext.GetQueryable<MyModel>()
   .Where(...)
   .OrderBy(x => x.DisplayName)
   .GetResults()

will be serialized into a Solr query like

?q=...&fq=...&sort=_displayname ASC

This usually works quite well, but consider the following list of item display names:

  • California
  • Colorado
  • Connecticut
  • New York
  • North Carolina
  • North Dakota

A query returning the list above, sorted by name, will actually be returned in this order:

  • California
  • North Carolina
  • Colorado
  • Connecticut
  • North Dakota
  • New York

Why is that? Well, it’s because the _displayname-field, as well as most other text fields, are indexed into a field type of text_general, or a language specific field, such as text_en etc. This means that each string is tokenized at index time. So for text_general, each string is split on spaces etc, so each word is indexed separately. Therefore North Carolina comes between California and Colorado, because Carolina is sorted in between. The same goes for North Dakota, where Dakota is sorted between Connecticut and New.

This can be solved in a few different ways, with its pros and cons:

Copy field

This is perhaps the simplest and most elegant way of solving this. In Solr, we can copy the content of a field into a second field that is of another type, like string instead of text_general. This means we can still do free text queries on the text stemmed field and we can sort the result on the string field. Extracts of the Solr schema would then look like this:

<schema>
  <field name="_displayname" type="text_general"  
indexed="true" stored="true" /> 
  <dynamicField name="*_s" type="string" indexed="true" stored="true" />
  <!-- New copy field row --> 
  <copyField source="_displayname" dest="displayname_s" />
</scema>

Then this can be used in Content Search like this:

public class MyModel {
  [IndexField("_displayname")]
  public string DisplayName { get; set; }
  [IndexField("displayname_s")]
  public string SortableDisplayName { get; set; }
}

var result = searchContext.GetQueryable<MyModel>()
  .Where(x => ... /* possible to query on stemmed x.DisplayName */ )
  .OrderBy(x => x.SortableDisplayName)
  .GetResults();

This approach is fast and simple, but it involves modifying the Solr schema

Computed Field

An alternative can be to create a new computed field that essentially does exactly the same thing, like this:

public class SortableDisplayNameComputedField : IComputedIndexField 
{
  public string FieldName { get; set; }
  public string ReturnType { get; set; }
  public object ComputedFieldValue (IIndexable indexable) 
  {
    var item = (indexable as SitecoreIndexableItem)?.Item;
    return item?.DisplayName.ToLower(item.Language.CultureInfo);
  }
}
<fields hint="raw:AddComputedIndexField">
  <field fieldName="displayname" returnType="string">MyNamespace.SortableDisplayNameComputedField, MyAssembly</field>
</fields>

This approach as also quite forward and you don’t need to mess with the schema. There is a small performance trade off though.

The class above can easily be modified so that it can be reused for multiple fields as well. It can also be customized so that it can return really any string representing the item in a sortable way that suits the business logic. I’ve seen examples where complex product names, including digit and letter combinations, where sorting essentially involves splitting the name into pieces and sort them individually.

Solr side sorting

When managing multiple languages, the sorting methods above doesn’t consider that various locales sorts content in different ways. We can fix this by using language specific sorting in Solr, but it requires registering new field types. This approach is probably the best approach when handling large volume of content on multiple languages.

In Solr, we can define ICU Collation field types that will be used for sorting. Then we can define a set of dynamic fields that we can then use for sorting. Below is an example from the Solr schema file with two languages:

<fieldType name="collated_de" class="solr.ICUCollationField" locale="de" strength="primary" />
<fieldType name="collated_sv" class="solr.ICUCollationField" locale="sv" strength="primary" />
...
<dynamicField name="*_c_de" type="collated_de" indexed="false" stored="false" docValues="true" />
<dynamicField name="*_c_sv" type="collated_sv" indexed="false" stored="false" docValues="true" />
...

To get the data into these new fields, we either need to use the Solr copy field feature or we can map this new field type in the Sitecore Content Search config and use computed fields. In Sitecore we can map those as:

<typeMatches hint="raw:AddTypeMatch">
  <typeMatch type="System.String" typeName="sort" fieldNameFormat="{0}_c" cultureFormat="_{1}" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
</typeMatches>

<fields hint="raw:AddComputedIndexField">
  <field fieldName="myfieldname" returnType="sort">MyNamespace.MyComputedField, MyAssembly</field>
</fields>

This solution is probably the best and most versatile solution, but as you see, it involves a few more steps than the other ones. But I think it’s worth it if working with many languages.

Sitecore side sorting

The fourth option is to let the .Net code sort the result instead of Solr. This means the order of the query result from Solr doesn’t matter and result list is then sorted in memory before being rendered on a page. This can be practical when sorting for a specific language is needed, but going through the whole Solr ICU Collation sorting, as described above, isn’t practical. On the .Net side it’s simple to just specify a locale, like this:

var result = searchContext.GetQueryable<MyModel>()
   .Where(...)
   .GetResults()
   .Select(x => x.Document)
   .OrderBy(x => x.DisplayName, icomparer)

However, this approach has a major downside. It will only sort the returned records of a query, so it can not be combined with paging (.Skip(n).Take(n)). Avoid this if you need both locale specific sorting and paging support.

Leave a Reply