Generating predictive Sitecore ID’s

Sitecore ID’s, or Guids, are many times great as content can be created in different places and be moved between databases without facing the risk of colliding id’s. When integrating stuff into Sitecore, you may need to represent some data as Items. When you author new content in Sitecore that references such imported data, it suddenly becomes a bit trickier to move data between instances etc. You’d basically have to transfer all the integrated data as well to ensure data consistency. It also removes the ability to just remove integrated data and re-run the process, as it would generate new IDs.

So what if we could integrate data from an external source and represent it as items with a predictive Sitecore ID? I.e. give each item a unique ID that won’t collide with anything else, but will always become the same every time the process is run in every system.

As long as we have some kind of stable unique key in the source, we can use that key with a hashing algorithm and generate a valid Guid. Having such guid, it’s possible to create new Sitecore items with a provided ID, like this:

var newItem = parentItem.Add(name, templateId, newItemID)

I found a piece of code on the net and modified it to my needs into a small helper class. Unfortunately I’ve forgot where I found it, so sorry for the missing originator reference. The helper class basically contains two methods, where a Guid is generated from either a string or byte stream:

public static Guid Create(Guid namespaceId, string name);
public static Guid Create(Guid namespaceId, byte[] nameBytes);

The class takes whatever input it gets, combines it with a namespace and then makes a SHA1 of it. The namespace can be anything, typically a constant in the code. This is basically used to ensure uniqueness if the code is reused in multiple places in the code.

It then needs to truncate this a little bit, since a SHA1 hash is 160 bits and a guid is “just” 128 bits. In reality it’s even a few bits less as some of them are pre-defined. So basically there are 122 bits that varies. This is still unique enough. It’s about one to 1E36, or in decimal 1:1,000,000,000,000,000,000,000,000,000,000,000,000. That’s a large number…

As long as we have something at the source that is unique and persistent, we can feed that into this helper class, together with a namespace. The namespace could really be anything, but I chose to use a Guid, to make it less likely two different parts of the code generate the same id.

Here’s a more verbose example of how this could be used:

public static readonly Guid MyNamespace = new Guid("random guid");

public Item CreateItem (Item parentItem, string name, string sourceUniqueId)
{
    var newItemId = new ID(GuidUtils.Create(MyNamespace, sourceUniqueId);
    var newItem = parentItem.Add(name, templateId, newItemId);
    // ...
}

One thought on “Generating predictive Sitecore ID’s

  1. You’re also unlikely to get collisions with anything in Sitecore already, as Sitecore using Version 4 GUIDs, whereas this code generates version 5 guids. Check out the 13th Hex digit. In Sitecore it is always 4 (which a couple of exceptions), and this code produces 5 (which is as per specification for a namespaced Guid using SHA1).

Comments are closed.