Yet another Sitecore scheduled publishing engine

When you schedule an item in Sitecore, it doesn’t mean that the item gets published or unpublished at that date. It just means that it is possible to publish the item within the given period. One have to perform an actual publish for it to actually happen.

There are many great modules already that handles this, such as Scheduled Publish Module for Sitecore by Hedgehog.

In my scenario, we just wanted to trigger a publish for items as they are scheduled. Scanning through the whole database is very expensive though, so I decided to make one that utilizes our search index instead. We use Solr in our solution, but it’ll probably work as good with Lucene as well.

Update: Some people had issues with the date range queries when not running in UTC mode. I’ve updated the code so that it forces UTC handling of all dates. If you’ve grabbed the code here previously, take a close look at the “u”-format in Load/SaveLastPublishTimestamp.

First, I created a computed field that stores the dates when a publish needs to occur. This can be either a start date or an end date. The field is defined as a list of dates in the index.

public class ScheduledPublishComputedField : AbstractComputedIndexField
{
	public override object ComputeFieldValue(IIndexable indexable)
	{
		Sitecore.Data.Items.Item item = indexable as SitecoreIndexableItem;
		if (item == null)
			return null;

		var dateList = new List<DateTime>();
		if (item.Publishing.PublishDate != DateTimeOffset.MinValue.UtcDateTime)
			dateList.Add(item.Publishing.PublishDate);
		if (item.Publishing.UnpublishDate != DateTimeOffset.MaxValue.UtcDateTime)
			dateList.Add(item.Publishing.UnpublishDate);

		if (item.Publishing.ValidFrom != DateTimeOffset.MinValue.UtcDateTime)
			dateList.Add(item.Publishing.ValidFrom);
		if (item.Publishing.ValidTo != DateTimeOffset.MaxValue.UtcDateTime)
			dateList.Add(item.Publishing.ValidTo);

		return dateList.Count == 0 ? null : dateList;
	}
}

And the field is added into the list of computed fields in the configuration

<fields hint="raw:AddComputedIndexField">
  <field fieldName="scheduledpublish" returnType="datetimeCollection">Stendahls.Sc.ScheduledPublishing.ScheduledPublishComputedField, Stendahls.Sc.ScheduledPublishing</field>
</fields>

Then we need an object that can represent items that should be published that we’ll use when querying the index:

protected class PublishItem
{
	[IndexField("_group")]
	public Guid ID { get; internal set; }

	[IndexField("_fullpath")]
	public string FullPath { get; internal set; }

	[IndexField("_language")]
	public string LanguageName { get; internal set; }

	[IndexField("_latestversion")]
	public bool LatestVersion { get; internal set; }

	[IndexField("scheduledpublish")]
	[IgnoreIndexField]
	internal DateTime ScheduledPublishDateTime { get; set; }
}

Then we need an agent that can check the index if there are any items that needs publishing. Essentially we just need a timestamp when the agent was previously executed, so instead of creating new database tables etc, I decided to just use a property in the core database. So the agent wrapper becomes quite simple:

public void Run()
{
	var database = Factory.GetDatabase(SourceDatabase);
	var now = DateTime.UtcNow;

	var lastRun = LoadLastPublishTimestamp(database);
	if (lastRun == DateTime.MinValue)
	{
		SaveLastPublishTimestamp(database, now);
		return;
	}

	try
	{
		PerformPublish(lastRun, now);
	}
	finally
	{
		SaveLastPublishTimestamp(database, now);
	}
}

private string PropertiesKey
{
	get
	{
		return "ScheduledPublishing_" + Settings.InstanceName;
	}
}

public DateTime LoadLastPublishTimestamp(Database db)
{
	string str = db.Properties[PropertiesKey];
	DateTime d;
	DateTime.TryParseExact(str, "u", CultureInfo.InvariantCulture,
		DateTimeStyles.None, out d);
	return DateTime.SpecifyKind(d, DateTimeKind.Utc);
}

public void SaveLastPublishTimestamp(Database db, DateTime time)
{
	db.Properties[PropertiesKey] = time.ToString("u", CultureInfo.InvariantCulture);
}

Now we can easily find what items needs publishing by a regular search query:

var searchContxt = ContentSearchManager.GetIndex(SourceIndex).CreateSearchContext();

List<PublishItem> publishItemQueue = new List<PublishItem>();
int skip = 0;
bool fetchMore;
do
{
	var queryResult = searchContxt.GetQueryable<PublishItem>()
		.Filter(f => f.LatestVersion && 
			f.ScheduledPublishDateTime.Between(publishSpanFrom, publishSpanUntil, Inclusion.Upper))
		.OrderBy(f => f.FullPath)
		.Skip(skip)
		.Take(500)
		.GetResults();
	skip += 500;

	publishItemQueue.AddRange(queryResult.Hits.Select(h => h.Document));
	fetchMore = queryResult.TotalSearchResults > skip;
} while (fetchMore);

I decided to opt for publishing related items for each item that is scheduled. This may require adaptations according to you needs. The complete source of the agent ended up like this in my case:

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using Sitecore.Collections;
using Sitecore.Configuration;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Linq;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Data.Managers;
using Sitecore.Diagnostics;
using Sitecore.Diagnostics.PerformanceCounters;
using Sitecore.Globalization;
using Sitecore.Publishing;

namespace Stendahls.Sc.ScheduledPublishing
{
	public class PublishAgent
	{
		public string SourceDatabase { get; private set; }
		public string SourceIndex { get; private set; }
		public List<string> TargetDatabases { get; private set; }
		public PublishMode Mode { get; private set; }

		public PublishAgent(string sourceDatabase, string sourceIndex, string targetDatabases)
		{
			Assert.ArgumentNotNullOrEmpty(sourceDatabase, "sourceDatabase");
			Assert.ArgumentNotNullOrEmpty(sourceDatabase, "sourceIndex");
			Assert.ArgumentNotNullOrEmpty(targetDatabases, "targetDatabase");
			SourceDatabase = sourceDatabase;
			SourceIndex = sourceIndex;
			TargetDatabases = ParseDatabases(targetDatabases);
		}

		public void Run()
		{
			var database = Factory.GetDatabase(SourceDatabase);
			var now = DateTime.UtcNow;

			var lastRun = LoadLastPublishTimestamp(database);
			if (lastRun == DateTime.MinValue)
			{
				SaveLastPublishTimestamp(database, now);
				return;
			}

			try
			{
				PerformPublish(lastRun, now);
			}
			finally
			{
				SaveLastPublishTimestamp(database, now);
			}
		}

		public void PerformPublish(DateTime publishSpanFrom, DateTime publishSpanUntil)
		{
			var searchContxt = ContentSearchManager.GetIndex(SourceIndex).CreateSearchContext();

			List<PublishItem> publishItemQueue = new List<PublishItem>();
			int skip = 0;
			bool fetchMore;
			do
			{
				var queryResult = searchContxt.GetQueryable<PublishItem>()
					.Filter(f => f.LatestVersion && 
						f.ScheduledPublishDateTime.Between(publishSpanFrom, publishSpanUntil, Inclusion.Upper))
					.OrderBy(f => f.FullPath)
					.Skip(skip)
					.Take(500)
					.GetResults();
				skip += 500;

				publishItemQueue.AddRange(queryResult.Hits.Select(h => h.Document));
				fetchMore = queryResult.TotalSearchResults > skip;
			} while (fetchMore);

			if (publishItemQueue.Count == 0)
				return;

			var db = Factory.GetDatabase(SourceDatabase);
			var publishingTargets = GetPublishingTargets(db, TargetDatabases);

			// Loop over the queue, but not using a regular foreach, since
			// we'll remove items from the queue as they are processed as related items.
			while (publishItemQueue.Count > 0)
			{
				var publishItem = publishItemQueue.First();
				publishItemQueue.RemoveAt(0);

				if (publishItem.LanguageName == null)
					continue;

				var language = LanguageManager.GetLanguage(publishItem.LanguageName);
				if (language == null)
					continue;

				var item = db.GetItem(new ID(publishItem.ID), language);
				if (item == null)
					continue;

				PublishItemAsyncWithRelatedItems(item, publishingTargets);
			}
		}

		protected virtual void PublishItemAsyncWithRelatedItems(Item item, List<string> publishingTargets)
		{
			if (item == null)
				return;

			var targetDb = Factory.GetDatabase(TargetDatabases.First());
			var options = new PublishOptions(item.Database, targetDb, PublishMode.Incremental, item.Language,
				DateTime.UtcNow, publishingTargets)
			{
				RootItem = item,
				Deep = true,
				PublishRelatedItems = true
			};
			var publisher = new Publisher(options);
			publisher.PublishAsync();

			// Increment performance counter
			JobsCount.TasksPublishings.Increment();
		}

		private List<string> GetPublishingTargets(Database sourceDatabase, ICollection<string> targetDatabases)
		{
			var targets = new List<string>();
			var parent = sourceDatabase.GetItem("/sitecore/system/publishing targets");
			// Loop over all targets and add those that matches the database list
			foreach (Item target in parent.GetChildren(ChildListOptions.SkipSorting))
			{
				if (targetDatabases.Contains(target["Target database"]))
					targets.Add(target.ID.ToString());
			}
			return targets;
		}

		private string PropertiesKey
		{
			get
			{
				return "ScheduledPublishing_" + Settings.InstanceName;
			}
		}

		public DateTime LoadLastPublishTimestamp(Database db)
		{
			string str = db.Properties[PropertiesKey];
			DateTime d;
			DateTime.TryParseExact(str, "u", CultureInfo.InvariantCulture,
				DateTimeStyles.None, out d);
			return DateTime.SpecifyKind(d, DateTimeKind.Utc);
		}

		public void SaveLastPublishTimestamp(Database db, DateTime time)
		{
			db.Properties[PropertiesKey] = time.ToString("u", CultureInfo.InvariantCulture);
		}

		private static List<string> ParseDatabases(string databases)
		{
			return databases.Split(',')
				.Select(s => s.Trim())
				.Where(s => !string.IsNullOrWhiteSpace(s))
				.ToList();
		}
	}
}
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:x="http://www.sitecore.net/xmlconfig/">
	<sitecore>
		<scheduling>
			<agent type="Stendahls.Sc.ScheduledPublishing.PublishAgent, Stendahls.Sc.ScheduledPublishing" method="Run" interval="00:30:00" >
				<param desc="source database">master</param>
				<param desc="source index">sitecore_master_index</param>
				<param desc="publish targets">web</param>
			</agent>
		</scheduling>
		<contentSearch>
			<indexConfigurations>
				<defaultSolrIndexConfiguration>
					<fields hint="raw:AddComputedIndexField">
						<field fieldName="scheduledpublish" returnType="datetimeCollection">Stendahls.Sc.ScheduledPublishing.ScheduledPublishComputedField, Stendahls.Sc.ScheduledPublishing</field>
					</fields>
				</defaultSolrIndexConfiguration>
			</indexConfigurations>
		</contentSearch>
	</sitecore>
</configuration>

21 thoughts on “Yet another Sitecore scheduled publishing engine

  1. First of all thank you. And now I have a question which DLL reference should be used for the following:

    using Sitecore.ContentSearch;
    using Sitecore.ContentSearch.Linq;
    using Sitecore.Globalization;

  2. Sitecore.ContentSearch.dll, Sitecore.ContentSearch.Linq.dll and Sitecore.Kernel.dll if I remember right.

  3. When Trying to build I am getting these errors:

    Error CS1061 ‘PublishItem’ does not contain a definition for ‘LatestVersion’ and no extension method ‘LatestVersion’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)
    Error CS1061 ‘PublishItem’ does not contain a definition for ‘ScheduledPublishDateTime’ and no extension method ‘ScheduledPublishDateTime’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)
    Error CS1061 ‘PublishItem’ does not contain a definition for ‘FullPath’ and no extension method ‘FullPath’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)
    Error CS1061 ‘PublishItem’ does not contain a definition for ‘LanguageName’ and no extension method ‘LanguageName’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)
    Error CS1061 ‘PublishItem’ does not contain a definition for ‘LanguageName’ and no extension method ‘LanguageName’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)
    Error CS1061 ‘PublishItem’ does not contain a definition for ‘ID’ and no extension method ‘ID’ accepting a first argument of type ‘PublishItem’ could be found (are you missing a using directive or an assembly reference?)

    I am running Sitecore 8.2 is that because I am using the new Kernel.dll ?

    • Hi,
      sorry about the blog post being a bit unclear about this. The solution requires three classes in order to work, and it looks like you’re missing one or two of them. The big code block at the end only covers the PublishAgent class. You also need the ScheduledPublishComputedField and PublishItem classes as described previously in the post.

      Please note also that 8.2 has a new optional publish engine that you may install separately. This code is not tested with 8.2 yet. I believe it will work if you’re still on the default/old publish engine, but I assume some changes are needed for it to work with the new stand-alone .Net Core based publishing service.

    • I’ve used it on 8.2 and 9.0, but I’m unsure if I had to change anything in the code. I’ll check and update the post later today,

        • I am running it with the publishing service on Sitecore 9, so yes it work. Just not sure if it’s the latest version in that repo

          • Hi,
            I ran this code. But then publishing isn’t working for me. The job runs in the backend but then nothing is getting published to web database. No errors in log\publshing log. Can you please share the latest repo and the solr schema file if possible ?I would like to confirm the dynamicfield value for datatime collection.

          • Hi,
            I ran this code. Had to make the mode as ‘Smart’ as Incremental can be used for single item publish. The job runs in the backend but the item isn’t getting updated in web database. Can you please share the latest repo if possible?

          • The field type in Sitecore is configured like this:

            <typeMatch type=”System.DateTime[]” typeName=”datetimeArray” fieldNameFormat=”{0}_dtm” multiValued=”true” settingType=”Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider”/>

            The field type, *_dtm, is configued like this in Solr:

            <dynamicField name=”*_dtm” type=”date” multiValued=”true” indexed=”true” stored=”true”/>

            <fieldType name=”date” class=”solr.TrieDateField” positionIncrementGap=”0″ docValues=”true” precisionStep=”0″/>

          • I’ve reviewed the source code, and I’ve just made minor changes related to a specific customer. I’ll have to look deeper into why it’s not enqueening the jobs. There are tons of issues in the Publishing Service too, so it might be something there as well.

  4. The computed field for scheduledpublish isn’t working. It always shows min value(1/1/0001). With that, the search api always returns the item even though the publishedfrom and To time elapses

    • Have you checked the field type of the computed field? What indexing engine are you using? I’ve found that the default config for datetime is sometimes wrong. Please note that the computed field is storing an array of datetimes – not a single datetime. A min-value (1/1/0001) is likely to be indexed among other datetime values in the array, but if it’s a single value, you might get the behavior you’re describing.

  5. Hi, We are using solr. The field type of the computed field is DateTime.(same as above mentioned). I can see that its getting indexed in solr as below
    scheduledpublish_tdtm”: [
    “2018-06-04T12:17:00Z”,
    “2018-06-04T12:30:00Z”
    ]
    But when querying, in debug mode, it always shows only 1\1\0001 has value.

    • If you’re looking at the “PublishItem” query type used in the PerformPublish method, this expected. This is because the query model uses a single DateTime and the field is stored as multiValue. The ScheduledPublishDateTime property is used only in the Solr “between” linq statement and cannot be read. That query is only expected to return a list of items that are in the window of publishing.

      • Ok. But then the item should be fetched only once right in search results. But then its getting picked up each time the job runs.Any reason for it

        • Not really. The stored array of dates in the index are all dates when the state of the item changes (publish from/to date, visible from/to). When a state changes, the item needs publishing. The agent runs periodically and finds all items that have any state change in the period from when the agent ran last time until now, so this way it’ll get a list of all items that needs to be published (or unpublished).

          • No, what I meant is the item is getting is picked up each time the job runs even if the Publish date is past date\ended. It should be run\picked up in that particular publish period set but it gets picked even out of the window.

Leave a Reply