SharePoint
SharePoint Search – Content Enrichment WebService
Recently we came across the need to manipulate the values of Managed Properties. In our specific case it was required that for some contents, the File Type Managed Property should be empty.
In order to do so we used the Content Enrichment web service, which is a service that adds a custom step to content processing in order to enable users to modify managed properties values before they are indexed.
In this blog post we’ll walk you through the steps we took in order to configure the service and also talk about some of it’s limitations and how we can overcome them.
Creating the Web Service
The web service consists of a basic WCF that receives an array of managed properties and outputs an array of managed properties that are customized according to our needs. In our scenario we just needed to clean the FileType managed property of some contents, so our output will be an array containing just the FileType managed property with an empty value. Let’s begin:
- Create a WCF Service Application (ex: SLEnrichmentService)
- Add a reference to the Microsoft.Office.Server.Search.ContentProcessingEnrichment.dll (located in Installation Path\Microsoft Office Servers\15.0\Search\Applications\External\)
- Modify the service to implement IContentProcessingEnrichmentService the contract interface as following:
public class SLEnrichmentService : IContentProcessingEnrichmentService { private const string FileTypeManagedProperty = "FileType"; private const int UnexpectedType = 1; private const int UnexpectedError = 2; private readonly ProcessedItem processedItemHolder = new ProcessedItem { ItemProperties = new List<AbstractProperty>() }; public ProcessedItem ProcessItem(Item item) { processedItemHolder.ErrorCode = 0; processedItemHolder.ItemProperties.Clear(); try { var fileTypeProperty = item.ItemProperties.Find(x => x.Name.Equals(FileTypeManagedProperty, StringComparison.InvariantCultureIgnoreCase)) as Property<string>; fileTypeProperty.Value = null; processedItemHolder.ItemProperties.Add(fileTypeProperty); return processedItemHolder; } catch (Exception) { processedItemHolder.ErrorCode = UnexpectedError; } return processedItemHolder; } }
- Publish your service
Configure SharePoint
Publishing it is not enough, In order for the Search Service to call our service we first need to register it. To do that we need to run a simple Powershell in which we say where the Web Service is located, which Managed Properties it expects and what managed properties it will return. In this specific case we just need to clean the FileType property, so this will be both our input and output property.
Since we just need the service to be called for some types of content, in our powershell we’ll have a trigger condition that validates that condition. In this scenario we have a custom Managed Property called SLExtChannel that holds the values of the types of contents, we will use it to check if the service should be called or not.
Here is our powershell:
$ssa = Get-SPEnterpriseSearchServiceApplication $config = New-SPEnterpriseSearchContentEnrichmentConfiguration $config.Endpoint = "https://searchsql.sp2013cm.local/Service/SLEnrichmentService.svc" $config.InputProperties = "FileType" $config.OutputProperties = "FileType" $config.SendRawData = $false $config.Timeout = 30000 $config.Trigger = "!(IsNull(SLExtChannel)) AND ((SLExtChannel==""SLNewsContentType"") OR (SLExtChannel==""PurchaseOrder"") OR (SLExtChannel==""SLFAQContentType"") OR (SLExtChannel==""ProcessedReport"") OR (SLExtChannel==""2"") OR (SLExtChannel==""3"") OR (SLExtChannel==""4"") OR (SLExtChannel==""5"") OR (SLExtChannel==""7"") OR (SLExtChannel==""11"") OR (SLExtChannel==""12""))" $config Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication $ssa –ContentEnrichmentConfiguration $config
After running the script all you need to do is perform a Crawl.
Conclusions
Although this was a simple example we can see how useful this web service can be, we can address issues like missing/inconsistent metadata, clean content, merge data from other services, etc.
By using the trigger condition we can ensure that the service will be called only when that condition is verified minimizing the impact on the Crawl/Index.
Nonetheless there are still some drawbacks when it comes to using this service:
- The content enrichment web service is a synchronous callout in the content processing pipeline, complex operations may impact the crawl durations/performance.
- We can have only one Content enrichment service registered per Search Service Application, which means that all content sources will be enable to use the Content Enrichment Service.
If we want to have specific callouts for each Content Source we can still achieve that by registering a WCF Routing Service as our Content Enrichment Web Service and create a route table based on the content source performing the callout.