Additional/Custom Properties in Elasticsearch

When the content index is initially created it will contain a field mapping for many of the stock properties within Content Manager. 

Field Mapping for the default ES index (as generated by Enterprise Studio)

Field Mapping for the default ES index (as generated by Enterprise Studio)

When you create a custom property you indicate whether the field should be present in the content index.  It is checked by default.  

2017-12-08_14-05-44.png

Once you click Next and then click finish, the custom property is available within Content Manager.  However, it is not yet created in Elasticsearch.  It will be added to the index mapping the first time a document has a value in this property.  Once a value has been indexed, the field is listed in the index mapping.

2017-12-08_14-32-23.png

As I add additional meta-data fields for each record, more meta-data is added to the Elasticsearch index.  Below is an example record after many fields have been added but there's no content in the electronic file.

Note that document content can be empty

Note that document content can be empty

The document in the example was one I downloaded from NARA and saved into CM.  Before they released it they scrubbed all meta-data from the file.  Since KeyView (the product used to extract text) found no text, nothing could be added to the content field for CM.  I corrected this by installing the Tesseract OCR Plugins and generating a new OCR rendition.   

2017-12-09_23-20-23.png

If I now delete a property from Content Manager entirely, it gives me a few warnings about data but nothing about the content index.

2017-12-09_23-29-34.png

Since I deleted the agency field, I wanted to test and see what is now contained within the index.  When I search on "KISS/SCOW" (without quotes), the value from the previous example, I get an error message.  It explains that the slash in the value doesn't parse to a valid query.

2017-12-10_14-23-25.png

If I surround the value with quotes then it parses correctly and shows me a result.  

2017-12-10_14-26-51.png

When I check the record via Kibana I can see that the KISS/SCOW string exists in both the document content and the record meta-data...

2017-12-10_14-34-33.png

There are several options within the client with regarding to reindexing.  The first is an option off the administrative record context menu.

2017-12-11_13-37-53.png

Invoking this action prompts the administrator which type of reindex should be performed.  As you can see below there is no mention of the document content.  Submitting the reindex requests result in an event being queued for each of the options selected.

2017-12-11_13-38-39.png

No changes to the Elasticsearch index for this record.  However, if I use the Administration ribbon and perform a manual re-index of just one record, as shown below, then the value in the custom property has been removed.

Custom Property Meta-data Removed from Elasticsearch

Custom Property Meta-data Removed from Elasticsearch

A quick peek at the index's mapping shows that the custom property still remains known to the index.  Without it we would lose sight of the fact that it remains on other records.  

Elasticsearch Head Information Window for CM Index

Elasticsearch Head Information Window for CM Index

Now I need to decide what I want to do with the rest of the dataset.  There are another ~6600 records which had data in this field.  I could reindex the entire lot via the administration ribbon, like I just did.  I could also write a script to adjust elasticsearch entirely outside the scope of CM.  Lastly, I could table the issue until the next upgrade (which for most organizations is every 2-4 years).