Additional/Custom Properties in Elasticsearch
When the content index is initially created it will contain a field mapping for many of the stock properties within Content Manager.
When you create a custom property you indicate whether the field should be present in the content index. It is checked by default.
Once you click Next and then click finish, the custom property is available within Content Manager. However, it is not yet created in Elasticsearch. It will be added to the index mapping the first time a document has a value in this property. Once a value has been indexed, the field is listed in the index mapping.
As I add additional meta-data fields for each record, more meta-data is added to the Elasticsearch index. Below is an example record after many fields have been added but there's no content in the electronic file.
The document in the example was one I downloaded from NARA and saved into CM. Before they released it they scrubbed all meta-data from the file. Since KeyView (the product used to extract text) found no text, nothing could be added to the content field for CM. I corrected this by installing the Tesseract OCR Plugins and generating a new OCR rendition.
If I now delete a property from Content Manager entirely, it gives me a few warnings about data but nothing about the content index.
Since I deleted the agency field, I wanted to test and see what is now contained within the index. When I search on "KISS/SCOW" (without quotes), the value from the previous example, I get an error message. It explains that the slash in the value doesn't parse to a valid query.
If I surround the value with quotes then it parses correctly and shows me a result.
When I check the record via Kibana I can see that the KISS/SCOW string exists in both the document content and the record meta-data...
There are several options within the client with regarding to reindexing. The first is an option off the administrative record context menu.
Invoking this action prompts the administrator which type of reindex should be performed. As you can see below there is no mention of the document content. Submitting the reindex requests result in an event being queued for each of the options selected.
No changes to the Elasticsearch index for this record. However, if I use the Administration ribbon and perform a manual re-index of just one record, as shown below, then the value in the custom property has been removed.
A quick peek at the index's mapping shows that the custom property still remains known to the index. Without it we would lose sight of the fact that it remains on other records.
Now I need to decide what I want to do with the rest of the dataset. There are another ~6600 records which had data in this field. I could reindex the entire lot via the administration ribbon, like I just did. I could also write a script to adjust elasticsearch entirely outside the scope of CM. Lastly, I could table the issue until the next upgrade (which for most organizations is every 2-4 years).