Monitoring CM with ELK

In this post I'll show how to use the ELK components to make an effective, free way to monitor any size Content Manager implementation.  At the end of this post we'll end up with a rudamentary dashboard like shown below.  This dashboard highlights errors as reported in the windows logs (which includes CM).

Kibana dashboard using winlogbeats and filebeats as a source

Kibana dashboard using winlogbeats and filebeats as a source

To make all this magic work I'll need to place beats in various locations.  On all of my servers I'll place both a filelogbeat and a winlogbeat.  I'll use winlogbeats to monitor windows instrumention, which will include content manager entries in system logs.  I'll also use filelogbeats to monitor the content manager specific log files and the audit logs.

To start I create one server with both installed and configured.  I'll zip this entire structure and copy it to each server in my environment.  Then I'll register the two beats without modifying the configuration.

2017-12-13_21-25-55.png

For now I'm going to focus on the winlogbeats.  I modified the configuration file so that it includes a tag I can use as a filter in saved searches, as well as information about my kibana and elasticsearch hosts.

You'd use proper server host addresses

You'd use proper server host addresses

With this saved on 3 of my servers I switch over to kibana.  Here I'll start configuring my dashboards.  The dashboards will be composed of several visuals.  Those visuals will be populated with content from elasticsearch, based on the query provided.  So to start I create several saved searches.

Here's a saved search to give me just errors from any of the servers.  I'll name it "Windows Log Error".

2017-12-13_21-47-34.png

Here's a saved search I'll name "CM Windows Log Errors".  These will come from winlogbeats on all CM servers I've installed the winlogbeats agent (but not several others which are outside of CM).

2017-12-13_21-43-25.png

Next I'll create one visual that tells me how many errors I've got overall (the first saved search).

2017-12-14_9-37-03.png

I then pick my saved search...

2017-12-13_21-57-44.png

Then configure it to split out by computer name.

2017-12-13_21-58-35.png

Then I click save and name it error counts.  Next I create a data table for the error sources, as shown below.

2017-12-13_22-01-44.png

Then a pie chart based on all of the audit events....

2017-12-13_22-02-32.png

Next I created a new dashboard and placed the three visuals onto it.  Then I added one saved search, the CM specific windows error logs.  This is the same as a data table, but thought I'd show it.

2017-12-13_22-04-12.png

Last step is to save the dashboard.  By checking the box the user can change the dashboard content by tweaking the timeline (visible on almost all pages of kibana).

2017-12-13_21-42-01.png

Now I can access this dashboard from a list of them.  Keep in mind that you can tag your development servers differently from production, so that might be a different dashboard.  You can also share them.

2017-12-13_22-05-17.png

That gives me this final dashboard shown below.  Note in the image the top-right corner where I've changed the time frame to the previous 1 year.  Then note how I added a filter to exclude one server in particular.  It's amazing what can be built.

2017-12-13_22-12-38.png