Analyzing Exchange Logs with Azure Log Analytics (Part 3)

by [Published on 8 Sept. 2016 / Last Updated on 8 Sept. 2016]

In this part of this article series, we will see if we can collect data from other sources such as Exchange Message Tracking Logs, and look at searching and analyzing the data we have been collecting.

If you would like to read the other parts in this article series please go to:

Data Sources

So far we have looked at some data sources, such as Windows Event Logs, Performance Logs and Internet Information Services (IIS) logs. Another data course is Custom Logs, which allow us to collect events from log files on both Windows and Linux computers other than the default ones. Once collected, we can parse each record in the log into individual fields using Custom Fields.

The log files to be collected must either have a single entry per line or use a timestamp matching one of the following formats at the start of each entry:

Image

Additionally, the log file must not allow circular updates where the file is overwritten with new entries.

When we first select Custom logs we are informed that it is still beta and that we need to manually enable it:

Image

To do this, we simply go to Preview Features and enable the Custom Logs feature:

Image

Once this is done, we can start adding our customs logs. Click Add+ to open the Custom Log Wizard:

Image

We start by uploading a sample of the custom log. In this case I am uploading a Message Tracking Log file:

Image

The wizard parses the sample file and displays the records that it identifies in this file for us to validate:

Image

Log Analytics will then use the delimiter that we specify to identify each record. New Line is the default delimiter and is used for log files that have a single entry per line. If the line starts with a date and time in one of the available formats, then we can specify a Timestamp delimiter which supports entries that span more than one line. However, Exchange Message Tracking Logs start with “2016-08-05T14:01:12.107Z” which is not yet a supported format, so we have to use New line:

Image

Two things to have in mind here:

  1. If a timestamp delimiter is used, then the TimeGenerated property of each record stored in OMS will be populated with the date/time specified for that entry in the log file. If a new line delimiter is used, then TimeGenerated is populated with the date and time that Log Analytics collected the entry;
  2. Log Analytics currently treats the date/time collected from a log using a timestamp delimiter as UTC. This will soon be changed to use the time zone on the agent.

We now need to define one or more paths on the agent where it can locate the custom log. We can either provide a specific path and name for the log file, or we can specify a path with a wildcard for the name:

Image

The following table provides examples of valid patterns to specify different log files:

Description

Path

All files in C:\Logs with .txt extension on Windows agent

C:\Logs\*.txt

All files in C:\Logs with a name starting with log and a .txt extension on Windows agent

C:\Logs\log*.txt

All files in /var/log/audit with .txt extension on Linux agent

/var/log/audit/*.txt

All files in /var/log/audit with a name starting with log and a .txt extension on Linux agent

/var/log/audit/log*.txt

Since Exchange generates multiple log files, sometimes more than one per day, we need to use a wildcard. We start by selecting Windows, type “C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\Logs\MessageTracking\*.log” as the path for our logs, and click the + plus button.

Finally, we provide a name and description for the log. The name we specify will be used for the log type (as we will see when we start analyzing our collected logs) and will always end with _CL to distinguish it as a custom log:

Image

We now have our first custom log created:

Image

It may take up to one hour for the initial data from a new custom log to appear in Log Analytics. Once it starts collecting data from the custom log, its records will be available with a Log Search.

The entire log entry will be stored in a single property called RawData, which doesn’t make it that easy to read or analyze:

Image

Still, we are able to track messages. For example, we can search for emails with a particular subject by looking for the subject as a string in the RawData field, such as by running the following query:

Image

Or checking for an EventID of DELIVER:

Image

However, if we are not familiar to the format of Message Tracking Logs, it is hard to determine who the sender is, who the recipients are, etc.

What we can do to improve this is parse this information into multiple properties that can be analyzed and searched separately by defining Custom Fields. To do this, we need to perform a search on our custom logs which we will cover in detail in the next part of this article series. So, we will cover custom fields now and searching logs later so we can finish exploring custom logs.

Once information from our Message Tracking Logs starts to get collected, all the information we need is in the RawData property as we have already seen. To parse it, we start by clicking the ... on the left of the RawData field, and then select Extract fields from ‘MessageTrackingLogs_CL’:

Image

The Field Extraction Wizard opens and the record we selected is displayed in the Main Example column. The custom field will be defined for those records with the same values in the properties that are selected. We highlight the text that we want to populate the custom field, provide a name for the field (the characters _CF will automatically be appended) and perform the initial extract:

Image

The Summary and Search Results sections display the results of the extract so we can inspect its accuracy. Summary displays the criteria used to identify records and a count for each of the data values identified. Search Results provides a detailed list of records matching the criteria:

Image

When we create a custom field, Log Analytics must understand which data to use to populate its value. It uses a technology from Microsoft Research called FlashExtract to quickly identify this data. Rather than requiring us to provide explicit instructions, Log Analytics learns about the data we want to extract from examples that we provide. As we just saw, with the TimeStamp field this works great and Log Analytics is able to extract it from all the log entries. Unfortunately, this does not happen to all other fields, most likely due to the nature of these logs where the information within the logs is not constant throughout all its entries. For example, if we try to parse the field EventID:

Image

It does not match all the types of EventIDs in the message tracking logs:

Image

This means we cannot accurately use the custom field Event_ID_CF to perform searches as it will only contain the EventIDs listed above.

Due to the nature of these logs, we will have to perform these manual mappings for a lot of fields, and yet it will not be 100% useful... However, this will not be the case with all types of logs, and Custom Logs and Custom Fields work great in other cases!

Let’s look at another example. In this particular environment, I have a script running every day that updates a CSV file with all databases’ size and number of users:

Image

Using Custom Logs we can start monitoring this log and import this data into Log Analytics. As we have seen before, I strongly recommend using one of the valid data and time formats otherwise all the entries will have a timestamp of when they were collected. This is fine if we are starting to collect new data from this moment on, but not useful if we have a large amount of data to import as all the entries will have the same timestamp...

As before, we create our Custom Log and point Log Analytics to its location on our server:

Image

After a while we can see information coming in:

Image

Notice the TimeGenerated vs the time the entry was logged. This is what we want to avoid whenever possible! So let’s change the date format in our log so that all entries are properly collected with the right TimeGenerated information:

Image

Next we create custom fields so we parse the information gathered to make it easier to search:

Image

Once this is done, we have all our fields properly populated:

Image

Now that we have all fields properly parsed, we can easily produce graphs to monitor and estimate the increase in database size or number of users for example!

Searching and Analyzing Logs

The log search feature of Log Analytics is such that I could spend a few articles just writing about all the great things it can do. It is that powerful! But let’s try to keep it short and sweet.

OMS Log Analytics Search is both available in the Azure Portal and in the Operations Management Suite website. They both allow us to perform the same level of searches, so let us continue with the OMS portal. To start with, we either click on the magnifying glass on the left hand side or use the dashboard’s Log Search button:

Image

Once in the search window, we are given a few examples to get us started:

Image

Let’s select the Computers with Available memory more than 2Gb to see what it returns:

Image

On the left hand side pane, we can choose if we want data from the past 6 hours, 1 day, 7 days or a custom range, and select which servers to analyze. On the right pane, at the top we have the search it performed:

Type=Perf ObjectName=Memory CounterName="Available MBytes" | measure avg(CounterValue) by Computer interval 30MINUTE | where AggregatedValue>2000”

This gives us an idea of how search queries work. In the middle we have a graphic representation of this data. Using the search query, we can easily update it to an interval of 5 minutes and return computers with over 1GB of RAM instead, which returns one additional computer:

Image

We can also click on Table to look at the same data in a table format instead of a chart:

Image

But let’s go back a bit. The first part of a search query (everything before any | vertical pipe) is a filter, it determines what subset of data to pull out from the OMS data store. The most basic filters we can use are keywords, such as “error”, “timeout”, or a computer name. These types of simple queries generally return diverse shapes of data within the same result set. This is because Log Analytics has different types of data in the system.

For example, the query for error in the following screenshot returned 24000 Event records (collected by Log Management), 2000 MessageTrackingLogs_CL records (generated by our custom log), and five ConfigurationChange records (captured by the Change Tracking). We can also see an increase on this type of events over time:

Image

These filters are not really object types/classes. Type is just a tag, or a property, or a string/name/category, that is attached to a piece of data. Some documents in the system are tagged as Type:ConfigurationAlert and some are tagged as Type:Perf, or Type:Event, and so on. Each search result, document, record, or entry displays all the raw properties and their values for each of those pieces of data, and we can use those field names to specify in the filter when we want to retrieve only the records where the field has that given value.

We can use either a colon (:) or an equal sign (=) after the field name and before the value. This means that Type:Event and Type=Event are the same, we can choose the style we prefer.

So, if the Type=Perf records have a field called “CounterName”, then we can write a query such as Type=Perf CounterName="% Processor Time". This will give us only the performance data where the performance counter name is “% Processor Time”:

Image

The problem here is that we are shown many graphs for the same computer. Why is that? This is happening because we are collecting this type of data from dozens of different instances:

Image

So, we can be more specific and use InstanceName=_Total in the query, or simply click on _Total on the left hand side and it is automatically added to our query:

Image

OK, we still have more than one graph for the same computer! Humm... This time the problem is that we are retrieving information from CounterName="% Processor Time" but there are 3 objects with this counter:

Image

We are just interested in the Processor one so we select it and, again, it gets added to our query:

Image

Now we finally have the information we want! We can expand and collapse each graph, look at the data in a list, table or graph format, and so on.

A simpler query to obtain the same data would simply be: “Type=Perf (ObjectName=Processor)”    :)

The CPU performance of this server is critical, so this is a very important graph to me. As such, I am going to save this query so I can quickly come back to it and even add it to a customized dashboard that we will later create. To do this, I filter the graph so it only shows data for this server and then click on the Save icon at the top of the window:

Image

I give it a name and a category, and click Save:

Image

Now, if we click on Favorites icon at the top:

Image

We get a list of pre-defined queries as well as the one we just saved:

Image

We can use this to quickly come back to our search and to place this graph in our dashboard. But first let us look at some more search examples.

We can also use range queries in our searches. This means we can provide the beginning and end range of values in a sequence. For example, if we want events from the Application event log where the EventID is greater than or equal to 1008 but not greater than 1009 (these are Exchange FastSearch events), then the following query would return them:

Type=Event EventLog="Application" EventID:[1008..1009]

Image

On the left we can see that 23,000 are warning events and another 23,000 information.

This search presents us with another method of visualizing data: minify. This method aggregates data in a way that we can see how many events happened for each object. For example, using minify we can see that the error “the indexing has stopped successfully for mailbox database” happened 690 for database MDB01:

Image

MEASURE is another useful command in Log Analytics searches. It allows us to apply statistical functions to our data and aggregate results grouped by a given field. There are multiple statistical functions that Measure supports, one of them being count(). Using this function we can, for example, see exactly how many error events there are per computer in the last 7 days by using the query:

Type=Event EventLevelName=error | Measure count() by Computer

Image

Or by EventID for example:

Image

By simply clicking on 10006 we can easily retrieve all these events and see what the problem is:

Image

In the following example, we select the CPU Total Time performance counter and average by Computer. We then narrow down the results to only the last 6 hours:

Type=Perf ObjectName:Processor InstanceName:_Total CounterName:"% Processor Time" TimeGenerated>NOW-6HOURS | Measure Avg(CounterValue) by Computer

Image

If we want to look at any of the Exchange logs we are capturing, such as Microsoft-Exchange-ManagedAvailability/Monitoring, we can quickly search them by using the following query for example:

Type=Event Source="Microsoft-Exchange-ManagedAvailability"

Image

If we are not sure of the event log name, we can type EventLog= and Log Analytics will display all the log names we are currently capturing. All we have to do is click on the one we want to analyze:

Image

Image

Once more, this is just the very basic of the Log Analytics Search feature. Trust me when I say we have not even scratched the surface! But this gives us a good start to search and analyze all the information being collected by Log Analytics.

Conclusion

In this part of this article series, we saw how to collect data from other sources such as Exchange Message Tracking Logs, and looked at searching and analyzing the data we have been collecting so far.

In the next and final part, we will look at dashboards, alerts and solutions.

If you would like to read the other parts in this article series please go to:

See Also


The Author — Nuno Mota

Nuno Mota avatar

Nuno is an Exchange MVP working as a Senior Microsoft Messaging Consultant for a UK IT Services Provider in London. He specializes in Exchange, Lync, Active Directory and PowerShell.