Creating Datadog Alerts Based on Log Metrics

A tutorial on setting up Datadog monitoring processes

Octobot
6 min readJun 9, 2022

As a cloud dev at Octobot, I am dedicated to providing support to the different development teams when they need it. This article is born from one of these experiences, when I joined a project with VES-Artex and started using the monitoring tool Datadog.

Why Datadog

The platform we developed and continue to work on for VES-Artex is the DairyBOS app, used to manage environmental conditions at dairy barns, always with a strong focus on animal welfare. DairyBOS also provides reports that help dairy farmers make better, more insightful decisions. You can learn more about this project here.

In this project, we used Kibana on Elasticsearch for monitoring the system health and logs. Sometime ago, as part of a bigger consolidation effort over the client side, we started migrating our log collection and metrics to Datadog. In there, all operations could be centrally overseen and monitored.

We came across a couple of problems that required extracting metrics from log messages, or extracting metrics from log analysis. In the following paragraphs, I’ll share how we overcame these challenges in order to successfully implement Datadog.

Datadog alerts based on log messages

Let’s get started by identifying which messages are relevant to our problem. In our case, the filtering was done like this:

And here is an example of one of those logs:

In my case, I wanted to extract the number of ms that this response sends, expressed with the number 81 at the end of the line. This way, I can send an alert every time it exceeds a given threshold. Let’s go with 5 seconds for this example.

Pipelines and processors:

Once you have the log query go to Logs > Configuration. Then, you are going to create a new pipeline, in the filter box you are going to paste the same query you did in the last step and set a name to it. You can add tags or a description as desired.

Now, under your new pipeline you are going to create a new processor. If you want to parse logs based on patterns you should choose a Grok Parser type processor. Grok Parser processors have the ability to print 3 logs examples and 3 patterns respectively, also called rules. I recommend editing those rules so with one or two you can match every log you want to consider.

For example:

Datadog alerts based on log messages

Let’s get started by identifying which messages are relevant to our problem. In our case, the filtering was done like this:

And here is an example of one of those logs:

In my case, I wanted to extract the number of ms that this response sends, expressed with the number 81 at the end of the line. This way, I can send an alert every time it exceeds a given threshold. Let’s go with 5 seconds for this example.

Pipelines and processors:

Once you have the log query go to Logs > Configuration. Then, you are going to create a new pipeline, in the filter box you are going to paste the same query you did in the last step and set a name to it. You can add tags or a description as desired.

Now, under your new pipeline you are going to create a new processor. If you want to parse logs based on patterns you should choose a Grok Parser type processor. Grok Parser processors have the ability to print 3 logs examples and 3 patterns respectively, also called rules. I recommend editing those rules so with one or two you can match every log you want to consider.

For example:

**At the top you can see 3 log samples that exactly match my rule.

It’s important to define the name of the attributes in your rule because you are going to use them in the metrics section later on. The right syntax to define an attribute is {matcher:name}, as you can see in the rules box. You can see how the attributes defined in your rule would look under the rules section.

From now on, every new log that matches your rule is going to bring every attribute defined in it.

Note: This rule is not going to apply to older logs, only the new ones.

The final step before going into the metrics, is to create a measure based on your attribute, to do it simply click on the wheel next to your attribute and select Create Measure.

Metrics

You are almost there! Once your measure is created, go to Logs > Configuration > Generate Metrics. Create a new one and in the query field use the same as before, but add your created measure. So now you can filter the same log query, but your result is going to be your attribute.

Choose a name for your metric and view it in Metric Explorer so it syncs with the other metrics.

Monitor

Now you are ready to create your alert. Go to Monitors > New Monitor, you can choose by Metrics if you are only interested in the result of your measure, or by Logs if you also want to add a sample of logs in your alert. In my case I chose the second option.

As you could see, by only selecting your measure the query automatically brings the result without writing it above, but I suggest you write it for reference. You could also group your results with avg by option if you want to register multiple alerts depending on your needs.

In my case, I group them by http.url (URL attribute defined in my processor rule) and now I’m able to set a multi-alert, which will trigger a separate alert for each URL.

You can also trigger a simple alert for any URL if you prefer. After setting your trigger conditions and notification settings, you are good to go. Here is an example of how the alerts look when it triggers on my slack channel:

Keep learning

This is how we solved this necessity presented by our client; I hope you found it helpful. As we adapt to changes and look for innovative solutions to the situations we face in our projects, we have the opportunity to learn new technologies and tools, and also expand our knowledge as a team.

If you want to continue learning about this tool, I recommend these links:

Originally published at octobot.io on May 13, 2022.

--

--

Octobot

Software consultancy passionate about transforming ideas into products that people love 🧑‍💻👩‍💻 More info: https://bit.ly/octobotweb