Lint Optimizations

Lint optimizations scan logs for common hygiene issues like sensitive (eg. api tokens) and redundant data (eg. timestamp appearing the log).

Optimization Triggers

Nimbus can automatically optimize logs when it detects the following situations:

  1. Logs with timestamp appearing in message body

  2. Common kind of secrets (AWS tokens, github and gitlab, etc)

Example

Take the following log

message: '2024/01/23 01:33:122 {"method": "process_checkout", "retry_count": 3}'
timestamp: 2024/01/23 01:33:126
service: checkout
...

There are two issues:

  • the timestamp is emitted with the json log and prevents datadog from properly parsing the log as json

  • datadog adds its own timestamp at the time of ingestion (when the log was processed by datadog) which is not the same as the time of emission (when the log was originally emitted)

Nimbus can now recognize this class of issues and apply a lint optimization to fix it. In this case, Nimbus would come up with the following optimization

process_when:
- key: message
  op: EQUAL
  value: 'checkout'
- key: message
  op: MATCH
  value: '^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} .+'
vrl: |
  groups = parse_regex!(.message, r'^(?<time>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) (?<data>.+)') 
  .message = groups.data
  .timestamp = parse_timestamp!(groups.time + "+00:00", format:"%Y/%m/%d %H:%M:%S%:z")

The log post lint optimization would look like the following

method: "process_checkout"
retry_count: 3
timestamp: 2024/01/23 01:33:122
service: checkout
...

This applies the correct timestamp and lets datadog properly parse the json log as a structured log. This also makes it possible to do queries like @retry_count > 0 which previously would not have been possible over the string based log data

Interaction with existing Optimizations

In rare cases, lint optimizations can interfere with existing reduce optimizations.

For example, if a current reduce optimization relies on a timestamp to be present in the log body and the lint optimization pulls it out as a log attribute, it means that those logs will no longer be aggregated.

For example, say you have the following log.

message: 2024/01/23 01:33:12 foo did bar
...

You also have the current reduce optimization

process_when:
- key: message
  op: MATCH
  value: '^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} foo.+'
...

You might get a lint optimization that pulls out the current timestamp into a separate attribute

message: foo did bar
timestamp: 2024/01/23 01:33:12

This means that your previous reduce optimization would no longer work because the it was using the date as an activation filter.

Today, you can either manual adjust the process_when clause and change the predicate to fix it yourself or wait for Nimbus to re-analyze your logs and provide updated recommendations.

Last updated