boberneprotiv.com

What to log and in what form

Logs have been incredibly helpful to me, allowing quickly investigate and resolve issues. In some situations, the lack of logs made it impossible to identify the root cause of the problem. In this article, I will share my vision of what events and information should be included in logs to make them truly valuable for troubleshooting and debugging.

I wrote this article as a reference for configuring new services, and it might be updated it in the future as needed.

In what form should logs be written?

Choosing the right logging format is the first step in setting up logging for your application. There are two widespread opposite approaches for log formatting: unstructured and structured logging.

Unstructured logs have no fixed format and are often just a stream of text with no defined structure. Despite the name, in my practice, I haven’t seen fully unstructured logs. It’s nice to read such logs in the command line while debugging locally, but I can’t find any additional advantages beyond that.

Problems begin with metadata, properties that are relevant only for a subset of log messages. It can be response status code, exception message or information about source of incoming request. The only place to include this information is within the log message itself. As a result, unstructured logs can be more challenging to analyze, since they require manual parsing and filtering to extract the relevant information. Try to parse MongoDB response time with bash or log aggregator like fluentd.

2023-01-17T15:02:49.9512Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Incoming GET request /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 from caller Gateway started
2023-01-17T15:02:49.9689Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query started
2023-01-17T15:02:49.9741Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query finished in 5 ms
2023-01-17T15:02:49.9743Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query started
2023-01-17T15:02:49.9794Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query finished in 5 ms
2023-01-17T15:02:49.9802Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Incoming GET request /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 from caller Gateway finished in 5 ms

In contrast, structured logs use json/xml/other format with well-defined schema that adds some complexity and overhead such as:

However, the benefits of structured logging often outweigh disadvantages:

{"time":"2023-01-17T15:02:49.9512Z","level":"INF","message":"GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request started","callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"incoming:http"}
{"time":"2023-01-17T15:02:49.9689Z","level":"INF","message":"Mongodb query started","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9741Z","level":"INF","message":"Mongodb query finished","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9743Z","level":"INF","message":"Mongodb query started","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9794Z","level":"INF","message":"Mongodb query finished","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9802Z","level":"INF","message":"GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request finished","callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":28,"statusCode":200,"type":"incoming:http"}

In my case, I’ve chosen something in between. The log entries begin with the timestamp log_level message format, followed by a JSON object at the end. While this format may be more difficult to parse and analyze than pure structured logging, it provides a more human-readable format than raw JSON.

2023-01-17T15:02:49.9512Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request started {"callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"incoming:http"}
2023-01-17T15:02:49.9689Z [INF] Mongodb query started {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
2023-01-17T15:02:49.9741Z [INF] Mongodb query finished {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
2023-01-17T15:02:49.9743Z [INF] Mongodb query started {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
2023-01-17T15:02:49.9794Z [INF] Mongodb query finished {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
2023-01-17T15:02:49.9802Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request finished {"callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":28,"statusCode":200,"type":"incoming:http"}

future considerations: This format is not ideal for log aggregation tools since json in the middle of the message can lead to parsing errors. In the future, I might consider using a structured format.

What events should be logged?

Now that we’ve covered logging formats, let’s dive into what information should be logged.

While logs can be incredibly helpful for troubleshooting and debugging, not all log entries are equally valuable. Including too many details can be distracting and overwhelming, while too little information can cause you to miss critical details. Striking the right balance is essential for creating truly effective logs.

I’ve made the short list of minimum required events that I want to be presented.

  1. Start and finish events for anything that triggers actions within your system from the outside, such as HTTP, gRPC, AMPQ, cron, and other protocols.
  2. Start and finish events for anything that goes outside current process, such as HTTP, database queries, cache operations, and messaging queues.
  3. Any unexpected or abnormal events, such as failed external calls, crashes, and other errors. Be careful, this type of error should be logged only once. It’s best to log these types of events only once, at the appropriate level of the application.

What information should be presented inside log entry?

There are four essential parts that should be included:

NEVER log sensitive information, such as passwords, credit card numbers etc.

I consciously avoid WARN level usage as far as it creates extra noise. For me, It’s not understandable is there any action required for this level. Do I need to start some investigation the same as for ERROR, or they can be ignored safely?

And finally, there are some helpful fields I prefer to have in the metadata

Final example

2022-10-13T08:01:02Z [INF] GET /5e35fa4e-6cde-1111-8178-e7afa036256b request started {"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"incoming:http"}
2022-10-13T08:01:02Z [INF] Mongodb query started {"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"outgoing:mongodb"}
2022-10-13T08:01:02Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"outgoing:mongodb"}
2022-10-13T08:01:02Z [INF] GET /5e35fa4e-6cde-1111-8178-e7afa036256b request finished {"elapsedMilliseconds":55,"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","statusCode":200,"type":"incoming:http"}
2022-10-13T08:01:04Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b request started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"incoming:http"}
2022-10-13T08:01:04Z [INF] Mongodb query started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b request finished {"elapsedMilliseconds":112,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","statusCode":200,"type":"incoming:http"}

Search for correlationId with ERROR level: cat log.txt | grep ERR | awk '{print $NF}' | jq ".correlationId"

Search all logs for specific correlationId: cat log.txt | grep d89d5278-a882-4e65-a859-09b14caada28

Print mongodb response time: cat log.txt | grep -E "outgoing:mongodb" | grep -E "elapsedMilliseconds" | awk '{print $NF}' | jq ".elapsedMilliseconds"

Print service response time: cat log.txt | grep -E "incoming:http" | grep "elapsedMilliseconds" | awk '{print $NF}' | jq ".elapsedMilliseconds"