What to log and in what form
Logs have been incredibly helpful to me, allowing quickly investigate and resolve issues. In some situations, the lack of logs made it impossible to identify the root cause of the problem. In this article, I will share my vision of what events and information should be included in logs to make them truly valuable for troubleshooting and debugging.
I wrote this article as a reference for configuring new services, and it might be updated it in the future as needed.
In what form should logs be written?
Choosing the right logging format is the first step in setting up logging for your application. There are two widespread opposite approaches for log formatting: unstructured and structured logging.
Unstructured logs have no fixed format and are often just a stream of text with no defined structure. Despite the name, in my practice, I haven’t seen fully unstructured logs. It’s nice to read such logs in the command line while debugging locally, but I can’t find any additional advantages beyond that.
Problems begin with metadata, properties that are relevant only for a subset of log messages. It can be response status code, exception message or information about source of incoming request. The only place to include this information is within the log message itself. As a result, unstructured logs can be more challenging to analyze, since they require manual parsing and filtering to extract the relevant information. Try to parse MongoDB response time with bash or log aggregator like fluentd.
2023-01-17T15:02:49.9512Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Incoming GET request /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 from caller Gateway started
2023-01-17T15:02:49.9689Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query started
2023-01-17T15:02:49.9741Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query finished in 5 ms
2023-01-17T15:02:49.9743Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query started
2023-01-17T15:02:49.9794Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Mongodb query finished in 5 ms
2023-01-17T15:02:49.9802Z [INF] be694e8d-5c92-431f-8c63-bd15e4c61ee8 Incoming GET request /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 from caller Gateway finished in 5 ms
In contrast, structured logs use json/xml/other format with well-defined schema that adds some complexity and overhead such as:
- depending on you stack, you might need to do additional work to produce structured logs
- consume more disk space than unstructured logs due to the additional metadata and formatting they include
- can be less human-readable than unstructured logs, which can make it harder for developers to quickly troubleshoot issues
However, the benefits of structured logging often outweigh disadvantages:
- structured logs can include metadata which provide more context about each log entry. This make it easier to understand the sequence of events leading up to errors or can be usefully for building advanced analytics.
- popular log collection tools such as fluentd, logstash, etc. can parse structured logs and extract relevant information from them. This makes it easier to integrate with monitoring tools such as Grafana, Kibana, etc.
{"time":"2023-01-17T15:02:49.9512Z","level":"INF","message":"GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request started","callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"incoming:http"}
{"time":"2023-01-17T15:02:49.9689Z","level":"INF","message":"Mongodb query started","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9741Z","level":"INF","message":"Mongodb query finished","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9743Z","level":"INF","message":"Mongodb query started","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9794Z","level":"INF","message":"Mongodb query finished","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
{"time":"2023-01-17T15:02:49.9802Z","level":"INF","message":"GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request finished","callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":28,"statusCode":200,"type":"incoming:http"}
In my case, I’ve chosen something in between. The log entries begin with the timestamp log_level message
format, followed by a JSON object at the end.
While this format may be more difficult to parse and analyze than pure structured logging, it provides a more human-readable format than raw JSON.
2023-01-17T15:02:49.9512Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request started {"callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"incoming:http"}
2023-01-17T15:02:49.9689Z [INF] Mongodb query started {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
2023-01-17T15:02:49.9741Z [INF] Mongodb query finished {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
2023-01-17T15:02:49.9743Z [INF] Mongodb query started {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","type":"outgoing:mongodb"}
2023-01-17T15:02:49.9794Z [INF] Mongodb query finished {"correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":5,"type":"outgoing:mongodb"}
2023-01-17T15:02:49.9802Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b?take=20 request finished {"callerId":"Gateway","correlationId":"be694e8d-5c92-431f-8c63-bd15e4c61ee8","elapsedMilliseconds":28,"statusCode":200,"type":"incoming:http"}
future considerations: This format is not ideal for log aggregation tools since json in the middle of the message can lead to parsing errors. In the future, I might consider using a structured format.
What events should be logged?
Now that we’ve covered logging formats, let’s dive into what information should be logged.
While logs can be incredibly helpful for troubleshooting and debugging, not all log entries are equally valuable. Including too many details can be distracting and overwhelming, while too little information can cause you to miss critical details. Striking the right balance is essential for creating truly effective logs.
I’ve made the short list of minimum required events that I want to be presented.
- Start and finish events for anything that triggers actions within your system from the outside, such as HTTP, gRPC, AMPQ, cron, and other protocols.
- Start and finish events for anything that goes outside current process, such as HTTP, database queries, cache operations, and messaging queues.
- Any unexpected or abnormal events, such as failed external calls, crashes, and other errors. Be careful, this type of error should be logged only once. It’s best to log these types of events only once, at the appropriate level of the application.
What information should be presented inside log entry?
There are four essential parts that should be included:
- log timestamp in ISO8061 format. To avoid any misunderstandings timezone should be specified explicitly e.g. 2023-01-17T15:02:49.9512Z
- log level, which should be one of the following
- ERROR - for unexpected errors, such as network errors and crashes, that require investigation from a developer. For example, a null reference exception is considered an error, but an incoming HTTP request failing to pass validation (BadRequest) is not.
- INFO - for everything from “What events should be logged?” section, item 1 and 2.
- DEBUG - anything that is not important in daily activities but, can be used for more detailed analysis. By default, debug log entries should be ignored to avoid unnecessary noise.
- A human-readable message that describes the event being logged. This message should be short and concise.
- JSON object that includes all other relevant data.
NEVER log sensitive information, such as passwords, credit card numbers etc.
I consciously avoid WARN level usage as far as it creates extra noise. For me, It’s not understandable is there any action required for this level. Do I need to start some investigation the same as for ERROR, or they can be ignored safely?
And finally, there are some helpful fields I prefer to have in the metadata
correlationId
- Random uniq string that connects all logs related to single action. It should be part of all log messages and should be provided by the caller, for example, through the X-Request-ID header for HTTP requests or X-Correlation-ID for AMPQ messages.elapsedMilliseconds
- At the end of incoming/outgoing requests, this field shows how long the action took to complete.statusCode
- At the end of incoming/outgoing requests shows response status code.callerId
- In the incoming requests, name of the caller. Should be provided by caller e.g. via headers for http request.
Final example
2022-10-13T08:01:02Z [INF] GET /5e35fa4e-6cde-1111-8178-e7afa036256b request started {"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"incoming:http"}
2022-10-13T08:01:02Z [INF] Mongodb query started {"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"outgoing:mongodb"}
2022-10-13T08:01:02Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","type":"outgoing:mongodb"}
2022-10-13T08:01:02Z [INF] GET /5e35fa4e-6cde-1111-8178-e7afa036256b request finished {"elapsedMilliseconds":55,"correlationId":"f3631da9-0c19-43aa-9bc9-978e15e7c682","statusCode":200,"type":"incoming:http"}
2022-10-13T08:01:04Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b request started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"incoming:http"}
2022-10-13T08:01:04Z [INF] Mongodb query started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query started {"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] Mongodb query finished {"elapsedMilliseconds":55,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","type":"outgoing:mongodb"}
2022-10-13T08:01:04Z [INF] GET /5e35fa4e-6cde-4be8-8178-e7afa036256b request finished {"elapsedMilliseconds":112,"correlationId":"d89d5278-a882-4e65-a859-09b14caada28","statusCode":200,"type":"incoming:http"}
Search for correlationId
with ERROR level: cat log.txt | grep ERR | awk '{print $NF}' | jq ".correlationId"
Search all logs for specific correlationId
: cat log.txt | grep d89d5278-a882-4e65-a859-09b14caada28
Print mongodb response time: cat log.txt | grep -E "outgoing:mongodb" | grep -E "elapsedMilliseconds" | awk '{print $NF}' | jq ".elapsedMilliseconds"
Print service response time: cat log.txt | grep -E "incoming:http" | grep "elapsedMilliseconds" | awk '{print $NF}' | jq ".elapsedMilliseconds"