Introduction
The last few months I have been developing a microservice architecture for a personal project of mine and everything seemed to work well except one thing. I was having trouble debugging it, and this was due to the limited options provided by AWS Cloudwatch. As such, I decided to look for something else, which could potentially provide a more robust solution and a better interface for me to interact with. After I spent two weeks researching and reading articles, I decided to use Grafana, Loki and AWS Firelens and now, the setup looks much more versatile and robust compared to what I used to have a couple weeks ago.
Nonetheless, the reason for this article is not to express my struggles and difficulties on the project, rather than to share knowledge around centralised logging and how it can benefit a microservice architecture when it is set up correctly. In addition, tools like Grafana and Loki make your experience with microservices much smoother, especially on the monitoring aspect, where most of the microservices' pain comes from. So, without further ado, let's start.
What is Logging?
A service accepts requests and responds to them. The request-response cycle can be thought of as an event and developers want to record it. However, an error can also be thought of as an event and therefore this should be recorded as well. As you may notice, all these changes alter the state of a service and to fully understand it, we need to capture them. These events can make us understand the cause of a failure, as well as to indicate the series of events that have led to this result. To capture those events, developers use various logging tools, which often write those events in the filesystem. While this procedure may be enough for a monolithic application, microservices have a few more requirements. If you are not familiar with the concept of microservices, I have written an article about it, which is available here.
Monolithic applications run on a single process and all logic is baked in it. Therefore, any logging will most likely include all the information you need and somehow is centralised within the application. In contrast, microservices include several applications and information flows within multiple services. Trying to understand the cause of a problem is much harder since you need to combine logs from multiple services and this is confusing, especially if you want to fix a bug in production that has been rolled out.
What is Centralised Logging?
Centralised logging is a design pattern where applications gather their logs in a central location and administrator/developers can access and monitor the system's behavior.
While the core idea is to have a central location where logs are sent, recent logging tools are combined with visualisation and processing tools to further analyse logging information. A few of those tools are Grafana, Loki and Promtail, so let's go through them.
Loki, Promtail and Grafana
The stack I have found most frequently used is Loki
, Promtail
and Grafana
and each one operates on a specific task, which we will explain individually.
Loki
a horizontally-scalable, higly-available, multi-tenant log aggregration system inspired by Prometheus. It is designed to be very cost effective and easy to operate
Essentially, Loki
exposes an API with several endpoints that either push or query log information. Our goal is to push log data from our services to those endpoints, and Loki
will take care storing and processing all the queries for us.
In addition, Loki
comes with its own query language called LogQL
(Log Query Language). The language is inspired by PromQL
and is considered a distributed grep
that aggregates log sources. If you want to learn more about LogQL
, check out here.
Promtail
Promtail is a client that gathers logs and sends them to Loki
API. A basic promtail configuration includes the target Loki
endpoint, as well as the location of the files that should search for logs.
Grafana
According their official website, Grafana is:
an open source visualisation and analytics software. It allows you to query, visualise, alert on, and explore your metrics no matter where they are stored in.
While Grafana
is more focused on time-series data and metrics, it can be combined with Loki
to visualise log data, perform ad-hoc queries to identify trends, as well as to create graphs and visualisations based on the log data. Grafana
is the recommended tool combined with Loki
and Promtail
.
Experimenting with Grafana, Loki and Promtail
The rest of this article will demonstrate an implementation of centralised logging for a small microservice. The code is available in this repository.
The setup includes the following services :
Loki: an api using the official
grafana/loki
imageGrafana: a web-based application using the official ubuntu
grafana/grafana
imageApp: a golang-based API with a single endpoint. Clients can interact with the endpoints on
/hello
and information related to the request will be logged by the service. The logs will be available inapp.log
file and promtail will be sending those logs toLoki
API.
The following docker-compose
file demonstrates what I briefly explained:
version: '3.7'
services:
grafana:
container_name: ${PROJECT:-centralised-logging-and-visualisations-}grafana
image: grafana/grafana:latest-ubuntu
ports:§
- '3000:3000'
restart: on-failure
depends_on:
- loki
loki:
container_name: ${PROJECT:-centralised-logging-and-visualisations-}loki
image: grafana/loki:latest
ports:
- '3100:3100'
volumes:
- type: bind
source: $PWD/loki-config.yaml
target: /etc/loki/local-config.yaml
restart: on-failure
app:
container_name: ${PROJECT:-centralised-logging-and-visualisations-}app
image: blog.mariossimou.dev/centralised-logging-and-visualisations-app:latest
build:
context: ./services/app
dockerfile: ./deployments/app.dockerfile
privileged: true
ports:
- '8080:8080'
restart: on-failure
depends_on:
- loki
- grafana
Running docker-compose up
, it will spin up the grafana
, loki
and app
containers. Grafana will be available in the browser on localhost:3000
and will ask you to login. The default username and password is admin
. As soon as you finish with the authentication process, you need to register Loki
as a data resource for Grafana. Navigate on Configuration > Data Sources
and select add data source. Fill the details as shown below:
As soon as you finish filling the details, make sure that you test the connection, and then select the Explore
option. This will load the explore view of Grafana, where you can experiment running queries and viewing logs. The LokQL
query we will be running is {service="app"}
and this essentially shows the logs related to the service app
. The view should look like this:
Let's now test that the http://localhost:8080/hello
logs some data to the dashboard. To accomplish this, run the following command in the terminal, which essentially hits the endpoint every second for 100 times.
for i in {1..100}; do curl -X GET localhost:8080/hello; sleep 1; done
If you rerun the query, the updated view of the dashboard should look like this:
Summary
That's all for this tutorial, but before we go, let's recap a few things:
Centralised logging is a great way to organise logs in a microservice architecture, making debugging and monitoring much easier.
Loki
,Promtail
andGrafana
are tools that can help with centralised logging. Grafana can be used to display logs as well as to create graphs and visualisations that show log-related trends.