Demystifying distributed tracing in Elixir via Open-Telemetry, Zipkin, and Elastic Search

Today we are going to discuss a very important topic when we come to discuss microservices architecture, i.e. Distributed tracing across the system when we have so many small applications running and each application has one or many microservices. In this kind of architecture, there are applications talking to each other and passing data around.

So today’s agenda would be to

  • Setup an application and check the traces via Zipkin.
  • Set up multiple microservices.
  • Set up communication between them.
  • Show the traces of the applications in the UI.
  • Set up Zipkin and configure elastic search as the backend to persist the data

Let’s set up a simple application and there we will see the traces in the application and how to configure them.

mix new zipkin_random --sup

We just created a project with a supervisor just to understand the flow in a single application then we are going to create another project where we will do the distributed tracing, but for now, let’s set up the Open Telemetry in a single application.

just some simple steps and we are good to go, just make the following changes into the application.

after making the changes to your project you need to start the Zipkin server so that you can see the events going into the Zipkin via Zipkin-exporter which we configured into the config.exs

docker run -d -p 9411:9411 openzipkin/zipkin

Now let’s take our application for a spin and see are we getting some events or not :) I hope it should.

Zipkin random after calling the hello function

we have compiled our application and running it so we got the [true] then we have to see what happened to the Zipkin UI, so open the localhost:9411 as Zipkin is running via docker on 9411 port.

Zipkin UI

I hope you all can see this empty UI console, so to see our event we have to check in the corner about our event. Click on the 15 mins search bar or you can filter it accordingly to 5 min or something, its up to you.

the events in the last 5 mins

Wow! I can see the event in my Zipkin UI and you can see more details too about how much total time it took, i.e. around 101ms, cool isn’t it? 😃

If you click on this event it’ll show you more details about it.

Zipkin event inside

So this is how our 1 event looks, now we have come to step 1, let’s go further and create one more function in the zipkin_random.ex so that we can see the nested event.

Here we created another function and here we are passing the delay from the first function, let’s compile it again and see what happens now.

Again true came it means everything went well and when we open the ZipkinUI, we should be able to see the last event 😄.

second event Zipkin

Wow again things went well and I’m able to see the event and now we can see the total duration was ~303ms and 2 events are there as shown in ZIPKIN_RANDOM(2) the tag, let’s check it out further and see more details.


Here we can see that total 2 events are there and the second event is the child of the first event, the second event took ~200ms and the total duration was ~300ms.

Now the next doubt in your mind would be that what if we need more details about it? how to see those details? , Let’s explore one more API which is provided by the opentelemetry_api.

adding set_attributes

Here we have introduced a new API set_attributes which will help us see the attributes in a span. We recorded the delay attribute value in second_event_span , so we should be able to see it in the Zipkin UI.


Here when we click on second_event_span we will see under the tags that delay_data is having value 200.


Sweet, so we have the delay data and attributes in a single system, you can explore more about the OpenTelemetry APIs from the hex.

Now let’s go distributed and see how we can do distributed tracing. We will create 3 new projects and will do tracing between them.

Now we will create 3 new projects name alpha , beta & charlie and will set up the Zipkin and Open telemetry in all of them and we will call alpha, then internally alpha will call beta & beta will call charlie then we will see the whole flow in the Zipkin UI.

mix new alpha --sup
mix new beta --sup
mix new charlie --sup

3 projects will get generated now let’s set up the Zipkin and Open Telemetry in all of them as we did for our previous application.

Add the dependency of opentelemetry & zipkin_exporter

Configure it so that Zipkin shows the event having service names.

Register the events in application.ex for all three apps.

Now make the following changes in your alpha.ex, beta.ex, and charlie.ex

Here we are doing RPC call, you can do a normal HTTP call too if they are having HTTP endpoints, here in ERPC call I’m passing the parent context into the child context, similarly, you can also do with HTTP, you can pass the parent context in the header too.

Now we will start our apps in three different terminals giving them — sname and then we will see how they behave.

in terminal 1, navigate to folder alpha and run the following command

iex --sname alpha@localhost -S mix

in terminal 2, navigate to folder name beta

iex --sname beta@localhost -S mix

similarly in terminal 3 navigate to folder name charlie

iex --sname charlie@localhost -S mix

— sname is necessary when we are using the erpc call so that the apps can be referred by name while calling.

Now call the function Alpha.start_call in terminal 1 where alpha is running.

Nice, we have called the Alpha application now it’ll call the beta, then beta will call the charlie and return the response back to Alpha via Beta.

When you open Zipkin UI you should see something like this.

Nice!, we are seeing the distributed trace in the Zipkin where we had one request to alpha then internally alpha called the beta and beta called charlie and we got the response back, now if we click on the trace.

Wow, we have done a distributed trace, the idea is simple just we have to pass the parent context to the child when we talk about the distributed tracing, in a single application, there is no need to pass the parent context around, it’ll automatically take care of that, but when we talk about distributed tracing it is must that we pass the current context to the child so that Zipkin knows that this particular span is the child of some another span somewhere in the whole system.

Now let’s setup the Elastic search as a backend for the Open Zipkin, we can use any supported technology as a backend for Zipkin but here today we will talk about the Elastic Search with the Zipkin.

Setup Elastic Search & Kibana

docker pull

Then run the image

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node"

when you go to localhost:9200 you should see the response from the Elastic Search

"name" : "87f1ef7a7a08",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "_j9aUs8vR-KctqsVJ5u_Bw",
"version" : {
"number" : "7.9.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
"build_date" : "2020-09-23T00:45:33.626720Z",
"build_snapshot" : false,
"lucene_version" : "8.6.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
"tagline" : "You Know, for Search"

Something like this.

Let’s set up the Kibana too just follow these steps

$ curl -O$ curl | shasum -a 512 -c - 
tar -xzf kibana-7.9.2-linux-x86_64.tar.gz
$ cd kibana-7.9.2-linux-x86_64/

Then once you install you should be able to see the localhost:5601 the Kibana port.

Now one thing we have to configure from Zipkin to see the traces inside the Elastic Search is configuring the Elastic Search as a backend for Zipkin.

Just stop the running docker of Zipkin now and do the following stuff in your terminal.

$ curl -sSL | bash -s$ java -DSTORAGE_TYPE=elasticsearch -DES_HOSTS= -jar zipkin.jar

Zipkin has been configured with ES, here we are telling Zipkin that Elastic Search on port 9200 is your backend and put all the data for the persistent into the ES. Let’s push some events from the alpha application and then we will configure the index in kibana to see the data.

Let’s configure the index from app management, create index page

configure the column

After configuring when we go to the discover page in Elastic Search, we see the data 😍

Here I will finish this story and in the later stories maybe we can talk more about the metaprogramming way to trace this stuff and how we can integrate it via telemetry to make it simpler, I would love to hear from you if you want that.

Thank you!

I write code