Setup alerts that you cannot do on Grafana, Elastic or Datadog

In this article, we will see the various challenges in microservice architecture and we will then explore one major challenge that many developers and businesses face: how to track the events flowing across these services.

Tracking events are required for various business use cases such as audit trails, fraud detection, missing API callbacks, raising alarms on unusual patterns, and delays and failure in services.

https://hashnode.com/post/clfw52pbe000609mq6mctgx4r

Microservices Architecture:

If you're well-versed in microservices, you can directly jump to the next section from the table of contents above.

Microservices have been popularized since 2011 when it was introduced in an architecture workshop in Venice. Since then, most developers have adopted the microservices architectural style to develop their products.

Microservices is an approach that enables teams to work independently on different services, deploy changes more frequently, and scale services independently. The impact of microservices has been significant, enabling organizations to build and scale complex applications faster and with more agility.

Microservices offer several benefits for software development teams. The architecture provides greater agility by allowing faster development cycles and independent deployment of individual services, resulting in faster time-to-market.

Other benefits include better fault isolation, improved reliability, and uptime.

Challenges in Microservices

Microservices bring significant challenges such as distributed ownership of entities. As the number of microservices grows, it becomes difficult to track and understand the impact of decisions, leading to a lack of visibility and isolation in the decision-making process.
Identifying failures or issues across services can become challenging due to distributed nature of transactions and data flow. Also, failures in one service can have cascading effects on other services.
Managing data in a microservices architecture can become complex due to each microservice having its own data storage and retrieval requirements.
Integration testing becomes more complex.
Ensuring data privacy, having an audit trail, and implementing consistent security across multiple services can be challenging.

How to Implement Event Tracking Across Services?

Event tracking refers to the process of capturing and monitoring user interactions or events within an application, website, or system. Events are user actions or system-generated events such as clicks, taps, searches, logs, and more. For example, say we have an e-commerce platform where customers can place orders for products. When a customer places an order, the order is created in the Checkout Service, and the order details are stored in a database. The order is then processed by various other microservices, such as Inventory Service, Payment Service, and Delivery Service.

One important metric for the platform is the time taken for an order to be delivered. However, tracking this metric is difficult because the order passes through multiple services, and none of the services has context outside their own.

Below are the various ways to implement event tracking:

1. Adding Logs to ELK/Logging Tools

To facilitate issue detection across microservices, logs can be added in different services and sent to ELK or the logging system with a unique key to join events. However, this approach has cons such as complex querying in the logging system and non-trivial searching of events going into different indexes/streams.

2. Building an In-House System

Implementing event tracking in-house involves building a custom system to track and store events generated by the application. To achieve this we can embed code to capture events in business logic code and push the events to a database. This can also be combined with log files to create a custom report or visualization to identify anomalies or insights.

However, it can be complex and time-consuming to build and maintain such systems, so it is only recommended for applications with specific requirements that cannot be met by existing solutions. It may be more efficient to use a third-party event tracking tool instead such as Dr Droid.

3. Building ELT Pipelines

We can also use Extract-Load-Transform (ELT) techniques. This involves capturing data from multiple sources such as web analytics, mobile apps, and CRM systems, and then transforming and loading it into a centralized data warehouse for analysis. Inherently they don't provide any intelligence on top of it. We can use a BI tool and create complex queries on the data and extract value from it.

Let’s consider an example:

Figure 1: A simplified ETL pipeline

Companies like Uber have created entire teams to build this scalably internally.

To get insights on any metric like the order_delivered time, ELT pipeline can be created, with any OLAP database (in this case, Snowflake) as the sink. Kafka queues are being used to stream the data from the Checkout and Delivery services to the Snowflake database.

Checkout and Delivery services will both send Order fact data (created_at and delivered_at timestamps) to snowflake tables which can be monitored via a BI tool generating hourly reports of the trending average service level agreements (SLAs) at the platform level.

However, setting up and maintaining a mix of tools or ELT can be complex and require significant effort. It may also require specialized skills and resources to build intelligence from the captured data.

How Dr Droid Simplifies Event Tracking.

To simplify the process of tracking events in microservices and correlating them to metrics, tools like Dr Droid can be used. This eliminates the need for managing a shared infrastructure and cron jobs. Dr Droid allows us to send events from microservices and set up alerts for individual orders or aggregated delayed orders. By sending events such as order_created and order_delivered to Dr Droid, it becomes easier to track metrics and set up alerts.

Dr Droid Set up Walkthrough

We are using the API specs and showing how to interact with Dr Droid using the REST API. An event called order_created is created.

Figure 2: Generating order_created event

After creating the event order_created, you can notice in the dashboard:

Figure 3: order_created event on the dashboard

Following that, we created another event order_delivered:

Figure 4: order_delivered event generation from API specs

Figure 5: order_delivered event at the dashboard

We can then set up monitoring in the console to track the time taken for each order to be delivered.

Figure 6: Monitor dashboard

Figure 7: Selecting primary and secondary events

Configuring a notification to be triggered if there is a delay in the occurrence of the secondary event despite the primary event being initiated.

Figure 8: Setting up an alert

There are two options available for setting up notifications, and I am choosing to enable email notifications.

Figure 9: Notification options

Figure 10: Setting up email notification

Figure 11: Monitoring set up is done

We are now ready to perform a test. We will proceed by creating a new event called order_created.

Figure 12: A new event created order_created

The primary event, which is currently active, can be viewed in the console.

Figure 13: Monitoring dashboard

As we have not yet created the secondary event, order_delivered, an alert notification has been received.

Figure 14: Monitoring dashboard alerts

An email notification was received stating that the secondary event, order_delivered, has not been created within a timeframe of 10 seconds.

Figure 15: Email notification

Figure 16: Notification on the dashboard

In this way, we can simply post the events to the Dr Droid over REST API to capture them in a single place and make it easy to audit, monitor and trigger alerts as per the business requirements.

Summary

I hope you have learned various use cases of event tracking and why it is vital in the microservices architecture. We also quickly learned various ways to implement event tracking including using Dr Droid.

Do let us know through comments, how you use event tracking and try Dr Droid. Our team will be happy to learn about your use cases.

How to Track Events Across Multiple Services

Microservices Architecture:

Challenges in Microservices

How to Implement Event Tracking Across Services?

1. Adding Logs to ELK/Logging Tools

2. Building an In-House System

3. Building ELT Pipelines

How Dr Droid Simplifies Event Tracking.

Dr Droid Set up Walkthrough

Summary

Comments

Kenobi

API callback & webhooks monitoring

More from this blog

How DrDroid Builds and Maintains the Knowledge Layer That Powers an AI SRE Agent

How DrDroid’s MCP Server Puts Production Context Inside Claude Code and Any IDE

Context Engine: How DrDroid's AI Agent leverages the Continuously Improving Knowledge Graph

How DrDroid AI SRE Agent is specialised for Production Incidents & On-call Investigations

DrDroid: How AI SRE Helps Engineers who are on-call for production monitoring

Command Palette

Microservices Architecture:

Challenges in Microservices

How to Implement Event Tracking Across Services?

1. Adding Logs to ELK/Logging Tools

2. Building an In-House System

3. Building ELT Pipelines

How Dr Droid Simplifies Event Tracking.

Dr Droid Set up Walkthrough

Summary

Comments

Kenobi

API callback & webhooks monitoring

More from this blog