How to monitor asynchronous 3rd party integrations proactively?

How to monitor asynchronous 3rd party integrations proactively?

If you'd like to try Dr Droid, click here.

In today's landscape, third-party integrations have become a crucial part of software development. These integrations offer helpful features, new capabilities, and a mashup of services to create useful products. For instance, using SaaS products like Twilio or Stripe saves significant time and resources, and is also non-trivial to be built in-house from scratch. (If you know any tech startup that built these in-house, let me know 😬)

Leading startups today have upwards of 50+ business critical integrations and an average of 350+ overall integrations.

Microservices rely on external API calls to connect to these services - most of which are asynchronous or moving towards being async. If an external API call is in the flow of a critical business journey or customer experience, it needs proactive monitoring.

To address these issues, businesses need to implement effective third-party integration monitoring strategies. In this article, we will focus on how third-party integration monitoring can help businesses ensure the reliability and availability of services. Furthermore, we will discuss how Dr Droid enables you to monitor the functioning of these services and take necessary steps in case of any potential problems.

To understand with an example, we are discussing a food delivery application use case that you probably use every day.

Use case: Food Delivery Application

Food delivery services often rely on timely and efficient communication between drivers, dispatchers, and customers. To ensure that deliveries are made on time, it's important to have a reliable and efficient system in place. This is where third-party integrations like IVR can be useful. IVR is an automated interactive voice response system, you often see deployed in call centres.

For instance, imagine a customer who orders a pizza for delivery and waits for it to be delivered piping hot. Timely delivery is important for a good customer experience. Everything is going fine but let’s say the delivery agent is stuck and unresponsive for any reason. This event would risk the timely pizza delivery and the customer will have a poor experience.

A Sample Third-Party Integration

For the above scenario, a simple flow for this integration could include steps such as detecting the anomaly (i.e., driver not moving from their location), triggering an automated IVR call, prompting the driver to input their reason for the delay, suggesting appropriate actions based on the driver's input, and logging the outcome of the call for future reference.

Figure 1: Ideal scenario

This is the ideal scenario where the call is initiated and the response is received and required actions are taken but what if no response is received? This will result in delays in taking action and a bad customer experience.

For this, we need monitoring where we can take action even if the response is not received and the entire process would not be affected.

Monitoring using a Cron Task

Figure 2: Monitoring using a cron task

Let’s understand how we can monitor such integrations effectively to reduce business risk.

The setup described involves a delivery service using an Interactive Voice Response (IVR) system and a Redis database to manage requests. A cron job is set up to trigger a specific action after 20 seconds.

When a customer initiates a delivery request through the IVR system, the request ID and state are stored in the Redis database. This allows the delivery service to keep track of all active delivery requests. Once the request is initiated, a cron job is set up to trigger after 20 seconds.

After 20 seconds, a signal is sent to the delivery service. The delivery service checks the Redis database to see if the request ID is still in the initiated state. If the request ID is still in the initiated state, the delivery service takes appropriate actions, such as updating the request status or sending a notification to the customer.

This setup does have some potential drawbacks while cron jobs can be useful for automating tasks and reducing the need for manual intervention, they do add an additional component to the architecture and increase the overall complexity of the system. This can increase the risk of failure and make it more difficult to identify and troubleshoot issues when they occur.

How does Dr Droid make the process easier?

Figure 3: Doing the above process using Dr Droid

The above flow describes a scenario in which a delivery service is using an IVR provider and Dr Droid to monitor the external integration we discussed above using Redis.

When it is found that there is some delay in delivery, the Delivery Service makes an external API call to the IVR provider to take inputs from the Delivery Executive about the reason for the delay and sends a "call initiated" event to Dr Droid. This event notifies Dr Droid that a new call has been initiated.

When the response is received from the Delivery Executive, the IVR provider sends the response back to the Delivery Service and sends a "callback received" event to Dr Droid, which notifies Dr Droid that a callback has been received for the call.

If a callback is not received within 20 seconds, you can set up alerts and webhooks on Dr Droid and it will take care of tracking and executing the recovery flow for delivery service.

Overall, this flow allows the Delivery Service to efficiently manage requests by leveraging the capabilities of the IVR provider and Dr Droid. With Dr Droid, businesses can monitor the performance and anomalies of their business process, receive alerts in case of any disruptions, and set up necessary actions to resolve the issues. The use of event notifications and decision-making processes enables the Delivery Service to handle calls promptly and effectively, while also providing visibility into the call process for monitoring and optimization purposes.

Setting up the monitoring using Dr Droid

We are using the API specs and showing how to interact with Dr Droid using the REST API.

An event called call_initiated is created.

Figure 3: call_initiated event created

Following that, we created another event callback_received:

Figure 4: callback_received event created

After creating the event, you can notice in the dashboard:

Figure 5: Events on Dr Droid’s dashboard

We can then set up monitoring in the console to track if a callback is received in x seconds after the call is initiated.

Figure 6: Monitor Dashboard

Figure 7: Selecting primary and secondary events

Configuring a notification to be triggered if there is a delay in the occurrence of the secondary event despite the primary event being initiated.

Figure 8: Setting up an alert

Figure 9: Setting up email notification

Figure 10: Monitoring setup is done

We are now ready to perform a test.

We will proceed by creating a new event called call_initiated

Figure 11: A new event created call_initiated

The primary event was initiated but as we have not yet created the secondary event, callback_recieved, an alert notification has been sent.

Figure 12: Monitoring dashboard

An email notification was received stating that the secondary event, "callback_received", has not been created within a timeframe of 20 seconds.

Figure 13: Email Notification

Figure 14: Notification on the dashboard

This makes monitoring the calls easier. If the event is delayed Dr Droid sends notifications via Email/Slack or triggers the configured webhook to take appropriate actions.

Summary

I hope you have learned the importance of using third-party tools and monitoring them to make the system work reliably and efficiently with some use cases.

Do let us know through comments, how you monitor the external APIs. We will be happy to learn about your use case.

Sign up for Dr Droid today - it's free up to 1M events/month!