🔗 This is the second part of this series about Amazon Connect, you can find the first part here: https://cloudnature.net/blog/setting-up-your-first-call-center-on-aws-a-step-by-stepguide
Introduction
Call center monitoring is not just a feature; it’s a necessity. In the fast-paced world of customer support, businesses must maintain a real-time grasp on their operations. Effective monitoring allows you to stay ahead of issues, ensure customer satisfaction, and optimize resource utilization.
Imagine your call center as a grand piano on a sold-out stage 🎹. Each call your agents handle is a note played on that piano. Effective monitoring acts as the pianist’s ear, allowing you to hit the right chords, keep the perfect tempo and produce melodies. Call center monitoring allows you to watch over your agents’ interactions, ensuring they hit the right notes. It enables you to identify and address any discordant sounds in real-time, thereby orchestrating customer satisfaction and resource utilization to a delightful tune 🎶.
In the next section, we will explore how, much like a pianist with a trained ear, you can listen to the harmonies that shape your call center’s performance on the AWS stage.
⚠️Note: As we saw in the first part of this series, a real call center is not build with just an Amazon Connect instance. This article covers just the monitoring of Amazon Connect instances and related phone numbers.
Key Metrics to Monitor
We’ve discussed why monitoring is important enough. Now it’s time to understand the metrics we need to monitor in order to have a harmonious call center. We can divide the most critical metrics to monitor into three groups:
- Service limit metrics
- Instance and Flow metrics
- Business metrics
Service Limit Metrics
CallsBreachingConcurrencyQuota — Count
This metric tracks the total number of voice calls that exceed the concurrent calls quota for your instance. Exceeding quotas can lead to service disruptions and affect customer experience. Monitoring this metric helps you react as soon as possible so you might ask the service team for a service limit increase.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
ThrottledCalls — Count
Throttled calls are voice calls rejected due to exceeding the maximum supported call rate. Like above, by monitoring this metrics you could ask for a service limit increase.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
ConcurrentCallsPercentage — Percent
This metric provides insights into the percentage of your concurrent active voice calls service quota being used. As you can se, unlike the two metrics above, this one helps you prevent the issue so you can plan capacity accordingly.
⚖️Threshold: 80
📅Evaluation Period: 1
⏳Period: 300
📈Statistic: Maximum
Instance and Flow Metrics
ToInstancePacketLossRate — Percent
Packet loss can degrade call quality. Higher percentage means the call quality is low, hence there could be network problems.
⚖️Threshold: 0.03
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Average
⚠️ Note: his one is noteworthy. AWS suggests setting an alarm when packet loss exceeds 1% (0.01). I think it’s safe to say we can set a 3% threshold for one minute.
CallRecordingUploadError — Count
Failed call recording uploads can result in lost data and compliance issues. However, it’s important to note that when an upload error occurs, it doesn’t necessarily indicate a complete failure. Amazon Connect automatically retries the delivery within 24 hours, mitigating potential data loss.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
⚠️ Note: in a real-world scenario, when encountering an upload error, you should reach out to AWS Support, providing them with the contact ID, contact flow logs, and CTR (Contact Trace Records). They will assist in resolving the issue and retrieving the recording for you, ensuring data integrity and compliance.
ContactFlowErrors — Count
Tracking flow errors helps you identify and correct issues in your call flow logic, ensuring smooth customer interactions.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
ContactFlowFatalErrors — Count
These errors indicate critical failures in call flow execution. Monitoring them is crucial for preventing disruptions in call center operations.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
⚠️ Note: it’s crucial to understand that these errors originate from Amazon Connect’s side, which means there’s limited action we can take independently. We may not even have access to CloudWatch logs related to the error. In such cases, the recommended procedure is to notify the AWS Support team, providing details about the affected Contact Flow and a timeline of the issue. They will investigate and resolve the problem, and reach out to you with their findings.
CallBackNotDialableNumber — Count
This metric alerts you to instances where queued callbacks couldn’t be dialed due to geographic restrictions. Ensuring compliance with outbound call rules is essential.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
MisconfiguredPhoneNumbers — Count
Failed calls due to misconfigured phone numbers can result in missed opportunities. Monitoring this metric helps maintain call quality.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
Lambda Execution Duration — Milliseconds
Tracking the execution time of Lambda functions in your AWS account helps you optimize their performance and reduce latency.
⚖️Threshold: 6000 (6s)
📅Evaluation Period: 2
⏳Period: 60
📈Statistic: Maximum
Lambda Execution Errors — Count
Monitoring Lambda function failures is critical for identifying and resolving issues within your call center infrastructure.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
Custom: IncomingCalls — Count
A sudden drop in incoming calls to a DID number (or TFN) could indicate issues with the carrier, and quick action is required to address them. Moreover this metric can also be used to set an upper-bound depending on you Carrier Provider capacity.
⚖️Threshold: 1 (depending on your workload)
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
⚠️ Note: How can we trigger an alert for this metric? We are testing “Anomaly detection” whenever the threshold is lower than the predefined range. However, since most call centers do not operate during the weekend, we cannot simply send notifications using SNS. Instead, we need to route the notification through a lambda function first and incorporate custom logic to determine if our call center is open.
Business Metrics
LongestQueueWaitTime — Seconds
Customer satisfaction aim to minimize wait times. Monitoring this metric helps you identify and address queue bottlenecks in real time.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
QueueSize — Count
Knowing the number of contacts in the queue at any given moment is essential for resource allocation and managing customer expectations.
⚖️Threshold: 100 (depending on your SLAs)
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
QueueCapacityExceededError — Count
Monitoring this metric helps prevent service disruptions caused by a full queue, ensuring smooth call center operation.
⚖️Threshold: 1
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
MissedCalls — Count
Missed calls can result in lost business opportunities and dissatisfied customers. Keeping track of missed calls helps you reduce response times and improve service quality.
⚖️Threshold: 10 (depending on your SLAs)
📅Evaluation Period: 1
⏳Period: 60
📈Statistic: Sum
Monitoring with AWS
Monitoring our call center on AWS is a straightforward process. We simply combine Amazon CloudWatch with Amazon SNS, to orchestrate the perfect monitoring symphony, which mean creating a fully automated notification service that triggers alerts as needed.
⚠️ Note: I won’t cover every single metric alarm (you can find them inside the repository). Instead, I’ll show one example of how to customize the monitoring of incoming calls which is the Custom: IncomingCalls — Count.
Let’s start by looking at an example of a metric alarm configuration:
resource "aws_cloudwatch_metric_alarm" "concurrent_calls_percentage" { alarm_name = "${var.project}-connect-concurrent-calls-percentage" alarm_description = "Alarm for Connect Concurrent Calls Percentage" comparison_operator = "GreaterThanOrEqualToThreshold" metric_name = "ConcurrentCallsPercentage" namespace = "AWS/Connect" statistic = "Maximum" unit = "Percent" dimensions = { InstanceId = var.instance_id MetricGroup = "VoiceCalls" } evaluation_periods = 1 threshold = 80 period = 300 alarm_actions = var.sns_topic_arn != null ? [var.sns_topic_arn] : null treat_missing_data = "notBreaching" }
This alarm is designed to monitor the “ConcurrentCallsPercentage” metric. It triggers when the maximum value reaches 80%, just one time, within a 5-minute window, generating an SNS email notification.
But our monitoring efforts don’t end here. To gain deeper insights into incoming calls and understand which numbers are being dialed and how often, we need to set up a custom metric filter. Here’s the code for it:
resource "aws_cloudwatch_log_metric_filter" "custom_incoming_calls" { name = "${var.project}-connect-incoming-calls" pattern = "{ $.Parameters.Value != \"\" && $.Parameters.Key = \"SystemEndpointAddress\" }" log_group_name = var.connect_log_group_name metric_transformation { name = "IncomingCalls" namespace = var.project value = "1" unit = "Count" dimensions = { IncomingCalls = "$.Parameters.Value" } } }
As you can see, the pattern for this metric filter is defined as:
{ $.Parameters.Value != "" && $.Parameters.Key = "SystemEndpointAddress" }
To make this filter work, we must update the contact flow by adding a “Set Contact Attribute” block that contains the “System Dialed number” value, like so 👇
By implementing this custom attribute, we are be able to track incoming calls and gain valuable insights into which phone numbers are being dialed and how frequently.
And there we have it, we have successfully monitored our call center on AWS🎉.
Data Visualization on AWS
Yeah! You’re absolutely right, we’ve accomplished a lot so far but we can’t call it a day, yet! When it comes to monitoring a call center, one of the most crucial aspects is visualizing the metrics in a single, comprehensive dashboard. Fortunately, Amazon CloudWatch allows us to create such dashboards easily, much like a composer brings together various musical notes to create a symphony. Let’s take a look at an example:
With this dashboard in place, we have successfully completed the visualization aspect of our monitoring system. This real-time dashboard provides us with the ability to ensure that our call center is running smoothly, free from errors or slowdowns. 🦥
⚠️ Note: you have the option to create a custom dashboard in Amazon CloudWatch directly with Terraform. While some may find this convenient, others, like myself, prefer to create dashboards manually using the AWS Console and then either import them or copy and paste the structure into Terraform. If you’re curious about the Terraform approach, you can find more information here 🔗https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_dashboard
Conclusion
There you have it, folks! With that, we’ve reached the final crescendo 🎶! While in the first article we’ve covered essential aspects of creating a robust and efficient call center infrastructure, in this post we have learnt how monitoring a call center on AWS works. From the initial setup and configuration of Amazon Connect to real-time monitoring and data visualization using Amazon CloudWatch, we’ve explored the key steps to ensure our call center operates seamlessly.
By following this guide, you can not only establish a reliable call center but also gain valuable insights into its performance, ensuring that it continues to meet the needs of your customers and your business, creating a symphony of success. 🎹.
Here you can find the GitHub repository: https://github.com/Depaa/amazon-connect-terraform/tree/main 😉
If you enjoyed this article, please let me know in the comment section or send me a DM. I’m always happy to chat! ✌️
Thank you so much for reading! 🙏 Keep an eye out for more AWS-related posts, and feel free to connect with me on LinkedIn 👉 https://www.linkedin.com/in/matteo-depascale/.
References
- https://docs.aws.amazon.com/connect/latest/adminguide/monitoring-cloudwatch.html
- https://docs.aws.amazon.com/connect/latest/adminguide/architecture-guidance.html
Disclaimer: opinions expressed are solely my own and do not express the views or opinions of my employer.