[US Region only] Interruption to all Hypercare services

Incident Report for Hypercare

Postmortem

Overview:

This incident was caused by a technical constraint related to database query limits and a recent release aimed at improving the search feature on Messaging and Contacts.

What Happened:

On December 11, 2024, at approximately 11:00 a.m. ET, the platform experienced a surge in concurrent searches across a large user base. This triggered the database's fixed data limit for query execution, leading to search processes pooling and eventually crashing. The issue was observed in the United States, where higher user loads exceeded expectations. Canadian operations were unaffected.

Impact:

From 11:10 a.m. to 11:50 a.m. ET, Hypercare Messaging and Virtual Paging services were non-operational, and on-call schedules loaded slowly.

Root Cause:

The root cause was the interaction between the database’s inherent query size limit and the recent enhancement to our search functionality on Messaging and Contacts. Under high concurrent usage, query sizes exceeded the database's fixed threshold, resulting in the experienced performance issues.

Resolution:

The search enhancement release has been rolled back to stabilize the system and restore full functionality. All services are now operational, and additional safeguards have been implemented to prevent similar issues in the future.

We sincerely apologize for the disruption and remain dedicated to ensuring a reliable, high-performing platform for all users.

Posted Dec 11, 2024 - 18:02 EST

Resolved

The issue is resolved. All services operational now.
Posted Dec 11, 2024 - 12:09 EST

Monitoring

Services are recovering. Messaging and Virtual Paging services should be operational. We're closely monitoring the issue.
Posted Dec 11, 2024 - 12:02 EST

Identified

The issue has been identified and a resolution is being worked on. Users can log into Hypercare and the On-Call schedules are loading very slowly. Messaging and Virtual Paging services are currently still non-operational.
Posted Dec 11, 2024 - 11:38 EST

Investigating

We are currently investigating reports of all Hypercare services being unaccessible in the US region. Our Engineering team is currently working on identifying the root cause.
Posted Dec 11, 2024 - 11:20 EST
This incident affected: United States Region (Virtual Pager (U.S. Region), Messaging (U.S. Region), Notifications and Real-Time Syncing (U.S. Region), File Attachments (U.S. Region), Viewing Who is On-Call (U.S. Region), Code Teams (U.S. Region), Self-serve Scheduling (U.S. Region), Administration and Scheduling (U.S. Region), API & Integrations (U.S. Region)).