Issue: Widespread reports of 502 errors within Eduphoria applications.
Event start time: 9:15am CST
Event end time: 10:20am CST
Status: Resolved
Summary of most recent events:
As part of our regular release cycle, an unforeseen issue during deployment led to a brief disruption in service. Our load balancers started generating higher than normal “502 Bad Gateway Errors.” We found the 502 errors were most likely caused by higher than normal connections to one of our primary database clusters. Users experienced intermittent 502 errors across Eduphoria applications from 9:15am CST to around 10:20am CST, when the issue was resolved. The issue was resolved by adding additional server capacity to the affected cluster.
Immediate next steps for Eduphoria
- Scaled up and upgraded database capacity; added additional hardware
- Test deployments monitoring impact of deployment on connections and error rates
- Additional steps to fully remediate:
- Implement RDS Proxy to better manage database connections
- Audit/Add more Caching
- Code refactor to reduce connections
Article is closed for comments.