Issue: Widespread reports of issues within Aware Online Testing (including: prolonged wait time with starting tests and issues with submitting assessments)
Event start time: 1:45PM CST
Event end time: 3:00PM CST
Status: Resolved; however additional system infrastructure improvements have been identified for future prevention.
Summary of most recent events:
1:30PM - Online Testing error rates were recorded at 4% (this level is above critical error rate threshold)
1:45PM - Investigation revealed issues connecting to the Redis cache. As a result, the cache cluster was failed over.
1:50PM - Error rates began falling, but reports of issues related to submitting tests increased.
2:30PM - Cache server DNS Entries were updated to cluster address in sosec without any change.
2:45PM - Redis Cluster was failed back over to the previously thought bad instance.
3:00PM - Online Testing normalized
Immediate next steps for Eduphoria
Continued system infrastructure improvements have been identified for future prevention for the issues in the failover process.
Specific details around continued system infrastructure improvements will be added to this article once available.
Comments
0 comments
Please sign in to leave a comment.