Degraded performance for some Moodle sites
Incident Report for Moodle US
Postmortem

Subject: Incident Post Mortem - Recent Web Server Outages

We'd like to provide a concise post-mortem of the recent web server outages to maintain transparency and illustrate the steps taken to resolve the situation.

Incident Overview:

A series of outages was traced back to a configuration issue that pushed our web server to its computing power limits, leading to frequent downtime and performance degradation.

Actions Taken:

  1. Configuration Adjustment: We identified and rectified the configuration problem, resulting in an immediate performance boost.
  2. Load Reduction: Our reconfiguration led to a significant reduction in server utilization which has been maintained consistently during high-traffic periods for over two weeks.
  3. Capacity Expansion: To bolster our infrastructure, we added a new server and doubled the memory of every existing server, ensuring better performance and readiness for future growth.

Next Steps:

Our team will continue vigilant monitoring, improve system resilience, and maintain open communication with you. We apologize for any inconvenience and appreciate your continued support.

Posted Oct 20, 2023 - 16:17 UTC

Resolved
The fixes implemented on October 5th have shown sustained success, and this issue is therefore considered resolved.
Posted Oct 16, 2023 - 17:12 UTC
Update
We are continuing to monitor for any further issues.
Posted Oct 16, 2023 - 01:39 UTC
Update
Engineers are still closely monitoring the infrastructure to ensure fixes implemented last week have permanently addressed the recent issues.
Posted Oct 09, 2023 - 16:47 UTC
Monitoring
We believe we have isolated the culprit for the problems we have been experiencing. We introduced fixes this afternoon that have shown significant performance improvement. The cluster has been stable since early morning, we are working to make sure it stays that way permanently. We will continue to monitor performance closely.
Posted Oct 05, 2023 - 20:11 UTC
Update
While site performance is currently stable, we continue to investigate the root cause of the issue.
Posted Oct 05, 2023 - 18:02 UTC
Update
Sites are operational and performance is good. We continue to investigate the root cause.
Posted Oct 05, 2023 - 14:09 UTC
Investigating
We are investigating new reports of performance degradation on Moonami shared hosting.
Posted Oct 05, 2023 - 12:48 UTC
Monitoring
Performance has been restored. Engineers are actively investigating to identify the root cause.
Posted Oct 04, 2023 - 13:58 UTC
Investigating
We are currently investigating reports of degraded performance on a number of Moodle sites.
Posted Oct 04, 2023 - 13:20 UTC
This incident affected: Moonami Client Infrastructure (Shared Moodle Hosting).