Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2025-01-20 Outage Report #609

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

ThisIsMissEm
Copy link
Contributor

This describes the outage that we faced in the week starting 2025-01-20. Based on #608


#### 2) Spreadsheet created to correctly calculate database connections required `@ThisIsMissEm`

This will be published after review from the Hachyderm team. We'd previously manually run the mathematics set out in `@hazelweakly`'s fantastic [Scaling Mastodon](https://hazelweakly.me/blog/scaling-mastodon/#db_pool-notes-from-nora's-blog) blog post.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This will be published after review from the Hachyderm team. We'd previously manually run the mathematics set out in `@hazelweakly`'s fantastic [Scaling Mastodon](https://hazelweakly.me/blog/scaling-mastodon/#db_pool-notes-from-nora's-blog) blog post.
We'd previously manually run the calculations set out in `@hazelweakly`'s fantastic [Scaling Mastodon](https://hazelweakly.me/blog/scaling-mastodon/#db_pool-notes-from-nora's-blog) blog post. However, we'd not re-run the numbers as we changed our infrastructure.
You can find the spreadsheet to calculate [database connections for Mastodon](https://docs.google.com/spreadsheets/d/1KbJ-TAPLqsRvN91Li-04E2MnV8EYAqlwcplcl_q_TVY/edit?gid=0#gid=0).


### Things that went well

TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TODO
- the infrastructure team was responsive thanks to members in different time zones

date: 2025-01-24
title: "Global Outage: Elevated 500 error levels"
linkTitle: "Global Outage: Elevated 500 error levels"
description: "Hachyderm experienced elevated 500 error levels in mid-January 2025. Here is the postmortem, analysis, and writeup on the incident."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: "Hachyderm experienced elevated 500 error levels in mid-January 2025. Here is the postmortem, analysis, and writeup on the incident."
description: "Hachyderm experienced elevated 500 error levels in mid-January 2025. Here is the timeline, analysis, and write-up on the incident."


As part of routine updates, we went through each of our machines and ran regular system updates.

Two of the updated packages included `libicu` and `libvips`. The affected machines involved were the `mastodon-web` machines, known as `fritz` and `buttons`, and `mastodon-sidekiq` machines, known as `franz` and `freud`. `nietzsche` is our database server, which we are in the process of replacing.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Two of the updated packages included `libicu` and `libvips`. The affected machines involved were the `mastodon-web` machines, known as `fritz` and `buttons`, and `mastodon-sidekiq` machines, known as `franz` and `freud`. `nietzsche` is our database server, which we are in the process of replacing.
Two of the updated packages included `libicu` and `libvips`. The affected machines involved were the `mastodon-web` machines, known as `fritz` and `buttons`, and `mastodon-sidekiq` machines, known as `franz` and `freud`. Our database server is known as `nietzsche`, which we are in the process of replacing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant