A server crash…

Maybe some frequent visitors noticed that this blog had a bit of outage in the past weeks. Alltogether, probably a couple of days. Usually, this blog has a very good uptime despite being hosted on my own server at home. Of course, I do need to do a server reboot every now and then, but all in all we’re talking about minutes of downtime per month. So a couple of days is huge.

The reason for this downtime was one of these things which you always hope won’t happen to you: A harddisk crash. Not just any harddisk though. It was my system disk, a 500GB SSD from Crucial (MX500). I always expected that one day, I’d log in to my server, only to find out that one of my traditional harddisks failed or was reporting bad sectors. I expected SSDs to be sturdier. After all, there are no moving parts in an SSD. Sure, blocks can only be written so many times, but I shouldn’t have gotten there for the next many years with this drive. I therefore was quite shocked when a couple of weeks back, I noticed that my DNS server wasn’t working. When I tried to log in to my server, I couldn’t and when I finally hooked a screen to the server, it told me that there was no system disk found.

I’ve tried every tip on the internet to revive an SSD. I’ve hooked it up to power for 30 minutes without data connection, decoupled it, let it sit for a minute, repeated that and then tried if it would show some life. I’ve put it in the freezer, I’ve put it in the oven, but nothing helped. My bios won’t see the drive anymore. It’s as dead as it can be. There were no prior warnings. It went from working a 100% to a 100% dead in the blink of a second. Lesson learned: SSDs can die and they die suddenly. And when they do, don’t expect to have any possibility of getting your data back.

Did I have backups? Luckily, yes. Even though my backup strategy leaves lots of room for improvement, my data backups were just over a month old. I lost two articles of this blog, but Wayback machine and Google cache helped me recover those quite easily. I lost some Homeassistant scripts which I had recently written, which was a bummer, but I’ll survive. All in all, data wise I was quite fine. My server configuration was another story though. That backup turned out to be quite ancient. That’s only a problem for those services that I actually run directly on my server and not inside a container and I suspected that that would cost me little time to reconfigure. I was wrong. Especially my reverse proxy configuration and my systemd files turned out to be quite advanced and not that easy to regenerate. Having to do this all again, did lead to quite a few lessons learned though and some nice new scripts that I wrote. In the next weeks, I’ll write a couple of articles on particular problems I encountered while reconfiguring my server. I’ve found some serious WTF’s and very little documentation, so some articles about those might be very welcome for some.