A Stupid Failure With a Home Lab – Out of Disk Space

I’ve just had what is probably the most ridiculous failure in my home lab. I found that my media download server was not longer running properly, the services were coming up but they were failing left right and centre. I noticed that most of my SMB mounts weren’t mounted so I manually mounted them, that didn’t fix the issue. After a bit if digging I realized I was out of disk space.

Now I realized that I was running that machine a bit closer to the wire than I wanted and that some of the containers liked to store more in their caches than I would prefer but I was fairly sure I should only be at about 60% drive usage. What I was seeing when I ran df -h was 100% usage on /. I switched into the /data folder and ran the following command.

sudo du -h --max-depth=1

That showed me exactly what I was expecting. The media services were using a few GB and the mounts were using a few TB, but that’s fine, the mounts aren’t on this machine. This was a complete head scratcher, the media containers weren’t using enough to fill the drive. Calculating the used space on the mounts takes a while and it wasn’t helpful for diagnosing the problem so I unmounted them all. To check I was on the right path I, by a stroke of good fortune, ran the above du command again. Imagine my surprise when /data/mnt came back at 20GB. I dug down into /data/mnt and what did I find? The downloading service had written a ton of files in there.

Here’s what I think happened. Earlier that day I had rebooted the Proxmox server due to a kernel update. I suspect the media download server came up faster than either the DNS server or the file server. When it came time to mount the drives it didn’t happen either because it could resolve the names or the file server wasn’t ready. That caused the container that does the downloading to just start writing files into the /data/mnt directory thus filling up the (small) drive on the media download server.

I’m not sure what the best solution for this is. Ideally I want the media download server to not start it’s containers unless the SMB mounts have worked. I can’t think of a way to do this though. For now I’m going to put a delay on the media download and media servers. This is described here in the Proxmox manual. Somewhat strangely I’m actually going to set the order and delay of the file server and DNS servers rather than the media servers. By setting an order of 1 and a delay of 30 seconds on, for example, the DNS server all the unconfigured servers will start after it as they always come last. If more than one server has an order of 1 they will be started in ID number order.

Now I need to unscrew-up my mnt folder.