Not much more than a week after I commissioned the "new to us" server, one of the drives dropped out of the array. I've had problems with this drive before, that I thought was down to me not cabling it 100% correctly, but nope, that sucker is definitely dead... horrific noises during read attempts and everything. How disappointing, I replaced three dying 7+ year old 1TB disks with some brand new drives and one of them dies after mere months? It happens, I guess.
After a few attempts to resuscitate it, I declared the patient a lost cause and set about working out how to RMA it. I started the process directly via WD, and they wanted me to ship the thing to Viet Nam! Instead opened a ticket up with PC Case Gear, where I bought it - and where I buy pretty much everything I can because their warranty support is excellent.
"Yep, here's where to ship it and an RMA number" was the reply on Monday, so I stuck it in a box and sent it off, and a bit over a week later the replacement arrived. During this time, my boss kept sending me articles making me question my choice of raidz (single parity, three disks). In particular, one article reckoned that even with raidz (so not just RAID5), that there was basically a better chance than not that it wouldn't make it through the resilver process without an uncorrectable read error (not a huge deal, I have backups!), and that the performance of the entire array would be abysmal during the entire process which could take DAYS for multiple terabytes.
So I started the resilver process and left it go, and was pleasantly surprised. It took about 5 hours to scan nearly 3TB (just under 2TB of actual data, and about 800GB of parity data) and resilver about a third of that. It finished entirely uneventfully, and I was able to saturate my gigabit ethernet with reads from the array during the process. No idea what all the hubbub was about, I'm extremely happy with how the whole ordeal went, my raidz array did exactly what I wanted it to the entire time - I did not have to rebuild the array from backups, and I was able to access it with full performance at any point during the operation.
So I'm back to full redundancy, and I'll work out taking backups seriously and actively monitoring SMART data now.