Mistakes and Swiss Cheese Ruining My Day (and Some Podcast Recordings) – Now, With a Good News Epilogue
The other day, I had an external hard drive completely fail. I lost the last 10 days' worth of data. That wasn't supposed to happen. I had protections and backups in place… yet something still went horribly wrong.
If you work in healthcare, you probably know about the “Swiss Cheese” model for medical errors, as illustrated below (thanks to the work of James Reason and others in the field):
When we see errors in healthcare, it's quite often due to MANY different things all having to go wrong at once.
“James Reason proposed the image of “Swiss cheese” to explain the occurrence of system failures, such as medical mishaps [1–5]. According to this metaphor, in a complex system, hazards are prevented from causing human losses by a series of barriers. Each barrier has unintended weaknesses, or holes – hence the similarity with Swiss cheese. These weaknesses are inconstant – i.e., the holes open and close at random. When by chance all holes are aligned, the hazard reaches the patient and causes harm (Figure (Figure1).1). This model draws attention to the health care system, as opposed to the individual, and to randomness, as opposed to deliberate action, in the occurrence of medical errors.”
When recording podcasts through Zoom, one of my biggest fears is losing the recorded audio and video files.
Back in 2011, I recorded a podcast with Eric Ries about his book The Lean Startup. I remember recording it through Skype onto a Windows laptop. Within hours of the recording session, the hard drive crashed. I didn't have Dropbox or other automated backup systems running continuously… so the files were gone.
I had to apologize to Eric and, thankfully, he was gracious enough to record the episode again.
Unfortunately, I had a situation this week where I thought I had lost everything I had recorded over the last ten days.
In recent years, this has been my process, along with some of the countermeasures that were intended to prevent data loss.
I record the podcasts using Zoom and I choose to record locally on my computer. Recording to the Zoom cloud would provide extra security, but Zoom cloud recording puts all of the audio into a single track. Recording locally produces multiple tracks and that makes the quality of the audio better when I can balance out the two tracks if necessary.
Recording onto one of my two Mac computers (an iMac and a MacBook Pro), I normally use a “belt and suspenders” approach. I run the Apple Time Machine software continually on the iMac and at least weekly on the MacBook Pro.
Additionally, I run Backblaze software that also pretty continuously backs up to a different cloud system. I do not keep the files in Dropbox-linked folders, as that would provide a triple-redundant backup system.
With my MacBook Pro, the built in solid state (SSD) storage starts getting full, even with a 256 GB system. So, I've been keeping many files, including the recordings of podcast episodes that have not yet been published, on a 4 TB external hard drive.
As I well know, the biggest problems with traditional hard drives is that they crash.
One nice thing about Backblaze is that it continually backs up external drives (something that Time Machine does not do). So, it mitigated that risk.
Last week, I was back in Texas, where I tend to use my iMac, especially for recording podcasts (it has the nice 27″ screen, my good podcasting microphone is plugged in there, etc.).
I recorded many podcasts — and lost them all due to a hard drive crash… (UPDATE: Thankfully, I was able to recover them)
- 3 episodes of “Habitual Excellence“
- 4 episodes of “My Favorite Mistake“
- 2 short 5-minute segments with other Lean podcasters for an episode that features many podcasters talking about their new series
As I recorded, I put the episodes onto the external hard drive, as I knew I'd be doing the editing when I got back to California (where I sit now).
Since the iMac has Backblaze and the external hard drive's previous home had been that iMac… I thought the external drive was getting backed up. It was not. I had “moved” it to the MacBook Pro as its home (since I'm in California all most all of the time right now).
That was mistake #1 — relying on the external hard drive and NOT also keeping the files on the iMac or in Dropbox. I could have copied the files, but I moved them to the external drive.
Mistake #2 was me misunderstanding how Backblaze worked and not double checking it.
The thing that “just went wrong” was a complete failure of that external hard drive. I could not read the files. The MacOS Disk Utility app gave an error… it showed the disk as being “uninitiated.” The disk seems dead and unrecoverable.
You might say “well, hard drive crashes happen,” but I had suspected for maybe two weeks that the drive was starting to go bad. You can sort of tell from the sound… and the way the drive felt when I picked it up… it was like the main “platter” in the drive was messed up somehow… a physical failure more so than a file corruption issue.
Mistake #3 was NOT moving more quickly to another drive… getting a new hard drive or getting an external SSD. I should have acted on those suspicions.
Many mistakes… had I not traveled, the external drive would have been connected to my MacBook Pro and it would have been backed up. If the disk had failed a few days later… it would have been backed up. If I had connected the external drive to my MacBook Pro in Texas… it would have gotten backed up.
These are the slices of Swiss cheese that lined up just right for my situation to get really bad.
I've contacted my guests… apologized profusely… and owned up to what I described as “a technology problem AND a process problem.” This was a preventable situation. I don't blame the hard drive… I blame my bad management of the situation.
I bought a 2 TB external SSD. These drives are more expensive, but they are certainly more reliable than older hard drive technology. I can recover all of my data, via Backblaze, except for that last week's worth of podcasts.
I also better understand how Backblaze works, so I won't make that mistake of assuming that the iMac's Backblaze software also backs up that external drive.
I'll learn from this series of mistakes. We're all human. We all make mistakes… but I still feel bad about not preventing this one. I learned from the Eric Ries podcast… making sure I didn't have a single point of failure… until I mistakenly ended up in that same situation again.
Update: I was able to recover the files using some $40 software called “Data Recovery Essential.” The software could see that there was data on the drive. It was more likely “corrupted” than “failed altogether.” It ran a scan overnight (this much was free). Then, after a small test to confirm that some audio files could be restored, I was able to restore all of it for just $40.
What do you think? Please scroll down (or click) to post a comment. Or please share the post with your thoughts on LinkedIn. Don't want to miss a post or podcast? Subscribe to get notified about posts via email daily or weekly.
One other countermeasure — I’m currently trying some Mac data recovery apps, ones that claim you don’t pay unless they can recover the data. I’d much rather pay $79 or $99 than have to re-record…
Good News Update: I was able to recover the files using some $40 software called “Data Recovery Essential.” The software could see that there was data on the drive. It was more likely “corrupted” than “failed altogether.” It ran a scan overnight (this much was free). Then, after a small test to confirm that some audio files could be restored, I was able to restore all of it for just $40.
Hi Mark. I hope your heart rate has been able to go down since you recovered your files. This is was a fascinating read, especially as someone who sets up plenty of backups themselves. The swiss cheese theory is a perfect visualization of the risk vs reward calculation that every person makes in a decision. I also think the size of the holes can be fully dependent on the person’s perception of the risk.
I hope this doesn’t paint a bad picture of my driving, but I think I can relate this best to the odds of getting a speeding ticket: the first layer is how fast you’re going, the slower you are the smaller the hole. The next layer, the speed limit. On I-95? Probably going to have smaller odds than a school zone. Another layer could be amount of traffic, etc. If you follow all these rules, you maximize your odds having a safe and ticket-less trip. The unfortunate reality is every so often those holes can still line up perfectly and cause harm.
Thanks, Nick. It was a very stressful 24 hours. I do consider what happened to be a severe “near miss.”
Even though the end result was good (no apparent data loss)… I’m still going to take the same countermeasures as I had already been thinking.
And you mention our human ability to perceive risk, we’re not very good at that.
I know I definitely have been in that situation where your heart stops beating and your stomach drops after you realize what happened. This is my first time hearing about the swiss cheese theory and it was definitely an interesting read. You could do everything right but there is still a possibility that something could still go wrong, and the swiss cheese theory illustrates that. The swiss cheese theory is all about your perception on how big you think the risk is in each scenario. The bigger the risk, the more likely the “swiss cheese holes” will line up. The reality is that the risk could be so small yet there is still a chance the holes can line up. Thanks for sharing and showing we are all human!