Why Would You Want to Validate a Spreadsheet?
Spreadsheets are everywhere. Their flexibility, ease-of-use, and instant visual feedback make them the go-to choice for many companies and people who want to do things with data.
But you know that already. If you’re reading this, you’ve probably interacted with a spreadsheet sometime this week, if not today. You might even be pivoting a chart right now. Chances are, you also have some idea of how unreliable spreadsheets can be.
Depending on your industry, you might have heard that “Your Spreadsheets need to be Validated for Compliance”. Sure, validation is required by the various GxPs and 21 CFR Part 11, so that's why most of us would validate a spreadsheet.
Validation
: Action of proving that a process leads to the expected results. Validation of a computerised system requires ensuring and demonstrating the fitness for its purpose.
-
OECD GLPs #17: A2: Validation
The regulatory frameworks can bring in specific requirements that we need to incorporate and validate in computerised systems and software, like audit trails and security.
But while "because I have to" is a perfectly valid answer, it's not a very satisfying one. After all,
compliance is just one of many inputs
to the design of a quality process, not a reason unto itself.
This week we ask: why would we want to validate a spreadsheet, regardless of compliance requirements? What do we get out of the deal?
You’ve all Heard the Horror Stories
You’ve all Heard the Horror Stories
In May of 2021, a single spreadsheet error cause a $10.5 million mistake for Crypto.com. They joined a growing list of infamous spreadsheet disasters: JP Morgan Chase lost
$6 billion
in 2012’s “London Whale” incident that was partly caused by cut-and-paste errors; Enron’s 2001 implosion was helped along by thousands of inaccurate spreadsheets; Barklays sent an offer to purchase to another firm that was generated from a spreadsheet, which had hidden - instead of deleted -
179 unwanted contracts
that ended up in the final offer.
The list goes on.
Forbes claims that up to 88 percent of all spreadsheets have “significant” errors in them. And I believe it.
Why is that? What’s so special about spreadsheets?
Spreadsheets are Easy, Flexible, and Everywhere
Spreadsheets are Easy, Flexible, and Everywhere
Spreadsheets are powerful tools that can be used for various purposes - as a calculator, a database, a reporting tool, quick visualizations, data exploration, a tool for data extraction and transfer, or even as a complex program that automates any of these repetitive tasks.
Excel is, in fact, a full programming language, albeit a functional and visual one that is very easy for non-programmers to learn and use.
It’s also very easy to introduce hidden bugs.
Coupling of Data and Functionality
Coupling of Data and Functionality
In most traditional programming languages you have a separation between your data and the program that runs on the data. For example your program code might be in a text file and your data might sit in a database. Changing the program and changing the input data are two separate activities.
Depending on the language, there is one program or script that can be re-used on different data sets without making separate copies of the code base. So, for example, you can test the exact same program against a test data set and your real data, and then again on as many data sets you want. If your language
compiles
to an executable, or you have access to code analysis tools (called
static
analysis tools), you can also get information about the correctness of your program and even best coding practices before you subject it to the data at all.
In contrast, spreadsheets generally have
the things that you program with
- the formulas, formatting, buttons, etc. -
all configured in the same interface
as the raw data. That may not sound like a big deal, but contemplate this: one cell can be used for data input, array input, formula input, result display and formatting, number format conversions, actual value rounding, as well as being the reference input data for other formulas, functions, conditional formatting, external code, charts, and so on and so forth.
While this makes for a great, instant feed-back-loop that lets you try different things to manipulate and visualize your data, it makes the long term stability of the code base very difficult. This is especially true if you're trying to re-use the formulas and other mechanisms to analyse different or evolving data sets.
Data Entry as an Extreme Sport
Data Entry as an Extreme Sport
So every time you enter data, you’re potentially modifying the underlying program! In fact there are a whole laundry list of ways that you can unintentionally change the functionality of the spreadsheet while just entering data. Just a few examples:
•
An accidental drag-and-drop can change the cell references for some other formula. This is especially easy to miss when the modified formula is off-screen somewhere like another worksheet.
•
Similarly, let’s say you entered data into the wrong place, and then cut-and-pasted it into the correct cells. Surprise! you’ve just changed where a formula or chart is getting its data.
•
Inadvertent selection of multiple sheets or spanning hidden cells during input.
•
Pasting-as values where formulas once were.
And this is before you even consider conditional formatting rules, named areas, macros, add-ins and other hidden sources of functionality and side effects. You can even manipulate the data using macro code that is
local to the user
rather than
local to the file
.
To put it another way: Every time someone opens and closes a spreadsheet, it could be doing something different to the data!
Spreadsheets: The Schrödinger's Cat of Data Analysis
Spreadsheets: The Schrödinger's Cat of Data Analysis
So how are you going to use the results from a spreadsheet in some critical application? How can you trust that data in a safety study, or to calculate a dosage, or to make critical business decisions with?
Did it work as intended? How about the intermediate calculations? How do you know you didn’t accidentally drag and drop an equation into the wrong cell? How do you know the red warning flags are going to show up whenever the results are out of range?
If you’re using them in a regulated facility, for compliance, or for other high-risk situations you are going to need a similarly high level of data integrity and assurance.
That might mean careful design and testing.
You Don't Need to Prove that Excel Can Add.
You Don't Need to Prove that Excel Can Add.
Though I've heard enough people argue to the contrary (just search for any Hacker News discussion on using Excel for stats...), you don’t need to prove that Excel's functions work properly. That was Microsoft's job, and you trusted them when you went all-in on using Excel to automate your nuclear-warhead business. I hope you did your vendor qual.
Now, whether Excel's the right tool for the job is a topic for another day.
What you
do
need is to test all of the “configuration” you have done - the calculations, custom code and visual indicators that you’ve introduced into the spreadsheet. The things that you rely on to be correct every time. The critical stuff. A risk-based approach is all the rage these days.
But you'll also need to keep up assurances throughout it's use in the wild - to make sure it doesn't morph and mutate like a frog living near Monty Burns' power plant.
Hmm. But wait. If you go through and QC every calculation, every data reference, etc. every time you use the spreadsheet you might as well be doing it by hand! How do you get that high level of assurance before that data gets used in a critical report or decision?
Front-Loading the Effort
Front-Loading the Effort
That’s where having a
validated template
comes in. Treat your spreadsheet like you're developing custom software: Document the core user requirements and intended functionality in a requirements specification; Learn and implement good development practices; Use built in controls and best practices to lock things like data input areas down; Put together test data sets that test normal, borderline and failure cases against the written requirements. Rinse and repeat.
What we’re doing here is front-loading all that QC work to the design and implementation phase by testing it (at various levels) against the specifications.
Once we have a template that provably meets the specifications, we lock it down and put it under version control. We release a controlled version, and only allow that verified template to be the starting point of each use. If you can't do that, you might need to have some way to test that any versions 'in the wild' haven't drifted away from correct behaviour.
This gives you freedom from having to worry as much about QC and testing
of the functionality
every time you put new data in the spreadsheet… instead the work is done once, up front. Done properly you now have a template that you can trust and reuse over and over again, and can focus instead on quality control of the
input data
.
A Risky Business
A Risky Business
We've already talked about how flexible spreadsheets are. The amount of effort that you need to put into development, testing and/or validation will really depend on the application.
For each spreadsheet you might need to use a combination of testing, development best practices, templating, external controls and ultimately validation. And how complicated that validation process is really depends on the spreadsheet - there are those that are easy to test and validate, and ones that are near impossible. Knowing your requirements up front and following some best development practices will help keep your spreadsheets in the 'easy' category.
Finally, put some effort up-front in evaluating the risks posed by any spreadsheets used in critical areas. This will allow you to decide which ones need validating, which need a few tests and a keen eye, and which just need a little bit of love.
Thanks for reading! Oh, and if you'd like more help on this topic, I offer a
range of consulting options and workshops
from practical advice for designing and building spreadsheets to coaching your team through validation best practices.
Until next time,
– Brendan
Subscribe to the Daily HaiQu!
Join me every weekday as we take a few minutes to explore, design, test and improve the critical systems we use in our regulated facilities. From spreadsheets and software to SOPs and forms and beyond.
We'll never share your email. Unsubscribe any time.