Spreadsheetology and scientific research

Thursday, August 25, 2016
by Scott Campbell

BBC News reports that some genetic scientists have run up against a problem when using the Excel spreadsheet, which would "helpfully", automatically, and presumably without notification alter the data within a column. For example: "Gene symbols like SEPT2 (Septin 2) were found to be altered to "September 2".

Well, we've all had word processors, spreadsheets, and our smartphones autocorrect what we type. So just how bad is the problem? According to the researchers who collated over 3500 published genetic research papers, about one-fifth were affected: "704 of those papers contained gene name errors created by Excel".

Goodness. That's a lot. I would hope that these errors haven't led to any improper research conclusions, although this wasn't part of the study which revealed the problem. Apparently Excel has been "co-authoring" scientific research like this for over a decade, and it has been been getting worse instead of better over time.

In Microsoft's defense, Excel is probably not the best choice for a genetic research database, but I suspect it is the most widely and cheaply available option even for those who know it isn't ideal. It's also true that a relatively simple configuration change would probably make the whole mess go away. The default for Excel is to autocorrect whatever it can. This can be disabled, but you have to be aware of that option in the first place.

Trusting spreadsheets (or computer software in general) to do what we mean or what we want, instead of exactly what we say is hardly a 21st century problem. Over on the Risks Digest, which has been tracking computer-related risks since the mid 1980s, warnings about "Spreadsheetology" have been around almost from the beginning.

In this case, I'm also reminded of Clippy, Microsoft's unpopular, antagonistic, animated paperclip avatar that was first included with Office 95 over 20 years ago. For those who don't remember, it would pop up occasionally and offer to help with various tasks, like composing a letter. Microsoft felt the avatar would provide an emotional connection for users, making the whole computing experience more friendly. Unfortunately, it was notoriously unhelpful, thus widely hated and parodied, and eventually disappeared. Or maybe it went to grad school. You know the old saying: publish or perish. Maybe it's Clippy PhD now and has been secretly just trying to help and get a few publications. Know any post-doc openings for a paperclip.

It looks like you're doing genetic research. I can help! Can I auto-correct your genetic symbols?