Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Data, death threats and that $25 million lawsuit: How data falsification has evolved!
If data is the new oil, then data scandals are almost inevitable. Last month, a judge dismissed the defamation lawsuit brought by former Harvard law professor, author and an expert on honesty, Francesca Gino, following her dismissal due to falsified data. Data tampering is nothing new, but the fact that Gino resorted to strong arm threats of financial ruin with her $25 million lawsuit is a shocking development. It seems that the days of recovering reputation by publishing a rebuttal paper might be over.
A systematic problem
The publish or perish culture of academia is well known. The 20210s created a methodological crisis in academia with a series of accusations that authors were getting a bit creative with their data in attempts to stay ahead in the career game. Massaging figures to get the right p-values, a measure of significance required for publication, was common. But big research companies like Bayer suggested that the problems were endemic. They were wasting millions in research each year producing success rates of around 20% in their replication attempts.
In 2021, a new scandal appeared. Francesca Gino, had been investigated not for massaging her data, but for falsification, and found guilty, but she wasn’t bowing out quietly.
The undoing of the Honesty Professor
It began with PhD student, Zoe Ziani, who was researching networking. On reading a paper co-authored by Gino, where the authors claimed people felt physically dirty when networking, Ziani was surprised by the findings. She scrutinised the data to see how they got to that strange conclusion, and immediately noticed that some of the p-values didn’t add up and the encoding of the categorical variables was irregular. Ziani didn’t want to cite the paper in her research, because it was odd, and the results seemed ‘cherry picked’.
Ziani’s supervisors disagreed and insisted she had to cite it, bowing to the strict hierarchy that still rules academia. Ziani turned her work over to Data Colada, a blog run by academics who investigate data falsification tactics. After substantially reviewing the original data set, the Data Colada team agreed with Ziani. Not only did the complaint against this paper hold up to rigorous challenge, it lined up with a series of concerns already raised about published papers where Gino had been the main data collector.
Harvard University, Gino’s employers, concurred following their own internal investigation, and published a 1,200 page report, placing Gino on unpaid leave. That should have been the last of the shameful chain of events, but Gino decided to challenge the decision, claiming that male researchers carried out similar work but were tolerated. She also denied any motivated intention.
How do you falsify data or spot falsified data?
There are different ways to falsify data. Manipulating data and cleaning it is an everyday task for the data scientist but version control on the data sets suggested that rows had been added after sorting, when the unfavourable results would be evident, and some rows were changed after sorting, manipulating the outliers. In themselves, these observations are suspicious, but not fraudulent in themselves. When the changes have a strong effect on the results, though, it suggests a motivated doctoring of the data. In short, the data was not just cherry picked, it was falsified.
Gino is not the only academic under scrutiny but anti-fraud work tends to be carried out by dedicated individuals, like the Data Colada team, on top of their professional roles. Harvard have an almost bottomless fund to fight lawsuits but that's not true of everyone. Fortunately, the Data Colada team were supported through a Go Fund me, but the threat of financial ruin is n ot the only risk they face. On investigating a different researcher on another paper, the team received what the FBI agreed was a death threat made against them. It was made clear in no uncertain terms what would happen if they went ahead with publication. While it’s not known who sent the threat, the paper under question has been retracted.
Learn more
There is a clear line between cleaning up data and intent to mislead. As legislation roles out, increasingly, data decisions may need to be defended in court. We have a webinar on the 17th October on ways to document your work to support the challenging task of Reproducibility. Sign up below: