Sloppy Science: Still Someone Else’s Problem?

“The Somebody Else’s Problem field is much simpler and more effective, and what’s more can be run for over a hundred years on a single torch battery… An SEP is something we can’t see, or don’t see, or our brain doesn’t let us see, because we think that it’s somebody else’s problem…. The brain just edits it out, it’s like a blind spot”.

Douglas Adams (1952 – 2001) Life, The Universe, and Everything

The very first blog post I wrote (back in March 2013), for the Institute of Physics’ now sadly defunct physicsfocus project, was titled “Are Flaws in Peer Review Someone Else’s Problem?” and cited the passage above from the incomparable, and sadly missed, Mr. Adams. The post described the trials and tribulations my colleagues and I were experiencing at the time in trying to critique some seriously sloppy science, on the subject of ostensibly “striped” nanoparticles, that had been published in very high profile journals by a very high profile group. Not that I suspected it at the time of writing the post, but that particular saga ended up dragging on and on, involving a litany of frustrations in our attempts to correct the scientific record.

I’ve been put in mind of the stripy saga, and that six-year-old post, for a number of reasons lately. First, the most recent stripe-related paper from the group whose work we critiqued makes absolutely no mention of the debate and controversy. It’s as if our criticism never existed; the issues we raised, and the surrounding controversy, are simply ignored by that group in their most recent work.

More importantly, however, I have been following Ken Rice‘s (and others’) heated exchange with the authors of a similarly fundamentally flawed paper very recently published in Scientific Reports [Oscillations of the baseline of solar magnetic field and solar irradiance on a millennial timescale, VV Zharkova, SJ Shepherd, SI Zharkov, and E Popova, Sci. Rep. 9 9197 (2019)]. Ken’s blog post on the matter is here, and the ever-expanding PubPeer thread (225 comments at the time of writing, and counting) is here. Michael Brown‘s take-no-prisoners take-down tweets on the matter are also worth reading…

The debate made it into the pages — sorry, pixels — of The Independent a few days ago: “Journal to investigate controversial study claiming global temperature rise is due to Earth moving closer to Sun.

Although the controversy in this case is related to physics happening on astronomically larger length scales than those at the heart of our stripy squabble, there are quite a number of parallels (and not just in terms of traffic to the PubPeer site and the tenor of the authors’ responses). Some of these are laid out in the following Tweet thread by Ken…

The Zharkova et al. paper makes fundamental errors that should never have passed through peer review. But then we all know that peer review is far from perfect. The question is what should happen to a paper that is not fradulent but still makes it to publication containing misleadingly sloppy and/or incorrect science? Should it remain in the scientific record? Or should it be retracted?

It turns out that this is a much more contested issue than it might appear at first blush. For what it’s worth, I am firmly of the opinion that a paper containing fundamental errors in the science and/or based on mistakes due to clearly definable f**k-ups/corner-cutting in experimental procedure should be retracted. End of story. It is unfair on other researchers — and, I would argue, blatantly unethical in many cases — to leave a paper in the literature that is fundamentally flawed. (Note that even retracted papers continue to accrue citations.) It is also a massive waste of taxpayers’ money to fund new research based on flawed work.

Here’s one example of what I mean, taken from personal, and embarrassing, experience. I screwed up the calibration of a tuning fork sensor used in a set of atomic force microscopy experiments. We discovered this screw-up after publication of the paper that was based on measurements with that particular sensor. Should that paper have remained in the literature? Absolutely not.

Some, however, including my friend and colleague Mike Merrifield, who is also Head of School here and with whom I enjoy the ever-so-occasional spat, have a slightly different take on the question of retractions:

Mike and I discussed the Zharkova et al. controversy both briefly at tea break and via an e-mail exchange last week, and it seems that there are distinct cultural differences between different sub-fields of physics when it comes to correcting the scientific record. I put the Gedankenexperiment described below to Mike and asked him whether we should retract the Gedankenpaper. The particular scenario outlined in the following stems from an exchange I had with Alessandro Strumia a few months back, and subsequently with a number of my particle physicist colleagues (both at Nottingham and elsewhere), re. the so-called 750 GeV anomaly at CERN…

“Mike, let’s say that some of us from the Nanoscience Group go to the Diamond Light Source to do a series of experiments. We acquire a set of X-ray absorption spectra that are rather noisy because, as ever, the experiment didn’t bloody well work until the last day of beamtime and we had to pack our measurements into the final few hours. Our signal-to-noise ratio is poor but we decide to not only interpret a bump in a spectrum as a true peak, but to develop a sophisticated (and perhaps even compelling) theory to explain that “peak”. We publish the paper in a prestigious journal, because the theory supporting our “peak” suggests the existence of an exciting new type of quasiparticle. 

We return to the synchrotron six months or a year later, repeat the experiment over and over but find no hint of the “peak” on which we based our (now reasonably well-cited) analysis. We realise that we had over-interpreted a statistical noise blip.

Should we retract the paper?”

I am firmly of the opinion that the paper should be retracted. After all, we could not reproduce our results when we did the experiment correctly. We didn’t bend over backwards in the initial experiment to convince ourselves that our data were robust and reliable and instead rushed to publish (because we were so eager to get a paper out of the beamtime.) So now we should eat humble pie for jumping the gun — the paper should be retracted and the scientific record should be corrected accordingly.

Mike, and others, were of a different opinion, however. They argued that the flawed paper should remain in the scientific literature, sometimes for the reasons to which Mike alludes in his tweet above [1].  In my conversations with particle physicists re. the 750 GeV anomaly, which arose from a similarly over-enthusiastically interpreted bump in a spectrum that turned out to be noise, there was a similarly strong inertia to correct the scientific record. There appeared to be a feeling that only if the data were fabricated or fraudulent should the paper be retracted.

During the e-mail exchanges with my particle physics colleagues, I was struck on more than one occasion by a disturbing disconnect between theory and experiment. (This is hardly the most original take on the particle physics field, I know. I’ll take a moment to plug Sabine Hossenfelder’s Lost In Math once again.) There was an unsettling (for me) feeling among some that it didn’t matter if experimental noise had been misinterpreted, as long as the paper led to some new theoretical insights. This, I’ll stress, was not an opinion universally held — some of my colleagues said they didn’t go anywhere near the 750 GeV excess because of the lack of strong experimental evidence. Others, however, were more than willing to enthusiastically over-interpret the 750 GeV “bump” and, unsurprisingly, baulked at the suggestion that their papers should be retracted or censured in any way. If their sloppy, credulous approach to accepting noise in lieu of experimental data had advanced the field, then what’s wrong with that? After all, we need intrepid pioneers who will cross the Pillars of Hercules

I’m a dyed-in-the-wool experimentalist; science should be driven by a strong and consistent feedback loop between experiment and theory. If a scientist mistakes experimental noise (or well-understood experimental artefacts) for valid data, or if they get fundamental physics wrong a la Zherkova et al, then there should be — must be — some censure for this. After all, we’d censure our undergrad students under similar circumstances, wouldn’t we? One student carries out an experiment for her final year project carefully and systematically, repeating measurements, bringing her signal-to-noise ratio down, putting in the hours to carefully refine and redefine the experimental protocols and procedures, refusing to make claims that are not entirely supported by the data. Another student instead gets over-excited when he sees a “signal” that chimes with his expectations, and instead of doing his utmost to make sure he’s not fooling himself, leaps to a new and exciting interpretation of the noisy data. Which student should receive the higher grade? Which student is the better scientist?

As that grand empiricist Francis Bacon put it centuries ago,

The understanding must not therefore be supplied with wings, but rather hung with weights, to keep it from leaping and flying.

It’s up to not just individual scientists but the scientific community as a whole to hang our collective understanding with weights. Sloppy science is not just someone else’s problem. It’s everyone’s problem.

[1] Mike’s suggestion in his tweet that the journal would like to retract the paper to spare their blushes doesn’t chime with our experience of journals’ reactions during the stripy saga. Retraction is the last thing they want because it impacts their brand.

 

20,000 Leagues under the THE

This monstrous tome arrived yesterday morning…

THE-rankings.png

I subscribe to the Times Higher Education and generally look forward to the analogue version of the magazine arriving each week. Yesterday, however, it landed with a fulsome house-rattling thud as it hit the floor, prompting Daisy, the eight year old miniature dachshund whose duty it is to ward off all visitors (friend, foe, or pizza), to attempt to shred both the magazine and the 170 page glossy World University Ranking ‘supplement’ pictured above that accompanied it.

I should have smeared the latter with a generous helping of Cesar dog food [1] and have her at it.

Yes, it’s yet another rant about league tables, I’m afraid. I’ve never been one to hold back on the piss and vinegar when it comes to bemoaning the pseudostatistics underpinning education league tables (be they primary school OFSTED placements or the leaderboards for august higher education institutions). I’m lucky to be in very good company. Peter Coles’ annual slamming of the THE rankings is always worth reading. (He’s on especially good form for the 2019 season.) And our very own Head of School, Mike Merrifield, has described in no uncertain terms just why university league tables are bad for you.

But this time round, and notwithstanding that WB Yeats quote I love so much [2], there’s going to be a slightly more upbeat message from yours truly. We need to give students rather more credit when it comes to seeing through the league table guff. They’re a damn sight more savvy than some imagine. Before I describe just why I have this degree of faith in the critical thinking capabilities of the next generation of undergrads, let’s take a look at a few representative (or not, as the case may be) league tables.

I’ve got one more year to go (of a five year ‘gig’) as undergraduate admissions tutor for the School of Physics & Astronomy at Nottingham. Throughout that time, I have enjoyed the healthy catharsis of regularly lambasting league tables during not only my University open day talks (in June and September) but for every week of our UCAS visit/interview days (which kick off again in mid-November).

I routinely point to tables like this, taken from the annual Graduate Market report [3]:

GraduateMarket2017-2018

Tsk. Nottingham languishing at #8. Back in 2014-2015 we were at # 2:

GraduateMarket2014-2015.png

Clearly there’s been a drop in quality to have slipped six places, right?

No. There’s nothing “clear” about that supposition at all. Universities and university departments are not football teams: it’s ludicrous to judge any institution (or department therein) on the basis of a single number.

Not convinced? Just sour grapes because Nottingham has ‘slipped’?

Well, take a slightly closer look at Table 5.8 directly above. Let’s leave the Nottingham “also-ran”s to one side, and focus on the top of the pops, Manchester. They’re an impressive #1 when it comes to employer perception…yet #28 in the Good University Guide. So which number do you prefer? Which has more credibility? Which is more robust?

Still have residual doubts? OK, let’s instead focus in on individual schools/departments rather than consider entire universities. (And don’t get me started on the university-wide Teaching Excellence Framework (TEF)’s gold, silver, and bronze medals…) Here’s where Nottingham stands in The Times’ Physics and Astronomy league table:

TimesTop10.png

Yay! Go Nottingham! In at #5 with a bullet. Up a whopping thirteen places compared to last year. (Incidentally, our undergraduate applications were also up by over 20%. This correlation between league table placement and application numbers may not be entirely coincidental…)

Wow. We must really have worked hard in the intervening year. Or perhaps we brought in “star world-class players” on the academic transfer market to “up our game”?

Nope.

So what was radically different about our teaching and/or research compared to the previous year that led to this climb into the Top Ten?

Nothing. Zilch. Nada.

Feck all.

Indulge me with one last example.  Here’s the most recent (2014) Research Excellence Framework ranking for physics…

REF2014.png

Nottingham is the only school/department to remain in the Top 5 over two rounds of this national research assessment exercise. (Last time round (in 2008) we were joint second with Bath and Cambridge). Again, Yay Nottingham!, right? Or does it perhaps speak rather more to a certain volatility in the league table placements because any peer review process like the REF is very far from being entirely objective?

Both Peter Coles and Mike Merrifield (among many others) have pointed out key reasons underpinning league table volatility. I’m not about to rehearse those arguments here. Instead, I’ll highlight a couple of rather encouraging Reddit threads I’ve read recently — and that’s not something I tend to write too often — related, at least partially, to Nottingham’s open days. The first of these Mike has very helpfully highlighted via Twitter:

 

There is indeed a lot to be said for brutal honesty and I am delighted that the pseudostats of league table placements are being questioned by open day audiences.

The responses to this rather snobbishly overwrought comment elsewhere on Reddit also made my heart sing:

Reddit.png

You can read the responses at the thread itself but I especially liked this, from ‘Matthew3_14’:

Reddit_response.png

I’d quibble with the “outside of the top 5ish” proviso (as you might expect), but otherwise “Matthew3_14” echoes exactly what I’ll be telling visiting applicants for our courses in the coming months…

If you like Nottingham, the rankings are irrelevant.

If you don’t like Nottingham, the rankings are still irrelevant.

Go to the place where you feel best.


[1] …for small, yappy-type dogs.

[2] “Being Irish, he had an abiding sense of tragedy that sustained him through temporary periods of joy.”

[3] Yes, it’s irritating that we now unblinkingly refer to students as a market. That’s a whole other blog post or five.