Science Proves Nothing

Here’s the first, provocatively titled, lecture for this year’s “Politics, Perception, and Philosophy of Physics” module. This year, I plan to upload video here for each F34PPP session on a weekly schedule (although the best laid plans aft gang agley…)

Erratum: Around about the 43 minute mark I say “Polish group” when I mean “Czech group”. (Apologies to Pavel Jelinek et al.)

From the peer-reviewed pages of Springer Nature…a theory more bonkers than a conference of Flat Earthers.

“Wow. Just wow. What the f**k?!?!”

That was the opening line of my e-mail reply to Ivan Oransky, MD, and co-founder of Retraction Watch, when I’d picked myself up off the floor after reading the paper he sent me earlier this week. Ivan wanted my reaction to…deep breath…”Development of a safe antiparasitic against scuticociliates (Miamiensis avidus) in olive flounders: new approach to reduce the toxicity of mebendazole by material remediation technology using full-overlapped gravitational field energy”, Parasitology Research (2019). 

That paper has now been retracted for reasons that will become very clear, very soon.

The more puzzling question is how the hell it got accepted in the first place.

Scroll down to page 5 of the paper (linked above) and find the section headed “Production of material remediated MBZ using full-overlapped gravitational field energy“. Actually, I’ll save you the bother. The section is reproduced in all its glory below. Are you sitting comfortably? Then we’ll begin…


…wait, there’s more…


And in case that didn’t make sense, there’s a helpful figure to explain everything:


I haven’t read anything quite as superbly crackpot as this since Jordan Peterson’s “Maps Of Meaning”.

As Ivan discusses over at the Retraction Watch blog, this, um, seminal example of truly innovative scientific reasoning was submitted on March 18. The editors and reviewers then took four months to consider the paper. And subsequently accepted it for publication.

Peer review. The gold standard on which all of science stands or falls.

Sloppy Science: Still Someone Else’s Problem?

“The Somebody Else’s Problem field is much simpler and more effective, and what’s more can be run for over a hundred years on a single torch battery… An SEP is something we can’t see, or don’t see, or our brain doesn’t let us see, because we think that it’s somebody else’s problem…. The brain just edits it out, it’s like a blind spot”.

Douglas Adams (1952 – 2001) Life, The Universe, and Everything

The very first blog post I wrote (back in March 2013), for the Institute of Physics’ now sadly defunct physicsfocus project, was titled “Are Flaws in Peer Review Someone Else’s Problem?” and cited the passage above from the incomparable, and sadly missed, Mr. Adams. The post described the trials and tribulations my colleagues and I were experiencing at the time in trying to critique some seriously sloppy science, on the subject of ostensibly “striped” nanoparticles, that had been published in very high profile journals by a very high profile group. Not that I suspected it at the time of writing the post, but that particular saga ended up dragging on and on, involving a litany of frustrations in our attempts to correct the scientific record.

I’ve been put in mind of the stripy saga, and that six-year-old post, for a number of reasons lately. First, the most recent stripe-related paper from the group whose work we critiqued makes absolutely no mention of the debate and controversy. It’s as if our criticism never existed; the issues we raised, and the surrounding controversy, are simply ignored by that group in their most recent work.

More importantly, however, I have been following Ken Rice‘s (and others’) heated exchange with the authors of a similarly fundamentally flawed paper very recently published in Scientific Reports [Oscillations of the baseline of solar magnetic field and solar irradiance on a millennial timescale, VV Zharkova, SJ Shepherd, SI Zharkov, and E Popova, Sci. Rep. 9 9197 (2019)]. Ken’s blog post on the matter is here, and the ever-expanding PubPeer thread (225 comments at the time of writing, and counting) is here. Michael Brown‘s take-no-prisoners take-down tweets on the matter are also worth reading…

The debate made it into the pages — sorry, pixels — of The Independent a few days ago: “Journal to investigate controversial study claiming global temperature rise is due to Earth moving closer to Sun.

Although the controversy in this case is related to physics happening on astronomically larger length scales than those at the heart of our stripy squabble, there are quite a number of parallels (and not just in terms of traffic to the PubPeer site and the tenor of the authors’ responses). Some of these are laid out in the following Tweet thread by Ken…

The Zharkova et al. paper makes fundamental errors that should never have passed through peer review. But then we all know that peer review is far from perfect. The question is what should happen to a paper that is not fradulent but still makes it to publication containing misleadingly sloppy and/or incorrect science? Should it remain in the scientific record? Or should it be retracted?

It turns out that this is a much more contested issue than it might appear at first blush. For what it’s worth, I am firmly of the opinion that a paper containing fundamental errors in the science and/or based on mistakes due to clearly definable f**k-ups/corner-cutting in experimental procedure should be retracted. End of story. It is unfair on other researchers — and, I would argue, blatantly unethical in many cases — to leave a paper in the literature that is fundamentally flawed. (Note that even retracted papers continue to accrue citations.) It is also a massive waste of taxpayers’ money to fund new research based on flawed work.

Here’s one example of what I mean, taken from personal, and embarrassing, experience. I screwed up the calibration of a tuning fork sensor used in a set of atomic force microscopy experiments. We discovered this screw-up after publication of the paper that was based on measurements with that particular sensor. Should that paper have remained in the literature? Absolutely not.

Some, however, including my friend and colleague Mike Merrifield, who is also Head of School here and with whom I enjoy the ever-so-occasional spat, have a slightly different take on the question of retractions:

Mike and I discussed the Zharkova et al. controversy both briefly at tea break and via an e-mail exchange last week, and it seems that there are distinct cultural differences between different sub-fields of physics when it comes to correcting the scientific record. I put the Gedankenexperiment described below to Mike and asked him whether we should retract the Gedankenpaper. The particular scenario outlined in the following stems from an exchange I had with Alessandro Strumia a few months back, and subsequently with a number of my particle physicist colleagues (both at Nottingham and elsewhere), re. the so-called 750 GeV anomaly at CERN…

“Mike, let’s say that some of us from the Nanoscience Group go to the Diamond Light Source to do a series of experiments. We acquire a set of X-ray absorption spectra that are rather noisy because, as ever, the experiment didn’t bloody well work until the last day of beamtime and we had to pack our measurements into the final few hours. Our signal-to-noise ratio is poor but we decide to not only interpret a bump in a spectrum as a true peak, but to develop a sophisticated (and perhaps even compelling) theory to explain that “peak”. We publish the paper in a prestigious journal, because the theory supporting our “peak” suggests the existence of an exciting new type of quasiparticle. 

We return to the synchrotron six months or a year later, repeat the experiment over and over but find no hint of the “peak” on which we based our (now reasonably well-cited) analysis. We realise that we had over-interpreted a statistical noise blip.

Should we retract the paper?”

I am firmly of the opinion that the paper should be retracted. After all, we could not reproduce our results when we did the experiment correctly. We didn’t bend over backwards in the initial experiment to convince ourselves that our data were robust and reliable and instead rushed to publish (because we were so eager to get a paper out of the beamtime.) So now we should eat humble pie for jumping the gun — the paper should be retracted and the scientific record should be corrected accordingly.

Mike, and others, were of a different opinion, however. They argued that the flawed paper should remain in the scientific literature, sometimes for the reasons to which Mike alludes in his tweet above [1].  In my conversations with particle physicists re. the 750 GeV anomaly, which arose from a similarly over-enthusiastically interpreted bump in a spectrum that turned out to be noise, there was a similarly strong inertia to correct the scientific record. There appeared to be a feeling that only if the data were fabricated or fraudulent should the paper be retracted.

During the e-mail exchanges with my particle physics colleagues, I was struck on more than one occasion by a disturbing disconnect between theory and experiment. (This is hardly the most original take on the particle physics field, I know. I’ll take a moment to plug Sabine Hossenfelder’s Lost In Math once again.) There was an unsettling (for me) feeling among some that it didn’t matter if experimental noise had been misinterpreted, as long as the paper led to some new theoretical insights. This, I’ll stress, was not an opinion universally held — some of my colleagues said they didn’t go anywhere near the 750 GeV excess because of the lack of strong experimental evidence. Others, however, were more than willing to enthusiastically over-interpret the 750 GeV “bump” and, unsurprisingly, baulked at the suggestion that their papers should be retracted or censured in any way. If their sloppy, credulous approach to accepting noise in lieu of experimental data had advanced the field, then what’s wrong with that? After all, we need intrepid pioneers who will cross the Pillars of Hercules

I’m a dyed-in-the-wool experimentalist; science should be driven by a strong and consistent feedback loop between experiment and theory. If a scientist mistakes experimental noise (or well-understood experimental artefacts) for valid data, or if they get fundamental physics wrong a la Zherkova et al, then there should be — must be — some censure for this. After all, we’d censure our undergrad students under similar circumstances, wouldn’t we? One student carries out an experiment for her final year project carefully and systematically, repeating measurements, bringing her signal-to-noise ratio down, putting in the hours to carefully refine and redefine the experimental protocols and procedures, refusing to make claims that are not entirely supported by the data. Another student instead gets over-excited when he sees a “signal” that chimes with his expectations, and instead of doing his utmost to make sure he’s not fooling himself, leaps to a new and exciting interpretation of the noisy data. Which student should receive the higher grade? Which student is the better scientist?

As that grand empiricist Francis Bacon put it centuries ago,

The understanding must not therefore be supplied with wings, but rather hung with weights, to keep it from leaping and flying.

It’s up to not just individual scientists but the scientific community as a whole to hang our collective understanding with weights. Sloppy science is not just someone else’s problem. It’s everyone’s problem.

[1] Mike’s suggestion in his tweet that the journal would like to retract the paper to spare their blushes doesn’t chime with our experience of journals’ reactions during the stripy saga. Retraction is the last thing they want because it impacts their brand.


At sixes and sevens about 3* and 4*

The post below appears in today’s Times Higher Education under the title “The REF’s star system leaves a black hole in fairness.” My original draft was improved immensely by Paul Jump‘s edits (but I am slightly miffed that my choice of title (above) was rejected by the sub-editors.) I’m posting the article here for those who don’t have a subscription to the THE. (I should note that the interview panel scenario described below actually happened. The question I asked was suggested in the interview pack supplied by the “University of True Excellence”.)

“In your field of study, Professor Aspire, just how does one distinguish a 3* from a 4* paper in the research excellence framework?”

The interviewee for a senior position at the University of True Excellence – names have been changed to protect the guilty – shuffled in his seat. I leaned slightly forward after posing the question, keen to hear his response to this perennial puzzler that has exercised some of the UK’s great and not-so-great academic minds.

He coughed. The panel – on which I was the external reviewer – waited expectantly.

“Well, a 4* paper is a 3* paper except that your mate is one of the REF panel members,” he answered.

I smiled and suppressed a giggle.

Other members of the panel were less amused. After all, the rating and ranking of academics’ outputs is serious stuff. Careers – indeed, the viability of entire departments, schools, institutes and universities – depend critically on the judgements made by peers on the REF panels.

Not only do the ratings directly influence the intangible benefits arising from the prestige of a high REF ranking, they also translate into cold, hard cash. An analysis by the University of Sheffield suggests that in my subject area, physics, the average annual value of a 3* paper for REF 2021 is likely to be roughly £4,300, whereas that of a 4* paper is £17,100. In other words, the formula for allocating “quality-related” research funding is such that a paper deemed 4* is worth four times one judged to be 3*; as for 2* (“internationally recognised”) or 1* (“nationally recognised”) papers, they are literally worthless.

We might have hoped that before divvying up more than £1 billion of public funds a year, the objectivity, reliability and robustness of the ranking process would be established beyond question. But, without wanting to cast any aspersions on the integrity of REF panels, I’ve got to admit that, from where I was sitting, Professor Aspire’s tongue-in-cheek answer regarding the difference between 3* and 4* papers seemed about as good as any – apart from, perhaps, “I don’t know”.

The solution certainly isn’t to reach for simplistic bibliometric numerology such as impact factors or SNIP indicators; anyone making that suggestion is not displaying even the level of critical thinking we expect of our undergraduates. But every academic also knows, deep in their studious soul, that peer review is far from wholly objective. Nevertheless, university senior managers – many of them practising or former academics themselves – are often all too willing, as part of their REF preparations, to credulously accept internal assessors’ star ratings at face value, with sometimes worrying consequences for the researcher in question (especially if the verdict is 2* or less).

Fortunately, my institution, the University of Nottingham, is a little more enlightened – last year it had the good sense to check the consistency of the internal verdicts on potential REF 2021 submissions via the use of independent reviewers for each paper. The results were sobering. Across seven scientific units of assessment, the level of full agreement between reviewers varied from 50 per cent to 75 per cent. In other words, in the worst cases, reviewers agreed on the star rating for no more than half of the papers they reviewed.

Granted, the vast majority of the disagreement was at the 1* level; very few pairs of reviewers were “out” by two stars, and none disagreed by more. But this is cold comfort. The REF’s credibility is based on an assumption that reviewers can quantitatively assess the quality of a paper with a precision better than one star. As our exercise shows, the effective error bar is actually ± 1*.

That would be worrying enough if there were a linear scaling of financial reward. But the problem is exacerbated dramatically by both the 4x multiplier for 4* papers and the total lack of financial reward for anything deemed to be below 3*.

The Nottingham analysis also examined the extent to which reviewers’ ratings agreed with authors’ self-scoring (let’s leave aside any disagreement between co-authors on that). The level of full agreement here was similarly patchy, varying between 47 per cent and 71 per cent. Unsurprisingly, there was an overall tendency for authors to “overscore” their papers, although underscoring was also common.

Some argue that what’s important is the aggregate REF score for a department, rather than the ratings of individual papers, because, according to the central limit theorem, any wayward ratings will “wash out” at the macro level. I disagree entirely. Individual academics across the UK continue to be coaxed and cajoled into producing 4* papers; there are even dedicated funding schemes to help them do so. And the repercussions arising from failure can be severe.

It is vital in any game of consequence that participants be able to agree when a goal has been scored or a boundary hit. Yet, in the case of research quality, there are far too many cases in which we just can’t. So the question must be asked: why are we still playing?

Blast from the past

While searching my e-mail archive for a message from years ago, I stumbled across this unpublished submission to the letters page of the Times Higher Education. More than a decade later, I’m still smarting a little that they didn’t accept it for publication…

From: Moriarty Philip
Sent: 30 November 2008 20:48
Subject: Comment on “‘Clever crazies quitting science” (THE 27 Nov)

Bruce Charlton of the University of Buckingham argues that modern scientists are boring because they are mild-mannered, agreeable, and socially inoffensive (News, 27 November).

What a dickhead.

Philip Moriarty, Condensed Matter Scientist

School of Physics & Astronomy
University of Nottingham
Nottingham NG7 2RD


How Not To Do Spectral Analysis 101

I will leave this here without further comment…


*bangs head gently on desk and sobs quietly to himself*

Source (via Sam Jarvis. Thanks, Sam.):

The original ‘peer-reviewed’ paper is this: Găluşcă et al., IOP Conf. Ser. Mater. Sci. Eng. 374 012020 (2018)



Bullshit and Beyond: From Chopra to Peterson

Harry G Frankfurt‘s On Bullshit is a modern classic. He highlights the style-over-substance tenor of the most fragrant and flagrant bullshit, arguing that

It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says
only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye
is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.

In other words, the bullshitter doesn’t care about the validity or rigour of their arguments. They are much more concerned with being persuasive. One aspect of BS that doesn’t quite get the attention it deserves in Frankfurt’s essay, however, is that special blend of obscurantism and vacuity that is the hallmark of three world-leading bullshitters of our time:  Deepak Chopra, Karen Barad (see my colleague Brigitte Nerlich’s important discussion of Barad’s wilfully impenetrable language here), and Jordan Peterson. In a talk for the University of Nottingham Agnostic, Secularist, and Humanist Society last night (see here for the blurb/advert), I focussed on the intriguing parallels between their writing and oratory. Here’s the video of the talk.

Thanks to UNASH for the invitation. I’ve not included the lengthy Q&A that followed (because I stupidly didn’t ask for permission to film audience members’ questions). I’m hoping that some discussion and debate might ensue in the comments section below. If you do dive in, try not to bullshit too much…