Not everything that counts can be counted

8539517000_9a44748db4_n

First published at physicsfocus.

My first post for physicsfocus described a number of frustrating deficiencies in the peer review system, focusing in particular on how we can ensure, via post-publication peer review, that science does not lose its ability to self-correct. I continue to rant about discuss and dissect the issue of post-publication peer review in an article in this week’s Times Higher Education, “Spuriouser and Spuriouser”. Here, however, I want to address some of the comments left under that first physicsfocus post by a Senior Editor at Nature Materials, Pep Pamies (Curious Scientist in the comments thread). I was really pleased that a journal editor contributed to the debate but, as you might be less than surprised to hear, I disagree fundamentally with Pep’s argument that impact factors are a useful metric. As I see it, they’re not even a necessary evil.

I’m certainly not alone in thinking this. In an eloquent cri de coeur posted at his blog, Reciprocal Space, last summer, Stephen Curry bluntly stated, “I am sick of impact factors. And so is science”. I won’t rehearse Stephen’s arguments – I strongly recommend that you visit his blog and read the post for yourself, along with the close to two-hundred comments that it attracted – but it’s clear from the Twitter and blog storm his post generated that he had tapped into a deep well of frustration among academics. (Peter Coles’ related post, The Impact X-Factor, is also very well worth a read.)

I agree with Stephen on almost everything in his post. I think that many scientists will chuckle knowingly at the description of the application of impact factors as “statistically illiterate” and I particularly liked the idea of starting a ‘smear campaign’ to discredit the entire concept. But he argues that the way forward is:

“…to find ways to attach to each piece of work the value that the scientific community places on it though use and citation. The rate of accrual of citations remains rather sluggish, even in today’s wired world, so attempts are being made to capture the internet buzz that greets each new publication; there are interesting innovations in this regard from the likes of PLOS, Mendeley and altmetrics.org.”

As is clear from the THE article, embedding Web 2.0/Web 3.0/Web n.0 feedback and debate in the peer review process is something I fully endorse and, indeed, I think that we should grasp the nettle and attempt to formalise the links between online commentary and the primary scientific literature as soon as possible. But are citations – be they through the primary literature or via an internet ‘buzz’ – really a proxy for scientific quality and the overall value of the work?

I think that we do science a great disservice if we argue that the value of a paper depends only on how often other scientists refer to it, or cite it in their work. Let me offer an example from my own field of research, condensed matter physics – aka nanoscience when I’m applying for funding – to highlight the problem.

Banging a quantum drum

Perhaps my favourite paper of the last decade or so is “Quantum Phase Extraction in Isospectral Electronic Nanostructures” by Hari Manoharan and his co-workers at Stanford. The less than punchy title doesn’t quite capture the elegance, beauty, and sheer brilliance of the work. Manoharan’s group exploited the answer to a question posed by the mathematician Mark Kac close to fifty years ago: Can one hear the shape of a drum? Or, if we ask the question in rather more concrete mathematical physics terms, “Does the spectrum of eigenfrequencies of a resonator uniquely determine its geometry?”

For a one dimensional system the equivalent question is not too difficult and can readily be answered by guitarists and A-level physics students: yes, one can ‘hear’ the shape, i.e. the length, of a vibrating string. But for a two dimensional system like a drum head, the answer is far from obvious. It took until 1992 before Kac’s question was finally answered by Carolyn Gordon, David Webb, and Scott Wolpert. They discovered that it was possible to have 2D isospectral domains, i.e. 2D shapes (or “drum heads”) with the same “sound”. So, no, it’s not possible to hear the shape of a drum.

What’s this got to do with nanoscience? Well, the first elegant aspect of the paper by the Stanford group is that they constructed two-dimensional isospectral domains out of carbon monoxide molecules on a copper surface (using the tip of a scanning tunnelling microscope). In other words, they built differently shaped nanoscopic ‘drum heads’, one molecule at a time. They then “listened” to the eigenspectra of these quantum drums  by measuring the resonances of the electrons confined within the molecular drum head and transposing the spectrum to audible frequencies.

So far, so impressive

But it gets better. A lot better.

The Stanford team then went on to exploit the isospectral characteristics of the differently shaped quantum drum heads to extract the quantum mechanical phase of the electronic wavefunction confined within. I could wax lyrical about this particular aspect of the work for quite some time – remember that the phase of a wavefunction is not an observable in quantum mechanics! – but I encourage you to read the paper itself. (It’s available via this link, but you, or your institution will need a subscription to Science.)

I’ll say it again – this is elegant, beautiful, and brilliant work. For me, at least, it has a visceral quality, just like a piece of great music, literature, or art; it’s inspiring and affecting.

…and it’s picked up a grand total of 29 citations since its publication in 2008.

In the same year, and along with colleagues in Nottingham and Loughborough, I co-authored a paper published in Physical Review Letters on pattern formation in nanoparticle assemblies. To date, that paper has accrued 47 citations. While I am very proud of the work, I am confident that my co-authors would agree with me when I say that it doesn’t begin to compare to the quality of the quantum drum research. Our paper lacks the elegance and scientific “wow” factor of the Stanford team’s publication; it lacks the intellectual excitement of coupling a fundamental problem (and solution) in pure mathematics with state-of-the-art nanoscience; and it lacks the sophistication of the combined experimental and theoretical methodology.

But yet our paper has accrued more citations.

You might argue that I have cherry-picked a particular example to make my case. I really wish that were so but I can point to many, many other exciting scientific papers in a variety of journals which have attracted a relative dearth of citations.

Einstein is credited, probably apocryphally, with the statement “Not everything that counts can be counted, and not everything that can be counted counts”. Just as multi-platinum album sales and Number 1 hits are not a reliable indicator of artistic value (note that One Direction has apparently now out sold The Beatles), citations and associated bibliometrics are not a robust measure of scientific quality.

Image credit: https://www.flickr.com/photos/bigleaftropicals/8539517000 

Are flaws in peer review someone else’s problem?

2017-05-08-09-12-32-900x600

That stack of fellowship applications piled up on the coffee table isn’t going to review itself. You’ve got twenty-five to read before the rapidly approaching deadline, and you knew before you accepted the reviewing job that many of the proposals would fall outside your area of expertise. Sigh. Time to grab a coffee and get on with it.

As a professor of physics with some thirty-five years’ experience in condensed matter research, you’re fairly confident that you can make insightful and perceptive comments on that application about manipulating electron spin in nanostructures (from that talented postdoc you met at a conference last year). But what about the proposal on membrane proteins? Or, worse, the treatment of arcane aspects of string theory by the mathematician claiming a radical new approach to supersymmetry? Can you really comment on those applications with any type of authority?

Of course, thanks to Thomson Reuters there’s no need for you to be too concerned about your lack of expertise in those fields. You log on to Web of Knowledge and check the publication records. Hmmm. The membrane protein work has made quite an impact – the applicant’s Science paper from a couple of years back has already picked up a few hundred citations and her h-index is rising rapidly. She looks to be a real ‘star’ in her community. The string theorist is also blazing a trail.

Shame about the guy doing the electron spin stuff. You’d been very excited about that work when you attended his excellent talk at the conference in the U.S. but it’s picked up hardly any citations at all. Can you really rank it alongside the membrane protein proposal? After all, how could you justify that decision on any sort of objective basis to the other members of the interdisciplinary panel…?

Bibliometrics are the bane of academics’ lives. We regularly moan about the rate at which metrics such as the journal impact factor and the notorious h-index are increasing their stranglehold on the assessment of research. And, yet, as the hypothetical example above shows, we can be our own worst enemy in reaching for citation statistics to assess work outside – or even firmly inside – our  ‘comfort zone’ of expertise.

David Colquhoun, a world-leading pharmacologist at University College London and a blogger of quite some repute, has repeatedly pointed out the dangers of lazily relying on citation analyses to assess research and researchers. One article in particular, How to get good science, is a searingly honest account of the correlation (or lack thereof) between citations and the relative importance of a number of his, and others’, papers. It should be required reading for all those involved in research assessment at universities, research councils, funding bodies, and government departments – particularly those who are of the opinion that bibliometrics represent an appropriate method of ranking the ‘outputs’ of scientists.

Colquhoun, in refreshingly ‘robust’ language, puts it as follows:

“All this shows what is obvious to everyone but bone-headed bean counters. The only way to assess the merit of a paper is to ask a selection of experts in the field.

“Nothing else works.

“Nothing.”

An ongoing controversy in my area of research, nanoscience, has thrown Colquhoun’s statement into sharp relief. The controversial work in question represents a particularly compelling example of the fallacy of citation statistics as a measure of research quality. It has also provided worrying insights into scientific publishing, and has severely damaged my confidence in the peer review system.

The minutiae of the case in question are covered in great detail at Raphael Levy’s blog so I won’t rehash the detailed arguments here. In a nutshell, the problem is as follows. The authors of a series of papers in the highest profile journals in science – including Science and the Nature Publishing Group family – have claimed that stripes form on the surfaces of nanoparticles due to phase separation of different ligand types. The only direct evidence for the formation of those stripes comes from scanning probe microscopy (SPM) data. (SPM forms the bedrock of our research in the Nanoscience group at the University of Nottingham, hence my keen interest in this particular story.)

But those SPM data display features which appear remarkably similar to well known instrumental artifacts, and the associated data analyses appear less than rigorous at best. In my experience the work would be poorly graded even as an undergraduate project report, yet it’s been published in what are generally considered to be the most important journals in science. (And let’s be clear – those journals indeed have an impressive track record of publishing exciting and pioneering breakthroughs in science.)

So what? Isn’t this just a storm in a teacup about some arcane aspect of nanoscience? Why should we care? Won’t the problem be rooted out when others fail to reproduce the work? After all, isn’t science self-correcting in the end?

Good points. Bear with me – I’ll consider those questions in a second. Take a moment, however, to return to the academic sitting at home with that pile of proposals to review. Let’s say that she had a fellowship application related to the striped nanoparticle work to rank amongst the others. A cursory glance at the citation statistics at Web of Knowledge would indicate that this work has had a major impact over a very short period. Ipso facto, it must be of high quality.

And yet, if an expert – or, in this particular case, even a relative SPM novice – were to take a couple of minutes to read one of the ‘stripy nanoparticle’ papers, they’d be far from convinced by the conclusions reached by the authors. What was it that Colquhoun said again? “The only way to assess the merit of a paper is to ask a selection of experts in the field. Nothing else works. Nothing.”

In principle, science is indeed self-correcting. But if there are flaws in published work who fixes them? Perhaps the most troublesome aspect of the striped nanoparticle controversy was highlighted by a comment left by Mathias Brust, a pioneer in the field of nanoparticle research, under an article in the Times Higher Education:

I have [talked to senior experts about this controversy] … and let me tell you what they have told me. About 80% of senior gold nanoparticle scientists don’t give much of a damn about the stripes and find it unwise that Levy engages in such a potentially career damaging dispute. About 10% think that … fellow scientists should be friendlier to each other. After all, you never know [who] referees your next paper. About 5% welcome this dispute, needless to say predominantly those who feel critical about the stripes. This now includes me. I was initially with the first 80% and did advise Raphael accordingly.”

[Disclaimer: I know Mathias Brust very well and have collaborated, and co-authored papers, with him in the past].

I am well aware that the plural of anecdote is not data but Brust’s comment resonates strongly with me. I have heard very similar arguments at times from colleagues in physics. The most troubling of all is the idea that critiquing published work is somehow at best unseemly, and, at worst, career-damaging.  Has science really come to this?

Douglas Adams, in an inspired passage in Life, The Universe, and Everything, takes the psychological concept known as “someone else’s problem (SEP)” and uses it as the basis of an invisibility ‘cloak’ in the form of an SEP-field. (Thanks to Dave Fernig, a fellow fan of Douglas Adams, for reminding me about the Someone Else’s Problem field.) As Adams puts it, instead of attempting the mind-bogglingly complex task of actually making something invisible, an SEP is much easier to implement. “An SEP is something we can’t see, or don’t see, or our brain doesn’t let us see, because we think that it’s somebody else’s problem…. The brain just edits it out, it’s like a blind spot”.

The 80% of researchers to which Brust refers are apparently of the opinion that flaws in the literature are someone else’s problem. We have enough to be getting on with in terms of our own original research, without repeating measurements that have already been published in the highest quality journals, right?

Wrong. This is not someone else’s problem. This is our problem and we need to address it.

Image: Paper pile. Credit: https://pixnio.com/it/oggetti/libri/libri-documento-educazione-informazioni-conoscenza-leggere-ricerca-scuola-stack-studio-lavoro