#4 of an occasional series…
At the start of this week I spent a day in a room in a university somewhat north of Nottingham with a stack of research papers and a pile of grading sheets. Along with a fellow physicist from a different university (located even further north of Nottingham), I had been asked to act as an external reviewer for the department’s mock REF assessment.
I found it a deeply uncomfortable experience. My discomfort had nothing to do, of course, with our wonderfully genial hosts — thank you all for the hospitality, the conversation, the professionalism, and, of course, lunch. But I’ve vented my spleen previously on the lack of consistency in mock REF ratings (it’s been the most-viewed post at Symptoms… since I resurrected the blog in June last year) and I agreed to participate in the mock assessment so I could see for myself how the process works in practice.
Overall, I’d say that the degree of agreement on “star ratings” before moderation of my co-marker’s grading and mine was at the 70% level, give or take. This is in line with the consistency we observed at Nottingham for independent reviewers in Physics and is therefore, at least, somewhat encouraging. (Other units of assessment for Nottingham’s mock REF review had only 50% agreement.) But what set my teeth on edge for a not-insignificant number of papers — including quite a few of those on which my gradings agreed with those of my co-marker — was that I simply did not feel at all qualified to comment.
Even though I’m a condensed matter physicist and we were asked to assess condensed matter physics papers, I simply don’t have the necessary level of hubris to pretend that I can expertly assess any paper in any CMP sub-field. The question that went through my head repeatedly was “If I got this paper from Physical Review Letters (or Phys. Rev. B, or Nature, or Nature Comms, or Advanced Materials, or J. Phys. Chem. C…etc…) would I accept the reviewing invitation or would I decline, telling them it was out of my field of expertise?” And for the majority of papers the answer to that question was a resounding “I’d decline the invitation.”
So if a paper I was asked to review wasn’t in my (sub-)field of expertise, how did I gauge its reception in the relevant scientific community?
I can’t quite believe I’m admitting this, given my severe misgivings about citation metrics, but, yes, I held my nose and turned to Web of Science. And citation metrics also played a role in the decisions my co-marker made, and in our moderation. This, despite the fact that we had no way of normalising those metrics to the prevailing citation culture of each sub-field, nor of ranking the quality as distinct from the impact of each paper. (One of my absolutely favourite papers of all time – a truly elegant and pioneering piece of work – has picked up a surprisingly low number of citations, as compared to much more pedestrian work in the field.)
Only when I had to face a stack of papers and grade them for myself did I realise just how exceptionally difficult it is to pass numerical judgment on a piece of work in an area that lies outside my rather small sphere of research. I was, of course, asked to comment on publications in condensed matter physics, ostensibly my area of expertise. But that’s a huge field. Not only is no-one a world-leading expert in all areas of condensed matter physics, it’s almost impossible to keep up with developments in our own narrow sub-fields of interest let alone be au fait with the state of the art in all other sub-fields.
So we therefore turn to citations to try to gauge the extent to which a paper has made ripples — or perhaps even sent shockwaves – through a sub-field in which we have no expertise. My co-marker and I are hardly alone in adopting this citation-counting strategy. But that’s of course no excuse — we were relying on exactly the type of pseudoquantitative heuristic that I have criticised in the past and I felt rather “grubby” at the end of the (rather tiring) day. David Colquhoun made the following point time and again in the run up to the last REF (and well before):
All this shows what is obvious to everyone but bone-headed bean counters. The only way to assess the merit of a paper is to ask a selection of experts in the field.
Nothing else works.
Bibliometrics are a measure of visibility and “clout” in a particular (yet often nebulously defined) research community; they’re not a quantification of scientific quality. Therefore, very many scientists, and this most definitely includes me, have deep misgivings about using citations to judge a paper’s — let alone a scientist’s — worth.
Although I agree with that quote from David above, the problem is that we need to somehow choose the correct “boundary conditions” for each expert; I can have a reasonable level of expertise in one sub-area of a field — say, scanning probe microscopy or self-assembly or semiconductor surface physics — and a distinct lack of working knowledge, let alone expertise, in another sub-area of that self-same field. I could list literally hundreds of topics where I would, in fact, be winging it.
For many years, and because of my deep aversion to simplistic citation-counting and bibliometrics, I’ve been guilty of the type of not-particularly-joined-up thinking that Dorothy Bishop rightly chastises in this tweet…
We can’t trust the bibliometrics in isolation (for all the reasons (and others) that David Colquhoun lays out here), so when it comes to the REF the argument is that we have to supplement the metrics with “quality control” via another round of ostensibly expert peer review. But the problem is that it’s often not expert peer review; I was certainly not an expert in the subject areas of very many of the papers I was asked to judge. And I’ll hold that no-one can be a world-leading expert in every sub-field of a given area of physics (or any other discipline).
So what are the alternatives?
David has suggested that we should, in essence, retire what’s known as the “dual support” system for research funding (see the video embedded below): “…abolish the REF, and give the money to research councils, with precautions to prevent people being fired because their research wasn’t expensive enough.” I have quite some sympathy with that view because the common argument that the so-called QR funding awarded via the REF is used to support “unpopular” areas of research that wouldn’t necessarily be supported by the research councils is not at all compelling (to put it mildly). Universities demonstrably align their funding priorities and programmes very closely with research council strategic areas; they don’t hand out QR money for research that doesn’t fall within their latest Universal Targetified Globalised Research Themes.
Prof. Bishop has a different suggestion for revamping how QR funding is divvied up, which initially (and naively, for the reasons outlined above) I found a little unsettling. My first-hand experience earlier this week with the publication grading methodology used by the REF — albeit in a mock assessment — has made me significantly more comfortable with Dorothy’s strategy:
.”..dispense with the review of quality, and you can obtain similar outcomes by allocating funding at institutional level in relation to research volume”.
Given that grant income is often taken as yet another proxy for research quality, and that there’s a clear Matthew effect (rightly or wrongly) at play in science funding, this correlation between research volume and REF placement is not surprising. As the Times Higher Education article on Dorothy’s proposals went on to quote,
The government should, therefore, consider allocating block funding in proportion to the number of research-active staff at a university because that would shrink the burden on universities and reduce perverse incentives in the system, [Prof Bishop] said.
Before reacting strongly one way or another, I strongly recommend that you take the time to listen to Prof. Bishop eloquently detail her arguments in the video below.
Here’s the final slide of that presentation:
So much rests on that final point. Ultimately, the immense time and effort devoted to/wasted on the REF boils down to a lack of trust — by government, funding bodies, and, depressingly, often university senior management — that academics cannot motivate themselves without perverse incentives like aiming for a 4* paper. That would be bad enough if we all could agree on what a 4* paper looks like…