Chapter 4: From the Research Assessment Exercise to the Research Excellence Framework: changing assessment models in the United Kingdom? – Prometheus Assessed?

4

From the Research Assessment Exercise to the Research Excellence Framework: changing assessment models in the United Kingdom?

Abstract:

We investigate the development and evolution of the panel-based British Research Assessment Exercise and its successor model, the Research Excellence Framework. We note the benefits and costs of both models. We raise questions regarding the utility of impact measures in the proposed REF.

Key words

Research Assessment Exercise

Research Excellence Framework

panel review

United Kingdom

impact

publication and teaching

If the panel-based research assessment model, the British Research Assessment Exercise (RAE), is sometimes seen as the exemplar for research assessment, influencing models across the world, there is some irony in its abandonment by the United Kingdom. Throughout the 20 years from its first version in 1986 there was constant tinkering with the process, but generally the RAE consisted of:

1. Panel assessment of the quality of two then four submitted research outputs per person, with panels based on disciplinary grounds.

2. Other measures, including peer esteem and research environment in later versions were also used, with the use, form and weighting given to them changing over time, and depending on discipline assessed. They were generally considerably less important than quality of the outputs.

3. Assessment by unit, in contrast to the New Zealand Performance Based Research Fund (PBRF) which was based on the assessment of individuals.

4. Units were able to be selective in whom they submitted for assessment in later assessments, as opposed to the New Zealand PBRF which required the assessment of all eligible members of staff.

In general the RAE is deemed to be more-or-less a success, in many, but not all, accounts – associated with an increase in both quantity and ‘quality’ of research – and with publicly funded research money directed more selectively to reward performance. It was however unpopular among a number of segments of academia, particularly, but not only, losers from the scheme. It generated considerable administrative burdens, with the 2008 RAE involving thousands of people, over several years, and considerable expense, as we will examine. It has created a considerable self-sustaining bureaucracy and clientele, whose existence has continued into the successor Research Excellence Framework (REF). Its effects, good and bad, on quality and quantity of research, on types of research, on collegiality, and on teaching, have been continually questioned.

Despite its putative success, the RAE’s demise was announced in 2006, with a view to replacing it with a bibliometric-based system. This idea has itself been largely jettisoned. As 2011/2012 progressed, the replacement REF was being rolled out. It is not such a departure from the RAE as first thought and perhaps better seen as a modification. It proposes quality evaluation of submitted research output by panels (65 per cent), but also a focus on measures of ‘research environment’ (15 per cent) and research ‘impact’ (20 per cent). It is the latter that is probably the most significant departure from the RAE, and which has generated considerable debate, often approaching derision.

The ongoing debate and historical development of the various forms of research assessment in the United Kingdom is useful in crystallising key issues, such as effects on research output – both in terms of quantity and quality – the actual purpose of research and research assessment, panel versus bibliometric types of research, and the intended ends of research. This chapter will examine the evolution of the RAE and its supposed benefits and costs, before investigating the successor model.

Research and science in the United Kingdom

Britain has a long and distinguished record of scientific and research achievement that is matched by few other countries. It has been consistently ranked second only to the United States, and sometimes first, across a wide range of disciplines on a variety of bibliometric and other measures (DBIS, 2009b). It is second to the United States on the number of Nobel prizes. In 2009, for example, the United Kingdom produced 7.9 per cent of world papers in the Thomson (formerly ISI) databases which gives it a 2nd rank across several disciplines and 3rd in biological sciences, 5th in maths and 6th in physical citations. The United Kingdom gained 11.8 per cent of world citations (second to the US), is 5th on citation per paper, 3rd on citations per researcher and 4th on papers per researcher, and 3rd on highly cited papers. It is first in the G8 on papers per $billion GDP and 3rd overall in the comparator group, while it is ranked 3rd on papers per research spending and first in the G8 on this measure. It is the fourth largest producer of PhDs (DBIS, 2009b). While the United Kingdom’s relative share of world papers has fallen from 9.3 per cent in 1999 in 7.9 per cent (still second in many disciplines, as noted), the absolute number of papers has increased and on some citation measures it has improved its absolute and relative positions (DBIS, 2009b).

It is unlikely that the RAE was introduced in 1986, and maintained since, as a response to poor quality or quantity of research; nor has its introduction seemed to have harmed research. Rather it can be seen as an attempt to focus and redirect research funding in an age of constraint and in the face of the increasing ‘massification’ of university education, and provide the accountability and performance measurement that were a fashion of the NPM era, and continue to be so. In later manifestations it was explicitly linked to quality issues and claimed to have increased the quality and quantity of research; albeit with considerable debate on its merits or otherwise.

The United Kingdom university system

The United Kingdom’s top universities are among the best known in the world, with the Times Higher Education world rankings giving it 4 out of the top 10 universities (4 out of the top 6), and 15 out of the top 100 in 2009. The United Kingdom system is largely a twentieth century creation. While four universities existed in Scotland by the end of the eighteenth century, even by the end of the nineteenth century the English universities of Oxford and Cambridge, established in medieval times, had only been supplemented by the University of London (established 1836) and Durham (established 1832). Student numbers remained tiny. The twentieth century, and particularly the post-war era, saw rapid expansion in both student numbers and organisations, so that by 1977 there were 52 separate university organisations.

Much of the higher education system was bifurcated into a university system and the technical institutions, colleges of education and, after 1965, a particular focus on the creation of polytechnics offering more vocationally orientated forms of education (Watford, 1987). The University Grants Committee (UGC), established in 1919, acted as a buffer organisation between the state and the university system, directing government funding towards organisations on a five year basis. Research funding prior to 1992 was directed through the UGC and (from 1988) the University Funding Council, and was included as part of university fees at a 40 per cent premium per student. Polytechnics and similar organisations received only token research funding and funding directed at particular targets. Funding was largely based on the principle of equity where universities received funding based on student numbers (Willmott, 1995).

This period of lightly regulated and well-funded universities, and locally and centrally controlled polytechnics, began to unravel in the 1980s (Tapper, 2007). A period of retrenchment followed in the 1970s and 1980s, particularly with large and selective cuts to universities under the Conservative Government after 1981. The Jarret Report of 1985 recommended the conversion of Vice Chancellors to Chief Executives, and the use of performance indicators to assess universities. This was reflected to some extent in the development of the RAE (Willmott, 1995). The UGC was wound up in 1988 and replaced with the University Funding Council, and the Education Act 1988 greatly centralised state control of universities. The polytechnics were incorporated as independent bodies in the same year with their own funding body, giving them a greater degree of autonomy. In 1990 free university education and means tested maintenance grants were abandoned and interest-free loans to students were introduced to cover half of living expenses, the rest topped up by private income or grants.

In 1992 the Further and Higher Education Act granted degree awarding powers to polytechnics and colleges meeting suitable criteria. Polytechnics were granted the right to use the term university (Watson and Taylor, 1998). This, unsurprisingly, saw a considerable growth in the number of universities. However, the distinction is still often made between ‘old’ and ‘new’ (post-1992) universities, some of which have continued their teaching and applied research focus, and have had mixed success in the RAE. The Dearing Report of 1997 noted the problem of underfunding of universities and insufficient spending on infrastructure – a finding echoed by Parliamentary and other reports (National Committee of Inquiry into Higher Education, 1997). This was followed by dedicated funds for infrastructure.

Student numbers increased greatly with the participation rate increasing from 14 per cent in 1990 to a third by the end of the decade, although this saw a marked drop in funding per student between 1978/9 and 1995/6 (Sutherland, 2008). Participation rates were at 34 per cent in 2009, with an express wish by government for them to reach 50 per cent (DBIS, 2009a).

Public research funding was delivered through a dual mechanism of a variety of competitive funds, and through the Higher Education Funding Councils, which have been jointly responsible for the RAE, as below. In 2001 around £1 billion was allocated by the RAE. In 2009, for example, funding for higher education was £4,782 million for teaching, £1,572 million for research, £134 million for business and community engagement, and £1,154 million in capital funding.

The Research Assessment Exercise

The RAE began in its first form in 1986, and has undergone several major revisions until its last manifestation in 2008. At its heart was the panel assessment of the research quality of units (not individuals as in the New Zealand version) which was based on the production of two then four research outputs for each individual sent to the panel for assessment, and the allocation of funds based on these assessments, albeit with the added complexity of other qualitative and quantitative measures. Grading systems changed over its life. Universities could choose those they sent for assessment. Funding was allocated to the organisation as a whole, to spend as it saw fit, although in some cases quality scores have been used in the reallocation of internal funding as would be expected.

The definition of what constituted research changes subtly over the period, and a distinction was drawn between scholarship and research. In 2001 research

for the purpose of the RAE is to be understood as original investigation undertaken in order to gain knowledge and understanding. It includes work of direct relevance to the needs of commerce and industry, as well as to the public and voluntary sectors; scholarship; the invention and generation of ideas, images, performances and artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction.

It excludes routine testing and analysis of materials, components and processes, e.g. for the maintenance of national standards, as distinct from the development of new analytical techniques. It also excludes the development of teaching materials that do not embody original research.

Scholarship for the RAE is defined as the creation, development and maintenance of the intellectual infrastructure of subjects and disciplines, in forms such as dictionaries, scholarly editions, catalogues and contributions to major research databases (RAE2001, 2002: 1.12).1

Evolution of the RAE

While the panel-based assessment of research was a constant through the various manifestations of the RAE, there were often significant changes on ranking methods and scores and submission requirements and other aspects of the process, which make writing a clear account difficult. For a large part of its life RAE assessment was based on the assessment of submissions with a nominated number of ‘research outputs’ per each individual, plus a variety of the other measures. Generally RAE requirements became more prescriptive and complex over time, but the process also increased in transparency. Table 4.1 summarises this evolution, and includes its successor the REF.

Table 4.1

Evolution of the RAE and the REF

Notes

1.Refer to Table 4.2 for details.

2.All details on REF subject to change and are provisional only.

The first Research Selectivity Exercise was introduced by what was then the UGC. It bears only some resemblance to the more elaborate Research Assessment Exercises that followed. Conducted in a period of fiscal restraint and in a climate favouring a move towards the use of performance indicators in assessing university performance, it was seen by some as largely a measure to reduce funds – or at least direct them in a more targeted and/or efficient manner. The actual exercise conducted by the UGC was unclear in both its objectives and its methods (described as by one critic as ‘rough and ready’), and unclear in the way it was used to recalculate funding based on the UGC grant (Phillimore, 1989: 260; Willmott, 1995). ‘Cost centres’ (not necessarily disciplines) were evaluated by an anonymous group of UGC ‘experts’ based on the following, albeit in a way that was never specified:

 a two page description of the research achievements and a list of five of the best publications from the previous five year period, submitted by each cost centre;

 numbers of research grants, studentships and ‘new blood’ lectureships;

 income from industry and other external sources;

 fellowships, prizes and other honours awarded to faculty;

 peer-review judgements of research performance (Phillimore, 1989: 260).

Critics noted a lack of appeal mechanisms; lack of clarity on judgements and lack of criteria for them; inconsistencies across disciplinary areas; anonymity and lack of accountability of the assessors; a focus on arbitrary cost centres that did not necessarily reflect disciplines; and data inadequacies, among other things (Bence and Oppenheim, 2005; Phillimore, 1989). Subsequent surveys found the vast majority of academics opposed the scheme (Bence and Oppenheim, 2005; Phillimore, 1989).

The second RAE, now with that name, was clearer in its objectives. It took on a form closer to what became the more-or-less standard RAE, but still differed in significant respects from later models. Carried out in 1989, it was now explicitly about redistributing funding towards ‘work of special strength and purpose’ and using selective funding to ‘maintain the quality’ of university research (University Funding Council, 1989: 2–3; Tapper, 2007). However it differed from later RAEs in that it was simpler and vaguer on assessment procedures, in that only two research outputs were supplied per person (four in later assessments). No guidance was given on ‘quality’, although a five point scale as supplied. Submission included the two nominated outputs and a numeral total of publications over the previous four years, numbers of full time equivalent (FTE) undergraduate and graduate students, research studentships and successful doctoral thesis submissions, research grants and research contracts, plus a report on ‘general observations’. All these factors were to be taken into account in quality assessments, although it was not specified how. Meetings were held over three months from April to July 1989, with results and panel membership announced in August, compared to the years of later exercises. Reports noted the lack of clarity over what research outputs should be counted, and found evidence of inaccuracies and possible deliberate misreporting on submissions (Universities Funding Council, 1989).

The 1992 RAE saw the inclusion of the ‘new’ universities and the merging of the Polytechnics and Colleges and University Grants Committee into the Universities Funding Council. Universities were also given the option of choosing which researchers it could submit for assessment in the 1992 assessment, which later led to accusations of game playing, as less able or younger researchers were not submitted for assessment. However, the funding formula is construed so that there were financial incentives to include higher levels of research-active staff, even if the overall quality rating might be relatively lower as a result. However, in many cases a small proportion of actual staff might be submitted for assessment, which can, according to some critics, lead to assessments that do not reflect the average quality of units.

From the 1996 RAE, a publication count was not supplied and panels were required to make judgements only on the normally four outputs listed per staff member being assessed, although mitigating factors could be taken into account. Outputs were normally publications, particularly journal articles (which constituted 75 per cent of submissions in the 2008 RAE), but also authored books (7 per cent), chapters (9 per cent) and so on, with the percentage of chapters and books versus articles varying between discipline. Some outputs were despatched in later assessments by electronic means, as the logistics of storing and supplying outputs in hard copy form was substantial.

Changing panels

Panel make-up changed over time. The 1989 exercise had 70 subject areas, with 152 subject units of assessment – 300 panel members and 100 anonymous external advisors. The 1996 exercise had 60 panels covering 69 areas of assessment. The 2001 RAE saw 68 subject areas or units of assessment serviced by panels, with five overarching umbrella panels, 60 panel subject panels, 26 subpanels in some areas, 464 ‘specialist advisers’ with the ability to cross-reference in the case of interdisciplinary work. In 2008 there was an explicitly two-tiered systems with 15 main panels, and 67 subpanels which included over 1,000 panel members, and just under 1,000 ‘specialist advisers’. These outside experts could be called in and interdisciplinary work could be referred to other panels – although this was noted to be difficult in practice (UNIVERSITAS, 2003). Each panel was able to choose its own assessment methods, within the boundaries given. In later RAEs, panels were required to publish a set of assessment criteria before embarking on assessment, and were ostensibly required to follow these. Higher education institutions (HEIs) could make submissions to as many panels as they wished.

Panel workloads varied enormously. For example, some panel members in 2001 received as low as eight submissions each to assess – others as many as 196 each. Unsurprisingly, direct assessment of outputs also varied hugely. Some panel members read all submitted. Other panels committed themselves to reading a minimum of 10 per cent of outputs, leaving a possibility that some individuals would not be directly assessed (UNIVERSITAS, 2003).

Panel submission and assessment

What was measured in assessments changed over time. Submissions from units to later panels contained the names of ‘research active staff’, with up to four research outputs each. There were a variety of other requirements such as student numbers, degree completions, research grants and so on, depending on the version, as outlined in Table 4.1. The relative importance of these measures remained obscure, at least until the 2008 exercise. In 2001 submissions were also required to a contain a statement of ‘research strategy and environment’, which contained such things as staffing policy, policies towards younger researcher and funding arrangement and was designed to be taken into account in the final grade given to the unit, although it was not clear how.

Some studies suggested outputs remained the main subject of assessment, with other qualitative and quantitative data used only at the margins (RAE2008, 2009; Roberts, 2003). However, one study of the various quantitative measures required in the 1996 and 2001 study found across several subjects that the ‘size [of departments] publications, research student performance and research council income strongly related to good performance’ in terms of RAE grades awarded (Lansley, 2007: 24).

Following recommendations from the Roberts’ inquiry, there was great clarity applied to other measures in the 2008 RAE, with a 50 per cent minimum weighting applied to the four research outputs. Peer esteem factors and research environment factors for individuals and the unit being assessed were also required. The relative weightings for the three factors of outputs, peer esteem, research environment were for the Main Panels to decide, but there were minimum requirements of 50 per cent, 5 per cent and 5 per cent, respectively. Other data called for included the overall staff summary – including research staff and related academic support staff; ‘detailed’ information on individuals selected, which is outlined at length; number of research students and research degree completions; research studentships and sources of funding; external research funding; individual staff circumstance and equity issues.

Under the overarching structure of the three factors, outputs, peer esteem and research environment, other information contained in the submission would be used to generate a final ‘quality profile’. It was not entirely clear however how this was done, and the results were accused of being highly subjectively derived (Corbyn, 2008). A percentage of submissions were audited for accuracy through various manifestations of the RAE; and errors were certainly not unheard of. Although these errors were probably not intentionally misleading, the report on the 1989 exercise suggested some probably were. Since 2001, submissions were made public and published on the World Wide Web, providing useful transparency over assessments.

Submission periods

The submission period (that is, when research was published) changed in various manifestations. The 1989 exercise was four years. A submission period of five years was specified for science and seven for humanities in the 2001 assessment. This was based on the belief that publication and research development cycles were longer in the humanities. Submissions that had been made to the 1996 RAE within these seven years could be resubmitted in 2001. For 2008, the assessment period was 1 January 2001 to 31 July 2007, with outputs required to be in the public domain (which in most cases means published) from 1 January 2001 to 31 December 2007. The ‘census date’ where eligible staff needed to be affiliated to the organisation being assessed was 31 October 2007 (Table 4.1).

Panel appointment processes

Panel appointment processes evolved towards great clarity and transparency, albeit still remaining somewhat unclear. The 1989 assessment did not publish panel membership until after the process and the appointment process was not overly transparent. Outside experts remained anonymous. For the 1996 process, panel chairs were selected by the Chief Executives of the relevant funding bodies. Around half had already served as Chairs in the previous exercise; the rest were appointed on recommendations from previous chairs. Almost all had served previously as panel members. Chairs then made recommendations for the other members from nomination from 1,000 various professional, learned and disciplinary associations (HEFCE, 1997). In the 2001 panel, while members could be and were from educational organisations, these were not able to nominate panel members. However, for the nominations that did occur from professional bodies and subject associations, it was still unclear how members were selected (UNIVERSITAS, 2003).

There was somewhat greater clarity in the 2008 RAE. In this case, Main Panel chairs were appointed after an application process by the chief executives of the university funding bodies, which took ‘into account the diversity of the UK HE research base [representing] a wide-range of HEIs and a considerable breath of research experience [including] in earlier [RAEs]’ (RAE2008, 2009: 9). The Main Panel chairs were then responsible for recommending the appointment to the next-level chairs from the thousands of nominations (3,000 nominations from 110 bodies). These chairs in turn, along with the Main Panel chairs, recommended the appointment of panel members. This of course still leaves some considerable leeway for an ‘old boys’ network’ and cronyist appointments, but was at least slightly clearer than its predecessors. Despite much talk of ‘research users’ in assessment, there were few nominations, although just under 10 per cent of panel members were from non- HEIs (RAE2008, 2009). Specialist advisors were also used – and in 2008, 939 specialist advisors were appointed from 500 nominations from panels and other nominations from ‘relevant nominating bodies’ (RAE2008, 2009: 56). The advisors’ names were published online.

Ranking methods and reporting results

The method of ranking changed considerably over time. The first three RAE submissions were ranked according to a five point scale, with descriptions of what constituted each rating sometimes changing during the period. This was changed to a seven point scale for 1996 and 2001. The 2008 scheme used five point ‘quality profiles’ (Table 4.2). This was justified on the grounds that ‘“quality profiles” … lessened the averaging effect of single-point ratings by allowing panels to exercise a finer degree of judgment, especially at grade boundaries’ (RAE2008, 2008: 3), although the distinction between a quality profile and scale is perhaps mainly a semantic one. A letter grade signifying the proportion of staff in a unit returned as research active was reported in the 1992 exercise.2 The number of staff per unit entered as research active from 2001 was published. In 2008 the number of FTE active research active staff and the proportion of total staff submitted as research active were reported.

Table 4.2

RAE Rating Scales 1992, 2001, 2008

19921  
Rating Description
5 Equates to attainable levels of international excellence in some sub-areas of activity and to attainable levels of excellence in virtually all others
4 Equates to attainable levels of national excellence in some sub-areas of activity, possibly showing some evidence of international excellence, or to international level in some and at least national level in a majority
3 Equates to attainable levels of national excellence in a majority of the sub-areas of activity, or to international level in some
2 Equates to attainable levels of national excellence in up to half of the sub-areas of activity
1 Equates to sustainable levels of national excellence in none, or virtually none, of the sub-areas of activity
20012  
Rating Description
5* Levels of international excellence in more than half of research activity submitted and attainable levels of national excellence in the remainder
5 Levels of international excellence in up to half of research activity submitted and attainable levels of national excellence in virtually all of the remainder
4 Levels of national excellence in virtually all of the research activity submitted, possibly showing some evidence of international excellence
3a Levels of national excellence in over two-thirds of the research activity submitted, possibly showing some evidence of international excellence
3b Levels of national excellence in more than half of the research activity submitted
2 Levels of national excellence in up to half of the research activity submitted
1 Levels of national excellence in virtually none of research activity submitted
20083  
Quality Level Description
4* World leading in terms of originality, significance and rigour
3* Internationally excellent in terms of originality, significance and rigour but which nonetheless falls short of the highest standard of excellence
2* Recognised internationally in terms of originality, significance and rigour
1* Recognised nationally in terms of originality, significance and rigour
Unclassified Falls below the standard of nationally recognized work. Or work that does not meet the published definition of research

Notes

1.Source: Universities Funding Council (1992: 15). Emphasis added.

2.Source: House of Commons Science and Technology Committee (2002: 12). Emphasis added.

3.Source: RAE2008 (2009: 11). Emphasis added.

The RAE in 2001 saw a marked increase in final scores, with 31 per cent of research active staff working in 573 units marked 5 and 5* increasing to 55 per cent of 1,081 units in 2001. Research ranked as national or international excellence was 64 per cent, up from 43 per cent in 1996. This led to considerable debate over whether the grade inflation was largely based on game playing rather than an increase in quality (House of Commons, 2002). It also led to a difficult situation where increasing funding did not necessarily follow better quality rankings, with HEFCE overall funding only increasing marginally. There was considerable difference in assessment measures across panels, with some finding the international versus national measures less than meaningful, and some developing their own rankings (UNIVERSITAS, 2003). Reporting or feedback to units on their portfolio assessments was also limited, partly due to fear of litigation, although there have surprisingly been few examples of this thus far (UNIVERSITAS, 2003). However, there had been a legal challenge after the 1992 exercise where one institution unsuccessfully asked for a judicial review on the basis that reasons should be given for ratings (HEFCE, 1997).

Feedback increased in the 2008 RAE where three ‘subprofiles’ were provided for outputs, research environment and peer esteem; again with the threat of legal action in mind (RAE2008, 2009). A number of Main and sub-Panels also produced reports outlining the assessment processes they had used.

The results for 2008 were:

 54 [per cent] of the research [was] either ‘world-leading’ (17 per cent in 4*) – or ‘internationally excellent’ (37 per cent in 3*).

 1,258 of the 2,363 submissions (53 per cent of total) had at least 50 per cent of their activity rated in the two highest grades. These submissions were found in 118 institutions.

 All the submissions from 16 institutions had at least 50 per cent of their activity assessed as 3* or 4*.

 84 per cent of all submissions were judged to contain at least 5 per cent world-leading quality research.

 150 of the 159 higher education institutions (HEIs) that took part in RAE2008 demonstrated at least 5 per cent world-leading quality research in one or more of their submissions.

 49 HEIs have at least some world-leading quality research in all of their submissions (RAE2008, 2008: 3).

Funding allocation

Funding was calculated by complex and possibly obscure formulae, with weightings for each quality evaluation changing over time, usually in the direction of greater selectivity. In 1992 this was assessment point 5 funding rating point 4, and down the scale 4(3), 3(2), 2(1) and 1(0). By 2008 the weighting had become considerably more selective, at 4*(7), 3*(3), 2*(1), 1*(0) and unclassified (0). This increased to 9:3:1 post-2008, increasing the stakes for winning considerably. Funding was also weighted based on the putative expense of the subject, where high cost subjects such as laboratory and clinical were weighted at 1.7 in 1997–2001 and 1.6 after, technical/experimental at 1.3 and ‘other’ at 1 in 2001. These weightings continued into the 2008 RAE.

Weightings were also used for ‘volume’ of research, based on the number of FTE research active staff (weighted 1), but with weightings given for research assistants and research fellows (0.1), and research students (0.15).

Charitable income was weighted at 0.228 per £25,000 received in 2001, but in 2008 was weighted at 0.25 based on charitable and charitable income converted into FTE, although it is not clear what this means from HEFCE documents. Units without three research active staff ranked 1* or above received no funding. An additional sum of £6.1 million was allocated in 2009/10 to 4* work only. Subject to this extra funding to 4* outputs and a fund to supplement full cost recovery funding for funding from charities, funding was derived from the 2008 RAE thus:

volume measures are weighted by the volume weightings … The product is multiplied by the relevant subject cost weighting, and then by the quality weightings. The latter are applied in proportion to the quality profile for the submission.

The overall outcomes of the formula calculation are scales to the total amount of funding available for QR in the year in question. (RAE2008, undated)

Staff that transferred between organisations during the period of the review could be used by both in their assessments. However, in the 2001 RAE only the last organisation would receive funding. The transferee was required to submit two outputs if they had transferred in the previous year. This was noted as a potential distortion for rankings as only two outputs were required and, as such, could lead to a better assessment for the second organisation (House of Commons 2002).

Funding to organisations could vary hugely. In 2001 for example, one group of 40 organisations made 240 submissions to the RAE for which they received an average of £27,580; compared to the average of £455,000 per submission overall (Roberts, 2003). This could perhaps be seen as a success in terms of the selectivity aims of the RAE, however perhaps those lightly funded organisations might differ.

Process and cost of the RAE

The later RAEs were immensely complex processes, spreading over several years, involving thousands of people, tens of thousands of submissions, and hundreds of thousands of research outputs. In the 2001 RAE for example, the RAE manager responsible for the process was appointed in November 1998. Nominations were called for and panel criteria and memberships were established and published by the end of 1999. The closing date for submissions was 30 April 2001, and these were received by panel members in May of the same year. Assessment meetings, including residential ones, were carried out from May 2001, with final grades confirmed and then published in December. Overview reports and feedback were only released in Spring 2002; over three years after the process began. In the 2008 RAE, guidelines to panel were released in January 2005, panel membership published in May 2005, assessment meetings through 2008, with the results finally published in December 2008.

The number of submissions and outputs submitted can be staggering. The 1996 RAE attracted 2,898 submissions from 192 HEIs, with 55,893 individuals to be assessed. The 2001 REA assessed 50,000 researchers in 2,598 submissions from 173 HEIs. For the 2008 RAE, 2,344 submissions were made from 159 HEIs, and 215,657 eligible outputs were submitted. Arts and Design assessment involved the storage of a number of large items that also needed display space to be viewed by panel members.

The process involved a large number of meetings and a considerable degree of administrative work. In 2008, for example,

arrangements were made for just over 1000 days of panel meetings, hotel accommodation for 1100 panel members, secretariat and RAE team staff, and near 100,000 transactions with panel members to dispatch outputs. (RAE2008, 2009: 32)

A number of reviews noted the extreme workloads and inadequate administration budget for the 2001 RAE and earlier, and the budget for the 2008 assessment was doubled. Even then permanent secretariat (mostly drawn from HEIs), part time advisors to panels contracted to provide policy advice to panels, and panel members themselves, often worked in excess of the hours for which they were contracted. Additional support costs to cover extra administrative burdens were £764,000 over the original budget (RAE2008, 2009).

In terms of the costs to universities themselves, a report commissioned by HEFC calculated the cost to HEIs (rather than the cost to the government) of the 2008 round was approximately £47 million, an average of £7 million over the seven years since the 2001 exercise. This translated to a total cost per HEI of £612,828, or £87,547 per year over the assessment period. This does not seem a huge figure if seen by total cost per researcher at £1,127, or £161 per researcher per year, but is still significant in aggregate terms. These costs were calculated looking at:

a variety of activities which require significant time and resource in terms of staff involvement, systems, co-ordination and internal governance. It has an impact at departmental level on the multiple active researchers taking part in the exercise, including the validation of publications information, the creation of tailored abstracts and participation in departmental and faculty review groups. Furthermore, many institutions engage in activities which go ‘above and beyond’ the minimum requirements of the RAE process in order to improve the quality of their submission, including strategic recruitment and external peer review. (PA Consulting Group, 2008:4)

These were generally seen as worth bearing, at least according to the report. However, as fiscal constraints bite in the coming years, this sum of money could call into question the merits of continuing with the exercise, or at least its successor.

Evaluating the RAE

If the RAE seemed for a while to have been abandoned in its classic form, did it then fail in the eyes of policy makers? This does not seem to be the case, with various largely positive evaluations seeing it leading to a greater quantity and quality (measured in terms of relative citation measures) of research, with the United Kingdom’s rank in citations improving. It directed research funding more selectively, which perhaps was a key measure of success for policy makers (Chatterji and Seaman, 2007). It has entered the British university system as a method of ranking universities, building reputations and using these reputations to leverage other funding and student applications (PA Consulting Group, 2008). It has been used by university administrations to remove, retire, transfer, discipline and/or modify the behaviour of academic staff that were seen not to be performing in the expected or preferred fashion, and added a further tool of control to central university administrations (Times Higher Education, 2004). This has been the case at some of the top universities in the United Kingdom, including medical academics at Imperial College that were threatened by disciplinary action if they did not have three papers accepted for publication and raised £75k in research funds; but it has also occurred further down the feeding chain. Perhaps for such reasons, senior university administrators have generally supported its continued existence (HEFCE, 1997; Times Higher Education, 2004). It also has highlighted so-called ‘pockets of excellence’ across the university sector, including in some post-1992 universities.

One HEFCE funded review found that while HEIs noted the administrative and financial burden, they sometimes saw this as balanced by the rewards in terms of direct funding, but also in terms of using the RAE assessment to gain prestige and reputation (including internationally), and to lever these for other funding and positive outcomes (PA Consulting Group, 2008). Earlier reviews also saw majority support for the RAE, albeit with concerns over aspects of its design, and with a significant degree of opposition (HEFCE, 1997). However, concerns were raised regarding administrative burdens, particularly for smaller organizations, its backward-looking focus, and the lack of developmental focus. The RAE also faced some questions regarding whether its rankings reflected ‘research value’ (although it was unclear what this meant), undervalued applied research, and ‘the non-transparent nature of allocation formulas’ (PA Consulting Group, 2008: 6). However, even some HEI that received no funding still valued the RAE for its positive reputational aspects. The usefulness of accountability and performance measurement for still largely publicly funded organisations was also reiterated, where it is seen to be ‘used to foster innovation [and] improve international research competitiveness’ (PA Consulting Group, 2008: 10), as well as promote departments to students, investors and the staff itself. Other commentators have noted there was some support for the RAE in re-orientating and rewarding research. However due to the highly prescriptive nature and high transaction costs of the RAE, and its continuation once its initial objectives of increasing selectivity has been met, this support diminished over time (cf. Times Higher Education, 2004).

RAE results were used to justify internal changes in organisations, influenced changes in organisations in terms of record keeping and management of research; although both these differed across organisations as would be expected and:

integrating these systems … of research support, financial administration and student and staff records systems … has proved a barrier for some HEIs, and … responding effectively to the RAE still requires dedicated and separate resource in most cases. (PA Consulting Group, 2008: 23)

Claims that it has encouraged a transfer market of top academics – which is not necessarily a bad thing for top researchers who surely deserve to be rewarded just as much as football players, or for the selectivity and concentration of research excellence – are generally rebuffed by reviews which conclude that there is little evidence for the practice (RAE2008, 2009; UNIVERSITAS, 2003). However, anecdotal accounts of head hunting before RAE (and REF) rounds continues, and there is a strong belief among some academics that this occurs in select disciplines, and among and by select schools, particularly business ones; and there is some disputable evidence (HEFCE, 1997).

While university administrators seemed broadly supportive of the RAE, the common sense understanding has been that academics were generally opposed to RAE. However reviews of the RAE find that despite these anecdotal and ‘common sense’ understandings, views on the utility or otherwise of academics on the RAE were perhaps more positive than would be expected, with one finding respondents rated the RAE results as at least fair, even if they had other concerns (Brinn et al., 2001). Official studies found a considerable degree of support for the process, albeit considerably less than that of university administrators, albeit with concerns over details, lack of funding, and with a sizeable opposition (HEFCE, 1997).

There is a considerable body of research – academic and government – that questioned aspects of the RAE. Governmental and various parliamentary reviews expressed concern over lack of clarity on assessment methods of panels, their lack of objectivity, the lack of applied science panel members, and the lack of transparency in the selection of members, although some of these were addressed to some extent in later versions (House of Commons, 2002). Concerns continued through various manifestations of the RAE regarding a focus away from teaching, highly contested notions that it was against collaboration and interdisciplinary work (of which there is mixed evidence), had a gender bias and a bias against applied knowledge, encouraged game playing, was biased against younger staff, among other things. Concerns continued over the impact of the RAE on redundancies, and over the closing of departments (House of Commons, 2002). Consistency across panels, neutrality and parity of assessment were also questioned in some reviews (UNIVERSITAS, 2003).

Some, but not all, of these concerns were addressed in the final 2008 RAE, at least in some aspects of increasing transparency of appointments and process. Transparency was also increased in the 2001 RAE where submissions were published online. Explicit assessment criteria were developed for each subject, but whether these were followed is questionable. However, the introduction of the categories of peer esteem and research environment, other measures involved in the assessment, and the potentially wide range of weightings that could be employed across different panels in ranking the three categories, mean that the possible lack of comparability across panels, subjectivity, and lack of clarity in what constitutes highly debatable notions of ‘world-leading’, ‘international’ and ‘national’ research may have actually increased in 2008. Anecdotes of panels not awarding high scores to individual submissions, even though this might included the top journals, such as Nature, for such spurious reasons such as they made a contribution to fields other than that covered by the panel continue to abound – and we return to problems of group decision-making in the final chapter. The constant tinkering with wordings, gradings, measurements and funding regimes (see Table 4.1) meant that to some extent there was a degree of goal shifting over the years, albeit with long lead times and with the changes usually signalled well in advance. We will deal with some more directed critiques below.

The teaching/research link and balance

A consistent claim has been that the RAE directed attention, effort and reward away from teaching. The importance of teaching was seen to be downgraded and researchers favoured over teachers in promotions and status (HEFCE, 1997; Besancenot et al., 2009; Leisyte et al., 2009; Brinn et al., 2001). Tension was sometimes created between research- and teaching-focussed staff, and between central administration and poorly performing or less research- focussed disciplines and units, particularly in post-1992 universities (Yokoyama, 2007). Given the importance this debate has for similar assessment exercises outside the United Kingdom, this deserves further attention.

The relationship between the RAE and teaching remains a contested one. Many universities and academics have a commitment to combining the two activities. There is a normative element to much research focusing on how the link can be improved (Burke and Rau, 2010; Jenkins, 2000; Leisyte et al., 2009; Simons and Elen, 2007). Research findings on the teaching/research relationship remain highly mixed however. Some studies suggest there is little, or a negative, relationship between teaching and research effectiveness (Barnett, 1992; Hattie and Marsh, 1996; Marsh and Hattie, 2002; Ramsden and Moses, 1992). Others find the relationship is a positive one and that teaching enhances research, and research enhances teaching (Lindsay et al., 2002; Visser-Wijnveen et al., 2009). Some studies find that the relationship depends on the discipline (McLean and Barker, 2004) and level of the course, with a stronger relationship between research and teaching effectiveness/student assessments as the course level increased (Arnold, 2008; Lindsay et al., 2002). As such, depending on one’s view of how the balance of evidence falls, if research is encouraged by the RAE, this might lead to an increase in the quality of teaching, at least at graduate level. However, the RAE might be promoting certain individuals who, however good at research they might be, may not be suited personality, skill or focus wise to be good teachers. Or it may not make any difference. At worst, on balance being good at research does not seem to imply poor performance at teaching, suggesting perhaps both can be rewarded. However, there may be a trade-off in terms of the time devoted to one cannot be devoted to another. In general however, there is at best mixed evidence that the RAE has led to a fall in the quality of teaching, whatever that might mean or how it might be measured.

One study found that more junior respondents saw the RAE having a greater negative impact on their teaching, administration, promotion prospects and job mobility than did more senior respondents (Brinn et al., 2001). The RAE has been associated with the growing importance of research and status, and promotions may depend on it with ‘teaching only’ jobs a punishment for lack of research achievement; but ironically for many the demands of administration and teaching in a research strapped environment mean that it is harder than it was previously to find the time to carry out the research (Leisyte et al., 2009). This may be leading in some cases to a disconnect between research and teaching where university employees are put into one of the two camps - but this is highly contextual, and may depend on the organisational culture of the university in question and its focus towards research.

Perhaps the RAE has not given research an important enough role and involved enough redirection of funding in some universities, allowing the demands for teaching and other activities to squeeze it out, and centralising research money too much towards a smaller group of universities. Perhaps some talented researchers, particularly in post-1992 universities with a more ambivalent attitude to research and a greater focus on top down centralised management, are not being given enough support and status for their research excellence, despite respectable quality scores in the RAE. Indeed, Middlesex University announced the closure of its philosophy department in April 2010 despite it being the highest rated department in the university by the RAE, and ranked 13 th in philosophy in the country. The closure was blamed on lower undergraduate numbers, although other sources suggested the low banded funding philosophy received was also a factor for the university, with it looking towards better funded STEM (science, technology, engineering and mathematics) teaching courses (Times Higher Education, 2010b).

Over time, as the research culture is cemented and those less focussed on research move into retirement or other avenues of employment, it may be that academics can accommodate themselves to new work environments and conditions. For those who think that research is important and even central to universities, this is not necessarily a bad outcome.

Central control and commodification of the university

There is a strong strand of literature that sees the RAE as reducing academic professional autonomy, collegiality and increasing managerialism and managerial control. This has included the ‘commodification’ (Gray et al., 2002) or proletarianisation of academics where they become drones in the factory-like production of knowledge, judged by the publishing of refereed articles (Gray et al., 2002), increased surveillance, and a reduction in university organisational autonomy (Tapper, 2007). Some academics might have found the RAE leading to greater stress as they struggle to fulfil the new demands, and as research was channelled into particularly ideological areas or towards types of research and types of research publication outputs, such as journal publication. Bureaucratic burdens generally increased, and only the sanity-challenged see increasing form-filling and evaluation as a positive outcome and end in itself, although such people exist in bureaucracies all over the world. If the benefits of these extra burdens were not seen as being balanced by extra benefits, and measures were of questionable usefulness, then perhaps on balance the RAE may not have been seen as positive. Some academics might see such competition as beneath them, and against the spirit of a collegial university; others perhaps might be more sceptical of whether such an Oxbridge collegial model exists outside campus novels, and enjoy winning at the game, even if the game is somewhat flawed and may not entirely measure what it claims to measure.

Whether these outcomes are entirely negative depends to some extent on where one stands, and how the RAE impacted within particular research environments. The RAE was introduced initially as part of a great effort by the state to exert central control of the universities. Central control over universities did increase from the light hand of the UGC, and RAE did have as a driving factor the performance and measurement and accompanying control that was popular at the time. University systems have changed in some cases to support the RAE and this, along with a general managerialism in universities themselves, may have made assessment and invasive monitoring of academics more likely. The RAE has given managers extra tools by which to impose this power and discipline. This control could have increased through alternative mechanisms, however that may have been more onerous, without at least the regard given to research in the RAE. It should be noted that managerialism of universities and the undermining of scholarly control does not require an RAE-type mechanism. As noted in Chapter 5, such changes occurred in New Zealand before their version of the RAE (PBRF). Indeed, the PBRF was seen by some as redressing the balance of power back towards scholars.

Even if its intent was to control the work of academics; not all were losers in this process as it is possible that those being controlled could turn and use these measures to further their own ends and agenda. Later RAEs were reliant on academics for the peer assessments, so that it is not entirely clear that the process was one of simply working in the interests of state actors – unless the 1,000 or so involved in the process were simply the tools, witting or unwitting, of a controlling state. Perhaps it would be better to see the process more of an alliance of one part of the academe, and organisations within higher education, with elements of the state, perhaps in opposition to other segments of the academe and different organisations within the broad framework of higher education. Indeed, there seems to be the creation of self- sustaining bureaucracies, a clientele within the academy who people the panels and gain the status, personal and financial satisfaction, and the joy of exerting power over members of their discipline, as well as a host of for-profit research and other consultancies that exist to feed into advising on the RAE and its successors. Whether this is a good thing can be debated of course.

Types and forms of research

It is possible that peer review panels in models such as the RAE can lead to a homogenisation of disciplines and indeed a form of ideological policing. As noted in Chapters 1 and 2, there is a strong strand of research that sees science often constrained by rather conformist paradigms, excluding and marginalising research, and those carrying out that research, that does not nicely fit. Such a claim is made regarding the economics panels in various RAE, which are seen to favour a particular ‘market-favouring’ type of economics as published in a select group of journals – with the list shortening over time (Lee and Harley, 1998; Lee, 2007), and similar comments about the RAE have been made to RAE reviews (HEFCE, 1997).

A critique often made, including by some parliamentary assessments, is that the RAE had the potential to reward certain types of research output. Ground-breaking studies may take years of work, be published in journals with questionable rankings – and may be ignored for years. They may be produced by individuals that have produced little until that time. The three discoverers of DNA – Watson, Crick and Franklin – did not receive immediate recognition for their work and previously had achieved little. Recognition was long coming, and Franklin missed out on the Nobel prize perhaps as she was already dead by the time it was awarded (it cannot be awarded posthumously).

Some critics suggest the RAE was inherently biased against publications other than journals (Paisey and Paisey, 2005). If this is true – and it is claimed not to be the case for later RAE – for some panels this bias may simply be reflecting the, perhaps regrettable, bias of their disciplines.

In sum, there are grounds for suspecting that the RAE favours paradigm-bound and discipline-specific work, and replicated particular output biases of those disciplines. As members are likely to be senior and establishment figures, this conservatism and conformity could be pronounced. To some extent, it might simply be reflecting such tendencies within the wider academe itself, but perhaps further concentrated in a small panel. Some of these issues are explored at greater depth in the final chapter.

The Research Excellence Framework

The pre-Budget report of December 2006 announced that the 2008 RAE would be the last, to be replaced by what is called the Research Excellence Framework. The RAE was seen to have served its purpose and it was time to change to better focus on linking research to economic and other outcomes. Initially this was focussed on the introduction of metrics or bibliometrics – quantitative counts and measure of publication behaviour and impact as discussed in previous chapters – but the difficulty of operationalising this soon became apparent, and was largely abandoned. The proposed REF has been modified over time to where quantitative measures can be used to support judgements of research outputs. Impact was to be one measure to be assessed by panels, along with ‘research output’ quality and ‘research environment’.

Over time, REF became more a modification to the RAE model than its total abandonment, with the central element of the RAE – panel peer assessment of quality – still central to the REF, and assessment by research unit rather than individual or whole organisation. Assessment was to be weighted by research ‘quality’ (60 per cent), research ‘impact’ (25 per cent) and research environment (15 per cent). In 2011, impact had its weighting reduced to 20 per cent of the total measure, with output counting for 65 and research environment 15 per cent, but with indications that impact’s weighting might increase in the future. Highly contested notions of impact are what excited the greatest degree of opposition, with debate on their utility continuing after the 2010 election which replaced the Labour Government with a Conservative–Liberal Democratic coalition. Various complex, contestable and possibly highly subjective measures of impact were proposed. For a while the future existence of the REF was doubtful, with the new government making some sceptical statements and seemingly accepting many of the criticisms of the proposed impact regime. However, at the time of writing it is still underway, with panel memberships advertised and appointed. The first assessment is planned for 2014. Data on PhDs completed, research income, and research environment between 1 January 2008 and 31 July 2013 are also required. Impact measures are also required, as discussed. Selected staff ‘in post’ on the ‘census date’ of 31 October will be assessed. Outputs assessed are those produced from 1 January 2008 to December 2013. It should be noted that it is possible that mechanisms could change after this book is published, as the discussion of some issues continues.

Moving to bibiiometrics, or not

The first consultation report on the REF outlined two types of assessment. First, for the sciences, the use of bibliometrics, focussing on citations per article, and drawing on the Web of Science (WoS), was recommended. For the ‘arts, humanities and social sciences, there will be a light touch peer review process, informed by metrics’ (HEFCE, 2007: 4). This bears a large degree of similarity to the Australian ERA, discussed in the previous chapter. It was proposed that the science measures would be phased in from 2010, with it driving all research funding by 2014. The ‘light touch’ process was due to begin in 2013, driving all funding from 2014. The initial report was ‘confident that bibliometric techniques can be used to provide a robust and meaningful indicator of research quality across the science-based disciplines, particularly when applied at the broad level of the subject group’ (HEFCE, 2007: 8). A perhaps rather excited scoping study claimed that on

the basis of real life examples … bibliometric methodology can be applied in procedures that suit the academic environment in an optimal way (emphasis added). (Centre for Science and Technology Studies, 2007: 5)

However, subsequent pilot studies on the use of WoS and Scopus databases and reports noted just how difficult the bibliometric process would be in practice. Pilot studies, and universities responding to them, found the process immensely complex, with most universities lacking sufficient records, databases and management of research. There were a large number of errors, duplication of names, different addresses and so on in databases, with considerable resources needed to correct and reconcile databases (Evidence Ltd and HEFCE, 2009; HEFCE, 2009; Technopolis, 2009). Difficulties were noted even in matching papers based on organisational address and subject area, with the journal subject listings in the databases not matching department and other units in HEIs. Matching staff names, including previous staff members, and units of assessment demanded considerable data manipulation (HEFCE, 2009). Differences were noted in the citation counts between Scopus and WoS. The considerable effort and cost that would be needed to bring universities up to scratch for a bibliometric system to work nationwide was pointed out. Bibliometrics were not the easy answer that had been hoped for. It was concluded:

Bibliometrics are not sufficiently robust at this stage to be used formulaically to replace expert review in the REF. However, there is considerable scope for citation analysis to be used to inform expert review.

… robustness … varies across fields of research. In areas where publication in journals is the main method of scholarly communication, bibliometrics are more representative of the research undertaken. (HEFCE, 2009: 3, emphasis added)

Indeed, later reports suggested citation data would not be used for the arts and humanities, and perhaps more surprisingly, the social sciences. However, more recently their use has been proposed as indicators for the physical sciences and some social sciences in the REF, although it was decided only economics among the social sciences will make use of them in the REF.

As the notion of an easy application of bibliometric measures soon fell apart under closer scrutiny, and perhaps resistance from interested parties, later reports took the REF closer to the peer review panels found in the RAE which it ostensibly replaced. However, a key concern became what was called ‘impact’, and this is perhaps the main difference from the RAE. This is not the same as the ‘impact’ used in some citation studies, where high impact is seen as related to high citations. It is an altogether broader and perhaps considerably vaguer notion. This impact was outlined in a letter of 22 January 2009 by the Secretary of State to the HEFCE. In this it was stated that the REF:

should continue to incentivise (sic) research excellence, but also reflect the quality of researchers’ contribution to public policy and to public engagement, and not create disincentives to researchers moving between academia and the private sector. (HEFCE, 2009: 4)

The REF provided a slightly different version of research where it is ‘a process of investigation leading to new insights effectively shared’. Unit submissions would be assessed by panels, including 36 sub-panels and four main panels, according to three factors, as follows:

1. Output quality of a selection of a research submitted by a unit, initially based on ‘rigour, originality and significance’. This was initially reduced to three outputs rather than the four found in the RAE, but increased to four again in later versions. ‘Early career researchers’ and other specified categories of staff could submit less than four ‘without penalty’. Co-authored outputs could be submitted more than once by the same organisation, listed against more than one staff member; although this does not exactly seem to be encouraged (REF2014: 15–16). Research quality received the greatest weight of the three – proposed at 60 per cent in the 2009 report, but increasing to 65 per cent in 2011.

2. I mpact which was seen as the:

demonstrable economic and social impacts that have been achieved through activity within the submitted units that builds on excellent research … to make a positive impact on the economy and society within the assessment period. (HEFCE, 2009: 7)

Other definitions of impact in the same and other reports were considerably wider, and by 2011/2012 had not reduced in vagueness. This measure received a weighting of 25 per cent in the 2009 report, reduced to 20 per cent by 2011. However, it is indicated that impact might increase at some stage as a proportion of assessment in the future. Research contributing to impact was initially to be measured with an historical window of 15 years previous to the measurement date, and in mid-2011 the cut-off date was announced as 1 January 1993. It should be noted that the research cited in impact case studies had to be carried out in that institution.

3. Environment – similar to the research environment of previous measures, at 15 per cent of the weighting.

Those eligible for assessment are research active staff employed on the census date, with a job description that included primarily research or research and teaching. Other research staff with a ‘clear defined relationship’ with the unit could also be included (HEFCE, 2009). Each ‘subprofile’ of the three measurements will be rated on five levels ranging from four star (exceptional), three star (excellent), two star (very good), one star (good) and unclassified (below the standard required). These will provide an overall rating for the research unit, rated by the same five levels. This will be termed an ‘overall excellence profile’.

Quantitative citations will be used to inform assessments of individual outputs although not in most social sciences, nor in the humanities and arts (HEFCE, 2009: 7). Economics will also use citation data (REF 2014, 2012). A number of cautions were raised regarding the use of citations on newer articles, and on applied work. ‘Grey literature’ (such as consultancy reports) and other ‘non-standard forms’ were allowed to be submitted as outputs for assessment and presumably assessed on the same basis; although it is clear that bibliometric data would be less useful in their assessment (unless using Google Scholar, perhaps, which measures some non-published sources and will be used in some panels), although how their quality would be assessed in general is unclear (HEFCE, 2010). ‘Double-weighting’ outputs, where some outputs containing a greater degree of work, such as monographs, would count as ‘two’ for submissions, was also discussed and was adopted by some panels with, for example, a single authored monograph potentially being double-counted in the medicine and biology sub-panels. Panels have considerable decision-making sway on weightings, and they differ between disciplines.

Measuring impact

The difficulties in assessing impact were noted from the first. The time lags in assessing impact were seen to be addressed by a ‘broad view’. Attribution of impact to research, the limits of measures and difficulty of corroboration were all noted. There were proposals as to how such difficulties could be limited. It was initially proposed that case studies detailing examples of research impacts (one for each 5–10 FTE) and a single overarching impact statement be prepared for the unit in question. This became a statement of the unit’s approach to ‘enabling impact’ as well as specific case studies (REF2014, 2012). These statements were seen to set out such things as exploitation of research, range of interactions with users, the range and significance, and evidence of continuing activity. Indicators of impact included such things as research income, indicators of the extent of collaboration with research users. Indeed, the ‘common menu’ of impact indicators runs to two pages in the 2009 report, and provides a very wide range of social, cultural, economic, policy and training impacts, among other things. The remarkable range of things that might be considered as impact are explored at length in the response to the call for submissions, calling into question just how useful the concept is and how it could be meaningfully put into practice (HEFCE, 2010). By March 2011 this impact regime had firmed up, albeit a very vague firming up, to mean:

all kinds of social, economic and cultural benefits and impacts beyond academia, arising from excellent research, that have occurred between 1 January 2008 and 31 July 2013. (REF2014, 2011: 4, emphasis added)

The 2012 report also gave a remarkably wide range of definitions (REF 2014, 2012).

A key point here is that the research that led to impact had to be excellent as well. Excellent in this case refers to work that receives a 2* ranking or above. Dissemination of research was excluded as a measure. To obtain the credit for the impact, the unit must demonstrate a ‘distinctive contribution’ that ‘meets standards of excellence that are competitive with international comparators’, although the reader’s guess is as good as ours as to what this means. Discussion with HEFCE officials seems to suggest that some change needs to be demonstrated to prove impact, but later reports differ on what counts as impact across panels (REF 2014, 2012). Research contributing to impact could be traced back to 1 January 1993, as noted. Panels have issued guidance on what indicators are appropriate and will produce a graded ‘sub-profile’ for impact submissions, on a five level rating ranging from 4 stars (exceptional) to unclassified (little or no significance), as noted.

The potential for considerably greater administrative burdens for both units and panel in assessing impacts, not to mention the considerable difficulty and potentially high levels of subjectivity in assessment, game playing and essentially ‘spin’, ‘selling’ and ‘marketing’ in the case studies is, of course, great. There is some irony that the original focus of the REF was to reduce administrative burdens by focussing on a few key indicators, but instead by 2012 had developed a regime that seemed potentially more complex, more subjective, and more open to challenge and possibly involving considerably greater administrative burdens than the RAE. It even seems to include more indicators as well. Impact itself is a difficult, vague and contested notion – as are the proposed case study measures of it – and this is considered at greater length in the concluding chapter of this book.

Conclusion

The RAE panel-based assessment exercise remained a central model of research assessment for over 20 years, passing through various manifestations until its last outing in 2008. While details, measurements and rankings changed, what remained constant was a focus on peer review of submitted outputs (two then four per individual being assessed) by panels in various fields. Assessment was at the research unit level, in contrast to the New Zealand version of the RAE which focussed on individual measurement. In later versions, assessment was of particular members of staff submitted by the unit, allowing less able, less well-published or less experienced researchers to be excluded from assessment, again in contrast to New Zealand where all eligible staff were to be included, at least ostensibly. Over time, the process seemed to grow in complexity and by the 2008 exercise was a vast undertaking involving thousands and costing millions, albeit still a small percentage of the total research budget. The original focus on the assessment of two then four supplied outputs has been supplemented by a changing array of measures and indicators that perhaps added to this complexity, although the focus on quality remained its ostensible aim.

As to its effects: most official accounts see both the amount and quality of research improving during its life, albeit from an already high base, and as measured by citations and papers produced, which are measures which have limits of their own. Others have worried about negative effects on university culture, on the work life of academics, and a focus on short-term and journal-focussed research, rather than ‘big’ and long-term projects, the discouragement of interdisciplinary work, and the discouragement of the production of books and monographs. Evidence for all these claims is mixed, of course, and views on the RAE may differ based on perceptions of winning and losing. Some are definitely winners; a key aim of the RAE was to be more selective in funding and send public money to those considered better at producing certain types of research. However, it was not always the old universities that did well out of this selectivity – pockets of high performance were found around the country.

Despite claims of success, by 2006 there was a belief among policy makers that the RAE has served its purpose and its abandonment was announced. However initial enthusiasm that suitable bibliometric measures might be an easy fix and reduce complexity soon fell away when the difficulty of the process was realised, and as 2010 progressed even the RAE-like model proposed (now called the Research Excellence Framework) seemed to have a questionable future. Its much-touted but controversial focus on impact seemed, and remains, particularly problematic. Nor does the REF seem any less complex than the RAE, and might be even more subjective. Despite pronouncements otherwise in 2010 by the new government, however, the REF continued to be developed, with the impact focus maintained. Panel members have been appointed (by open recruitment for chairs and nomination for general members), criteria and methods of assessments have been announced, and it looks as if assessments will begin in 2014 as scheduled.


1Research was defined as: ‘original investigation undertaken in order to gain knowledge and understanding. It includes work of direct relevance to the needs of commerce, industry, and to the public and voluntary sectors; scholarship; the invention and generation of ideas, images, performances, artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction. It excludes routine testing and routine analysis of materials, components and processes such as for the maintenance of national standards, as distinct from the development of new analytical techniques. It also excludes the development of teaching materials that do not embody original research …
Scholarship [is] defined as the creation, development and maintenance of the intellectual infrastructure of subjects and disciplines, in forms such as dictionaries, scholarly editions, catalogues and contributions to major research databases’ (RAE2008, 2008: 5).

2A: 100–95% staff submitted. B: 94–80%. C: 79–60%. D: 59–40%. E: 39–20%. F: < 20%.