Research evaluation in Japan: the case of the National University Corporations
This chapter investigates the evaluation system for higher education institutions in Japan. This evaluates research, but also teaching and other social services of the National University Corporations. Performance is assessed against goals (set by the universities) and expectations (of nominated ‘stakeholders’). After conducting a statistical analysis, we show that both assessments give similar results. Teaching and research evaluations are related. Neither evaluation is able to be used in resource allocation, as they do not provide data to compare to other universities in Japan or worldwide. We argue it is better to understand Japan’s innovative method as a means to enhance accountability rather than necessarily improve performance.
In Japan, as in other countries, research and performance evaluation has increased in recent years. This reflects the influence of New Public Management (NPM) and similar ideas, including greater pressures for performance measurement and accountability. But like many other reforms in the Japanese public sector, there are aspects of higher education policy that seem to reflect developments elsewhere, but take on aspects unique to the Japanese experience. To a considerable degree, features of research evaluation in Japan remain obscure, particularly the criteria on which some research is evaluated, and the link between these evaluations and funding. The focus on evaluation of single universities based on criteria largely individual to that university, means comparisons and rankings across organisations are difficult.
Research evaluation forms only part of the overall evaluation system of universities however, and differs by disciplines and universities, with some private and public universities being evaluated largely on criteria outside research performance. There is no central agency to determine on what basis universities are to be evaluated. Nor are there clear standards or criteria for performance evaluation beyond basic guidelines promulgated by the Education Council and the Ministry of Education, Sports, Science and Technology (MEXT). Indeed, universities are largely evaluated on the criteria which they themselves determine, and/or develop in negotiation with various stakeholders. These evaluations may have a research element, but not in all cases.
First, self-evaluation is carried out. Then self-evaluations are compared to reviews carried out by quasi-independent government agencies, largely using peer review by panels of experts. Evaluation criteria are often unclear, albeit published after the evaluation. The evaluations have limited use for cross-organisational or cross-country comparisons due to the fact that universities are largely evaluated on standards that they set themselves, which may not be comparable. The domestic market in higher education is sheltered from international competition and large enough so that universities can survive without this international scrutiny. This study focuses on the publically owned National University Corporations (NUCs).
After testing a number of hypotheses, we discover that self-evaluation is often more positive than central evaluation, which is hardly surprising. There was greater variation in assessment of research than of teaching, and higher assessment of teaching was related to higher assessment for research. Higher inputs into research were related to better research outcomes. This chapter proceeds by outlining the Japanese research and university environment, before testing a number of hypotheses. Policy implications and recommendations are examined in the conclusion.
Japan is a research powerhouse. Since the 1990s it has maintained the highest level of research spending as a percentage of GDP among developed countries, at 3.67 per cent of GDP for the 2007 financial year (FY). A large part of this expenditure (¥18.9 trillion) is, however, shouldered by business. Expenditure by government for FY 2007 was ¥3.3 trillion, at 17.4 per cent of total research expenditures, comparatively the lowest of major countries, including Russia (61.2 per cent for FY 2006) and France (38.2 per cent for FY 2005). Total expenditures for FY 2007 by sector were 73 per cent by business, 1.6 per cent by non-profit institutions, 7.3 per cent by public organisations, and 18.1 per cent by universities. Expenditure on basic research was ¥2.41 trillion, while spending on applied research was ¥4.07 trillion and spending on developmental research was ¥11.06 trillion. These figures are exclusive of social sciences and humanities research expenditures.
In terms of the share of the world’s total number of scientific papers published in 2007, Japan has fallen recently to third position (at 8.2 per cent of papers), with the United States first at 29.3 per cent of papers and China second at 10 per cent. Since 1990 Japan has been ranked fourth in the world after the United States, the United Kingdom, and Germany in total number of article citations.
Universities dominate in the provision of basic research. While approximately three quarters of research expenditure is spent by business enterprises in Japan, universities provide half of basic research in terms of expenditure. While business spends just 6.4 per cent of total research expenditures for basic research, universities spend 54.7 per cent of the total. Fifty per cent of government funding for research goes to higher education institutions. Measured by the total number of citations for 1997 to 2007, among the top 20 institutions in Japan, 17 are higher education institutions consisting of 15 national universities; one is a national inter-university research institute, and one a private university. The largest group of researchers work in business, followed by a second group in universities (276,829 persons). The total number of researchers in 2008 was 857,723. Engineering and technology fields have the largest population at 474,304.
The share of researchers working at universities varies from field to field. In medical sciences, social science and humanities, the dominant group is academicians in universities. The NUCs dominate university research. There are 86 national universities among the total of 765 universities (which include the national universities, other public and local universities, and private universities) existing in 2008. Despite NUCs comprising 11.2 per cent of the total number of universities, numbers of NUC researchers, and expenditure on research and facilities amount to 45.4 per cent, 42.3 per cent and 31.8 per cent of the total, respectively. Of government funding to universities of ¥1.68 trillion for FY 2007, national universities received ¥1.44 trillion (86 per cent), paid through operating and competitive grants (MEXT, 2009). Therefore, it is reasonable to focus on national universities to assess research activities and outputs for Japan.
While there are antecedents going back centuries, modern Western-style universities were first established during the Meiji era of the late nineteenth century as Japan attempted to learn and catch up with the West. The Fundamental Code of Education, promulgated in 1872, called for the establishment of eight universities (Fujimura- Faneslow, 1997). The University of Tokyo was the first, established in 1877, followed by several other colleges and technical institutes, and government-run research- focussed institutes and departments. As Amano (1979) notes, these diverse institutions were based on a mixture of Anglo-American and German models. By the Second World War, there were 45 universities in a strict hierarchy, consisting of the top-tier 7 ‘imperial universities’ – with Tokyo as the most prestigious – other government universities, the elite private universities Keio and Waseda, and 23 other private universities of considerably lesser status (Fujimura-Faneslow, 1997). Attempts to reduce the hierarchical nature of the university system by the occupying forces after the Second World War largely failed, and this hierarchy continues to the present, with the considerable post-war expansion in universities largely at the middle and bottom end (Amano, 1979; McVeigh, 2002). The academic profession in Japan still maintains a strong preference for research activities and academic autonomy, even in comparison to Germany.1 Traditional management styles in Japanese companies and bureaucracy persist. Research activities are implemented in a team or a group led by the professor, with professors dominating all decision- making. Research groups do not generally interact with other groups, instead conducting research independently and drawing on their own equipment, staff and funds (Hanada and Miyata, 2003).
In effect, and de jure, the public universities were arms of the Ministry of Education. They were owned by the state, seen as fulfilling functions set by the state, subject to the same laws governing other state agencies, with staff being civil servants, employed and paid by the state. For a long time the Japanese government adopted ex ante (input) quality assurance and control measures. After the 1956 University Establishment Standard, the Ministry of Education was responsible for approving university curriculum, the structure of courses and teaching methods, staffing levels, student numbers, land and buildings, tuition levels, even the number of books in the library. The Ministry also had the power to appoint, discipline and dismiss academic staff, although this was supposed to be exercised in accordance with the faculty meeting and academic council. The establishment of a university required an assessment by the Council for University Chartering and School Corporation, which investigated the ‘quality’ of plans, including the organisation of teaching staff.
In the late 1980s, higher education policy in Japan evolved from ex ante control (input) to ex post (output) control and evaluation of university activities. The policy change was influenced by reform in the general public sector. Massification, which Trow (1974) identified as the transition from elite to mass higher education and internationalisation or globalisation in the higher education market, also provided pressure for change. There were also related pressures to enhance transparency and ‘objectivity’. In 1991 the regulation for the establishment of universities was simplified. All universities were obliged to conduct self-monitoring and self-evaluation. There remained considerable central control over the university system, however.
As the Japanese economy moved from an ‘economic miracle’ to prolonged recession following the asset bubble collapse in 1991, the reduction of tax revenues led to worsening public finances. Globalisation and demographic change, represented by an aging population and lower birth rates, also encouraged structural reform in the public sector. The Administrative Reform Council (ARC), chaired by the Prime Minister, published their final report in December 1997. This proposed reforming state governance. One of the main proposals was to strengthen evaluation activities, seen to enhance accountability and transparency, with it ‘necessary to establish the system [with] objective evaluation on policy outcomes … to be fed back to planning and budgeting’.
Following the recommendations of the ARC’s final report that implementation functions should be separated into independent public bodies, and reflecting the influence of NPM, the Basic Law of Independent Administrative Institutions was enacted in 1999. In 2001, 57 Independent Administrative Institutions (IAIs) were established. These included 31 research and development institutes previously attached to ministries. The IAIs are legally separate bodies, albeit with a considerable degree of indirect and other control from their erstwhile departments. Under the Basic Law, the responsible minister sets medium-term (3–5 year) goals comprising the fiscal plan, planned improvements in efficiency and quality of services. Then IAIs submit the plan to the minister for approval. Within the terms of the plan, IAIs ostensibly have full discretion to manage resources in exchange for strengthening accountability for the results. The framework includes a mandatory ex post evaluation each year. At the end of the mid-term, each IAI is examined by an external evaluation committee as to whether the goals are being achieved, and whether primary activities should be maintained or abolished. National and other research institutions converted to IAIs are included in the new evaluation system.
Following the National University Corporation Law of 2003, in 2004 all national universities were transformed into NUCs. These are semi-autonomous public bodies, legally separate from MEXT. The governance of NUCs is similar to IAIs, although there are some differences in planning and evaluation. While NUCs were given a degree of greater flexibility in management, accountability for performance was strengthened through an ex post evaluation. Indirect influence over the NUCs from the Ministry continues, due in part to strong networks and the placement of former officials in university management structures (Goldfinch, 2006).
The Koizumi Administration (2001 to 2006) promoted public sector reforms, including the adoption of NPM, into government operations (Yamamoto, 2009). The corporatisation of national universities ostensibly saw the results of evaluation reflected in resource allocation. By contrast, the Basic Law of IAIs neither explicitly provides that the results of performance evaluation link to funding, nor stipulates that performance be evaluated by the same standards across IAIs.
In 2004 MEXT introduced a ‘certified evaluation and accreditation’ system. Each university, including NUCs, is required to be evaluated by an external evaluating organisation certified by the MEXT at least once in every seven years. NUCs also face another evaluation every six years by the National University Corporation Evaluation Committee (NUCEC), with this power delegated to the National Institute for Academic Degrees and University Evaluation (NIAD-UE). The differences between evaluation of universities generally, and the NUCs in particular, lie in the contents (mandatory in both teaching and research activities for NUCs), evaluation organ (NIAD-UE is the subcontractor for MEXT) and evaluation standards (NUCs face uniform standards approved by MEXT).
Each higher education institution must select an organisation from among the evaluation organisations, although as noted NIAD-UE evaluates the NUCs. MEXT may take corrective actions if the university is judged to have failed to meet laws and regulations. Research activity is not a mandatory field to be evaluated for universities other than NUCs. For example, only 3 universities among 11 institutions which underwent the certified evaluation by NIAD-UE in FY 2008, chose evaluation of research.
As shown in Table 6.1, there are varying evaluation systems for research activities by institution, programme or project and researcher. At the ex ante level this includes the evaluation of fund applications and the allocation of special grants to private universities, and assessments of universities before certification at establishment, and thereafter.
|Unit||Ex ante||Ex post|
|Project/Programme||Assessment of competitive research projects. Internal assessment in budgeting||Evaluation of programme and large project. Internal evaluation|
|Institution/ Organisation||Examination of special grants for private universities. Assessment in establishing universities||Certified evaluation and accreditation for universities. Evaluation of research type of IAIs and NUCs|
|Researcher||Assessment of research potential at recruiting and promotion||Evaluation of outputs and outcomes including publications in refereed journals|
Large projects, the performance of the organisations as a whole, and the research output of individual researchers are evaluated in an ex-post sense. Universities are now obliged to be evaluated according to relevant regulations and laws at the organisational level. The certified evaluation and accreditation results are made public. In contrast, details of research activities and outputs of private universities for specific grants from the government are not open. The amounts of grants are published by university, as required, as are the evaluation standards and measures.2 National universities differ from other universities in that both teaching and research activities are evaluated.
While the evaluation of the NUCs has some very superficial similarities to the Research Assessment Exercise (RAE) in the United Kingdom, there are considerable differences, particularly in the methods and practices. There is a lack of clarity regarding the content and detail of evaluation methods. The linkage between evaluations and funding can be unclear. The design of the evaluation system is in part in response to critiques of systems like the RAE, and the differences may be intentional, at least in part. In particular, critics from sociology, economics and public management have questioned aspects of the RAE and similar systems, including problems with:
a focus on short-term or quantitative goals rather than long-term or high goals with a degree of risk/uncertainty (Smith, 1995; Power, 1997; Rebora, 1999; Van Thiel and Leeuw, 2002; Dixon and Suckling, 1996; Holmstrom and Milgrom, 1991).
To address some of these concerns, evaluations for NUCs adopt an ostensibly comprehensive and integrated approach whose scope extends to teaching, research, public services and management. The evaluation also does not directly assess the quality of academic performance, but rates the extent of achievement or accomplishment against the midterm goals and expectations. This chapter will examine whether evaluation systems for the national university system are feasible and will work as intended.
In 2004 all national universities became NUCs, which are modifications of the independent administrative institutions (IAIs) (Yamamoto, 2004; Goldfinch, 2006), as noted. Larger flexibility in management is given to NUCs in exchange for strengthening the accountability of results. Core funding from government to NUCs becomes a block grant, termed the ‘operating grant’. Academic and administrative staff are no longer civil servants.
Targets for each national university are set in the mediumterm goals by MEXT, after consideration of the draft goals submitted by each NUC. In contrast to IAIs, NUCs largely set the medium-term goals. This is partly due to the character of higher education in Japan, including a Constitution that stipulates academic freedom. The goals are required to be related to the quality of teaching and research, improving operations and efficiency. To accomplish the goals, each NUC sets out the mid-term (six year) plan, which must be approved by MEXT. An example is given in Table 6.2.
Evaluation is carried out by NUCEC. NUCEC comprises ‘experts’ in administration, teaching and research of higher education institutions, and from industries, culture and society. Members are excluded from the evaluation of NUCs in which they have interests. The first chairperson was a Nobel Prize winner. Formally the minister appoints the members. In effect however, selection is carried out by bureaucrats in charge of the evaluation division in MEXT.
The NUCEC evaluates the progress in the achievement of medium-term goals each year, albeit with this partly delegated to NIAD-UE. At the end of the medium-term goals period, the NUCEC evaluates the achievement of goals and plans for NUCs, taking into consideration the specific nature of higher education, especially teaching and research outcomes. The contracted evaluation system might be explained by a wish to draw on the expertise and accumulated experience in evaluation of universities, rather than simply the need to ensure ‘objectivity’ in evaluation through using an at arm’s length organisation. However, as the membership is selected by MEXT its results may reflect some MEXT preferences.
NUCEC ‘requests’ NIAD-UE to implement the evaluation of achievement of medium-term goals in teaching and research. NIAD-UE is an Independent Administrative Institution whose mission is to certify and accredit, evaluate academic performance, and award academic degrees in special cases. It was previously the National Institution for Academic Degrees (NIAD).
Each national university submits two self-evaluation reports on performance to the NIAD-UE. These report on the achievement of the medium-term goals in teaching, research and public service, at both the organisational level and by department. Both self-evaluation reports are prepared in compliance with guidelines for performance reporting published by NIAD-UE.
The report on achieving the goals is conducted by evaluating to what extent the actual results meet the midterm goals. The report on the academic level (of research and teaching outcomes) is conducted through comparing the actual results with the expected levels, with expectations based on stakeholder engagement, as expanded below. In the case of the evaluation of research quality, NUCs are required to analyse the research activities and research outcomes and judge their research level according to whether each department meets the expectations of the nominated stakeholders. Each department has to submit a list of excellent research outcomes [SS (outstanding) and S (excellent)] in terms of scientific or social and cultural impact. Impact in this case is defined as the direct and indirect (but including seeing some causal link) results of research outcomes.
According to the guidelines, excellence shall be judged by each department with reference to these measures, including publications in high ranked academic journals, books reviewed, citations, awarded prizes, invited lectures or speeches, and utilisation. As such, self-evaluation involves a considerable discretionary element, even to the extent of defining what the organisation will be measured upon. An example of how this operates in a particular NUC is provided in the next section to give greater context.
The NIAD-UE then assesses the self-evaluation report. It is rated against stakeholders’ expectations in four grades. These are ‘far above the expectations’, ‘above the expectations’, ‘same as the expectations’ and ‘below the expectations’. Each department is individually assessed. The grades received can differ from the self-assessment. The role of the unit being assessed is again central, with the unit defining which stakeholders are important. They can include academics, the industrial world, other organisations in the region, state (national government) figures and other citizens. Students and parents are excluded. In theory the expectations of stakeholders will differ from university to university. In practice, there is however little difference in the expectations among the universities, since each university emphasises the common mission of the NUCs.
NIAD-UE then directly assesses the achievement of medium-term goals for each national university. This includes site visits and hearings with presidents and directors, and interviews with students and graduates. After the analysis and the site investigation, NIAD-UE evaluates the achievement on teaching and research of each national university in five grades. These are ‘outstanding in achieving the mid-term goals’, ‘excellent in achieving the goals’, ‘generally good in achieving the goals’, ‘insufficient in achieving the goals’ and ‘crucial issues to be improved (poor) in achieving the goals’. The evaluation results are submitted to NUCEC. These are usually accepted pro-forma by NUCEC and they form the substantial part of teaching and research evaluation for the NUCs. Further details are given in the case study below.
The NUCEC submits the results to the Committee for Policy Evaluation and Evaluating the Independent Administrative Institutions in the Ministry of Internal Affairs and Communications (MIC). The Diet (parliament) can ask the Committee to consider the nature of national universities.
The basic structure for evaluating NUCs is identical to that of IAIs. Both adopt the principle of management by objectives. The focus is on to what extent the actual performance at the end of medium-term achieves the goals and plans. In other words, the evaluation is in essence a comparison between actual and planned figures within the organisation. It is neither a relative evaluation nor a ranking system for universities. However, the NUC system is intended to link the evaluation to funding which in practice requires an evaluation system to compare the organisational performance with each other, or to other standard measures like RAE in the United Kingdom or Valuazione triennale della ricerca (VTR) in Italy (Minelli et al., 2007). How this can be done remains problematic, and perhaps highly subjective and discretionary.
As Johnes and Taylor (1990) argue, research ratings of university and research performance vary significantly with the quality of staff, research resources, priority for teaching, and university type. A funding regime may not produce good results for a higher education system as a whole if it allocates the bulk of funds into a few outstanding universities, which may result in an unbalanced system. However, depending on how performance is measured, this outstanding performer might not be efficient in terms of the ratio of inputs to outputs, or research outcomes per expenditures. Universities rated less well in research performance on this basis could be more efficient (in technical efficiency terms) than those rated higher, as the rating or performance might be measured in simply the number of excellent articles in international refereed journals or by citations without considering the resources (time, money, and staff etc) inputted into the process. As such, some measures of research performance may not be a good indication of productivity.
From this perspective at least, the evaluation methods for NUCs are an innovative way to take into account the differences among universities. NIAD-UE assesses the level of research by comparing the research activities and outcomes with the expected level from the point of expectations of the stakeholders, including academics, government, local communities, industries and international society. Actual and expected levels might differ from those of self-evaluations, because the evaluation criteria and expectations by NIAD-UE are not always identical to those of respective departments. Actual in this case means ex post performance on activities and outcomes; expected levels are the projected performance on these indicators. In addition, NIAD-UE examines the achievement of research in the midterm comparing it with the mid-term goals in IAIs.
The evaluation and funding link is an attempt to balance pressure from some aspects of the Japanese bureaucracy (in this case the Ministry of Finance (MOF)) for greater competition and more directed and selective funding, with the countervailing pressure from MEXT and the NUCs themselves for balance across the NUC system as a whole. Comparing performance with expectations or goals has some merit in giving some information for improving performance within the unit, as it relates to the perceived demands of stakeholders, however without allowing comparison between universities.
Such a model is the preference, by and large, of both MEXT and the NUCs. They insist all national universities remain research intensive universities, even if located in rural areas, and that the national system needs to be sustained as a whole. If funding is directly linked to research performance, it is feared resources will be highly concentrated within a few prestigious universities such as the University of Tokyo, while less prestigious NUC will suffer financially in comparison. In contrast many private universities differ significantly in type and research focus, with the majority teaching-focussed.
On the other hand, the MOF prefers greater competition in higher education institutions, including the use of performance-based funding on a national level. MOF insists there is need for comparable and standardised data on research activities. Therefore, the hybrid evaluation system of research level and achievement against targets might be an attempt to harmonise the conflicting needs of sustainability and competitiveness, and to find a middle path between the policy preferences of two powerful state agencies.
According to the National University Corporation Law, the evaluation of NUCs must be conducted at the end of the medium-term goal period (FY 2004 to FY 2009). However, in order to reflect the results in the resource allocation and planning for the subsequent medium-term goals for the period FY 2010 to FY 2015, each national university and the MEXT needed to have the evaluation results by the end of budget proposals for FY 2010, viz, by the end of August in 2009. This meant a provisional evaluation for the six year medium-term goals had to be implemented in the FY 2008.
The provisional evaluation on teaching and research was subcontracted to NIAD-UE by NUCEC. NIAD-UE set up the Evaluation Committee for Teaching and Research of National Universities consisting of 30 persons. This was composed of three sub-committees for assessing current achievements by university, analysing the present conditions on teaching and research activities, and evaluating the level of each research activity and outcome by department.
The first sub-committee consisted of eight groups classified by organisational scale and structure, with 187 members. The second was divided into ten working groups in accordance with ten major areas, with total assessors of 260. The third sub-committee was composed of 66 special working groups for research area, with a total of 334 members. In each area at least two persons assessed self-reported excellent performance, with each department able to submit up to half of its academic staff for assessment (if the staff are self-assessed as excellent). Performance in this case is in terms of the objectives and ‘character’ of the department. Evaluation started by examining the self- evaluation reports on teaching and research in July 2008. NIAD-UE visited each university from October to November in 2008. Finally, in February 2009, the provisional evaluation reports were submitted to NUCEC, which were accepted pro forma.
NIAD-UE evaluates by both the level by department and achievement at university in teaching and research. For research, the research level is evaluated by comparing it with the expected level. Research activities are measured in volumes or quantity and research outcomes. These focus on quality, including assessments of excellent performance selected by each national university. By contrast, the achievement means to what extent the actual performance meets the mid-term goals and the mid-term plans.
The Provisional Evaluation examined progress towards the achievement of the medium-term goals and analysed teaching and research. As mentioned earlier, teaching and research activities were evaluated by NIAD-UE, whether examining achievement or level. The ratings of achievement of medium-term goals in research are shown in Table 6.3 They range from ‘excellent’, ‘outstanding’, ‘generally good’, ‘insufficient’, to ‘poor’.
The ratings of achievement on operations or management, other than teaching and research, are prepared by NUCEC. The level of education activities by department is analysed based on teaching ‘organisation’, content, method, outcomes and destination of new graduates entering employment. These are compared to expectations. The level of research activities is divided into activity and outcomes (see Table 6.4). Further, to what extent the quality of teaching and research by department improved is reported in three grades. Two hundred and one departments were rated ‘highly improving’, 372 departments were ‘reasonably improving’ and 21 departments were rated as ‘not improving’. NIAD-UE rated quality improvements by examining the materials on ‘enhancing research quality’ submitted by each national university, according to the missions of each department. NIAD-UE claims the quality of teaching and research has improved since incorporation.
Each national university submits to the NIAD-UE information on publications and awards, which must be disclosed to the public. While the individual publications listed as ‘distinguished performance’ are not required to be disclosed owing to privacy protection issues, some universities do publish the full reports. National University A is an exceptional case of full disclosure, and is studied here for that reason. It consists of six departments or schools.
Outcomes composed of rating ‘SS’ (excellent) and ‘S’ (outstanding) were selected by the university. The maximum staff (ranked as excellent or distinguished performance) allowed – half of academic staff by department – were submitted to NIAD-UE for evaluation. The university is able to choose the area to be evaluated from the academic or social, economic or cultural fields. Here, academic field indicates areas where research outputs contribute to the academic discipline, while social, economic and cultural fields show those where research outputs contribute to enriching the social, economic and cultural life of the people. Rankings are shown in Tables 6.5 and 6.6.
A large part of distinguished outcomes are found in academic fields rather than social, economic and cultural fields. Except for science and engineering whose staff number just 14, there are only small differences among departments in the number of highly ranked outcomes (at SS or S level) (see Table 6.5). On the other hand, Table 6.6 shows that the departments used a variety of different measures in validating claimed excellence, as follows:
4. The departments of humanities (36.3 per cent) and education (37.5 per cent) commonly adopted the number of books or performances reviewed as measures. None of the departments of engineering, science and agriculture used such measures.
6. Impact factor – a measure reflecting the average number of citations to articles published – is used in the science and engineering fields. The impact factor is considered a possible measure of the quality of a paper published in a refereed journal (see Chapter 2).
7. The utilisation of research outputs is a measure focusing on contribution to society and industry. As expected, the measure is adopted more readily in engineering (52.3 per cent) and agriculture (37.5 per cent).
We should be careful about in generalising results from University A.
As many studies (Cave et al., 1991; Johnes and Taylor, 1990; Jackson and Lund, 2000; Bruijn, 2001) show, it is more difficult to measure the performance of teaching than that of research. There is little standardised testing in measuring learning outcomes for universities. By contrast research activities and outcomes can be measured by such things as publication number and citations, which serve as proxies for quantity and quality, albeit flawed ones. From this discussion we derive the following hypothesis:
The assessment of research level is focussed on the difference between actual performance and expectations by department, while the achievement evaluation of research is the comparison of performance with their targets listed in the medium-term goals by university. The expectations were finally approved by NIAD-UE in the process of evaluation. In contrast, goals were approved by MEXT before the process. As such, we suspect they are independent measures. This leads to hypothesis 2a.
As noted, the research level is assessed in terms of research activities and outcomes against expectations, referring to the guidelines given by NIAD-UE. Activities are generally measured by the number of papers and books published, and the amount of research funds, especially competitive funds. Outcomes are measured by the number of academic prizes and citations in international academic journals which each department reported. However, the two are likely linked as inputs will flow into outputs/outcome with, ceteris paribus, better funded and resourced units better able to carry out research, particularly in ‘big science’ projects. Therefore, we develop the following hypothesis.
The evaluation system of NUCs is ostensibly comprehensive in scope, process and content. Since the system is comprised in part by management by objectives, the targets in the medium-term goals have to be checked and fed back to the next planning round. Each year the NUCEC monitors the progress against the medium-term goals and plans for every NUC. The assessments of teaching and research for mid-term are largely based on peer review informed by some quantitative data, including the number of publications and amount of research money. Both assessments are implemented at the level of activities and outcomes by department, and the achievements by university. However, in both the NIAD-UE and NUCEC evaluation stages, the linking of evaluation and funding was unclear.
Consequently, every national university is obliged to make a greater effort to be assessed as a good performer on all indicators being measured, including potential ones. This is simply a low risk strategy to avoid poor performance on measures that may have some influence on evaluation and funding outcomes, even if not explicitly so. Accordingly, the following hypothesis is derived.
Every national university, whether a former imperial university or not, likely has a preference for better evaluation results as a higher assessment would be linked to a higher reputation and greater funding from the government (albeit in an unclear manner). On the other hand, NIAD-UE, which is in charge of evaluating teaching and research activities and outcomes, has processes and procedures professionally controlled to ensure greater neutrality and objectivity in assessments. Hence, the following two hypotheses are derived.
To test the hypotheses above, data on the evaluations is needed. Fortunately, as discussed earlier, all results from both the self-evaluation and NIAD-UE/NUCEC evaluations on teaching and research are published: we analyse the results of these evaluations using basic statistics such as correlation and variation.
The number of NUCs and departments evaluated is 86 and 596 respectively. Survey data on the impact of universities from the NIAD-UE is used. Primary data is given by written documents on evaluation. In order to operationalise evaluation results in quantitative terms, we transformed the assessments of achievements and levels of teaching and research into Likert scales. Achievement assessments against medium-term goals range from 5 (excellent) to 1 (poor). Level assessments to the expectations are from 4 (much higher) to 1 (lower).
National universities were classified into eight groups by character (see Table 6.5).3This classification can be explained by the differences in the historical backgrounds and resource holdings among NUCs. Bias was assessed by any measurable preference given to particular groups within this classification.
In order to test H1, the following was undertaken. In the achievement of teaching and research, the differences in the two measures among universities was calculated by taking the minimum score from the maximum score by university type (see Table 6.7). As shown in Table 6.7, the difference in teaching achievement (0.285 = 3.285 – 3.000) is smaller than that in research achievement (0.857 = 3.857 – 3.000). Therefore H1 is supported.
In order to test H2a, the correlation of scores in research achievement and in research level was calculated. The correlation coefficient between the achievement and research activities in the level is 0.253 (p < 0.05) and the correlation coefficient between the achievement and research outcome in the level is 0.315 (p < 0.01). This means that research achievement is significantly associated with research level on activity or outcome. As a consequence, H2a – hypothesising the independence between assessments of achievement and level – is rejected.
H2B is tested thus. The research level was analysed by NIAD-UE in terms of activities and outcomes using a self- evaluation report on teaching and research. This is a standard evaluation theory which divides the focus into inputs, efforts (activities), outputs and outcomes over time. Of course, as the research develops, external environmental factors affect the research performance and the relation between inputs and performance becomes indeterminate. In other words, it is presumed that higher activities would not always produce higher outcomes. However, the correlation coefficient of assessments on research activities and outcomes is 0.707 which is statistically significant (p < 0.01). Hence, H2b is confirmed.
H3 is tested thus. In order to analyse the relationship between research and teaching performance, the coefficients of correlations between achievements and levels of teaching and research were calculated. The coefficient of correlation between teaching and research achievement is 0.352 (p < 0.01) and statistically significant. Since the teaching level is measured in five elements as described earlier, the teaching performance was measured as the coefficient of correlations between the average score of teaching level and research level. The results were 0.604 (p < 0.01) and 0.591 (p < 0.01). Accordingly, research performance was significantly associated with teaching performance in both achievement and level. H3 is therefore supported.
H4 is tested as follows. The relationship between selfevaluation and NIAD-UE’s evaluation on research level is analysed by measuring the difference of the score in the two types of evaluations. The value ranges from 3 to − 3 owing to the maximum being 4 (much higher than expected) and the minimum being 1 (below the expected). A positive value means that the rating by NIAD-UE is higher than that of the self-evaluation, while negative means self-evaluation is higher than NIAD-UE. A zero means the two are identical. The average of the value in research activities is − 0.587 and that in research outcomes is − 0.625. Accordingly, it is found that self-evaluations in research activities and outcomes are generally higher or softer than those by NIAD-UE. H4 is therefore supported.
H5 is tested thus. The performance of national universities varies by type, including graduate or medical schools, as noted in Table 6.6. However, evaluation is implemented by comparing the actual performance in activities and outcomes with the goals/expectations. These might reflect the character or nature of the faculty in a national university and may differ due to subjectivity in their derivation and evaluation. On the other hand, NIAD-UE attempts to ensure neutrality and reliability through assessment procedures using more than one assessor with professional knowledge. As shown in Table 6.8, in every type of university, self- evaluation is higher than that of NIAD-UE. However, except for graduate universities or institutes, there are neither significant nor systematic differences in average scores among university types. Consequently, although a common bias toward softer rating was found, H5 is generally supported.
The theoretical hypotheses other than H2a were generally supported. The empirical findings in the previous section show that differences in research assessments among universities were bigger than those in teaching, while research performance was associated with teaching performance. Research level assessments ratings for research activities were related to those for research outcomes. The assessments by self-evaluation on teaching and research were generally higher than those by professional evaluation body, NIAD-UE. Finally, there was no significant bias for specific group of universities on ratings by NIAD-UE.
However, how the evaluation results would be linked to resource allocation or the operating grants for the next midterm goals was not determined at the time of writing. Debate on higher education policy and methodological matters continue. While evaluation and funding and ostensibly linked, it is not clear as to:
Provisional evaluation assesses the achievement and level of teaching, research and operations. The ‘level’ evaluation is carried out by department, while ‘achievement’ assessment is carried out at university level. As Gibbons and Murphy (1990) argue, a relative performance evaluation is an optimal measure as reward or payment is linked to the difference in performance between individuals in the same organisation, or organisations in the same sector. Differences in performance are more likely to be related to individual and/or unit performance if background conditions are similar. However, the provisional evaluation in achievement and level is respectively the comparison of actual performance with the targets by university and the expectations by department. These are unable to be compared to each other, as they exist in different scales. Since targets are determined in the medium-term goals, and are largely developed by the individual national university in any event, the achievement evaluation is affected by the level of difficulty. In other words, when the university set the easy (difficult) targets in the medium-term goals, the targets would be (less) likely to be achieved.
The level evaluation has a similar problem. A response from the national universities to the survey by MEXT indicates that ‘softer’ self-evaluation of research performance might be associated with good evaluation results by the NIAD-UE. In practice, as shown in Table 6.8, the NIAD- UE’s evaluation of research level is significantly related to the self-evaluation of each university. There exists a common bias towards a softer rating, with the higher the self- evaluation, the higher the NIAD-UE’s rating. Accordingly, evaluation results can be distorted. Agency theory (Ross, 1973; Baker, 1992) also suggests that the strength of linkage between performance evaluation and funding will be marginal due to the possible larger measurement errors. It is argued that when it is difficult to measure performance correctly, incentives will be lower, and so the linkage between performance assessments and funding will not be strong.
The second issue is the relationship between inputs, and outputs or outcomes. The current evaluation system neglects inputs and does not take into consideration productivity. However, university performance, especially research activities or outputs and outcomes, are to a large extent determined by inputs. The NIAD-UE’s assessment does not explicitly consider the impact of inputs on research performance. The research achievement and level of former imperial universities was, on average, rated highest, although a relative evaluation was not adopted (see Table 6.7). The average achievement in research is 3.857, which was higher than those other groups. For research level, the averages of activities and outcomes are respectively 2.798 and 2.744, which are larger than other groups. Except for graduate institutes, the ranking of research achievement is related to the research expenses per academic staff by university type. It was found that the group of former empirical universities got the highest rating on research performance in achievement or level, corresponding to highest inputs (14,881 thousand yens). However, the unit cost of teaching is not associated with the ratings on teaching achievement (see Table 6.9). For example, the former imperial group is rated highest (3.285) in teaching achievement, as in research; however, the unit teaching cost of 237 (thousand yen) is lower than those of medical and normal groups (respectively 367 and 307 thousand yen).
|University type||Teaching cost per student (’000 s)||Research cost per academic staff (’000 s)|
|Science and Technology||266||9,424|
|Humanities and Social Sciences||215||2,970|
|Comprehensive having hospital||204||4,657|
|Comprehensive without hospital||189||3,997|
Perhaps in order to cope with the above problems, the linkage between performance and funding should be limited to the case of good performance. Also, the optimum production or most efficient function which can identify the relationship between inputs and outputs/outcomes should be estimated, if possible. Of course, the first prescription is constrained by the public finance situation. The second also faces considerable difficulties in identifying an optimum production function, because the function presumes the quality and quantity of resources for teaching and research can be easily measured.
The third issue is an attribution problem. The provisional evaluation examined performance for the FY 2004 to FY 2007 in the operational, financial and academic activities of all national universities. This means that the articles, books and awards in this period are subject to the evaluation by NIAD-UE. It takes time, however, for research inputs and activities to result in research outputs and outcomes, including awards. As a consequence, some research outcomes and awards are caused by research inputs or activities before FY 2004; viz, the time horizon for evaluation is not necessarily matched with its target period. Research performance for the evaluation period (2004 to 2007) covers those academic staff employed in the period, even though some may have retired from the university or moved to other institutions. While these limitations are recognised by NIAD-UE, they are seen as unavoidable in research evaluation.4
The fourth issue is ensuring the neutrality of the evaluation. The Council of Public Finance in MOF argued – at the proposal for budgeting of FY 2010 – that evaluations lacked objectivity. One was unable to compare the performance of a university with other universities. The council noted that a large proportion of members5 were persons related to universities, and argued the number of members of NUCEC from the business world and other experts should be increased to provide some distance from the universities. The evaluation method itself is qualitative, using a peer review system which may have increased subjectivity. While the National University Corporations Law stipulates evaluation shall be implemented by comparing achievements with medium-term goals, NUCs are accountable to the public and society in a broader sense. As evaluation shall be ‘objectively’ linked to government funding, distinguished research performance should perhaps be evaluated in light of social, economic and cultural perspectives, rather than simply an academic perspective. In addition, NIAD-UE’s evaluation results are unable to be used to compare performance among universities. Advantages and disadvantages are summarised in Table 6.10.
The National University Corporations Law enacted in 2004 stipulates that NUCEC shall evaluate the performance of NUCs. The evaluation ranges from teaching and research to administration and public services. It is presumed that the performance shall be linked to government funding, which presumably requires a degree of comparability among universities. However, owing to the system adopting management by objectives, evaluation is carried out by examining the achievements of targets in the medium-term goals of each national university. These targets are essentially determined by the universities themselves. In addition, in contrast to IAIs, evaluation is examined at the level of teaching and research performance by field. This is seen to promote ‘individuality’ as well as enhance quality. Since teaching and research levels are compared with expectations, evaluations are not comparable across units.
As such, there is a difficulty in linking performance evaluation to funding, particularly as there is no real basis for comparison across organisations. On the other hand, such a system might have merit in considering the individual characteristic of each national university. The full coverage of activities for the evaluation means that NUCs are accountable to the multiple stakeholders such as government, parents, industries, citizens and society – those needs are varied (Talbot et al., 2005). Hence, there are trade-offs6 between some measures and elements of performance and others.
We set up several hypotheses in research assessment of national universities. Analysing the evaluation results, we generally confirmed the hypotheses. However, achievement evaluation was significantly associated with level evaluation, in contrast to the hypothesis. Both types of evaluation are intended to assess the performance from different perspectives, that is, the former is assessed against mediumterm goals and the latter is examined referring to the expectations by stakeholders. It was presumed that both ratings would be independent of each other. However, in practice there was a significant relationship between the two types of evaluation. On the other hand, as expected, it was found that variations in achievement evaluation were larger than those at level evaluation. Research outcomes were significantly associated with research activities. It was confirmed that self-evaluation by NUCs was softer than NIAD-UE’s evaluations, and ratings of research level by NIAD-UE were not biased depending on the university type. This suggests NIAD-UE, a professional evaluating body, played a role in improving the reliability of research evaluation.
If we consider the intended aims of national universities at incorporation, evaluation has to contribute to strengthening international competitiveness and sustainability. National universities are the leading core of a knowledge-based society and might be an engine of the economy, both locally and nationally. Although national universities focus on basic research, the relationships among basic, applied and development research has been noted as interrelated and complementary.7 This means that, to some extent, resources should perhaps be concentrated into international competitive groups or fields, at the same time keeping national universities financially sustainable across the system as a whole. This is of course a difficult balancing act.
Since corporatisation, operating grants, a core revenue source from the government, have decreased each year. As a consequence local universities, unable to obtain sufficient external or competitive funds, have faced severe financial constraints. Although total research publications in science and technology fields in Japan increased in the period between 1996 and 2006, the publications whose citations per article are ranked in the top 10 per cent decreased in the same period.8 Since most publications are produced by national universities in Japan, this points to a possible negative impact on research outcomes for NUCs in the aftermath of incorporation.
Owing to a landslide victory of the centre-left Democratic Party Japan in 2009, which removed a Liberal Democratic Party which had held power since 1955, research assessment and evaluation in Japan has entered a new stage. In November 2009, research projects or programmes that included funding for universities, were reviewed by public screening sessions and budget examiners from the Ministry of Finance. The public session was led by Democratic Party Japan lawmakers and specialists, of which very few were scientists. Cuts were recommended in some scientific research funding, including supercomputer development and space programmes (Cyranoski, 2009).
Previous reviews on scientific or research projects were based on a closed system in which research programmes consisting of projects were examined by the MOF in terms of budget allocation, while research projects were assessed and allocated funding by professionals or peer review of the project. As such this is potentially a revolutionary change. A number of scientists, including eight Japanese Nobel laureates, publically defended their budgets and criticised the new review process (Associated Press, 2009). However, with a call for budget constraint and accountability and transparency of results, particularly with continuing fiscal stress, scientists and researchers are increasingly forced to justify their studies. As Dr Ezaki, a Nobel laureate, indicated, research assessment is required to justify the social value and meaning of science to the public in an open and understandable way.
1Arimoto et al. (2008) found Japan’s scholars are less interested in teaching than those in the United States and Germany. Respondents who answered ‘yes’ for ‘primary concern is in teaching’ amounted to 5.5 per cent in Japan, 13.2 per cent in Germany and 30.2 per cent in the United States.
2Current grants for private universities are composed of general grants and special grants. The ratio of general grant to special grant is approximately 2 to 1. The former is largely determined by the number of academic staff and students. The latter is related to performance and innovational activities. The share of research performance was 18.5 per cent for FY 2008, and the performance measures are percentage of degrees awarded, whether a contracted or joint research is implemented, whether a talented younger researcher granted a scholarship by the government is received, whether the grant in aid of scientific research is accepted, whether a patent is acquired or applied. Hence, most of the measures are on activity level.
3Eight groups are classified by the character of research and teaching. The former imperial group is historically high ranked. Groups of Normal (broad-based), Science and Engineering, Social Sciences and Humanities, and Medical are divided according to the disciplines. The existence of a medical school influences the amount of external funds and publications and citations. A further classification is given for universities conducting only graduate teaching.
4NIAD-UE mentioned in Q&A for the guidelines as follows. ‘Time inconsistency caused by the gap between activities and outputs or outcomes is an inevitable measurement issue in research evaluation, no matter which period will be set as a time horizon for evaluation’.
5Of 30 members, 24 were academics, 3 were from business and 3 were from high schools.
6E.g. cost-savings or downsizing the staff versus improving the quality of teaching and research.
7For example, the Induced Pluripotent Stem (iPS) Research Institute in Kyoto University stipulates the mission to pursue the possibilities of iPS cells through both fundamental and applied research with the goal of contributing to the development of regenerative medicine in addition to serve as the world-first core institute dedicated to leading iPS cell research (emphasis added).
8In this decade publication numbers increased by 10.6 per cent. In contrast, numbers of the top 10 per cent highly cited papers in each field decreased by 1.9 per cent.