Judging Behaviour And Rater Errors: An Application Of The Many-Facet Rasch Model

Noor Lide Abu Kassim

Abstract


Of the potential sources of construct irrelevant variance or unwanted variability in performance assessment, those associated with raters have been found to be extensive, difficult to control, and impossible to eliminate. And as rater-related errors are non-trivial and threaten the validity of test results, it is necessary that these errors are accounted for and controlled in some way. This paper explains the different types of rater errors and illustrates how they can be identified using the Many-Facet Rasch Model, as implemented by FACETS. It also demonstrates what these errors mean in terms of actual judging or rating behaviour and elucidates how they may affect the accuracy of estimation of performance. Rater errors that are explicated in this paper are those related to rater severity, restriction of range, central tendency, and internal consistency. As assessment and its procedures are central to student learning, matters related to valid and fair testing need to be taken seriously. It is hoped that with greater awareness of how we judge and a better understanding of how rater-related errors are introduced into the assessment process, we can be better raters and better teachers.


Keywords


rater error, judging behaviour, Many-Facet Rasch Model, performance assessment, validity.

Full Text:

PDF

References


Engelhard, G., Jr. (1994). Examining rater errors in the assessment of written composition

with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112.

Henning, G. (1997). Accounting for nonsystematic error in performance ratings. Language Testing, 13(1), 53-63.

Holzbach, R. L. (1978). Rater bias in performance ratings: Superior, self-, and peer ratings. Journal of Applied Psychology 63(5), 579-588.

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.

Lee King Siong, Hazita Azman & Koo Yew Lie. (2010). Investigating the undergraduate experience of assessment in higher education. GEMA Online™ Journal of Language Studies, 10(1), 17-33.

Linacre, J. M. (1989). Many-Facet Rasch measurement. Chicago, IL: MESA Press

Linacre, J. M. (1998). Rating, judges and fairness. Rasch Measurement Transactions 12(2), 630-631

Linacre, J. M. (2003). Facets (Version 3.48.0) [Computer Software and manual]. Chicago: Winsteps.com

Linacre, J. M., Engelhard, G. Jr., Tatum, D.S., & Myford, C. M. (1994). Measurement with judges: Many-faceted conjoint measurement. International Journal of Educational Research, 21, 569-577.

Lumley, T. & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing 12(1), 55-71.

Lunz. M. E. (1997). Performance examinations: Technology for analysis and standard setting. Paper presented at the Annual Meeting of the National Council of Measurement in Education. Chicago, IL: (ERIC Document Reproduction Service No. ED409377).

McNamara, T. F. (1996). Measuring second language performance. New York: Addison Wesley Longman.

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428.

Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second language speaking ability: Test method and learner discourse. Language Testing, 16(1), 82-111.

Wigglesworth, G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-336.

Wilson, M., & Case, H. (2000). An examination of variation in rater severity over time: A study of rater drift. In M. Wilson & G. Engelhard (Eds.), Objective measurement: Theory into practice (Volume V) (pp. 113-134). Stamford, CT: Ablex.


Refbacks

  • There are currently no refbacks.


 

 

 

eISSN : 2550-2131

ISSN : 1675-8021