INTER-RATER RELIABILITY AND VALIDITY OF SCORING MEN'S INDIVIDUAL TRAMPOLINE ROUTINES AT EUROPEAN CHAMPIONSHIPS 2014
DOI:
https://doi.org/10.52165/sgj.10.1.69-79Keywords:
trampoline, judging, accuracy, objectivityAbstract
Execution scores of men's individual trampoline routines at the European Championships (EC) 2014 in Guimarães, Portugal were analysed. In total, 66 men competed in the qualifying round. The old, classic format of scoring, by which the execution score is the sum of the scores of individual judges (discarding the lowest and highest scores), was compared with the new format, by which only the median scores of each skill are tripled and then summed for the final score. Execution was found to be the most significant component of the total score, surpassing degree of difficulty and time of flight in both routines. Intra-class correlation (ICC) coefficients and Kendall's coefficient of concordance W were computed. The bias of judging was small with only one judge found who scored significantly higher than the other judges did. Inter-rater reliability was found good for single skills (ICC around .9 and Kendall W around .7), while for the sum of all ten skills it was excellent (all ICC coefficients above .99 and Kendal W above .97) for both routines. Although the correlation coefficients between old and new format scores were high (r=.965 and r=.997 for first and second routine, respectively), there were some substantial differences in rankings of competitors between old and new scoring format (Spearman rank correlation rho=.94 and rho=.96 for first and second routines, respectively). Despite the reliability and validity of judging trampoline routines were high, some possible means of improvement are suggested. Regarding the differences between old and new formats, no clear (dis)advantages of one or another were found.
Metrics
Downloads
References
Ansorge, C. J., & Scheer, J. K. (1988). International bias detected in judging gymnastic competition at the 1984 Olympic Games. Research quarterly for exercise and sport, 59(2), 103-107.
Ansorge, C. J., Scheer, J. K., Laub, J., & Howard, J. (1978). Bias in judging women's gymnastics induced by expectations of within-team order. Research Quarterly. American Alliance for Health, Physical Education and Recreation, 49(4), 399-405.
Balmer, N. J., Nevill, A. M., & Williams, A. M. (2003). Modelling home advantage in the Summer Olympic Games. Journal of Sports Sciences, 21(6), 469-478.
Boen, F., Van Hoye, K., Vanden Auweele, Y., Feys, J., & Smits, T. (2008). Open feedback in gymnastic judging causes conformity bias based on informational influencing. Journal of sports sciences, 26(6), 621-628.
Bučar, M., Čuk, I., Pajek, J., Karacsony, I., & Leskošek, B. (2012). Reliability and validity of judging in women's artistic gymnastics at University Games 2009. European Journal of Sport Science, 12(3), 207-215.
Damisch, L., Mussweiler, T., & Plessner, H. (2006). Olympic medals as fruits of comparison? Assimilation and contrast in sequential performance judgments. Journal of Experimental Psychology: Applied, 12(3), 166.
Di Felice, U., & Marcora, S. (2013). Errors in judging Olympic boxing performance: False negative or false positive? In Peters, D. M. & P. O'Donoghue (Eds.), Performance Analysis of Sport IX (pp. 190-195): Routledge.
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2012). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84. Internet resource: https://cran.r-project.org/web/packages/irr/index.html, 2017
Johns, P., & James, B. (2013). The efficacy of judging within trampolining. In D. M. Peters & P. O'Donoghue (Eds.), Performance Analysis of Sport IX (pp. 214-221): Routledge.
Leskošek, B., Čuk, I., Karácsony, I., Pajek, J., & Bučar, M. (2010). Reliability and validity of judging in men’s artistic gymnastics at the 2009 university games. Science of Gymnastics Journal, 2(1), 25-34.
Leskošek, B., Čuk, I., Pajek, J., Forbes, W., & Bučar-Pajek, M. (2012). Bias of judging in men's artistic gymnastics at the european championship 2011. Biology of Sport, 29(2), 107.
Morgan, H. N., & Rotthoff, K. W. (2014). The harder the task, the higher the score: Findings of a difficulty bias. Economic Inquiry, 52(3), 1014-1026.
Pajek, M. B., Forbes, W., Pajek, J., Leskošek, B., & Čuk, I. (2011). Reliability of real time judging system. Science of Gymnastics Journal, 3(2), 47-54.
Plessner, H. (1999). Expectation biases in gymnastics judging. Journal of Sport and Exercise Psychology, 21(2), 131-144.
Plessner, H., & Schallies, E. (2005). Judging the cross on rings: A matter of achieving shape constancy. Applied Cognitive Psychology, 19(9), 1145-1156.
Scheer, J. K., & Ansorge, C. J. (1975). Effects of naturally induced judges' expectations on the ratings of physical performances. Research Quarterly. American Alliance for Health, Physical Education and Recreation, 46(4), 463-470.
Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Statistical methods in medical research, 7(3), 301-317.
Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. The Journal of Strength and Conditioning Research, 19(1), 231-240.