How many heads are better than one? The reliability and validity of teenagers' self- and peer assessments

Yao Ting Sung, Kuo En Chang, Tzyy Hua Chang, Wen Cheng Yu

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Self- and peer assessments are becoming more popular in classrooms, but there are few data on the reliability and validity of such assessments performed by school children. Because these factors are greatly affected by the number of raters, we conducted two studies to determine the rating behaviours of teenagers in self- and peer assessments, and how the number of raters influences the reliability and validity of self- and peer assessments. The first study involved 116 seventh graders (the first grade of middle school), where students individually playing musical recorders were subject to self- and peer assessments. The second study involved 110 eighth graders, with Web pages constructed by students being subject to self- and peer assessments. Generalizability theory and criterion-related validity were used to obtain the reliability and validity coefficients of the self- and peer ratings. Analyses of variance were used to compare differences in self- and peer ratings between low- and high-achieving students. The coefficients of reliability and validity increased with the number of raters in both studies, reaching the acceptable levels of 0.80 and 0.70, respectively, with 3 or 4 raters in the first study (involving assessments of individual performance) and with 14-17 raters in the second study (involving assessments of group work). Furthermore, low- and high-achieving students tended to over- and underestimate the quality of their work in self-assessment, respectively. The discrepancy between the ratings of students and experts was higher in group-work assessments then in individual-work assessments. The results have both theoretical and practical implications for researchers and teachers.

Original languageEnglish
Pages (from-to)135-145
Number of pages11
JournalJournal of Adolescence
Volume33
Issue number1
DOIs
Publication statusPublished - 2010 Feb 1

Fingerprint

Reproducibility of Results
Head
Students
Self-Assessment
Analysis of Variance
Research Personnel

Keywords

  • Peer assessment
  • Reliability
  • Self-assessment
  • Teenager
  • Validity

ASJC Scopus subject areas

  • Pediatrics, Perinatology, and Child Health
  • Social Psychology
  • Developmental and Educational Psychology
  • Psychiatry and Mental health

Cite this

How many heads are better than one? The reliability and validity of teenagers' self- and peer assessments. / Sung, Yao Ting; Chang, Kuo En; Chang, Tzyy Hua; Yu, Wen Cheng.

In: Journal of Adolescence, Vol. 33, No. 1, 01.02.2010, p. 135-145.

Research output: Contribution to journalArticle

@article{8b494d250f7243358937b5292442f71a,
title = "How many heads are better than one? The reliability and validity of teenagers' self- and peer assessments",
abstract = "Self- and peer assessments are becoming more popular in classrooms, but there are few data on the reliability and validity of such assessments performed by school children. Because these factors are greatly affected by the number of raters, we conducted two studies to determine the rating behaviours of teenagers in self- and peer assessments, and how the number of raters influences the reliability and validity of self- and peer assessments. The first study involved 116 seventh graders (the first grade of middle school), where students individually playing musical recorders were subject to self- and peer assessments. The second study involved 110 eighth graders, with Web pages constructed by students being subject to self- and peer assessments. Generalizability theory and criterion-related validity were used to obtain the reliability and validity coefficients of the self- and peer ratings. Analyses of variance were used to compare differences in self- and peer ratings between low- and high-achieving students. The coefficients of reliability and validity increased with the number of raters in both studies, reaching the acceptable levels of 0.80 and 0.70, respectively, with 3 or 4 raters in the first study (involving assessments of individual performance) and with 14-17 raters in the second study (involving assessments of group work). Furthermore, low- and high-achieving students tended to over- and underestimate the quality of their work in self-assessment, respectively. The discrepancy between the ratings of students and experts was higher in group-work assessments then in individual-work assessments. The results have both theoretical and practical implications for researchers and teachers.",
keywords = "Peer assessment, Reliability, Self-assessment, Teenager, Validity",
author = "Sung, {Yao Ting} and Chang, {Kuo En} and Chang, {Tzyy Hua} and Yu, {Wen Cheng}",
year = "2010",
month = "2",
day = "1",
doi = "10.1016/j.adolescence.2009.04.004",
language = "English",
volume = "33",
pages = "135--145",
journal = "Journal of Adolescence",
issn = "0140-1971",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - How many heads are better than one? The reliability and validity of teenagers' self- and peer assessments

AU - Sung, Yao Ting

AU - Chang, Kuo En

AU - Chang, Tzyy Hua

AU - Yu, Wen Cheng

PY - 2010/2/1

Y1 - 2010/2/1

N2 - Self- and peer assessments are becoming more popular in classrooms, but there are few data on the reliability and validity of such assessments performed by school children. Because these factors are greatly affected by the number of raters, we conducted two studies to determine the rating behaviours of teenagers in self- and peer assessments, and how the number of raters influences the reliability and validity of self- and peer assessments. The first study involved 116 seventh graders (the first grade of middle school), where students individually playing musical recorders were subject to self- and peer assessments. The second study involved 110 eighth graders, with Web pages constructed by students being subject to self- and peer assessments. Generalizability theory and criterion-related validity were used to obtain the reliability and validity coefficients of the self- and peer ratings. Analyses of variance were used to compare differences in self- and peer ratings between low- and high-achieving students. The coefficients of reliability and validity increased with the number of raters in both studies, reaching the acceptable levels of 0.80 and 0.70, respectively, with 3 or 4 raters in the first study (involving assessments of individual performance) and with 14-17 raters in the second study (involving assessments of group work). Furthermore, low- and high-achieving students tended to over- and underestimate the quality of their work in self-assessment, respectively. The discrepancy between the ratings of students and experts was higher in group-work assessments then in individual-work assessments. The results have both theoretical and practical implications for researchers and teachers.

AB - Self- and peer assessments are becoming more popular in classrooms, but there are few data on the reliability and validity of such assessments performed by school children. Because these factors are greatly affected by the number of raters, we conducted two studies to determine the rating behaviours of teenagers in self- and peer assessments, and how the number of raters influences the reliability and validity of self- and peer assessments. The first study involved 116 seventh graders (the first grade of middle school), where students individually playing musical recorders were subject to self- and peer assessments. The second study involved 110 eighth graders, with Web pages constructed by students being subject to self- and peer assessments. Generalizability theory and criterion-related validity were used to obtain the reliability and validity coefficients of the self- and peer ratings. Analyses of variance were used to compare differences in self- and peer ratings between low- and high-achieving students. The coefficients of reliability and validity increased with the number of raters in both studies, reaching the acceptable levels of 0.80 and 0.70, respectively, with 3 or 4 raters in the first study (involving assessments of individual performance) and with 14-17 raters in the second study (involving assessments of group work). Furthermore, low- and high-achieving students tended to over- and underestimate the quality of their work in self-assessment, respectively. The discrepancy between the ratings of students and experts was higher in group-work assessments then in individual-work assessments. The results have both theoretical and practical implications for researchers and teachers.

KW - Peer assessment

KW - Reliability

KW - Self-assessment

KW - Teenager

KW - Validity

UR - http://www.scopus.com/inward/record.url?scp=75949117905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75949117905&partnerID=8YFLogxK

U2 - 10.1016/j.adolescence.2009.04.004

DO - 10.1016/j.adolescence.2009.04.004

M3 - Article

C2 - 19505717

AN - SCOPUS:75949117905

VL - 33

SP - 135

EP - 145

JO - Journal of Adolescence

JF - Journal of Adolescence

SN - 0140-1971

IS - 1

ER -