Estimating Measurement Error and Score Dependability in Examinations Using Generalizability Theory

Filed in Articles by on November 11, 2022

ABSTRACT

This study investigated the estimation of measurement error and score dependability in examinations using Generalizability Theory.

The fact that the scores obtained by the students (object of measurements) in examinations were affected by multiple sources of error (facets) and these scores, were used in taking relative and absolute decisions about the students, there was the need to estimate measurement error and dependability of the scores.

This was to find the extent of the contributions of these sources of error (facets) in examination scores. Four research questions and two hypotheses were posed to guide the study.

The population of the study comprised 25,530 senior secondary three (SS3) students in public secondary schools in Rivers State for the 2011/2012 academic year.

The sample consisted of 2,553, SS3 students selected through the proportionate stratified random sampling technique. A Mathematics Achievement Test with items drawn from past WAEC and NECO SSCE questions was used for data collection.

Educ version 6.0-e based on ANOVA and Generalizability theory was used to answer the four research questions.

A 95% confidence interval was computed using the S E variance components to determine whether there was a significant difference in the contributions and effects of the facets and their interactions to measurement error and score dependability in examinations.

The findings of the study revealed that some hidden sources of error were at play in the study. The residual made the highest contribution to measurement error.

This was followed by the student factor. Similarly, the residual and the student’s variance components were significantly (p < 0.05) different in their contributions to measurement error in examination scores.

Conversely, questions and invigilators were not significantly different in their contributions and effects on measurement error and score dependability in examinations (p > 0.05).

Invigilators and the interaction of students x invigilators have the highest effect on score dependability in examinations.

The findings also revealed that an increase of invigilators to 90, increased the generalizability coefficient (EP2) and index of dependability (Ø) which rank-ordered students and classified them based on their performance, irrespective of the performance of other students.

Therefore, generalizability theory provided a framework for evaluating multiple sources of variability in examination scores and for deriving implications for test development and test score interpretation.

It is therefore recommended that in conducting examinations, enough invigilators should be recruited so as to minimize error and maximize score reliabilities.

TABLE OF CONTENTS

Title Page – – – – – – – – i
Approval Page – – – – – – ii
Certification – – – – – – – – iii
Dedication – – – – – – – iv
Acknowledgement – – – – – – – v
List of Tables – – – – – – ix
List of Figures – – – – – – – x
Abstract – – – – – – – – xi

CHAPTER ONE: INTRODUCTION

Background of the Study – – – – – – 1
Statement of the Problem – – – – – – 8
Purpose of the Study – – – – – – – 9
Significance of the Study – – – – – – – 9
Scope of the Study – – – – – – – – 11
Research Questions – – – – – – – – 12
Hypotheses – – – – – – – – 12

CHAPTER TWO: REVIEW OF LITERATURE

Conceptual Framework – – – – – – – 14
Universe of Admissible Observation – – – – – 15
Generalizability (G) Study – – – – – – – 16
Universe of Generalization – – – – – 17
Decision (D) Study – – – – – – – 17
Variance Components – – – – – – 20
Error Variances – – – – – – 21
Generalizability Coefficient / Indices – – – – 22
Theoretical Framework – – – – – – 23
Classical Test Theory – – – – – 23
An Overview of Generalizability Theory – – – – 47
Related Empirical Studies in G-Theory – – – 72
Sampling model for validity– – – – – 72
Dependability and Interchangeability of assessment methods
in Science – – – – – – – 74
Dependability of Scores for a New ESL, Speaking Test – – 74
Generalizability Investigation of Cognitive Demand and Rigor –
Ratings items and Standard in an Alignment Study – – – 77
Investigations of Raters and Occasions as Potential Sources of
Error in Children’s Draw-A-Persons Scores using G-Theory – 78
A Multigroup G-Theory Analysis of a Large-Scale Reading
Comprehension Test. – – – – – – – 79
Generalizability Study of the Medical Vignettes in Interview to
Assess Students Non-Cognitive Attributes for Medical School- 80
How Accurate are ELS Students holistic Writing Scores on Large
Scale Assessment? A Generalizability Theory Approach – – 81
Generalizability Study of Job Performance Measurement of
Navy Machnics Mates – – – – – – 82
The Generalizability of NU-6 word Recognition scores, – –
A Generalizability Analysis of the Speech perception in Noise
(SPIN) test – – – – – – – 83
Summary of Literature – – – – – – 84

CHAPTER THREE:RESEARCH METHOD

Design – – – – – – – – 86
Area of the Study – – – – – – – 87
Population of the Study – – – – – 87
Sample/Sampling Techniques – – – – – 87
Instrument for Data Collection – – – – – – 88
Validation of the Instrument(s) – – – – – – 89
Reliability of the Instrument(s) – – – – – – 90
Method of Data Collection – – – – – – – 90
Method of Data Analysis – – – – – – – 90

CHAPTER FOUR: RESULTS – – – – – 91

CHAPTER FIVE: DISCUSSION OF RESULTS – – – 100

Discussion of Findings – – – – – – 100
Conclusion – – – – – – – – 104
Educational Implications – – – – – – – 105
Recommendations – – – – – – – – 105
Limitations of the Study – – – – – – 106
Suggestions for Further Studies – – – – – 106
Summary of the Study – – – – – 106
References – – – – – – – – – 110

INTRODUCTION

1.1 Background of the Study

Measurement pervades almost every aspect of modern society. Nworgu (2003) looked at measurement as the process of assigning numerical values to describe features or characteristics of objects, persons, or events in a systematic manner.

Measurement involves assigning figures, numerical quantities, or scores to variables or traits of interest.

For example, a great variety of things about individuals (achievement, aptitude, intelligence, height, weight) are measured by various people like teachers, doctors, etc, on a regular basis.

At a glance, obtaining scores of these attributes seems quite simple, but unfortunately, a major problem with many kinds of measurement is that there is often no basis to assume that the numerical value provided, accurately and truthfully represents the underlying quantity of interest.

For the mere fact that the results of these measurements can have a profound influence on an individual’s life, it is important to understand how these scores are derived and the accuracy of the information they contain.

Teachers give tests to determine what students know and are able to do in a particular content area. If there is confidence in a test, the belief is that a student who scores high on the test knows more in that area than a student who scores low.

In like manner, two students whose scores are similar probably have roughly the same level of ability in the area being tested.

Two questions, therefore, arise; how much confidence should we have in any particular test? How should our level of confidence in the test affect the way we think about student’s scores?

Yet no test, however well designed can measure a student’s true ability because there are numerous factors that interfere with our ability to measure it accurately and precisely.

REFERENCES

AERA, APA, & NCME (1985). Standards for educational and psychological
testing. Washington DC: American Psychological Association.
Akeju, S.A. (1972). The reliability of general certificate in education examination,
English composition papers in West Africa. Journal of Educational
Measurement. Seminar 1972. Retrieved March 10th, 2011 from
http://www.Jstor.org/pss/1434162
Allen, M.J. & Yen, W.M. (1979). Introduction to measurement theory. Monterey,
California: Brooks/Cole.
Allen, M.J. & Yen, W.M. (2002). Introduction to measurement theory, Long
Grove. IL: Waveland Press.
American Educational Research Association, American Psychological Association,
National Council on Measurement in Education (1999). Standards for
educational and psychological testing. Washington, DC: American
Educational Research Association.
Ary, D., Jacobs, L. & Razavieb, A. (1996). Introduction to research in education.
(5th Eds.). US: Harcourt Brace & Co.

Comments are closed.

Hey Hi

Don't miss this opportunity

Enter Your Details