Levels of Rigor: A New Way of Measuring Growth in Student Achievement

An increasing number of school systems are seeking to evaluate teachers by measuring growth in student achievement.  From the perspective of policymakers very far removed from the classroom, this seems like an easy proposition.  When standardized test scores increase from X to Y, then the teacher is deemed to be effective.  Some states have also allowed schools to use district-based test scores or even teacher-based grades to show growth.  The problem with all of these alternatives is that they create a perverse incentive for low achievement for the first marking period – the “X” score – and a very high stakes and anxiety-laden environment for high scores on the second marketing period – the “Y” score.  There is a better way:  Levels of Rigor.
What if the measurement of improvement was based not on the student score but on the degree of rigor in the student assessment?  Students could earn the same score in the fall and spring, but if the degree of rigor increased and improvement was based on rigor, then the same scores would yield substantial gains in student learning.  This system would completely change the incentives for teachers and students. Rather than rewarding students and teachers for false gains – low scores and grades in the fall compared to higher scores and grades in the spring – rigor-based gain creates only a single positive incentive: learn at higher levels throughout the year. 
What does this mean in practical classroom terms?  In the fall, students in secondary school might write an essay, with a claim supported by arguments and evidence.  But in the spring, the same students might write an argumentative essay that included external citations and court decisions, analogous to a legal brief.  The number of “proficient” student might be the same from the fall to the spring, but the degree of rigor has increased substantially.  Elementary students might begin the school year with linear measurement and end the year with two-and three-dimensional measurement using both the English and metric systems.  Scores and grades don’t need to change – the improvement is in rigor. 
When we use accountability systems based on scores, every incentive is to make the fall semester test difficult and the spring semester easy, artificially boosting the percentage of students with higher grades and proficiency levels.  It’s gaming the system, something that at least some administrators and teachers who are facing termination based on a lack of sufficient progress will certainly do.  If, on the contrary, we use an accountability system based on rigor, then the incentives of policymakers, taxpayers, teachers, and citizens are all in alignment – when students learn at a higher level during the year, we recognize the real improvement that takes place. 
“Not everything that can be counted counts, and not everything that counts can be counted.”  This epigram, often misattributed to Albert Einstein, actually comes from the lesser-known sociologist Bruce Cameron.  But despite the less illustrious provenance, it remains an important concept, particularly in the context of teacher evaluation.  If the question is, “To what degree has the teacher demonstrated improvement in student achievement?” the answer cannot be accurately represented by changes in test scores or grades.  Rather, we should be able to say with confidence that the same students with the same teacher demonstrated proficiency at higher degrees of rigor during the same school year.  For those demanding quantitative indicators, we can label rigor during the year as Rigor Levels I, II, and III.  If the same percentage of students are performing at Level I in the fall and Level III in the spring, a comparison of test scores would call that “zero progress.”  But a comparison of Levels of Rigor would label that as a 66% improvement.  Certainly no measurement system is impervious to criticism – but that doesn’t stop state and national policymakers from demanding numerical accountability systems.  “Levels of Rigor” is not perfect, but it is substantially better than comparisons of test scores and grades for the evaluation of teacher impact on student learning.