How do we solve too much testing?

Snapshot: How James Would Fix Testing

  1. Tie tests to standards-based units, for more frequent, lower-stakes assessments that replace other grading and end-of-year testing
  2. Measure mastery of standards: Students can see their progress and not be penalized if mastery doesn’t happen on the first try; understanding whether students are getting at least a year’s worth of learning growth each year becomes as easy as adding up how many standards they mastered
  3. Stop tying student results to pay for performance, to eliminate incentives to cheat the system.

What Should North Carolina Do About Over-Testing in Schools?

Testing is a hot topic on the campaign trail for State Superintendent.  North Carolina has a long history of standardized testing for various purposes, going back to our ABCs program, which included bonuses for teachers at “high performing” schools.  We know from strong research and this state’s experience that high-stakes testing does not improve education. But policymakers have kept asking us to do more. How can we truly fix this system and benefit our students?


Testing, testing … right now

To meet our requirements under the Every Student Succeeds Act (ESSA), North Carolina tests all students in grades three through eight in English language arts and math at the end of every year.  There are also science tests in fifth and eighth grade, and several required tests in high school, including the ACT for all eleventh-graders.  These tests all show a measure of “proficiency” for each student against an arbitrary scale score. That’s supposed to reflect progress against the standards the state sets for what students should learn.  Once students have taken multiple tests over more than one year, a secret, private algorithm will also generate a “growth” score, meant to show how a student’s learning grew from the first test to the second, based on expectations set by prior data.  Combining the proficiency and growth measures, the state spits out all sorts of results that claim to measure subgroups of students, teachers, principals, and schools on an A-F grading scale.


What’s wrong with that?

How the algorithm arrives at the scale score and growth indexes are not well understood by teachers, parents, or policymakers, so these don’t much help anyone make changes to how we educate students.

The growth indexes are averaged across students so that if Anita grows 6.0, and Johnny grows -1.0, the teacher will get a score of 2.5, which is shown as “exceeded growth” expectations even though half her class did not show a year’s progress.

Testing windows are generally two weeks, and occur at the beginning of the year (to get some baseline data) and the end of year, sometimes with district benchmarks in between. When so few tests set such high stakes for students, teachers, and principals, inevitably, everyone feels immense stress. Even schools that try to make this a “positive” experience send the message to their kids that these tests are massively important, and kids feel the strain. Do we really think, then, that they’re showing us their best work? Additionally, during testing season (a phrase that really should not exist), teachers and all support staff are distracted by having to be strict proctors, and students during such a stressful time can’t even go outside to play, because their noise will impact other students who are testing.

Learning is also impacted by “review” cycles and test-taking strategies, which begin at least a month before testing, so our overall calendar for learning gets squeezed.

These few high-stakes tests become the basis for far too many other high-stakes decisions. We’re using these test results to determine bonus pay for teachers.  We use it to label schools (in a way that penalizes schools who have a higher proportion harder-to-educate students).  We determine bonus pay for principals based on the results. 

So why not ditch the whole thing?

So, should we just stop assessing students? Of course not. There are three main reasons to have some standard assessments. First, students simply do better when they have measures of progress to understand their goals and how they are progressing. (Think about how a score in a game guides your playing!)  Second, teachers need assessments to know what students are learning and when they need to adjust instruction so that all students meet the standards. Finally, there is some public policy purpose in testing (but it rightly comes last in this list): To better understand and talk about achievement gaps. Ever since — under No Child Left Behind legislation — we had to start reporting on the progress of subgroups of students, we’ve had somewhat better measures of these gaps. Data from assessments provides some consistent measures of whether policy and program choices are working for students.  And ideally, standardized data increases the public’s confidence in our schools.

Two changes for a better system

I recommend two significant changes to our testing culture.

First, we need much more frequent but much lower-stakes assessments that will genuinely help teachers adjust instruction, which we can also use to measure our public policies. This should happen at the end of every unit, completely replacing other assessments teachers use for grades–eliminating duplication of testing. Do we still need some sort of testing to check the validity of those scores? Possibly; if so, we should consider limiting this testing to a sample of students.

Second, those assessments should tie directly to North Carolina’s strong set of standards, which clearly state what students should learn in each grade/course. We can base our measurement of schools solely on a student’s progress against these standards–how many standards did each student master? That clearer measurement provides significant and simple snapshots of our students’ learning for the students themselves, parents, teachers, and the public.

For example, fourth-grade math has 25 standards.  One of them is “Add and subtract multi-digit whole numbers up to and including 100,000 using the standard algorithm with place value understanding.” 

We should have an assessment (a quiz or other as appropriate) that measures a student’s understanding of this standard.  It can be delivered at any time.  The student will show some level of mastery on the standard.  If they reach the level of “mastered,” they don’t need to be tested on that standard again.  If not, they can demonstrate mastery later. Even if a student hasn’t completed all their third-grade standards, as a teacher realizes progress, the teacher can let the student show mastery for standards in other grades to measure growth.

At the end of the year, we count the number of standards the student has mastered, and where they are against all fourth-grade standards. A student is considered proficient at fourth-grade math if they’ve mastered all 25 standards.  If the student started the year still missing mastery of 10 third-grade standards, but completed those 10 in fourth grade plus 20 of the fourth-grade standards, they aren’t yet proficient, but they exceeded growth expectations because they completed 30 standards in the year.  They will have opportunity to finish fourth-grade standards the next year as well. 

Students and families thus have a clear picture of where they stand at any point in the year against the expected standards.  They also can see growth occurring in real-time as students complete standards during the year.  Think of the stress relief when we eliminate Read to Achieve because we can measure where students are against the spectrum of reading skills and address issues at appropriate instructional points in their schooling instead of the high-stakes test at the end of third grade.

And now teachers know precisely where all their students are against the standards, and they can guide instruction appropriately.  They know that students who just take longer to master content will have a chance to re-assess later, when it clicks for them.  What’s the value in marking a student a failure the first time through a lesson if a little later they’ve mastered what we expect them to learn?  We should grade students based on what they demonstrate knowing.

And we can get more useful public policy data from this change as well.  We will not only know what percentage of students are proficient, but we will for the first time be able to see what percentage of students are getting at least a year’s worth of learning growth each year, and we can also clearly measure if students who came in behind are showing more than a year’s worth of growth each year–what they need to eventually catch up.  If I can say that Obama Elementary has 80 percent of students clearly on grade level because they’ve shown proficiency on the standards, and 95 percent are growing at least a year for every year we have them in the public schools, that’s a great school that all parents will want to send their children to.  If we’re not delivering those clear results, principals and teachers will have the real-time data they need to adjust instruction.  Policymakers will also have the clear results needed to know what interventions need more resources (and which are not effective as well). Remember–the point here isn’t just to show which schools are succeeding or failing, it’s to get the struggling schools the resources they and their students need.

We can use these measures in any course that we have clear standards for–and shouldn’t we have clear standards for every course? By simply tracking mastery of standards, we’ll actually expand the data we have available to understand how students are doing.

Keys to this transition will be:

  • Having strong assessments.  I propose we pilot this effort with a single grade and a statewide team of teachers and experts in assessment and data, to work on how to assess each and every standard. Such a team and pilot may not be necessary in every grade afterward, but by providing transparent implementation to start will help us know we are on the right track.
  • Removing incentives to cheat the system by eliminating all ties to pay for performance. Everyone wants valid measures of student achievement.  But human nature makes it harder to get that if there are reasons to cheat.
  • Communicating the difference between traditional letter grades and standards-based.  An “A” (or a 93) is actually an odd, arbitrary designation of how a student did on an assignment at a point in time.  Standards-based mastery assessment allows us to see what the student has mastered at any point in time.  You are not hurt by “failing” five  times if you show you fully understand the standard on the sixth try.  This is a significant mindset shift that will help show whether students are achieving the skills we want them to learn in school, lessen stress on students, and potentially reinstate some of the joy of learning.
  • Working with all stakeholders (General Assembly, families, teachers, administrators, business communities) to understand how these changes can deliver more useful information to students and teachers and still meet public policy goals.  And then drive the legislative, professional development, and implementation work needed to see this through.

We have the technology to tie teacher assessments into large data sets for public policy purposes.  We just need to clearly align all teacher assessments with our existing strong standards, and then change how we aggregate those results to report on what we expect of schools–that is, that for each and every child we put in school in August, they should gain a year’s worth of knowledge by June.  And for students who start out behind, they should get more than a year’s growth so that they catch up over time.

There are other areas the next Superintendent of Public Instruction needs to address as well: future columns on will look at issues of ensuring that students are all prepared to learn when they walk in the school building (including wraparound services) and School Performance Grades (which, while based on testing, need to be addressed separately to ensure alignment regardless of how we assess).

This Post Has 3 Comments

  1. Dot Sulock

    The questions on the test matter a lot. Current Math EOG tests are unnecessarily difficult. They include no fair standard mastery questions that non-elite students could successfully do. The tests are frightening and discouraging from question 1 to the end and non-elite students just give up quickly. This destroys their belief in the fairness and rationality of education as well as destroying their self-confidence and possible liking for math. These tests are criminal.

Leave a Reply