November 1, 2025

•

5 min (est.)

•

Vol. 83

•

No. 3

Is That Grade Accurate?

Joe Feldman

Before recording a grade, teachers should ask whether it accurately captures what the student knows—and gather better evidence when it doesn’t.

Premium Resource

Assessment & Grading

A hand holding a magnifying glass in black and white against a yellow background

Credit: Denis / Adobe Stock

It’s a common practice in classrooms: A student takes a deep breath, opens the test (on paper or on a screen), and does their best to earn as many points as possible. The teacher collects the test, assigns points to each correct answer, totals the points, and enters the score into the gradebook. That score goes into a “tests” category made up of several summative assessments and is weighted alongside other categories like “homework” and “labs,” then the gradebook app calculates a final course grade.

Notably, the teacher’s energy and expertise in this process is invested in constructing the assessment and in evaluating student responses. Depending on the context, a teacher may design the assessment from scratch or make adaptations to an assessment provided by colleagues or a publisher. Likewise, some assessments are evaluated and scored entirely by the teacher, while others rely on computer or machine scoring. Once scoring is complete, the teacher’s remaining task is simple data entry that requires only administrative skills, not pedagogical knowledge or judgment.

But this familiar assessing–scoring–reporting process (see fig. 1) can produce student grades that are inaccurate, even misleading, because this process excludes teachers’ professional judgment prematurely. We need a process that ensures that grades are accurate and valid representations of student understanding and that fully uses and trusts teachers’ professional judgment.

Diagram illustrating the Assess - Score - Report steps of the traditional grading process.

Grading What Matters

Let’s begin with a premise that should not be controversial: A student’s course grade should accurately reflect their understanding of what was taught. It should not include credit for things outside the content—such as whether the student signed the syllabus or brought classroom supplies. When we blend noninstructional factors into the grade, we diminish its power to communicate meaningful information about what a student has learned. The “B” that collapses academic and nonacademic information cannot communicate whether the student had strong content knowledge but did not return a signed form or had weaker understanding but completed all nonacademic tasks.

If the primary purpose of grades is to communicate accurately what course content students have learned, then the information we enter into our gradebooks must be valid. If a student’s assessment yields a score that truly reflects their understanding, we should enter it. But let’s stress-test the assess–score–report paradigm. What happens when a student understands the content, yet gets a low score? How well does our assess–score–report process support our goal for accurate grades when the student data we get from our assessment is invalid?

When Assessments Yield Invalid Data

Our assessments can yield inaccurate scores in two broad circumstances.

First, there may be flaws in assessment design. A poorly worded, ambiguous, or misleading item can cause students—sometimes many of them—to misunderstand what is being asked, and therefore to respond incorrectly even though they know the content the question was meant to elicit. Within the traditional assess–score–report process, this often goes unnoticed. The teacher marks the response wrong, totals the points, and reports the resulting score into the gradebook, which now does not accurately reflect what the student knows. Teachers occasionally recognize the flaw—perhaps because an unusual number of students make the same error—and try to remedy it by giving students the benefit of the doubt and awarding points as if the student answered correctly or excluding the question entirely. Either approach distorts accuracy: Awarding unearned points inflates scores, dropping the item alters the weight of the remaining questions, and both result in the targeted knowledge unmeasured.

Because of the diversity of learning styles and the unpredictable events of students’ lives, no single assessment will yield valid information about every learner.

Second, circumstances can interfere with performance. Students may have the required understanding but be unable to express it. Some reasons are extraneous and outside the student’s control: illness, lack of sleep because a baby brother was up all night, or witnessing violence on the way to school. Others are psychological or biological: test anxiety or a learning profile that does not align with the format of the assessment. When teachers are aware of these situations and suspect that the student’s performance was compromised, within the assessing–scoring–reporting paradigm they must still enter what they believe to be an inaccurate score. After they enter the grade, some teachers invent remedies—bonus tasks or extra credit—as a manufactured counterbalancing of the low score, but the assess–score–report process itself offers no solution. The result can be grades that communicate a false description of student knowledge unless teachers step outside the process to fix it.

A New Paradigm

To find a better solution, return to our primary purpose for a grade: to accurately describe a student’s understanding of course content. That responsibility is shared. It is the student’s job to learn, prepare, and demonstrate what they know; it is the teacher’s professional duty to ensure that the assessments measure that knowledge and result in scores that are reflections of student understanding. In medicine, clinicians administer tests that they believe will capture aspects of the patient’s health, but they don’t just record the results; they apply their expertise to ensure that the results are valid before they rely on them. Teachers likewise should bring their expertise not only to designing assessments and scoring responses, but also to validating whether a score truly represents learning. Just as we trust our physicians not to rely on invalid test results, we trust our teachers to not enter invalid student assessment scores.

By inserting a validating step into our process—amending it to be assess–score–validate–report—everything changes (see fig. 2). Before entering an assessment score, a teacher asks, “Did this assessment correctly reveal the student’s understanding?” If the teacher is confident the score is valid, they report it in the gradebook. If there is reason to doubt the score’s validity, they do not proceed to the “report” stage. Prior evidence about the student’s understanding—results from related formative checks, earlier summative tasks, or observed performance—may suggest to the teacher that the student’s opportunity to demonstrate knowledge was impeded by the assessment design or by external circumstances and subsequently yielded a false score or at least a score worth interrogating before being entered into the gradebook.

Diagram illustrating the Assess - Score - Validate - Report progression and cycle of the revised grading process.

To resolve potential invalidity of the score, a teacher might create an alternate way to assess the student’s understanding of that content and compare results to determine which score is more valid. They could consult prior formative evidence that showed a student’s strong understanding and decide to reject portions of the summative result that are inconsistent. Or they could conduct a quick verbal assessment with the student targeted at the specific standard. Only when a teacher is confident in the validity of the assessment results should they proceed to reporting that information in the gradebook. It’s worth repeating: A teacher has no interest in including invalid data about student performance in the grade calculation.

Of course, this validating step can be time-consuming and confusing. What is enough evidence to adjust or dismiss a test result? How can this process be applied with integrity and fairness for every student? Because of the diversity of learning styles and the unpredictable events of students’ lives, no single assessment will yield valid information about every learner. Rather than validate each summative assessment score before reporting it in the gradebook, therefore, many teachers choose to administer several kinds of assessments to measure the same course outcomes and then determine the most valid grade from the constellation of evidence (see fig. 3). By expanding the aperture of assessment design—varying the assessment modalities, timing, and item types—the teacher assembles a bank of data across situations and formats to determine the most valid descriptions of a student’s level of understanding.

Diagram illustrating the variety of potential types of data that can comprise the Evidence Bank, including different modalities, different days, and different item types.

For example, in high school English teacher Ruhiyyih Wittwer’s unit on Of Mice and Men, she provides an assessment “map” to explain how she uses multiple types of assessments to decide a student’s understanding of each standard. For instance, as shown in an excerpt of this “map” (see fig. 4), a student’s score on Reading Standard 1 is based on evidence from four different assignments: two on-demand writing tasks, an optional paragraph assessment, and an essay. With this expanded body of evidence gathered at different moments and through different modes, the teacher derives a single “overall” grade for the standard. Importantly, the teacher still records in the gradebook the performance on each assessment so the data is preserved, but she includes only the validated overall grade of summative assessments in the final term’s grade calculations.

A table showing four types of student work a teacher has collected to satisfy two different reading standards, throughout a unit on the novel Of Mice and Men.

Ironically, amending the summative paradigm to include validation makes it resemble what teachers already do with formative assessment. Stiggins (2004) describes formative assessment as assessment for learning: gathering information about where students are on their learning trajectory to provide responsive instruction and target knowledge gaps or misunderstandings. During learning, invalid formative data can send teachers aiming at the wrong target and wasting time, so teachers ensure that the formative data communicates accurate information to them. The commitment to valid evidence should be even greater in summative contexts so that our grades accurately communicate student knowledge not just to teachers, but to students, families, and external audiences.

This increased application of professional judgment can feel daunting. It means sharpening our assessment craft: designing prompts that are clear and aligned to what was taught; writing test items that elicit the thinking we intend to measure; building a battery of assessments that address the diversity of students’ learning and communication styles; and collaborating with colleagues to leverage collective expertise and to norm expectations for performance. Most teachers have not been trained explicitly to evaluate the validity of a particular student’s score. It can be tempting—especially under time pressure—to retreat to the apparent safety of entering whatever score the test produces, without questioning its validity. But it is our professional obligation, and an opportunity to more fully use our professional expertise, to report only valid data into our gradebook.

I often begin my workshops by asking teachers about their experiences receiving grades as students, and some of their most painful and frustrating memories are when a grade didn’t match what they actually knew in the subject. Adding a “validating” step in the assessment process can prevent us from replicating these experiences for our students and instead communicate to them that assessment and grading are shared responsibilities. Students commit to learning and demonstrating; teachers commit to ensuring that the grade they report is accurate.

Building the Case for Fairer Grades

Recent studies on grade inflation (The New Teacher Project, 2023) and the mismatch between teachers’ grades and external course-aligned tests (Equitable Grading Project, 2024) have cast doubt that teachers’ grades are accurate descriptions of student understanding. Our adjusted assess–score–validate–report paradigm gives us the opportunity to include our professional judgment when we enter summative assessment scores into the gradebook, thereby fortifying a case that our grades accurately and reliably describe student understanding of course content, and strengthening our credibility among parents, students, colleagues, and college admissions officers.

Validating evidence of student understanding before reporting it supports the relationship between teacher and learner.

Validating evidence of student understanding before reporting it also supports the relationship between teacher and learner. It holds students accountable for learning—they no longer can get by with just earning participation and extra credit points—while simultaneously reducing counterproductive pressure and test anxiety. Students know if their teacher cares about their understanding of the material, then even if they bomb a test, the teacher will consider other ways to find out what they’ve learned. That bad test grade won’t consign them to failure.

None of this means that grades become softer or standards lower. On the contrary, insisting on valid evidence makes grades more rigorous and more honest. It requires more, not less, from both students and teachers. Students must learn and be ready to demonstrate what they know; teachers must judge the evidence. When we use our expertise in this way, the grade becomes what it should have always been—accurate and trustworthy.

The familiar ritual will not disappear. Students will still take a breath and open the test, and teachers will still evaluate responses and enter results. The difference is that before a score is allowed to shape a student’s future, we pause to ask whether it truly describes their understanding. If it does, we can report it with confidence. If not, we do what professionals do: gather better evidence and get to the truth. In doing so, we elevate our practice, improve the fairness and accuracy of grades, and better serve students, families, and the larger community that rely on our judgments.

Reflect & Discuss

Was there an instance where you as a teacher declined to report a score until you gathered better evidence? What did you learn?
If your school moved to assess–score–validate–report, what support would school leaders need to give teachers (time, templates, education, models) to make it work with integrity?
Whose knowledge is more likely to be under-represented by your school’s current assessments (e.g., multilingual learners, students with test anxiety, etc.)? What alternate measures can surface their understanding?

References

•

Equitable Grading Project. (2024). Can we trust the transcript? Recognizing student potential through more accurate grading.

•

The New Teacher Project. (2023). False signals: How pandemic-era grades mislead families and threaten student learning.

•

Stiggins, R. J., Arter, J. A., Chappuis, J., & Chappuis, S. (2004). Classroom assessment for student learning: Doing it right—using it well. Pearson.

Joe Feldman has worked in education for over 20 years as a teacher, principal, and district administrator. He is founder of Crescendo Education Group, which since 2013 has supported K-12 schools, districts, and colleges/universities nationwide to improve grading and assessment practices. He has presented at numerous education conferences, and his writings have been published in Education Week, Kappan, Education Leadership, District Administrator, and Black Press USA. His book, Grading for Equity: What It Is, Why It Matters, and How It Can Transform Schools and Classrooms (Corwin), was published in 2018.