An Approach to Specifications Grading: Guest Post by John Ross

I have been involved in a lot of discussions about assessment strategies lately. There is a bit of a swell of young faculty who are rethinking their assessment strategies carefully. For some, this is a first serious step to rethinking their jobs as educators, and for others it is further step into the details of how to be effective.

Today we have a guest post by John Ross of Southwestern University. I met John at the Legacy of R.L. Moore meeting this summer, so I already know he is interested in effective teaching methods. This past weekend he mentioned lightly on twitter that he is using a new assessment setup. I wanted to hear the details, so I invited him to write about it. I am very pleased that he accepted my challenge.

My Version of Specs-Based Grading

by John Ross, Southwestern University
This semester I am running my calculus class using a specifications-based grading system. The decision to do this was made after discovering Robert Talbert’s blog and reading the many informative things he had to say about specs grading. If you’re unfamiliar with this style of grading, I’d recommend starting there (

I developed my syllabus late last summer after talking with Robert at the IBL conference and with several people at MathFest (Spencer Bagley and Tom Clark, among others). My version is similar to many other versions that I’ve read about, but it does differ in a few ways. I use two major types of assessment that run parallel to each other: (weekly) quizzes and (approximately monthly) tests. Quizzes and tests cover most of the same material, but use different “grain sizes” and are assessed differently. I don’t think this quiz/test setup is particularly innovative, but it does allow me to take some liberties with my grading method for my quizzes. Quizzes are assessed using a mastery-based grading system with four possible grades. After a quiz is returned, students have the opportunity to revise their grade by doing additional work.

The quiz/test balance

As mentioned above, my students take quizzes and tests with different “grain sizes” and different grading systems for each. Quizzes are used to assess discrete “skills.” There are 33 skills marked out for the whole semester, and each week between 1 and 3 skills appear on the quiz. Tests, on the other hand, are used to assess more broadly defined “subjects.” There are 13 subjects in total. At the end of the semester, a student’s grade will be determined by the number of skills and the number of subjects that student has been able to master (in addition to some other factors, such as homework).
If a student fails to master a skill on a quiz, or fails to master a subject on a test, there are other chances to show mastery. I’ll talk about the quizzes below, but mastery on the tests is done in a traditional “mastery” format (so students who fail to pass a subject on the first test will have a second chance on the next test and, if necessary, chances on future tests as well). The final exam serves as a final test to show mastery: if students have done well throughout the entire semester, they may not even need to take this exam.
By design, there is a large amount of overlap between skills and subjects. This allows me to assess students’ understanding of the material twice, and makes me feel more comfortable with my quiz grading system. In particular, the tests give me some comfort that students can’t succeed in this class by cramming for a skill quiz, passing it, and then forgetting the material.

Paths to mastery on the skill quizzes

The weekly quizzes, and the way I assess them, are the heart of my class. I grade each skill separately, give as much feedback as possible, and give students one of four possible grades: Mastered, Progressing: Email, Progressing: Discussion, or Insufficient. Grades of M (or I) are used when students do nearly perfectly (or very poorly) The “progressing” grades allow for some wiggle room in the middle, with minor mistakes getting a grade of “Progressing: Email” and more moderate mistakes getting a grade of “Progressing: Discussion”.
After getting their graded quizzes back, students have 2 weeks to raise their grade for each skill to “Mastered”. How they do this depends on what grade they received. A student who receives a P:E must send me an email (usually a paragraph or less) detailing what they got wrong, and what the correct answer is, to receive mastery. A grade of P:D means that the student must come and talk to me during office hours (or set up an appointment outside of office hours) for a short discussion about the skill. A grade of Insufficient means they must come to me for a minor discussion, and take an alternate quiz (which will be graded on the same scale of M—P:E—P:D—I).
What happens, then, is a student can have several chances (and go through several iterations) attempting to receive mastery for a skill. As an example, a student could do very poorly and receive an I on the first quiz; receive a P:E on the alternate quiz; send me an initial round of corrections via email, which I push back against because I disagree with their explanation; and, finally, submit good corrections and receive mastery.

Positive aspects of this system

I like this system for a number of reasons. First of all, it gives students multiple chances (and multiple ways) to display mastery of a skill. There are some students, for example, who have told me that they suffer from test anxiety and constantly underperform on the quizzes, yet can clearly communicate the material to me via email or by face-to-face discussion.

Second, it gives me a greater degree of control and flexibility than I would have in a simpler Mastered/Insufficient grading system. This means I don’t have to agonize over how to grade “mid-range” quizzes.

Third, it can let me “cheat” by pushing students to relearn various things, even if they are not directly related to the skill at hand. For a concrete example: if a student does well on a Chain Rule quiz but forgets how to compute an important derivative, I can reward them for their work on the chain rule while forcing them to revisit (and hopefully learn) the derivative in question by giving them a grade of P:E. If I were using a binary grading system, I would probably have to give them grade of M (since they had mastered the chain rule) and there’s no guarantee that they would read my comments or learn that derivative.
Finally, I believe I spend less time grading under this system than if I were using a binary “Mastered or Insufficient” system. This is probably debatable, but I believe that I spend less time reading emails and having short discussions than I would by regrading and regrading multiple papers.

Negative aspects of this system

There are definitely some downsides to this system. The biggest downside is that I am concerned that some students are using the system as a crutch, and aren’t learning the skills as deeply or completely as they should. Some students seem to be banking on getting “progressing” grades, and then copying out of the book/notes to fix that grade to a “mastery” level. This is concerning, but the parallel test system helps me feel more confident that these students must (eventually) learn the material.
Other students tend to view the discussions as a time when I will explain to them what they did wrong, rather than the intended goal of having students lead the discussion and explain to me what they had done wrong. This is definitely my fault for failing to adequately communicate my intentions for these discussions. I have found the following work-around: if a student comes to a discussion woefully unprepared, I will discuss the problem with them, but won’t give them mastery for the skill. Instead, I’ll bump them up to P:E, and make them email me a summary of the discussion we’ve had, plus corrections to their quiz.
Because I have a large number of students (60 students in Calculus, plus an additional 30 in another class), I end up hosting a lot of office hours (5 a week, plus appointments). These office hours aren’t always full, but many students do show up. I maintain this isn’t a terrible thing, though, because a lot of that time spent meeting with students would be spent grading otherwise (and these discussions can offer a lot of chances for formative assessment and feedback for the student).

Finally: 60 calculus students being quizzed on 33 skills, and each skill being graded on a 4-point scale with probable revisions, can lead to a rather complicated looking grade book. I try to keep a record not only of each student’s current grade, but their whole grade history for each skill (so I can reference, for example, that a student has already taken an alternate quiz and gotten a grade of P:E on it). Halfway through the semester, I have had several students who were unclear on what their grade was in the class. It’s very nice to have this amount of information to refer back to, although the grade book was rather difficult to set up initially.

As the semester wears on, I’m sure that more positive and negative aspects of this system will come to light. For now, though, I am relatively happy with how this system is working out, and an informal anonymous survey of my students suggests that they like it too. I am happy to provide more information to anyone who has any questions.