A 'D' For Details: Should The City Release Teachers' Ratings?

By Helen Zelon.

Published November 9, 2010

Photo by: Marc Fader

Principal Ben Shuldiner of the High School for Public Service: Heroes of Tomorrow school in Crown Heights says the public should know how teachers are graded.

When publishing exec Cathie Black replaces Joel Klein as the head of the nation’s largest school system in coming, she’ll bring big-time business-world cred with questionable applicability to the world of education.

So she’ll have something in common with the internal teacher ratings at the center of a very public tug-of-war between the city’s Department of Education (DOE) and the United Federation of Teachers (UFT)—a dispute that will be either one of the last things Klein handles, or the first tasks Black takes up.

On October 20, the DOE, citing Freedom of Information Act requests from prominent news organizations, announced that it would release the names and individual ratings of 12,000 public school teachers. This came hard on the heels an August Los Angeles Times expose that published local teachers’ scores.

The next day, the United Federation of Teachers filed suit to prevent the release and publication of teacher names. On October 22nd, a state judge stayed any release of the information before November 24th.

In the tempest, details have gotten lost. So far, the debate has turned on questions of teacher privacy and alleged flaws in how the scores are calculated.

But what’s missing is any understanding of who will be affected—fewer than 1 in 6 teachers—should the scores be released, or what the scores in question really mean.

‘D’ is for ‘details’

While such ratings have been used in states like Tennessee since 1992, New York City’s Department of Education only began using internal value-added measures in 2006. This year, the teachers union and the city agreed that the ratings will count for a quarter of a teacher’s overall professional evaluation – one of a range of measures, like classroom observation and other assessments, used to assess teacher performance. But the city and the union never agreed to make the scores public.

Value-added scores, a paradigm adopted from the profit-loss model in the business world, use student progress, as reflected in standardized test scores over time, to assess the impact of individual teachers. It’s meant to show which teachers help students make stronger gains – and which don’t. A teacher whose students post higher gains than comparable students in a comparable class, earns a higher value-added score. Because scores are calculated by comparing test results from year to year, students and teachers need at least two years of test scores to be able to develop a value-added rating.

It’s unclear if value-added scores provide the kind of precise measure they promise. What is clear is that, in New York City at least, the reports pertain to only 12,000 of the city’s nearly 80,000-member teaching force.

By design, the value-added model, developed by a team led by Columbia economics professor Jonah Rockoff, only applies to the small subset of teachers in the “testing grades”—grades 4 through 8. New York students begin taking standardized tests in the 3rd grade, but since two years of testing are needed to calculate value-added scores, 4th grade is the first year where students have produced enough data for their teachers to be scored.

In the middle schools, where most teachers specialize by subject, only English and math teachers are eligible for value-added ratings, because those subjects are tested by the state.

All other teachers—those in kindergarten, first, second and third grades; those who teach middle school science, social studies, art, or foreign languages; and every high school teacher in New York City’s more than 400 high schools—are “ignored by value-added assessment,” according to Sean Corcoran, assistant professor of education economics at NYU, whose 2010 report analyzed value-added measures of teacher ratings.

So the measure that Klein says will allow parents to evaluate teacher quality applies only to a small fraction of the city’s teachers, with roughly 85 percent of the teaching workforce entirely outside the process,

‘F’ is for ‘flaws’

Underpinning all are profound concerns about the integrity of the state student tests, and the validity of those test scores. In 2009, state tests documented 82 percent of students as proficient in math, compared with 54 percent this year. Reading scores fell even more precipitously; in 2009, two-thirds of students were judged proficient or better. This year, only 42 percent made the grade. The city and state, like the rest of the country, are moving toward using national assessments, which are expected to cut down on state-by-state grade manipulation. But those new tests won’t be in use for three to five years.

“There are problems with the tests,” Department of Education spokesman Matt Mittenthal told City Limits. “The word ‘inflation’ is right. What we found this year was that students have a longer way to go than we might have thought. That said, value-added measures are designed to measure growth, no matter where you start.”

So even a test that is flawed, Mittenthal said, can still be used as an effective yardstick – both good and bad in one.

But there are questions about the way students’ test scores figure into the value-added teachers’ ratings.

By the end of October, hundreds of individual teachers filed complaints about their personal test scores, claiming there were inaccuracies and omissions. Corcoran argues that irregularities and uncertainties in the city’s value-added rating system undermine the integrity of the scores themselves.

His report cites persistent omissions of student attendance and mobility data in the calculations, especially for students of color, new immigrants and students with special needs: Because the ratings compare student scores from year to year, students who are absent on test days are excluded from the calculations, as are students who may be new to a school (or the country) or who have moved during the school year. So value-added scores for teachers who teach in schools or districts with high student mobility and chronic absenteeism are affected by the frequent absences of the students they teach. As Corcoran puts it: “From the standpoint of value-added assessment, these students and teachers do not count.”

Joanna Cannon, the research director at the DOE’s Office of Accountability, concurs that students who are absent on a test day cannot be included in their teachers’ value-added measure.

“It is the case that students who are more mobile come from a more disadvantaged background and often have lower performance,” Cannon says. “The alternative is more problematic for us: We don’t want to create a situation where we attribute information to teachers who are not responsible for that.”

Ratings are also inconsistent from year to year, Corcoran noted in his report, with 31 percent of English teachers ranked in the bottom quintile – rated “failing” in 2007 – who ranked in the top rating quintile (“exceptional”) in 2008. About 1 in 4 math teachers showed the same dramatic, upward trajectory. The traffic flows both ways: About 60 percent of high-ranked English teachers in 2007 fell from the top rank in 2008, with 12 percent tumbling to the lowest possible ranking.