When publishing exec Cathie Black replaces Joel Klein as the head of the nation's largest school system in coming, she'll bring big-time business-world cred with questionable applicability to the world of education.

So she'll have something in common with the internal teacher ratings at the center of a very public tug-of-war between the city's Department of Education (DOE) and the United Federation of Teachers (UFT)—a dispute that will be either one of the last things Klein handles, or the first tasks Black takes up.

On October 20, the DOE, citing Freedom of Information Act requests from prominent news organizations, announced that it would release the names and individual ratings of 12,000 public school teachers. This came hard on the heels an August Los Angeles Times expose that published local teachers' scores.

The next day, the United Federation of Teachers filed suit to prevent the release and publication of teacher names. On October 22nd, a state judge stayed any release of the information before November 24th.

In the tempest, details have gotten lost. So far, the debate has turned on questions of teacher privacy and alleged flaws in how the scores are calculated.

But what's missing is any understanding of who will be affected—fewer than 1 in 6 teachers—should the scores be released, or what the scores in question really mean.

'D' is for 'details'

While such ratings have been used in states like Tennessee since 1992, New York City's Department of Education only began using internal value-added measures in 2006. This year, the teachers union and the city agreed that the ratings will count for a quarter of a teacher's overall professional evaluation – one of a range of measures, like classroom observation and other assessments, used to assess teacher performance. But the city and the union never agreed to make the scores public.

Value-added scores, a paradigm adopted from the profit-loss model in the business world, use student progress, as reflected in standardized test scores over time, to assess the impact of individual teachers. It's meant to show which teachers help students make stronger gains – and which don't. A teacher whose students post higher gains than comparable students in a comparable class, earns a higher value-added score. Because scores are calculated by comparing test results from year to year, students and teachers need at least two years of test scores to be able to develop a value-added rating.

It's unclear if value-added scores provide the kind of precise measure they promise. What is clear is that, in New York City at least, the reports pertain to only 12,000 of the city's nearly 80,000-member teaching force.

By design, the value-added model, developed by a team led by Columbia economics professor Jonah Rockoff, only applies to the small subset of teachers in the "testing grades"—grades 4 through 8. New York students begin taking standardized tests in the 3rd grade, but since two years of testing are needed to calculate value-added scores, 4th grade is the first year where students have produced enough data for their teachers to be scored.

In the middle schools, where most teachers specialize by subject, only English and math teachers are eligible for value-added ratings, because those subjects are tested by the state.

All other teachers—those in kindergarten, first, second and third grades; those who teach middle school science, social studies, art, or foreign languages; and every high school teacher in New York City's more than 400 high schools—are "ignored by value-added assessment," according to Sean Corcoran, assistant professor of education economics at NYU, whose 2010 report analyzed value-added measures of teacher ratings.

So the measure that Klein says will allow parents to evaluate teacher quality applies only to a small fraction of the city's teachers, with roughly 85 percent of the teaching workforce entirely outside the process,

'F' is for 'flaws'

Underpinning all are profound concerns about the integrity of the state student tests, and the validity of those test scores. In 2009, state tests documented 82 percent of students as proficient in math, compared with 54 percent this year. Reading scores fell even more precipitously; in 2009, two-thirds of students were judged proficient or better. This year, only 42 percent made the grade. The city and state, like the rest of the country, are moving toward using national assessments, which are expected to cut down on state-by-state grade manipulation. But those new tests won't be in use for three to five years.

"There are problems with the tests," Department of Education spokesman Matt Mittenthal told City Limits. "The word 'inflation' is right. What we found this year was that students have a longer way to go than we might have thought. That said, value-added measures are designed to measure growth, no matter where you start."

So even a test that is flawed, Mittenthal said, can still be used as an effective yardstick – both good and bad in one.

But there are questions about the way students' test scores figure into the value-added teachers' ratings.

By the end of October, hundreds of individual teachers filed complaints about their personal test scores, claiming there were inaccuracies and omissions. Corcoran argues that irregularities and uncertainties in the city's value-added rating system undermine the integrity of the scores themselves.

His report cites persistent omissions of student attendance and mobility data in the calculations, especially for students of color, new immigrants and students with special needs: Because the ratings compare student scores from year to year, students who are absent on test days are excluded from the calculations, as are students who may be new to a school (or the country) or who have moved during the school year. So value-added scores for teachers who teach in schools or districts with high student mobility and chronic absenteeism are affected by the frequent absences of the students they teach. As Corcoran puts it: "From the standpoint of value-added assessment, these students and teachers do not count."

Joanna Cannon, the research director at the DOE's Office of Accountability, concurs that students who are absent on a test day cannot be included in their teachers' value-added measure.

"It is the case that students who are more mobile come from a more disadvantaged background and often have lower performance," Cannon says. "The alternative is more problematic for us: We don't want to create a situation where we attribute information to teachers who are not responsible for that."

Ratings are also inconsistent from year to year, Corcoran noted in his report, with 31 percent of English teachers ranked in the bottom quintile – rated "failing" in 2007 – who ranked in the top rating quintile ("exceptional") in 2008. About 1 in 4 math teachers showed the same dramatic, upward trajectory. The traffic flows both ways: About 60 percent of high-ranked English teachers in 2007 fell from the top rank in 2008, with 12 percent tumbling to the lowest possible ranking.   

The scores are "curved"; they compare teachers to other teachers, not to an objective scale. And since the value-added ratings rely on data that's been gathered over a comparatively short time, the scores are subject to confidence intervals (the range within which a score is thought to be accurate) of 30 to 60 points. That means a teacher with only a two years of data whose "score" is 43 (about average among teachers) has a potential score span of 15 (below average) to 71 (above average) on a 100-point scale.

"The huge confidence intervals are an indication," says DOE's Mittenthal, of "wide margins" of uncertainty. "They say a lot about how confident we are about a particular assessment. They're not an error, not a flaw," but a statistical measure that demonstrates a potentially wide range of values within which an actual score resides.

And, says Corcoran, it's "difficult to impossible" to define "unique contributions" by individual teachers, to teasing out all other influences that shape a child's achievement – despite the intention of value-added scores to do just that. Corcoran and others say that "school effects" like leadership, collegiality, school discipline and student mix can't be quantified easily – and that the current value-added measure falls short.

DOE spokeswoman Natalie Ravitz argues that the DOE's value-added model accounts for "life factors" because "statisticians make a prediction that factors in race, poverty, English Language Learner [status], disabilities, and absences." Their DOE's value-added model takes multiple factors into account in the calculations, but their algorithms do not seek to define or measure the school effects Corcoran identifies.

"The fundamental problem," NYU's Corcoran told City Limits, is that "the people behind these [value-added] systems have no experience with actual education or instruction." he was quick to point out that he is an economist, himself. "Economists and statisticians devised the value-added method, advised by policy experts with law and business backgrounds. Cognitive development experts have not been part of this process since day one."

'A' is for 'accountability'

DOE administrators currently use value-added data to help teachers understand their strengths and weaknesses in the classroom. Principals can look at student achievement across their school or across and individual grade and identify good and bad practices – again, evidenced by test scores—that can be supported, shared, or reworked.

DOE officials are aware that the data linked to teacher names can be taken out of context, Mittenthal said. Releasing a single score or teacher-rating risks muting the impact of all the other elements that make up teacher evaluations. "If we wanted this out there before we received the [FOIL] request, we'd have released it already."

"This is only one part of a bigger picture for each teacher. If the court compels us to release this information, we're going to include every caveat around it that we can. But the only information that was requested was the value-added data," Mittenthal says. "We're not going to volunteer any other information."

"We want to do our best to make sure that teachers are not shamed or humiliated," he added, recognizing the implicit risk to individuals that releasing the ratings contains, and echoing his boss, the Chancellor.

Teachers claim they are the only public employees whose job performance the city proposes to measure and make public by name. While it's true that police officers, firefighters and other civil service workers do not face a personalized degree of public scrutiny public school principals do face a comparable public airing, as each school earns a letter grade on the City's Progress Reports, linked with the principal's name.

Though subjected to personal scrutiny themselves, principals are divided on the merits of releasing value-added scores.

Principal Ben Shuldiner of the High School for Public Service: Heroes of Tomorrow school in Crown Heights says the public should know how teachers are graded.

Releasing the scores and teacher names would undermine the "monopoly tendencies" in many schools, which don't permit parents to participate in choosing their child's teachers.—an option the DOE has neither mentioned nor promised. "What's the fear?" Shuldiner challenges. "Why not? Why would we privilege that information? Who are you protecting – incompetent teachers? It doesn't make sense to me."   

But, he adds, it's critical to know the instrument that's used to assess teacher quality is sensitive and accurate: "Are the metrics good? Do they show quality? Is this the right yardstick? You don't want any one exam to be a total indicator of good or bad."

Phil Weinberg, principal of Telecommunication Arts High School in Bay Ridge, thinks value-added measures are short-sighted: "They assume that the previous years of instruction don't contribute to achievement," he says. Can the achievement of an upper-grade teacher, in middle-school or beyond, be separated from the teachers who have worked with her students years before?

Value-added lacks nuance, Weinberg says, because of its reductionist nature. "People don't reduce that easily to a binary or 100-point standard. Students have more variables attached to them than we can capture in a single measure. The profit-loss business model has failed us as a nation. Why would we overlay it on education?"