In Liberty We Trust: The measurement problem

How do you measure how well some enterprise or project is performing? Nobel laureate Friedrich Hayek, in his classic lecture The Pretense of Knowledge, shows us that what is measurable is not always what is important, and what is important is not always measurable. This has some very profound implications for government-run institutions such as schools and police departments, and helps explain some of the most intractable problems we see play out before us.

Before I get too far, I should make clear an important distinction. By "measurement," I mean a mathematically quantifiable observation which can be expressed in objective units - inches, kilograms, degrees Celsius, nanoseconds, volts, or what-have-you. It is possible to observe and draw conclusions about something without measuring it in this fashion, and in fact we do so routinely. I can measure my weight after consuming a bowl of ice cream. I can also assess, and describe verbally, the experience of eating it, but I cannot measure it nor express it mathematically in any meaningful and objective sense.

Now...onward and upward.

There are all sorts of factors that figure into an assessment of quality and performance, and it's extremely difficult, if not impossible, to objectively quantify them. This problem is entirely avoided at the individual level - a phenomenon analagous to knowing whether or not you like a piece of art vs. attempting to create a universal formula or algorithm of artistic merit. This is so because value is subjective; it exists in the mind of the individual valuer rather than in any inherent property of the thing valued.

In order to understand how this throws a wrench in government programs, let's first look at how it serves in the free market, in the example of a restaurant. Assessing how much you like a particular restaurant is deceptively easy: It pleases or displeases you to some degree, which you feel on an intuitive level. A variety of factors figure into your assessment, including food quality, atmosphere, service, and price - and every one of them is highly subjective. You can probably describe in words why you enjoyed or didn't enjoy your dining experience - say, to write a review, and that information can be interpreted and utilized by others, again on an intuitive level. It is impossible, however, to objectively quantify it with mathematical precision, much less in a way that is intersubjectively comparable to another person's experience. Any rating system is necessarily subjective and imprecise, whether it be written reviews or a scale of one to four stars. There is not, and cannot be, a universal equation of restaurant quality.

This is where Hayek's wisdom comes in. We can, of course, objectively measure all sorts of things about the restaurant: number of seats and tables, number of cooks and wait staff, waiting time to be seated, time between ordering and being served, weight and volume of portions, number of dishes on the menu, calories and fat per serving, average time the wait staff spends with each customer. None of them easily correlates directly to quality, though. How long is too long to wait for one's food, for instance, is a subjective matter. The person habitually in a hurry may find a wait more than five minutes intolerable, while one interested in leisurely conversation with dining companions might shrug it off, or even feel rushed by too-quick service. A long wait time, moreover, could be due to inefficient service, or to the practice of cooking each dish from scratch as it's ordered. Similarly, wait staff might spend a lot of time per customer because they're very attentive to the customer's needs, or because they hover annoyingly, or because they get so many complaints they must deal with.

Each measurement you might take measures exactly that thing, and not necessarily whatever you hope to measure by proxy. Most factors that make a restaurant pleasing to you, while they may be observed and stated, are not susceptible of objective measurement. What is important is often not measurable at all, and what's measurable is seldom unambiguously important.

Fortunately, an objective quantitative analysis is not necessary, because the free market provides a built-in meter of customer satisfaction: Profit and loss. Because the customer controls both the means to pay and the decision of whether or not to buy, his money reflects his satisfaction with the product. His intuitive level of satisfaction is translated directly into profit or loss without the need to synthesize a dozen, a hundred, or a thousand different highly subjective criteria into a universal scale of restaurant quality. Funding is automatically allocated to those restaurants that do a good job pleasing their customers, and the most successful models inspire imitators and emulators, so there's no need to impose a standard.

Now, compare the public school system. Unlike in a restaurant, the customer is disconnected from the decision to buy or to abstain from buying. Instead, money is extracted from him via taxation and the product is purchased on his behalf. This removes the job of assessing the quality of the service from the customer and payer; someone else must decide how well the institution is performing, and thus which models of service should be supported and funded and which not.

So, how do you determine which models should be abandoned and which supported, and to what degree? There must be some decision-making body appointed, and this body will not have access to particular knowledge of the preferences and situations of all of the individuals to be served, which would otherwise inform their own individual assessments. Therefore, the tastes and judgments of the individual, relating specifically to his or her own unique circumstances, must be supplanted by some universal rule. This could be done simply by having the decision-making body openly impose its own subjective preferences in dictatorial fashion. Usually such a naked exercise of arbitrary power is not well-tolerated, though. The common alternative is to attempt the devising of some universal and objective means of measuring the institution's performance, which will then inform the direction that things ought to take in scientific fashion. Unfortunately, as Hayek taught, what is important is not necessarily directly measurable, and what is measurable quite often misleads us.

The traditional letter grade system is one means of assessing the outcome of education, and it is a useful one at a certain level, but ultimately still a subjective one. A grade is, at its root, simply a teacher's opinion (hopefully, but not always, well-informed) of how well a student is learning a subject. It cannot really be considered objective, nor can a grade from one teacher or school be compared apples-to-apples with one from another. It is inseparable from the teacher's intuitive subjective judgments: what points of a subject to focus on, what sort of tests or other methods to use to ascertain mastery, and how the raw results of those methods are to be intrepreted. Even the best testing does not necessarily measure what it purports to measure. A grade might reflect a student's mastery of a subject, but it also may reflect unquestioning compliance with orders, extroversion or introversion (via a "class participation" portion of the grade), teacher bias, and student interest in the subject or the teacher's methods, just to list a few. My Ds and Fs in high school English and literature classes, for instance, were far more representative of my extreme shyness and quiet rebellion than of a faulty command of language, and I would suspect that many similar cases exist.

Arguably, a grade is more meaningful when it is tempered with intuitive judgment. An "objective" grade may tell us that Sally is routinely scoring below the class average on daily assignments, and that Johnny is able to answer 85% of pop quiz questions correctly. It does not take into account that Sally is just going through the motions because she knows the material forward and backward and feels bored and under-challenged, or that Johnny is just regurgitating answers without any deeper understanding - important details that an attentive teacher, parent, or the student might recognize. Thus, in the quest to be more "objective" and "scientific" we end up stripping out some of the most valuable and important insights. Without non-quantifiable individual judgment, we hamstring our capacity to interpret what those numbers or letters actually mean.

Standardized testing takes all the potential pitfalls of traditional grading and distills them to near purity. It seeks to eradicate individual interpretation entirely, and so steps up to full Pretense of Knowledge-level error. It does not, and cannot, directly measure a student's understanding of a subject, much less his or her capacity for critical thinking and practical application of knowledge. What it actually measures is no more and no less than how often students provide the answers deemed correct by the test's authors. In fact, how those answers are arrived at - whether by real mastery of the subject, rote regurgitation, being coached on test-taking skills at the expense of understanding of the subject, or even outright cheating by teachers or school administrators terrified of justly or unjustly being labeled incompetent - cannot be ascertained. It is simply taken as given that this is a reliable proxy for, and accurately reflects, mastery and understanding of academic subjects.

Yet that false objectivity is exactly what is demanded, and what must be demanded, by the system as it exists. There must be at least a strong pretense that funds and resources are allocated rationally, and not merely by a politician or administrator's subjective preference or whim, and a truly objective standard is impossible.

The accountability of the market to its customers is short-circuited. Rather than directing progress toward a desirable outcome, the incentives become perverse ones. Instead of focusing on pleasing their ostensible customers (parents and students) as a restaurant must, schools and teachers must please the proximate source of their funding: government. Government, which is necessarily removed from the individual preferences of students and parents, must utilize some alternative way of assessing performance, and that way must be cloaked in a veneer of scientific objectivity. Thus, instead of a focus on developing the skills, knowledge, and qualities that students and parents value, schools and teachers must focus on churning out classes of proficient test-takers...or at least creating the illusion of such.

This in turn creates a systemic bias in favor of the bland conformity of order-following automatons and against attentive, creative innovators and problem-solvers, and a dilemma for the latter sort of teacher: Teach to the test and be rewarded, or teach to the individual and risk punishment. Teachers who aspire to be mentors to their students are suppressed and frustrated. The system encourages treating the job as an assembly line. I know a few dedicated and intelligent teachers who have persevered and done what they could in spite of the institutional shackles imposed on them; I can only imagine how many brilliant minds have been driven from the profession over the years.

Of course, the farther removed from the individual level the decision-making power resides, the more pronounced are the perverse incentives and perverse results. Federal government control of education is vastly inferior to local government control, but even local government control is no substitute for the individual sovereignty of the free market.

The very same analysis can be applied to such subjects as law enforcement, too, and I had intended to do so, but this post is already running long, so that sub-topic will have to wait for another day.

In Liberty We Trust

Saturday, June 13, 2015

The measurement problem

No comments:

Post a Comment