· web viewscience before attending yale college. he's also a co-founder of gelf magazine,...

36
Transcript of: Measuring, Mismeasuring, and Measuring the Wrong Thing Carl Bialik and Jerry Muller on the Use and Abuse of Data Thursday, May 23, 2019 6 to 7 pm Ruggles Hall Carl Bialik Jerry Z. Muller Free and open to the public; free tickets required. Open to the Public Conversations at the Newberry In this installment of “Conversations at the Newberry,” Carl Bialik and Jerry Muller discuss our society’s increasing

Upload: phambao

Post on 27-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Transcript of:

Measuring, Mismeasuring, and Measuring the Wrong Thing

Carl Bialik and Jerry Muller on the Use and Abuse of DataThursday, May 23, 2019

6 to 7 pm

Ruggles Hall

Carl Bialik

Jerry Z. Muller

Free and open to the public; free tickets required.Open to the PublicConversations at the Newberry

In this installment of “Conversations at the Newberry,” Carl Bialik and Jerry Muller discuss our society’s increasing obsession with quantifying performance in all walks of life: education, medicine, business and finance, government, the police and military, and philanthropy and foreign aid. Have we moved from measuring performance to fixating on measurement itself?

Carl Bialik is an American journalist, currently Data Science Editor of Yelp, working on Yelpblog. Formerly, he was the creator and writer of the weekly Numbers Guy column for the Wall Street Journal, about the use and (particularly) misuse of numbers and statistics in the news and advocacy. He is also a cofounder of the online-only Gelf Magazine, and has written for FiveThirtyEight.com.

Jerry Z. Muller, professor of history at Catholic University of America, is author of seven books, including most recently The Tyranny of Metrics. His research crosses borders among history, social science, philosophy, and public policy on a variety of historical and contemporary subjects, including capitalism; nationalism; conservatism; the history of social, political, economic, and religious thought; and modern German and Jewish history.

Transcript

David Spadafora: I'm David Spadafora, President of the Newberry Library. Welcome to everyone, and thank you for joining us for this last of this academic year's Conversations at the Newberry. This evening's program constitutes the beginning of an experiment in which the Newberry fosters the exploration of topics that associate the humanities with the so-called STEM disciplines.

David Spadafora: Each of these sets of disciplines needs and benefits from being connected to the other. We may think that they operate in isolation, but they should not. Each has contributions to make to the other. In the years ahead, you will observe other Newberry programs that bring the humanities into a closer contact with the sciences and technology.

David Spadafora: On this occasion, our topic is measurement, and specifically, performance measurement and its problems. Everywhere we might care to look we observe measurement activity. Our times admire assessment based on calculation and precision or at least the appearance of precision. Although early modern thinkers and scientists like Johannes Kepler and Galileo Galilee set the stage for a world driven by measurement, it is the last two centuries when it has thrived.

David Spadafora: The great scientist, William Thompson, Lord Kelvin, told an audience in 1883 that, and I'm quoting, "When you can measure what you are speaking about and express it in numbers, you know something about it, but when you cannot measure it, when you cannot express it in numbers,

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 2 of 22

your knowledge is of a meager and unsatisfactory kind. It may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be."

David Spadafora: John Maynard Keynes put it more condescendingly, typically, in 1936 to say that, "Net output is greater, but the price level lower than 10 years ago or one year ago is a proposition of similar character to the statement that Queen Victoria was a better queen, but not a happier woman than Queen Elizabeth, a proposition not without meaning, and not without interest, but unsuitable as material for the differential calculus."

David Spadafora: So, we have come today to scrutinize numerical list of best and worst products and services, focus on stock analyst's quarterly estimates of likely revenues and per share earnings. Take seriously the most recent epidemiological correlations of risk factors with heart disease or cancer or dementia. We construct major international agreements that rest on projections of average world temperature increases expressed in tenths of degree centigrade, and we assess baseball players on the basis of the spin rate of pitch balls or the exit velocity off a bat.

David Spadafora: There is, of course, nothing inherently wrong with wanting to measure things to employ measurement and making policy decisions, to choose a gastroenterologist to do your colonoscopy because he has performed the procedure thousands of times without puncturing a colon. Indeed, it would be foolish not to take into account the relevant numbers.

David Spadafora: Which numbers are truly relevant for specific purposes? What dangers lurk when we make decisions that reflect only the things that are easily measured? How do we know the numbers we are using are actually reliable? How do we handle conflicts between divergent sets of numbers? In short, what are the principles external to the numbers themselves that we ought to consider seriously when we're planning to measure performance?

David Spadafora: To help us think about such questions, we are fortunate to have with us tonight two experts about measuring performance. Carl Bialik is a journalist, who is Data Science Editor of Yelp, the company that publishes the well-known crowdsource reviews of local businesses, and also runs, as I trust you know, an online reservation business. His work includes contributing to the Yelp blog.

David Spadafora: I first became familiar with him as the numbers guy columnist for the Wall Street Journal, a wonderful column he created to explore the use and misuse intentional and unintentional of numbers in business in public life. He is a New Yorker and graduated from the Bronx High School of

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 3 of 22

Science before attending Yale College. He's also a co-founder of Gelf Magazine, which concentrates on sports and the politics of sports, and he's written for FiveThirtyEight.com, which covers polling politics, sports, and economics.

David Spadafora: Jerry Z. Muller, Professor of History at the Catholic University of America, is a modern European intellectual historian, who has concentrated on the history of capitalism and on German thought. He was an undergraduate at Brandeis, and earned his PhD at Columbia. He has taught at Catholic University since 1984. He has published seven books, the most recent to appear being his fascinating and provocative book, The Tyranny of Metrics. Translations of this book are forthcoming in, as I understand it, Chinese, Japanese, Korean, Russian, and Turkish. It was reading his book that gave me the idea for this conversation.

David Spadafora: So, please welcome Carl Bialik and Jerry Muller to the Newberry. Let's begin our conversation with this question to Jerry. What do you mean in your book by metric fixation, and why is it a problem?

Jerry Muller: Thank you, David, for that question, and thank you for inviting me. What I call metric fixation is a kind of cultural pattern among contemporary organizations in a wide range of fields. It's a way of thinking about organizations should be run. After doing a lot of research, I concluded that it was a repetitive pattern, the one finds in one field after another, and that it was often, although it rested on a number of ideas that seem plausible when you first hear them, when you put those ideas together in practice, they're often dysfunctional, and dysfunctional in ways that are often quite similar from one organization or one realm to another, from K to 12 education to college education, and from policing to medicine, and from finance to foreign aid.

Jerry Muller: So, let me just give you some idea of what I mean by metric fixation. It's based on a few ideas that are connected in the way in which people put them into practice. So, the first notion is it's an updating of Lord Kelvin, updated by management gurus, which is what gets measured gets done. Closely related to that idea, not in theory, but in the way it's put into practice is the human judgment based upon experience and talent is unreliable, and therefore, one should depend upon standardized forms of measurement to make sure things get done.

Jerry Muller: The next related idea is that people respond to incentives. So, you ought to reward people if they meet metric targets, and you ought to punish them or disadvantage them in some ways if they don't live up to the metric targets.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 4 of 22

Jerry Muller: The other idea that's, as I say, often related in this metric fixation is that organizations are made more accountable by transparency, and transparency means taking these standardized measurements and making them public, putting them on a website or what have you.

Jerry Muller: Now, as I say, each of these ideas seemed plausible taken individually when you first hear them, but when they're combined and actually put into practice, they're often counterproductive and demoralizing. So, let me just say a little bit briefly about some of the ways in which they're typically counterproductive. If you give people incentives that are based upon standardized metrics, and you reward them if they meet the metrics, and you punish them if they don't, then they tend to focus upon the things that get measured.

Jerry Muller: The problem is that in most organizations and in most sophisticated jobs, there's actually a number of facets of the organization and a number of facets of the job, and people may focus on the parts of the job that get measured and rewarded as opposed to the parts that don't get measured in part because they're very difficult to measure or perhaps impossible to measure, something like mentoring younger colleagues, which makes all the difference, often, in terms of how much people appreciate their work or cooperating with colleagues.

Jerry Muller: These sorts of things are hard to measure, but they're often at least as important as the things that do get measured, and if you incentivize the things that get measured, people will focus on those at the expense of these less tangible but no less important factors.

Jerry Muller: Closely related to that is ... Well, to give you another example of that that you're probably familiar with, the phenomenon of teaching to the test. So, in the United States in the last 20 years and more, we've had standardized tests, and students are measured, pupils are measured on that. Sometimes the schools are rewarded or punished based on those standardized measurements. That was instituted under the George W. Bush administration and under the Obama administration, they went a step further, and wanted to reward or punish individual teachers on the basis of those standardized tests.

Jerry Muller: Well, there's a lot to be said for testing when it's used properly, but what happens often in these cases is teaching to the test. That is to say the students are tested in mathematics and reading and writing. So, what do they learn to do? They learn that teachers tend to focus on those narrow elements of mathematics and reading and writing that are tested on the test.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 5 of 22

Jerry Muller: So, for example, students coming in to college now know how to write a five-paragraph essay, but they don't know how to write a 10-page essay because that's not what their education has been oriented towards. In general, metric fixation tends to lead to gaming. That is reaching the metric goal in ways that are at odds with the larger purpose of the organization.

Jerry Muller: So, for example, about 15 years ago, a number of states and then eventually the federal government in the United States started to issue surgical report cards on the success rates of individual surgeons for certain procedures, knee replacements, but also heart procedures. So, one of the effects of that was that surgeons tended to shoo operating on patients who were more risky because they were more frailer, they had more comorbidities. That is more things wrong with them than just the thing that they were being operated on. So, the chances of the operation would be a success were less and that would screw up the surgeon's metric score.

Jerry Muller: So, they didn't operate on them or sent them elsewhere. That improved the scores of the surgeons. The people who were disadvantaged by this were the people who needed the operations. They didn't get operated on as a result or conversely, again, in the field of medicine, the federal government through CEMS rewards and punishes hospitals based on to what extent patients remain alive for certain procedures after 30 days.

Jerry Muller: Now, it happens, and one hears lots of anecdotes about this from people who manage surgical units, that there are operations that are fundamentally unsuccessful, but the surgeons urged the hospital administration to keep the patients alive for at least 31 days so it won't ruin the metric. Yes. These things happen.

Jerry Muller: It happens in all sorts of more subtle ways, too. There's a lot of pressure now from state legislatures to increase the graduation rates from state colleges and universities. That happens, and the easiest way of doing that is by lowering the standards of graduation. That's going all over, too.

Jerry Muller: There are other problems, oh, in publicly traded corporations. David alluded to this. There are these quarterly projections that they make and then there's a lot of gaming that goes on in order to meet those quarterly projections, which means that the executives in question advantage short-term gains over what might be the longer term wellbeing of the corporation, which might mean putting more capital into R&D or into employee education or building new facilities or what have you. We can talk more about that if you're interested.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 6 of 22

Jerry Muller: Then there are all sorts of deceptive metrics that result from this metric fixation. Then there's all the ... This is what got me interested in it. There's all the time that is required to input these metrics, and time that takes away from the actual ... time putting into measuring, time put into measuring that takes away from actually doing. You may have noticed this when you've visited your primary care physician in the recent past.

Jerry Muller: So, there are a variety then of recurrent patterns that come on to the rubric of what I call metric fixation that has to do with these dysfunctions that occur in such a wide range of fields. What's so striking to me is that there's reams of evidence, there's mountains of evidence about how poorly this works in a variety of settings, if you read the medical journals, for example.

Jerry Muller: Yet, people believe in it. They have this faith in metric fixation, and I try to figure out not just why it was dysfunctional, but why they have that faith in it. We can talk about that later.

Carl Bialik: It's quite a picture you paint across so many industries. One thing you take pins to make clear throughout the book is that you do see metrics as being potentially quite useful in some ways, in some applications, in some organizations. You just gave a whole lot of great examples of poor use of metrics. What's an example of a good use that is illustrative of the right way as you see it to use metrics for assessment?

Jerry Muller: So, measurement ... Here, Lord Kelvin was three-quarters right. Measurement is often genuinely useful. It depends who does the measuring and for what purpose, and what incentives are linked to that measuring. So, there are many examples in which measurement is useful. For example, standardized tests. If you're a teacher, including if you're a college professor, and you give your student a test, you give your students a test, you can see how much they're actually picking up of what you've been teaching. It's often quite disheartening, but it's a worthwhile exercise.

Jerry Muller: So, when the tests are developed by the practitioners, and then used by the practitioners, not by a third-party, and there's no reward and punishment, except the psychic reward of knowing that either the students are picking up a lot or they're not, then standardized measurement can be really useful. There are many cases in a variety of fields where that's been shown. So, one of the cases I discuss in the book is this doctor, who was at Johns Hopkins, Peter Pronovost, who developed a set of standard measures, a set of guidelines for putting in central lines. Those are those flexible tubes that you put in to veins and arteries in order to inject various fluids. In the past, they were often a source of huge rates of infection in hospitals, which cost a lot of money, and cost us certain amount of lives.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 7 of 22

Jerry Muller: Well, he put together this checklist, and then he had the units in the hospitals that he was working with keep track of their rates of infection. When you had, say, two surgical units in two different hospitals that were using the same system of metrics, and you saw that one was doing better than the other, the heads of the two groups could meet or the staff of the two groups could meet and see what the one that was doing better was doing that the other one wasn't. In that sense, they could be and they really are genuinely useful. In that case, they've cut down on the rate of infection from central lines tremendously.

Jerry Muller: There are other medical organizations like Geisinger in rural Pennsylvania that have also used metrics to compare how various physicians in various units were doing. In those cases, they actually tied a certain amount of the physician's remuneration to the level of success and the level of patient satisfaction, and it worked. One of the reasons that it worked is because the things that were being rewarded were things that the practitioners themselves, that is the doctors and nurses, genuinely believed in.

Jerry Muller: So, when you use metrics that way so that the practitioners play a role in formulating them and in evaluating them, and it's consonant with the purposes of the practitioners themselves, they can be genuinely useful.

Carl Bialik: That's a psychological insight of how using incentives in different ways or different kinds of incentives can affect how people approach their work, and in the book, you have intrinsic and extrinsic motivation. Do you want to expand on that? I have a question related to it, but just what that means, and what that does.

Jerry Muller: Yeah. I understand. Let me take a sec to explain it because until a few years ago, I think I was familiar with the ideas, but not with the terminology. So, intrinsic motivation means to what extent you do a job because of things within you. That is to say because you feel that the purpose of the organization is important or because you find the job challenging, either challenging intellectually or challenging manually if you're in an artisanal field or you like working with the people that you work with. Those are all forms of intrinsic motivation.

Jerry Muller: Extrinsic motivation is when you try to motivate people externally, usually in monetary terms or sometimes in reputational terms that translate into monetary terms, right? So, if you're a surgeon with a high number on their surgical report card, that, ultimately, is going to have some monetary ramifications, right?

Jerry Muller: So, it turns out that a lot of metric fixation is based on the kind of simple-minded conception of human motivation that says that, really, extrinsic

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 8 of 22

motivation is everything. Now, don't get me wrong. Monetary motivation is important for almost everybody to some degree or another, and in some jobs, it's the only thing that motivates people, in jobs either that are very repetitive and not intrinsically interesting, so you do them for the monetary reward or sometimes there are fields like finance where for better or for worse, people tend to measure themselves by how much they make.

Jerry Muller: For most people, they're motivated by some combination of extrinsic reward and intrinsic reward. If you just try to motivate them through extrinsic reward, through monetary reward, you are sending a signal to them that it's the money that really counts. So, if you're a doctor, the healing isn't what really counts. It's how much money you make. If you're a nurse, it's not the safety and comfort of the patient. It's how much the surgical unit makes. If you're a teacher, sometimes they've tried to offer teacher bonuses on an experimental level to try to get teachers to raise the level of tested math performance for example.

Jerry Muller: So, not only does it turn out that it doesn't work, but it sends a bad message to people. People who are motivated in good part by intrinsic motivation, it sends a bad message to them that you really ought to be in this for the money. That in itself can be very demoralizing for people in an organization.

Carl Bialik: Is there also some psychological effect of having a numerical assessment that maybe hasn't been studied directly, but maybe you've seen or seen anecdotes of. I think of very silly examples like people trying to reach 10,000 steps for the day and what they do. It's a silly example, but at the same time because it doesn't matter and it's something only they know, and yet people will go outside and walk for 20 minutes than they wouldn't have otherwise. What does it do to someone psychologically to have a transparent measure with a clear threshold and a clear outcome, and perhaps, judgment of their peers?

Jerry Muller: Yeah. That's hard to answer, but one of the interesting phenomena going on in our age is more and more of what Carl's been talking about, people setting these metric goals for themselves either in terms of their exercise or in terms of their health. In California and other places, there are what are called quantified communities of people who get together and discuss their metrics, their biological metrics, and their exercise metrics. It's called the quantified self. You can look it up on the internet.

Jerry Muller: It's related to this concept of metric fixation in that people are driven by this notion of efficiency and maximizing efficiency all the time in every area of one's life, which itself I think is, first of all, it's bad for

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 9 of 22

organizations when they try to ... I've been corresponding lately with some people who design work processes. They tell me when you have an organization that is set to have everybody working at maximum efficiency to meet certain metrics all the time, those organizations become very fragile because there's no room for people to be flexible, and move around, and innovate, and so on.

Jerry Muller: So, this notion of making everything as efficient as possible in a business, in a government organization, perhaps in your baseball team, I don't know, but even in yourself, is part of this larger enchantment, you might say, with quantification. I think especially for people who don't have some other larger sense of meaning and purpose in their life, perhaps this acts as a temporary substitute.

Jerry Muller: Here's something I can measure and know that I've done it. I've done the 10,000 steps. By the way, this leads to gaming, too. There are now various sorts of devices that one can buy that interact with your FitBit in a way that increases the metric of the number of steps that you've taken, and so on. So, even when people are doing it for their own sake to some degree, they still try to game their own metrics.

Carl Bialik: So, I'm nodding along with everything you say. It's all quite reasonable. I imagine someone could write a book or could have written a book maybe before metric fixation fixated all of us called The Tyranny of Judgment. There's certainly downside to judgment. In fact, baseball has come up a couple of times. Moneyball 2003 was largely about what happened when people actually tried to measure things that were before just measured by judgment. One of the things that happened is that people who didn't get chances because they didn't fit a certain mold now have a chance because you can measure performance and see they're deserving.

Carl Bialik: So, there's a lot that I want to ask you in that realm, but to start with, how do we know? How would you measure, pardon the word, that what we have now is worse than what it replaced and that must have motivated it to some degree what came before it?

Jerry Muller: So, there isn't a book called The Joy of Metrics or The Tyranny of Judgment, but there's a whole field called Behavioral Economics, which is really about the tyranny of judgment, right? So, if you've read Tversky and Kahneman, and to a lesser extent, Richard Thaler and so on, this is what they do. When you look at what they do, especially the Tversky and Kahneman, it's often about how we're not very good at calculating numerical percentages.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 10 of 22

Jerry Muller: There's a spill over effect where people say, "Oh, yeah. You see, judgment doesn't work that well." So, we should distrust judgment in general, and that's where I think it gets overgeneralized. Now, there are problems with judgment. I mean, one of the critics of judgment is it can often be based on inadequate experience or it can be tied in with bias and prejudice. All those things are true. Those are possibilities. For me, those are arguments for using measurement with judgment.

Jerry Muller: In other words, it's not judgment or measurement. It's using the two together, so that based on experience and talent, you have some sense of what the metrics can validly measure, what they can't validly measure, and how much weight to give to the things that you have been able to measure compared to the things that you haven't.

Jerry Muller: So, I think that's the case. So, sports, in general, are an area that I'm quite ignorant about, and baseball, in particular, I've always found particularly boring, but I have done reading on it because after I published my book, baseball scouts started to write to me and say, "You've described exactly what's been going on in our field." In the past and even more so in the present, my understanding is that baseball scouts do use a combination of standardized metrics to measure things together with some degree of judgment.

Jerry Muller: It's true that sabermetrics were able to discover some of these useful patterns, but look what's happened to baseball. Now, I say this as an outsider, but this is what people who follow the field more carefully what I would call ex-fans tell me. So, first, this book came out, Moneyball. Everybody was very impressed by it. Then everybody, all the baseball teams started to get into metrics, and they measured more and more things. One of the things that they found was that the team is more likely to score more highly if the batters aim at home runs as opposed to getting on base, and that kind of thing.

Jerry Muller: One of the ways in which this was measured and institutionalized is through what's called a launch angle. Is that a term? Did I get it right? Yeah. So, batters are now taught that this is the angle at which they're supposed to bat, even though that may not be their natural propensity, but that's what the metrics show. So, everybody aims at this launch angle, which is more likely to get you a home run, but often results also in a lot of walks or strikeouts.

Jerry Muller: So, what's happened is baseball has become more metricized. It's become more standardized, but there's less people getting on base, and running around, and doing all the things that made baseball interesting. So, baseball has become more metricized. It's more lucrative. It's more

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 11 of 22

"scientific" in that sense, but it's become more predictable, less interesting, and there's been, as you may know, there's been a falloff in baseball attendance.

Jerry Muller: So, I think this is a result also of the unanticipated negative effects of metricization. Again, here, I'm out on the outer edge of my knowledge.

Carl Bialik: It's a funny case because baseball has always been metricized. They're just looking at less useful metrics. There's also the question of how you define usefulness. The usefulness for a player or for a team could not be what's best for the sport, possibly. You want to win more games, maybe you play a style that alienates more people, but you have different goals.

Carl Bialik: It does make me want to hear more about the picture you paint of metrics aiding judgment. I mean, it is not a new thing, in a sense, that there are people already doing it, and you described some of them. I'm just wondering, what would that look like? What would the way out of tyranny look like that you think would be most successful, and maybe scouts now being more metrics-focused or one place to look?

Jerry Muller: So, that may be. It's an area that I know, as I say, that I know particularly well. I think that when metric ... In a wide range of organizations, when people use metrics in a way that involves the practitioners in developing and evaluating them, and where it isn't tied to reward and punishment or if it is tied to reward and punishment, then it's because the practitioners themselves believe in that, then it can be useful.

Jerry Muller: So, let's take an example of how it's been both useful and misused. Comp stat, computerized statistics of the incidents of crime, which was pioneered in New York City in the early 1990s, and then has been adopted in a wide variety of cities, not just in the United States but abroad as well, in so far as it dealt with metrics of where it was tied to GIS, geographical information systems. So, you could find out where crimes were occurring and what time of day they were occurring, and you could then deploy officers in those areas in a more regular way. That was a genuinely useful way of using metrics to fight crime.

Jerry Muller: When you told officers that, for example, the promotion of an officer or the promotion of a commander was going to be based on the metrics in that area, then that created all kinds of incentives for misuse and abuse. This has been documented in a wide range of cities, and not just in the United States but in the United Kingdom and elsewhere, too. The crime metrics that are gathered by the FBI are in four major categories of felonies.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 12 of 22

Jerry Muller: When police are told that they're going to be rewarded or punished based on the number of those felonies in their precinct, what happens often enough is things get ... Crimes that are reported to the police don't make their way into the statistics or more frequently, crimes that were felonies like grand theft are downgraded to misdemeanors like minor theft, so that they don't make their way into the major metrics. So, it seems to me that policing is a good example of where metrics have been both tremendously useful and sometimes dysfunctional and counterproductive.

Carl Bialik: A lot of the examples we're talking about are within organizations, but there are also going to be times when people are going to have to make decisions that may be very important for themselves from outside. I'm thinking of consumers and I work for a company that is involved with rating businesses and those ratings are then used, but we're not the only ones by any means. I think we talked earlier about Rate My Professors and services like that. Every product out there has numerical ratings. Do you find it to be a part of metrics fixation to use those kinds of ratings when you don't have the judgment you need, the expertise you need necessarily to make an important decision?

Jerry Muller: So, those kinds of metrics that are based upon essentially consumer responses, they can be useful, but there's two things that I think need to be said about them. One is people try to gain them. So, when Amazon first began selling books and having their customers rate the books, people who were new authors told me, "Well, you've got to write a number of positive reviews under a variety of names, and then input it into Amazon." I noticed now when one looks at Amazon ratings, so this is the second point, one has to learn to be a critical consumer of those kinds of metric ratings.

Jerry Muller: So, when I noticed the book that was related to mine but by a more famous person, and I noticed that he had a lot of very high ratings from very early on on Amazon. When I looked at them, I noticed a lot of them were from reviewers where this was the soul thing they had every put on Amazon, which often means it's actually a form of gaming.

Jerry Muller: So, I think that although those kinds of reviews can be useful, one has to learn to be an informed and skeptical consumer of these kinds of metrics because it's not just that people try to game them, it's that the people who are inputting the consumer reviews may have very different tastes and priorities from your own.

Jerry Muller: So, when I'm in a city that I'm not that familiar with or a part of town, I do look at Yelp metrics for restaurants. I've learned to be a critical consumer of them because a lot of them, given the age cohort that the people come

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 13 of 22

from and so on, they typically read something like this, "This is a great place because the mimosas during happy hour are only $2.75," right? Hence, the place gets five stars.

Jerry Muller: Now, the truth is I'm happy all the time, so I don't have to go to happy hour, and I don't drink mimosas. So, it's not a very useful metric for me. Now, that doesn't mean that the system is useless. As I say, I make use of it, but for any of these kinds of consumer-oriented metrics, you really have to pay attention to what's the nature of the people who are most likely to be contributing.

Jerry Muller: So, too, for Rate My Professors, I find there's a propensity there for people to contribute who are either extremely enthusiastic about the professor or very pissed off at the professor for some reason or another, and that broad swath in the middle tends to be underrepresented. So, again, if you read enough of them, you'll learn to read these things critically.

Jerry Muller: So, too, sometimes for physicians' practices, I've looked at this and I'm often surprised at how often people comment on how nice the office staff was or wasn't to me, how the parking was in the doctor's office, but in terms of the patient outcomes, they may get under-reviewed.

Jerry Muller: So, with all of these consumer-oriented metrics, as I say, they have their place, but one ought not to accept them naively, and it actually requires a good deal of self-education to use them critically, I think.

Carl Bialik: On the professor, I still want to know if there are way more people who are pissed off than are really enthusiastic. It seems useful. Did you write the reviews for your book? Are they still up?

Jerry Muller: No. I wrote some highly critical ones. No. I did not, and I shoo that practice. By the way, when you look at Rate My Professor, and again, you can use this critically, and in a sense, one got this similarly when I was a department head, and one got the reviews of various professors. So, what they frequently say is, "I didn't like this class because there was too much reading," or "There was too much work," or on Rate My Professor, it's often, "Take this course because all you have to do is show up in class and you don't have to do the reading and you'll get an A," right?

Jerry Muller: So, if you're a critical consumer as I was when I was a department head, if the reviews from the students said, "There was too much work in this course," I would say, "Good. Thank you to my colleague."

Carl Bialik: So, a couple of other things on this topic. One, quickly, Yelp does and I think several of our peers also do try to find the reviews that are trying to

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 14 of 22

game the system, and to not remove them, but put them on a separate page, and not count them in the rating. It's a very difficult problem. We have a lot of smart people working on it, but among the kinds of signals that could be looked at, and I'm not involved in making the algorithm are, "Is this the only review they ever wrote?"

Carl Bialik: On the doctors, I think this is a really hard problem, obviously. It's hard to figure out how to choose a doctor. It's actually what inspired the creation of Yelp, even though we're more known for restaurants, was the founder was sick, couldn't find a good doctor. I think, of course, the outcome is important, but the things you described are underrated in their importance because people decide whether to go back to the doctor when they should by how the experience is, and is the TV blaring, are they waiting an hour past their appointment time. Those things do matter for health, not just for satisfaction.

Carl Bialik: I think the broader point there was that it's almost like there are three ingredients we need here. We need good metrics, we need good judgment and expertise, and we need expertise in how to work with metrics. So, I think probably everyone here, whatever level of familiarity they have with metrics, probably would be interested in what you would suggest as the best way to get more accurate in deciding how to weight metrics or whether to consider them at all in a big decision.

Jerry Muller: Yeah. So, one of the themes of my book is that the kind of question that Carl just asked is unanswerable because metrics are highly context-dependent, right? So, it's very hard to come up with general rules about what makes for a good metric or what makes for a bad metric or an easily gamed metric. A lot of it depends on contextual knowledge, and the contextual knowledge in turn is based upon experience.

Jerry Muller: So, I think it's important to have people who have a lot of experience in a particular field in order to, first of all, try to come up with good metrics, and then look from time to time at whether they continue to be good metrics because maybe the purposes of that field or the priorities of that field have changed, and to consider to what extent the metrics have been gamed, and how one might change them in keeping with that.

Jerry Muller: So, I think the problem is there's a tendency for consultants to try to have standardized formulas about how to use metrics in every field in a way that isn't context-dependent, and therefore, doesn't work very well as opposed to people who work in the field, including data scientists who work in the field, and who over time no doubt gain experience and judgment in terms of what metrics seem to be working, what don't, but probably that has to be in consultation with practitioners themselves, too.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 15 of 22

Jerry Muller: So, it's one of those many areas where the attraction of metrics is often that it seems to be a set of rules that you can learn. There are lots of books by business consultants and so on about how to use metrics that are supposed to apply to every field. There are far fewer that are like mine that are skeptical about metrics, and believe that they have a place, but also believe that they're easily overused or misused.

Carl Bialik: I'm wondering what you make of some of the most successful companies of recent years in the tech space and their use of metrics to the extent you've observed it or thought about it. I mean, I have some thoughts myself for my own employer, but that's really my only experience with it. They do tend to be very metric-driven. That's my understanding. Everything is about OKRs and KPIs, objectives and key results, key performance indicators. Everything is being measured when making decisions. Experiments are used with pre-specified outcomes that would determine whether to go ahead with something like a change to the product. By certain metrics, they've been quite successful, these companies. So, are there lessons there? Are there things that are actually the seeds of their decline in being so metrics-driven? What do you see with those companies?

Jerry Muller: So, as you know, this is not an area where I'm expert, but a few thoughts. I'm struck by the fact that Facebook, and I'm on Facebook, and Facebook sends you ads. I get ads pretty often for books by Jerry Z. Muller. I can see why, but frankly, I'm not in the market. Also, I've noticed that if, say, you buy a toaster oven, you tend to get a lot of ads for toaster ovens, but actually you don't really want that many in your household. So, the algorithms are imperfect, but a lot of them do work, and they have something to be said for them. I mean, from the point of view of the consumer, they're sometimes genuinely useful in terms of suggesting things you might want to look at or buy, absolutely.

Jerry Muller: In terms of the companies, the companies that are good at them, like Google, make a tremendous amount of money from them, and that's partly because they're good at them. In that sense, they can be genuinely useful. On the other hand, these companies, especially Facebook and Google, have profited tremendously from what we sometimes call measurability bias. That is the fact that people put more faith in things that can be measured.

Jerry Muller: So, people in the field of advertising have told me that in the past, one try to engage in a fair amount of brand recognition. That is advertising their brand in a wide range of venues, so that people would get to know your brand and be more likely to think about buying it when the time came to buy something that that company produces.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 16 of 22

Jerry Muller: Nowadays, because of measurability bias, more and more companies are only interested in advertising in clickable venues, and they're interested in that in good part because it's measurable. So, they can go to their superiors and say, "Look at all the clicks that we've had on this." On the one hand, that's been tremendously profitable, especially for Facebook and Google, this measurability bias. On the other hand, it's led to dysfunctional forms of gaming on its own. That is to say unscrupulous people of whom I'm afraid there are a lot have created these bots that are fake websites that click on your website and then guide the money from the advertisers or intermediately from Google to the owners of those fake bots. So, it's a form of gaming, too.

Jerry Muller: So, these algorithms, which are a form of metrics, as used by these companies, obviously, they have a tremendous amount of utility, and create certain dysfunctions of their own.

Carl Bialik: Where do you see this headed? You've written a book that could change where it's headed, but where do you think we'll be in 10 years?

Jerry Muller: I think it will be worse. Here's why. Because the things that make metric fixation attractive, the cultural factors and the institutional factors aren't going to go away. So, this kind of cultural managerialism, the notion that management rather than being an art and a practice that is based upon experience and talent, that management is a set of techniques that you can learn in business school or from a consultant or maybe from a book, and then you can apply that in whatever organization you work on, that's very attractive to a lot of people.

Jerry Muller: The concreteness of numbers makes them very attractive. The fact that a lot of people have an interest in, I mean, an economic interest in this kind of metric fixation. The fact that in order to use metrics properly, you have to educate yourself about a particular context, and that means you can't move if you're a manager easily from or the head of an organization from one company to another or from one whole sphere to another. You couldn't say, "Go from being the head of Exxon to being, I don't know, the secretary of state," to take an imaginary example, but that notion is very powerful in our culture.

Jerry Muller: So, I hope that my book is going to make people at all levels of organizations, including the top levels, more critical and self-critical about the use of metrics, but I'm not at all sure it will happen as the Italian Marxist Antonio Gramsci said, "Pessimism of the intellect, optimism of the will."

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 17 of 22

David Spadafora: Gentlemen, maybe this is a good place for us to bring the audience into the equation, so to speak. So, we're going to ask for people who want to ask questions or make comments to raise their hands and there are folks here with the microphones who will come to you.

Audience Questioner #1:

Hi. Never been first. I was wondering, I feel like one of the big metrics in the news last week was the College Board's adversity score. I was wondering if you guys would comment on that.

Jerry Muller: Sure. We don't know exactly why the College Board developed that adversity score. Sorry. So, the College Board is the organization that produces standardized tests that students take towards the end of high school, that then is often used by colleges and universities to determine their admission and their merit aid, and things like that, right? So, that's the main thing, historically, that the College Board has done.

Jerry Muller: Now, they've developed this adversity index, which is intended to quantify in a standard way certain disadvantages that a student is presumed to have. For example, if they live in a neighborhood with high degree ... Oh, this is an important part. We don't actually know what the inputs are going to be. It's presumed to be things like the average wages in a neighborhood or the percentage of students who get free lunches at the high school that they come from, and that sort of thing.

Jerry Muller: So, it's intended to quantify adversity. The main motivation of it seems to be that there are people and organizations that are dissatisfied or unhappy with the rates of admission of certain minorities, above all African-Americans and Hispanics, to institutions of higher education. Increasingly, they're not allowed by the law to engage, to take these factors into account. I mean, for most of the analysis I've read, this is seen as a way of getting around this.

Jerry Muller: The problem is if you admit people with ... I mean, first of all, one of the purposes of college essays is to make college admission counselors more aware of the advantages and disadvantages that people have had in their lives, and they do play that role to some degree or another. If you admit people with lesser levels of achievement and lesser levels of ability, then they're less likely to do equally well in college. Then if you look at their graduation rates, and you're concerned about their low graduation rates, the typical way of solving that is by various forms of gaming the metrics or imprecise metrics.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 18 of 22

Jerry Muller: So, if you lower the standards for graduation, you can graduate more people. If you graduate more students in fields that are less cognitively demanding, you can graduate more of them. If you eliminate certain requirements, like in California now, in order to get more students graduating more quickly, they're eliminating courses in civics and American history. You can do that, too, but all of these things have costs. So, I think college admissions people should take adversity into account, but I'm very skeptical of adversity indexes. Maybe you feel differently, Carl.

Carl Bialik: I do. Well, I mean, first of all, as you said, adversities are already being taken into account. So, coming up with something that is standardized, it depends on if it's constructed well, but in essay, I think it makes sense to convey things that wouldn't normally be captured in something easily quantifiable, but the idea of giving the score to any school that would want to use it and may not have the capacity to calculate something incredible, I mean, that's what the SAT is, right? Every school doesn't come up with its own SAT, and the SAT has flaws, and gaps, and perhaps this is supplementary.

Carl Bialik: My understanding also is that the metrics of how somebody did before college are not as predictive of college performance as you described. I don't know all the research, but that's my impression what I've studied, but yeah, I mean, it will definitely depend on how it's constructed, and how it's used like with any metric.

Audience Questioner #2:

If there ever were a case of the tyranny of metrics, it's the U.S. News & World Report rankings of various academic departments, hospitals, grade schools, for all I know. When you look at that with any objectivity at all, you would say they are quantifying things that really can't be quantified in any meaningful way, and then they're ranking institutions. I mean, it's silly. Yet, my own experience with some very, very, very prominent academic institutions is at some level, the deans of the various departments are driven by what some character sitting in the backroom in the marketing department that U.S. News & World Report came up with 30 years ago. I mean, it's ridiculous. Can you comment on ... Is there any way out of that morass for our colleges, institutions, hospitals, grade schools?

Jerry Muller: Okay. So, I devote, actually, a substantial part of a chapter to that in the book. Let me say something in favor of them first before I criticize them, and that's that if you really have no other sources of information and you use these rankings very broadly, they can be useful. So, if you're going to

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 19 of 22

apply to X college, it's useful to know whether it's in the top 50 colleges in its category or whether it's number 400. That's genuinely useful.

Jerry Muller: The difference between number 30 and number 40 is probably not that useful, but the effect of having these rankings is that they lead to tremendous amount of gaming within institutions of higher education. Indeed, there are boards of trustees who offer incoming presidents a bonus, a very substantial bonus if they can increase the ratings of their university by X number of steps. Universities, since part of that U.S. News & World Report rating is based upon the reputation of the university as measured by questionnaires that they send out to other university presidents, they spend a large amount of money putting out these glossy brochures, which they then send around to the various university president, most of whom don't have time to look at them and toss them in the garbage.

Jerry Muller: There's all sorts of ways in which gaming occurs. So, one of the criteria of U.S. News Report is, I think, the number of, something like the number of classes with less than 20 students in it. I was talking to a colleague at a college, and he's a very good teacher and asked him, and he was teaching a course that sounded really interesting to me.

Jerry Muller: I said, "How many students do you have?"

Jerry Muller: He said, "19."

Jerry Muller: I said, "Gee! That doesn't seem like very many."

Jerry Muller: He says, "Well, but the administration tells us we can only admit 19 students into this course because otherwise, it would affect our ratings."

Jerry Muller: So, yes, those ratings, as I say, on some level, they have some level of utility. If you take them in terms of their broadest categories, but they do create tremendous incentives for gaming and misallocation of resources. Oh, and then there's outright cheating, which is what leads to the scandals. That's a step beyond. Right.

Carl Bialik: Yeah. I have a couple of additional thoughts, and I agree with what Jerry said there, but one thing that happens when you have a hard border between two groups of numbers is it just invites gaming, and being right at that edge. So, having a more gradual measure would make some difference here. I think in U.S. News & World Report, I know the guy you were describing. I think he wasn't in the marketing department. Still is one guy, and not maybe how we should all be making our decisions on education and households, but he was trying to update it from 30 years

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 20 of 22

before. So, he went from median for certain measures to the mean of the 25th and 75th percentile, which should account more for outliers. So, of course, everyone just shifted where they gamed things.

Carl Bialik: The other point is just, this is more of a meta point, but two points. First, what is the alternative? I think that's the first thing Jerry said like these can be useful when you don't have an alternative. Why are these popular? Why do they exist? Because people are looking for some information here. So, I think to really, if you don't like it, not you personally, but if lots of people don't like it, they should make a better alternative, and there's a lot of easy ways to make it better.

Carl Bialik: The second point is even if you had perfect information, everything was quantifiable, it would be somewhat absurd to rank because at the very top, there would be almost no difference. It's basically a tie. Yet, people love rankings. I work at a company where we put out every year the top 100 places to eat in the country. We know, everybody knows that that in a way is it's impossible because even if you could quantify it, there would be 1,000 or 10,000 that would be effectively tied to the extent that you can measure it. Yet, we put it out because media organizations are way more interested in rankings than in just a list of like, "Here are some great colleges," "Here are some great universities." So, that's why I continue to be interested psychologically what does metrics fixation doing to all of us that we are so interested in numbers even where they don't make much sense.

Audience Questioner #3:

So, people sometimes do things when they think things are being given to them that allow them to do better. So, Moneyball, yeah, little market team, not a billionaire owner. You got to do things, but I wonder how much of that is judgment, how much of that is the will to win. You coming up on D day. Is that a metrics case where we have done it? Did it make any logical sense? Other kinds of things where the human initiative just decides to win that day. I just wanted you to comment.

Jerry Muller: So, again, sports, in general, and baseball in particular, is an area of minimal interest to me, but traditionally, Carl can help me on this, this is one of the things that scouts looked at, right? They didn't call it will to win, but there was some rough equivalent of that. No doubt, that does play a role and no doubt, it is really hard to quantify.

Carl Bialik: Yeah. I think it exists, and I also don't know how you would figure out which player had it ahead of time. I mean, in my field, we try to design experiments all the time, "Okay. We think this thing is real, but we don't

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 21 of 22

know for sure. We think this change to our product will be good, but we don't know for sure." So, we choose a random set of users, and we see, "Does this actually get them more engaged with the product?" That's very common in tech. How would you randomly select some people and say, "These ones have the will to win. We're going to add them to our team"? There's no second team that you can compare it to. So, that gets at Jerry's point of what you don't measure can get lost.

Carl Bialik: Moneyball was describing a phenomenon that was already happening in baseball. It popularized it, but they just were doing what they were doing before other people were aware. The larger point there is also it's hard for us to trace what causes what anywhere. The trend of baseball becoming less popular has a lot of things that are probably driving it besides fewer base hits and more strikeouts. So, even when we're making cases about the tyranny of metrics, we have to use some level of metrics or metric intuition to understand cause and effect. So, the tool is necessary even there. Yeah. Thank you for the question.

David Spadafora: Before thanking our conversationalists tonight, I have one observation to make. In about 1994-1995 when I was a college president who did not get a bonus for the status of our college in the U.S. News & World Report, and never was offered one, I did go to visit the founder of that particular set of questionnaires and that issue and told him that I thought it was not a good thing that U.S. News was doing, that they really should quit talking about inputs and talk about outputs. I think that there has been an attempt in many of these cases to do a little more of that, but only a little more of that across the less 30 to 35 years. It's a great shame.

David Spadafora: I would like to thank Carl and Jerry very much for a great conversation.

Measuring, Mismeasuring, and Measuring the Wrong ThingMay 23, 2019, Newberry LibraryTranscript by Rev.com

Page 22 of 22