Monday, September 19, 2005

Practical and Statistical Significance

I first read Deirdre McCloskey almost two years ago arguing that economists were too focused on statistical significance at the expense of practical significance. Oh the glories of econometrics! Now I actually sort of know what she's talking about.

When you run a regression, you set up a null hypothesis: typically that the coefficient on the variable is 0. You then compute a statistic to see if you can reject the null (ratio of the estimated coefficient to the standard error of that coefficient), and embrace the hypothesis that the variable has effect in one direction or another. But really that isn't enough: sure we might be able to say that we "reject the null at the five percent level," but that doesn't tell us if the effect is really worth talking about -- does it have any practical significance. As in the textbook*, you might find that the total number of employees has a statistically significant relationship (at the five percent level) with participation rate in a 401(k), but the effect is really rather small: like an increase in firm size of 10,000 means that the participation rate falls 1.3 percent. Sure it matters, but does it have any practical significance?

Economic theory is excellent at predicting that something will have an effect, but to learn something from econometrics it needs to tell you first whether an observed effect is likely to be an artifact of the data or not, and then you need to figure out whether that effect is "important" in some sense. Or at least that's what McCloskey has been arguing all these years.

It plays nicely into what Henry and I always return to: this sense that everyone agrees that lots of things happen in the world (having welfare reduces incentives to work; cutting welfare hurts many families) but people disagree whether these things are importance. That is, we agree that things are statistically significant (there is an effect which is highly likely not to be an artifact of the data), but we disagree on whether the effect is practically significant. So many ways to say the same thing!

All McCloskey is asking is for economists to be more attentive to the size of the effects, rather than just trying to get a statistically significant result. Now you might say: it's the economists job to figure out everything that can impact things, and the policy maker can weigh whether a given effect is large enough to alter or reinforce their views. To which I'm not sure there is a good response.

Regardless, this is exciting for it's something I've wanted to understand for some time.

*Wooldridge, "Introduction to Econometrics" (3rd edition), pg. 142-3.

5 Comments:

Anonymous Anonymous said...

I think this is not just a problem for economists.

In fact, it's a far, far worse problem at the interface of social science (econ, poli sci, psych, population biology, etc.) and public policy. It is one of the cripplingly dangerous dirty secrets of the way expertise circulates in liberal democratic societies, as an entrepreneurial product bought by policy makers, credulous media, and the public sphere.

McCloskey is absolutely correct that you cannot make an argument about effect size as a statistical or empirical argument: it is necessarily an ethical, philosophical, political argument. There is no way of telling whether a given result is significant without that kind of claim. A very small effect size may be extremely important if it is found in a domain that we commonly regard as critically important for cultural, social or philosophical reasons. A fairly large effect size may be immaterial if it affects something that most people are not concerned by.

We leave individual social scientists to make those claims. Many don't make them at all, but act as if they've been immaculately made somehow in the mere act of running a regression. Others make those claims in the mode of a demagogue, manipulatively, counting on a wider public illiteracy or on forms of moral panic to sell the result and the policy recommendations that come with them.

If I could change anything about economics, political science or psychology as disciplines, it would be to make them *require* "philosophical literacy in making claims of significance" as a basic component of their work, rather than an optional extra which a liberal arts grad might pick up somewhere else.

Tim Burke

8:53 AM  
Blogger henry said...

She has a point that we shouldn't really care about small (but significant) effects. I guess I just can't believe that there is a large (and significant) amount of that going on. I suppose she has done the research...yet it just seems reasonable that most people would take a look at the coefficient and ask what the practical effects are.

Now you might say: it's the economists job to figure out everything that can impact things, and the policy maker can weigh whether a given effect is large enough to alter or reinforce their views. To which I'm not sure there is a good response.

I suppose the response is that if there is not a large practical effect then the answer isn't very interesting. Or the answer should be that there is little or no effect, which may or may not be interesting. People should be working on interesting things, not boring things.

11:57 AM  
Anonymous Anonymous said...

Let me put it this way. You're an expert or an academic social scientist, let's say. You start on a study of something that might be important. You finish your study many moons later and it's a very small effect size.

Now what? If you happen to be studying something that everyone assumes to have a large effect size, you could certainly play a productively contrarian card, and say that you'd shown the conventional wisdom to be false. Let's suppose instead that there isn't any common assumption, or you don't want to be contrarian. Congratulations! You just wasted a lot of time.

Unless...you can make a big deal out of it because of some pre-existing moral panic or "common sense", or you can manage to make a problem out of thin air. Both are time-honored strategies among practicing experts; we owe some of the worst public policy of the last forty years to just such entrepreneurial activities.

3:13 PM  
Blogger Isaac said...

Henry: McCloskey documents in excrutiating detail the absence of discussion of practical significance in empirical work published in the AER.

Professor Burke (I like hierarchy, I guess): The point you are making is that assuming that an effect exists, we need some "normative orientation" to decide whether the size of the effect is large enough to impact our views. And it is a fundamentally moral question how we weigh the importance of relative sizes of effects. Is that right? Hence, any policy analysis which slips from "there is a statistically significant effect" to "therefore we ought" has elided an important step? (a question in the valley girl sense). That step being: the effect is of size x, and something of size x matters to us because [invoke a liberal arts education].

11:32 PM  
Anonymous Anonymous said...

Isaac: exactly. This is McCloskey's point as well, the "secret sin" of economics (which I would argue is shared across the hard social sciences), that to demonstrate that effect X exists as statistically significant alchemically becomes another and very different kind of significance in many contexts, as if statistical significance is equivalent to moral or political significance, QED.

One place where I really have noted this is in psychological studies of the effect of "violent" television on children's behavior. Many studies get packaged by their authors as demonstrating an effect which requires various institutional and social responses merely because they've found that there is AN effect, that a statistical finding is also of necessity a finding of social significance. But when you look at the effect size in many studies, it's miniscule; in fact, the striking thing almost is that watching "violent" children's programming so rarely conditions behavior observed immediately following the exposure. There are other ways to critique such studies (they don't generally look at behavior more than ten minutes removed from the exposure to programming; they often control very poorly for whether the children observed have a prior history of antisocial or violent behavior) but I think the point about effect size is sufficiently potent before you even get to those other critiques.

T. Burke

8:30 AM  

Post a Comment

<< Home