Slog - The Stranger's Blog

Line Out

The Music Blog

« Tom Cruise's Legal Options | Hot Tipper #1 »

Thursday, December 8, 2005

Proof Our Readers Are Smarter Than Us?

Posted by on December 8 at 15:30 PM

Speaking of my “Tom Carr bust,” check out the comments on my Tom Carr post from yesterday (which provides links to the whole sordid affair).

Tom Carr and I (and Dominic Holden, and others) are locked in a debate over the statistical significance of certain numbers on Seattle’s marijuana filings. Well, turns out that if you raise the issue of statistical significance on the Slog, you get comments like this:

So, now, if you do a linear regression on the model: 1/(# of filings) = A + B*year + C*(0 if before I-75, 1 if after) + error; you find that it fits the data really well. The model is statistically significant at levels as low as 0.001013% (5% or 1% are usually considered convincing), and it explains around 98% of the variability of the data (adjusted R-squared = 0.9859). Further, the estimated value of “C” is significant at levels as low as 0.012%, and the value of “C” is positive, confirming that years after I-75 are correlated with lower anual filings (or higher 1/filings). The estimated value for “B” is significant at levels as low as 0.62%. Linear regression assumes that the “error” values are independant, identically distributed, normal random variables. Checking the “residual” errors of the fitted model various plots indicates that these assumptions have not been violated.

Conclusion: At a 5% (or even 1%) significance level, there is a statistically significant correlation between the passage of I-75 and reduced annual marijuana filings. (But remember kids, correlation does not imply causation).

Thanks for posting, “student.” And thanks to you too, “j-lon,” for your thoughts. Now, can someone, in a very remedial way, explain in the comments section of this post what the hell “student” just did?


CommentsRSS icon

God Damn. I want to go on a date with that person. They can lecture me, blackboard, erasers and all, anytime they want.

The common meaning of significant is "important," while the meaning of statistical significance is "likely to be true - not due to chance."

The poster is saying that there is a 99.9% chance that the data are not due to chance. That is well above the 95% or 99% limits normally used to determine a statistic's significance. So these numbers would be considered "statistically significant," i.e. the declining number of pot busts has a relationship to the passage of 1-75 and cannot be explained by random chance.

Lark Hawk's explanation is good. Another way to think of it is that we have two hypotheses: a null hypothesis that says all of these numbers come from the same distribution, and an experimental hypothesis that says the numbers represent two separate distributions (in this case, distributions before and after I-75). To find that p < 0.05 or 0.01 means that the likelihood of the null hypothesis is very small.

To stranger-ize these already good comments:

There is a 99.998987% chance that Carr is full of shit in claiming that the drop in the number of pot-busts after I-75 was due to random year-to-year changes in the number of busts.

All of this Carr discussion assumes that the sole purpose of I-75 was to reduce pot arrests. If that's true, then the initiative seems to have been effective. But one of Carr's statements (from yesterday? I'm too lazy to try to find it again) suggests that he thinks I-75 also had the purpose of freeing up police resources to focus on other issues. If it did have that purpose, you might argue that its effect was insignificant--pot arrests were already a negligible part of police workload before I-75. I don't want to make Carr out to be a hero or anything, and I don't know if this is actually the argument he intended to make; it's just been bugging me that I haven't seen this point of view represented here.

You've already gotten some good explanations. It's important to note that "student's" model also shows a statistically significant decrease in the number of arrests per year.
It's also worth mentioning that to find a significant association between the enactment of the law and the number of arrests, you need to transform the outcome ("student" used 1/n). Without doing this, the only significant predictor of the number of arrests is year (and not the law).

josh
if you don't use the inverse transform, none of the assumptions about the "errors" being nornal with constant variance hold, so the lack of significance of I-75 using untransformed data is based on bad assumptions and therefore meaningless.

oops, I take that back. A simple linear model with the year as the predictor does not violate any assumtions. But my model still gives you a bigger R-squared, so it explains more of the observed changes in the data.

My brain hurts.

Awesome! I was going to post something before about what statistical significance means but was too lazy. Glad someone did it.

Comments Closed

In order to combat spam, we are no longer accepting comments on this post (or any post more than 45 days old).