Debating Statistics

Statistics… suffice to say- not my strong suit. But one thing I know about debating them is that they are usually BS. The way I often teach students to think about statistics is that they are generally calculated in social sciences using a 95% confidence interval- which in shorthand is a way of saying “we are 95% sure this is correct” (obviously not the technical definition).

If we conduct 1,000 studies and publish the results, and 95% of them are likely to be correct, that means we will have 50 studies that are probably incorrect. These studies make up the oddball news items like “drinking soda makes you live longer” and other weird things you see that have “statistical” support.(Credit to Sklansky on this one)

 

Those 50 also probably make up a disproportionately large percentage of statistics used in debate. The reason is obvious- they are crazy and debaters gravitate towards crazy arguments. As people are taught more and more to rely on peer reviewed quality sources it should be remembered that just because something has some math behind it doesn’t necessarily mean its “true”.

 

In that vein here is an interesting link

At long last, regressions were run and… no result. No relationship between price shocks and conflict, even in the most generous scenarios. I shrugged and thought, “Well, so much for that.” My committee said, “Huh, what about that child soldiering project we told you not to do?” And off I went on my career as micro-conflict man.

In the meantime, lots of papers that did see an impact of economic shocks on conflict or instability did get published. The conventional wisdom grew: Rising incomes made the state more attractive to rebels as a prize, and falling incomes made it easier to recruit rebels. No matter that these two ideas ran in apparently opposite directions.

Meanwhile, I met other academics that had run the same regression as me. Famous ones you have probably heard of. Their reaction was the same as mine: “Oh, I found that result,” several said, “but I’m worried there’s nothing there because my data have problems, and the specification wasn’t quite right. So I left it out of the paper. I’ve been meaning to get back to that.”

Let’s follow a simple decision rule: run your regressions with inevitably imperfect data and models. If you get the theoretically predicted result (any of them), publish. If not, wait and look into your data and empirical strategy more.

The result? As in the natural sciences, most published research findings are probably false.

 

 

15 thoughts on “Debating Statistics

  1. QuantGoddess

    You're an idiot. Math is real.You're an idiot. Math is real.

    Let's start with the misread of the academic community, mathematics, natural science. First of all, the "interesting link" cited is from a Yale professor, and at Yale, they don't do quant methodology. They don't go through the actual science, but instead just go through the actions. University of North Texas's political science program is ranked higher than Yale's. Therefore, the Yale professor is not qualified to judge statistics entirely.

    Next. Let's start with the implications of the last line of the article. According to this, gravity probably doesn't exist, AIDS really doesn't kill that many people since statistical data points to the high number of people dying as a result of AIDS caused diseases. The problem with statistics isn't that ALL statistics are wrong. Instead, when debating statistics, one must look at the sample size, the methodology or how the regressions were run. Faulty methodology would cause the incorrect conclusions; that does not mean that statistics are BS. As time continues, scientists go back over the data and improve it. Take for example the rational actor model. Bruce Mesquita was correct on his overall idea, but two years ago, Dan Lindley produced a paper performing the same models and regressions, but instead broke the data down by years and disproved the rational actor model noticing the factors that contributed to the shift from rational acting and irrational acting. Rational actors means that they have all the knowledge of what is going on at the time, they can make any choice they want, and they are in the right mindset to make the "best" decision. Lindley proved that actors are more irrational than rational because actors do not have total knowledge of what is going on or there might be factors which prevent states from making the "best" decision for themselves. Statistics is an established field shows us not what is 100% true and what is 100% false, but shows us the probability of what will happen and like the moments that are tested. Just as a statistician can tell you your chance at winning the lottery, better decisions can be made.

    Also just the article that you used for evidence only talks about how the author couldn't find data; it doesn't say anything about how statistics are BS.

    Math is how we prove things true because it's a common language that we can resort back to since IT NEVER CHANGES, meaning that it's a reliable thing to use for testing. As a result, 2+ 2 is always 4 (for mathematicians out there 2+2 is always 4 for any mod or base greater than 4). Therefore math is the foundation for logical process. Any graduate book on rhetoric or logic, I allude back to on sitting before me, "Methods of Logic" by W.V. Quine, logic is predicated off mathematics. The book models logical statements mathematically because LOGIC STATEMENTS ARE PREDICATED OFF OF MATHEMATICS. Introductory logic books introduce logic as math. It's not the math that is flawed; it's the data or the methodology used. Therefore, if statistics are wrong, the math itself will indicate the error. It is how we truth test.

    My debate coach with his graduate degrees in political science would have helped me explain this better, but after reading your article, he had a seizure and collapsed on the ground. Our team was planning on taking him to the doctor, but according to your article, "in the natural sciences, most published research findings are probably false," that would be pointless. Therefore going to the witch doctor or faith healing and the doctor with a medical degrees is equivalent. Now, while the atheists stand confused, we are dancing and chanting around our coach since it is a cheaper option.

    1. Scottyp4313nr

      Your debate coach with his graduate degrees should assign you some reading comprehension exercises. Nowhere in anything I wrote did anything say math isn't real. It is about how biases affect which research is published. At no point in your string of rambling did you acknowledge this point or refute it.

      Also, not that I really care about US news and world report style rankings, but your point about Yale interested me so I did some research. You are wrong about that as well- http://www.usnewsuniversitydirectory.com/graduate

      1. QuantGoddess

        The argument wasn't about whether or not math was real. It was about how the statistical information determines how we understand and know what works and doesn't. To reject it's value in a round is rejecting concrete support. Take the CX space topic. Without the models based on statistics, we wouldn't be able to run simulations to determine effectiveness on technology. Even if biases affect which research is published, the math and the sample data, as stated earlier, would reflect the flaw in the study, and therefore can still be refuted like any other claim without a solid warrant. They might be misleading, but the hard science is still the way that we prove things work. The math is the concrete way of proving these claims. Your argument about how biases affect which research is published can be cross applied not only to statistical papers, but other papers as well. For example, a paper written by a politician could say something different than what a scientists in the US think tank would. Now the question is, when it comes to comparing evidence what do we look at? the qualifications? My point was that the math would prove and support claims through repetitive testing or destroy the argument by revealing the holes.

        1. Scottyp4313nr

          Ok, apparently you really just don't get it. So I will try and explain things in a simpler way.

          1. Debaters present statistics in round to support arguments, such as the capitalist peace hypothesis. They then argue that because they have statistics they have presented objective truth that cannot be questioned by other policy or kritik arguments.

          2. The article I linked to , and the article linked within it, are indicting the rigor of common statistics that are presented in journal articles and saying they are not as precise/well vetted as one would think seeing them in a peer reviewed journal. You clearly did not read either the quote I posted, the article it came from, the other article it linked to, or the source material for that article. You have blown up over one sentence you did not understand the context of.

          3. Most high school students do not present data sets or the statistical analysis done on them in debates. They present a conclusion "capitalism promotes peace"- so refuting the methodology in the round is not really an option.

          4. At no point, anywhere, did I, the articles I cited, or the source material for those articles indict hard science or the use of statistics. They are discussing the MISUSE of statistics.

          1. QuantGoddess

            I understand what you are talking about with how debaters claim that just because they have statistics, any statistics behind them, they cannot be questioned.

            I did read and understand the article. However, my point was that the article created such a broad generalization to all statistics which is not true. Statistics are important to proving things, but peer review and repeated testing has shown which statistics are true and which are false. If a debater wants to get into a statistics debate, they should at least understand where it is coming from.

            As a high school debater, when debate statistics, our team always keeps the statistical analysis done on them. Most researchers, when publishing papers, must also include their methodology. Cutting the final conclusion still requires cutting the paper which has the analysis and how they can present. I agree that just presenting a conclusion is faulty, but a round debate is still possible.

            It's not the statistics that are bad; it's how debaters run and misuse it or misunderstand it. Running statistics is similar to running any other argument. If the opposing team A doesn't catch the missing warrant or link in an argument run by team B, then team B is granted the validity of the argument. Similarly, with statistical arguments, the same thing would happen. We all know that nuke war isn't a probable option for retaliation, yet teams still run it anyways. If it goes uncontested, then the team gains their impact. Similarly, statistical arguments function the same way. Like any other debate, it's the debaters' responsibilities to be weary when listening to stats and force the team running stats to be able to support their argument.

          2. Scottyp4313nr

            Its NOT just how debaters use statistics, its also how the publication proces CHOOSES which statistics to publish.

      2. DrakeSilvor

        Have you taken a statistics course? First of all, when testing hypotheses you do not find a "confidence interval," you perform a test of significance. Usually the threshold is either a p-value of .05 of .01 and if that threshold is met, then you have proven statistical significance. Thereby you reject the null hypothesis of no correlation and find a statistically significant relationship. Despite your caveat "obviously not the technical definition," the definition you give your students, "we are 95% sure this is correct" is grossly misleading. Furthermore, most statistical studies go beyond simple hypothesis testing. OLS (ordinary least squares) regressions are typically run hundreds of times for replication and more complicated studies create models by which they test using Monte Carlo simulations.
        The intent behind your work is not as extreme as to argue math isn't real, but you seek to discredit and cast skepticism onto the application of quantitative methodologies within debate. By doing so you are limiting the potential for meaningful education in areas beyond policy, which is especially harmful because most debaters do not go on to be politicians or policy makers. Any higher level statistics and proper application of research that you can learn to do in high school is immensely beneficial in college.
        Please do not misinterpret my response as a "string of rambling" that ignores your "points." As a scholar in the area of econometrics and someone who formerly debated in high school, I find your article and attached link that claims "most published research findings are probably false" to discredit the life's work of myself and many of my colleagues.

        1. Scottyp4313nr

          Drake,

          I'm sorry you disagree with my fast and loose use of terms. I do not care to quibble with you over the difference between "sure this is correct" and "statistically significant" (and I wold of thought my caveat would be sufficient for this)- my point is about how statistics relate to debate, and when they are used in debate the correlation between "capitalism and peace" is represented as "capitalism causes peace, statistics prove". I do not seek to "discredit and cast skepticism onto the application of quantitative methodologies within debate", I have no agenda. I switch sides, you can find other posts I have made defending statistics. My point was to bring to peoples attention an argument they can make to refute other teams uses of statistics. How you could come to any other conclusion based on what I wrote is beyond me ( I guess its from the sentence "But one thing I know about debating them is that they are usually BS" which is probably when you stopped reading, what I meant was "one thing I know from observing debates about statistics is usually the way they are argued is BS")

          That you find the link to "discredit your life's work" is humorous at best. Had you followed the links or actually, you know, read and understand anything they said, you would not of come to this conclusion.

          1. DrakeSilvor

            The claim that I had not bothered to follow the link is rather funny because if you had read my response I quote, "most published research findings are probably false," which is within the link. Obviously I have done my reading.
            I have issues with your "fast and loose use of terms" because anyone who has taken even an elementary statistics course would know better than equating "sure this is correct" with statistically significant. A major issue within debate is this "fast and loose" attitude of misrepresenting scholarly fields. How many arguments can I fit in an 8 minute constructive? That question is the source of the abuse of statistics. The way your article and link criticize published statistics for being "BS" and published statistics as overwhelmingly false shifts the blame from poor debate practices onto statisticians.
            Please serve as a good example to your students by teaching only what you know and referring them to others for areas you are uncomfortable in. There is no shame in not knowing enough about statistics to properly test hypotheses or cite the studies. However, allowing the poor debater habit of oversimplifying and misrepresenting to influence the way you teach your students to address methodological debates is wrong.

          2. Scottyp4313nr

            Oy, a younger me would no doubt have gone to town on this reply but its clear you are just obtuse. I don't think its possible for you to make such comments had you read the articles, if you did read them and still don't get it I will bow out since experts couldn't convince you.

            I will say this- why don't you write up an explanation of the 95 confidence interval issue I discussed you find more appropriate and I will ammend the original post with your superior analysis. You can email it to scott@the3nr.com

  2. Stats101guy

    What's even funnier than Scotty's claim is that to prove him wrong you reference a quote that could be found just by reading Scotty's original post, and would not necessitate any outside reading.

  3. WhitWhitmore

    I feel like I'm taking crazy pills. I swear, it's like I'm playing cards with my brother's kids or something.

    The only argument being made is that authors often cherry-pick data and choose to omit contradictory findings from their papers. How is this a controversial or surprising contention? Academics are under enormous pressure to get published. It's key to tenure and money. They have an incentive to make bold findings. An article that reads "I thought maybe the price of rice in china had an impact on conflict, but, upon rigorous statistical investigation, not so much." is not as likely to get published as "ZOMG…LOOK AT THESE NUMBERS!!! Food prices key to global wawrs." The author Scotty P quoted cites conversations with published authors who openly admit to disregarding findings that would call the statistical validity of their papers into question. Now, you can say that it is a broad over-generalization to claim that this happen all the time (or even often), but that doesn't mean that it doesn't take place. Furthermore, I would argue that it is MORE likely to be the case for debate arguments which tend to trend towards the more hyperbolic.

    In addition, this is just an ARGUMENT that can be made in a debate round when teams make statistical claims. It is not an attack on science or math as an institution.

  4. DavidKP

    Also…Doesn't everyone know that the attempt of math and science to be completely objective is masculine?

  5. seanbram

    You are definitely right to encourage questioning of statistical evidence in debate. However, I think people may draw the wrong lessons from this post. Instead of using the evidence referenced in the post as a broad brush to indite all quantitative social science work in debate. Although I don't think you are arguing for this, people may read into this the wrong way.
    Debaters should keep in mind that there are disagreements between researchers doing statistical work which can be used for evidence. Although it is sometimes dense to wade through, it is definitely worth it. When I'm judging the cap k i would rather hear "Gartzke's work on trade and conflict shows an inaccurate relationship because it doesn't adequately control for the fact conflict decreases trade, and once this is taken into account, the result is that trade causes war" than "disregard statistics because researchers are biased towards hyperbole (most likely in favor of the negs cherrypicked historical example)".
    Obviously there are more substantive criticisms of quantitative methodologies in social sciences, but I don't think publishing bias is close to the best one for the purposes of debate.

  6. brianrubaie

    Scotty's argument has been the source of quite a bit of recent dispute.

    Some of the folks over at Marginal Revolution agree with Scotty: "…(T)he problem is common to most fields of empirical science. If the sample size is small then statistically significant results must have big effect sizes. Combine this with a publication bias toward statistically significant results, plenty of opportunities to subset the data in various ways and lots of researchers looking at lots of data and the result is diminishing effects with increasing confidence…"
    Source: Tabarrok, 11-29-11 – Alex, Chair in Economics at the Mercatus Center, associate professor of economics at George Mason, research director for The Independent Institute. "Small samples mean statistically significant results should usually be ignored," Marginal Revolution, http://marginalrevolution.com/marginalrevolution/

    The links at the bottom of the post are particularly helpful.

Comments are closed.