Brainbox Research Institute Ltd’s Post

Brainbox Research Institute Ltd reposted this

View profile for Jason Thatcher, graphic

Parent to a College Student | Tandean Rustandy Esteemed Endowed Chair, University of Colorado-Boulder | TUM Ambassador | Professor, Alliance Manchester Business School

On statistical significance and research design. 26 years ago, Carver argued that "Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method." He goes on the suggest that “too often statistical significance covers up an inferior research design” (p. 386). Truth be gold, Carver wasn't wrong - but - many people fail to actually read his work. Why read it? BC to participate in the current conversation about statistical significance and research methods, you need to understand the discourse, where it came from, and why it's important. Paired with the 1993 update, Carver's work is worth reading. The citation: Carver, R. (1978). The case against statistical significance testing. Harvard educational review, 48(3), 378-399. The abstract: In recent years the use of traditional statistical methods in educational research has increasingly come under attack. In this article, Ronald P. Carver exposes the fantasies often entertained by researchers about the meaning of statistical significance. The author recommends abandoning all statistical significance testing and suggests other ways of evaluating research results. Carver concludes that we should return to the scientific method of examining data and replicating results rather than relying on statistical significance testing to provide equivalent information. The link: https://lnkd.in/em39ZanZ The 1993 Citation: Carver, R. P. (1993). The case against statistical significance testing, revisited. The Journal of Experimental Education, 61(4), 287-292. The abstract: At present, too many research results in education are blatantly described as significant, when they are in fact trivially small and unimportant. There are several things researchers can do to minimize the importance of statistical significance testing and get articles published without using these tests. First, they can insert statistically in front of significant in research reports. Second, results can be interpreted before p values are reported. Third, effect sizes can be reported along with measures of sampling error. Fourth, replication can be built into the design. The touting of insignificant results as significant because they are statistically significant is not likely to change until researchers break the stranglehold that statistical significance testing has on journal editors. The link: https://lnkd.in/emW8MJku

  • No alternative text description for this image
Aditya Singh, PhD, FRGS

Asstt. Prof. of Geography @ BHU | PhD: School of Health and Care Professions, University of Portsmouth, UK | Previously @IIPS, OPM, PHFI, Girl Centre, PC | Editor @PLOS, BMC, Scientific Reports, Frontiers | Fellow of RGS

2w

Statistical significance and p-values are often overused and misunderstood, even by experienced researchers. Statistical significance and p-values only tell us if an effect exists, not whether it is meaningful or relevant in the real world. A small effect can become “significant” with a large sample size, while important effects can be missed with small samples. Instead of over-relying on p-values and statistical significance, we should focus on effect sizes, confidence intervals, and the practical implications of the results. Science isn’t just about rejecting null hypotheses—it’s about understanding and applying findings in meaningful ways.

Francisco Lamosa

Computer Vision | Image Processing | GPU (Compute) & Embedded Systems | SMIEEE

1w

Statistical significance is a measure of whether something could reasonably safely be discounted or not. It does not prove anything. If you reject the null hypothesis, then the alternate is plausible, and perhaps even more likely (however it still is a measure of which hypothesis is more likely which does not in itself prove anything). However, at that point, you need to demonstrate that this a mechanism can explain the observed phenomenon, and even then, you need to keep on trying to refuting it (repeatedly). If the observed phenomenon is real, then there must be a mechanism to account for it.

Eric B. Weiser, Ph.D.

Professor of Psychology, Research Scientist, Statistician

2w

This topic was a big issue when I was in grad school in the '90s, and at the time ran counter to the deeply ingrained principles of hardcore experimental psychology that had been drilled into me (see: Winer and Keppel stat textbooks), that the ONLY thing that mattered in life was achieving p < .05, at any cost. Fast forward to today, and I couldn't agree more. Traditional hypothesis testing and its fixation on statistical significance is a relic from the past that should be deemphasized. I firmly advocate for effect sizes and CIs as vastly superior to any p-value. If you're talking t-tests, ANOVAs, and the like, count me out. I'm all about meaningful metrics like Cohen's d, Hedges' g, eta squared, omega squared, r squared, coefficient of determination, odds ratios, and beyond.

As a reviewer, I often find myself looking at both statistical significance and effect size. More recently, I started noticing papers that identified "highly significant" coefficients using sample size in the millions, yet the effect size is rather small. Sure, it's an interesting statistical effect in the qualitative sense, but how much does it ultimately matter? Then there's the whole mixup between statistical "non-significance" and "insignificance." That has been a losing battle.

Use of statistics provides the impression that a field is scientific so that people can boast that they are in the Stanford Top 2% Scientist list even though they hadn't done a single experiment in their lives.  Statistics gives legitimacy to otherwise looked-down fields.  Science makes new cases, empiricist collect a large enough sample to make their p values significant.  Statistics is useful not for science but for organizational, institutional, societal reasons.

Faruk Arslan

Associate Professor of Information Systems/Business Analytics at New Mexico State University

2w

Thank you. An important reminder, indeed. The eye-opening work of Ziliak and McCloskey, "The Cult of Statistical Significance," echoes the work of Dr. Carver and elaborates on the negative consequences of relying solely on statistical significance testing.  https://press.umich.edu/Books/T/The-Cult-of-Statistical-Significance2 

Christian Zerfaß, PhD

LC-MS | Environmental | Metabolomics | Chromatography

2w

It's nice to see this being promoted. Statistical methods are powerful tools to structure & prioritise data, but they only are descriptors of the underlying data. Useful enough, just shouldn't be interpreted in isolation. I always advocate to relate statistically significant findings to a mechanistic hypothesis, or use them as a start for mechanistic investigation (data prioritisation by statistical methods). Working in 'omics, common datasets hold thousands of variates per measurement, so you always, in each measurement, have some that look somewhat significant in statistical terms, be them noise or relevant. Closing, I'd like to quote an excellent Nature Protocol series on statistics directed at biological scientists (the examples given are mostly biological/medical data), which throughout the series condense the principles behind statistical testing in brilliantly clear statements like these two: "Statistics does not tell us whether we are right. It tells us the chances of being wrong." Nature Methods 2013, 10, 809-810 "A P-value measures a sample's compatibility with a hypothesis, not the truth of the hypothesis." Nature Methods 2017, 14, 213-214 Full series: https://www.nature.com/collections/qghhqm/pointsofsignificance

Julien Lagarde

PhD, HDR, maître de conférence, Univ Pau Adour, coordination dynamics, behavioural neurosciences

2w

well, my take would be to start by studying/training probability and statistics proper, estimation too, instead of pressing buttons of softwares and apply recipes. Check your data, at least make an histogram once in a while. But it useless, too time consuming and maths involved for a large part of the scientific community. Let's keep pressing and studying stats with black box softwares. Replace frequentist buttons with bayesian ones, you make a big deal.  Sorry just aimed to be a little ironic.

Antony Andrews, PhD

Assistant Professor of Economics

2w

Jason Thatcher,  you're poking the bear here—and honestly, I’m all for it; sometimes you’ve got to stir things up to see what shakes loose (though the paper mill mafia might not be too thrilled). Anyway, here’s my two cents: Statistical significance testing (using the p-value) estimates the probability of observing a relationship, pattern, or effect in your sample purely by chance, assuming no real relationship or effect exists in the population (i.e., the null hypothesis is true). Shoutout to Prof. Andrew Gelman for finally helping me wrap my head around this!"

Darren W.

Senior Specialist | KPMG Fed Lighthouse | AI, Analytics, & Engineering **This is my personal account. Opinions expressed are my own.**

2w

Michael N. Liebman Have you seen this article before?

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics