Google Statisticians: Looking Beyond P-Values

There’s a recent term called “Google Statisticians.” No, they are not statisticians who work for Google; they are people who do statistical analyses by googling words like “how to do significance testing” or “how to calculate p.”

As biostatistician Jeff Leek pointed out, most analyses are no longer performed by statisticians, as data are now abundant and cheap to collect. Long gone are the days of door-to-door surveys, and phone surveys are almost a thing of the past. Online surveys are everywhere due to platforms like Lime Surveys and the powerful Google Consumer Surveys that make it easy to collect and analyze survey data. Log file data is free and overwhelming in size. There’s even software geared towards non-statisticians that automates statistical analyses.

I’m not advocating that statisticians should monopolize statistical analyses; let’s face it – statistical analysis is not surgery; no one’s going to die because you did the analysis incorrectly (unless you’re analyzing data from clinical trials for a new drug). But I believe there’s a need for people to educate themselves in some basic statistics if they are to run statistics or evaluate conclusions drawn from statistical analyses, and googling “how to calculate p” is not going to cut it.

Statistics is more than just looking at the p-values. Some questions to ask include: What data was analyzed to calculate the p-value? Does it relate to the research question? Is the data a reliable or valid measurement of whatever it is you want to measure? What statistical test was conducted to produce this p-value? What is in place to control for error?

For instance, if you want to optimize your website by running A/B experiments, and you want to compare the control vs. the treatment on 20 Key Performance Indicators (e.g. clicks per user), you should know that chances are, you will obtain “significance” (i.e. get a p-value less than .05 when the confidence level is at 95%) even when the treatment had no impact. If you’re running an A/B/C/D/E experiment with 1 control and 4 treatments, and you want to compare all 5 groups against one another on the 20 Key Performance Indicators, chances are you will obtain 10 significant results even when in fact none of the treatments had any impact on the KPIs. Just looking at p-values is not enough; you need procedures in place to ensure that p-values are calculated and interpreted accurately.

The danger of being a “Google Statistician” can be summed up in Mark Twain’s quote, “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”