There’s a recent term called “Google Statisticians.” No, they are not statisticians who work for Google; they are people who do statistical analyses by googling words like “how to do significance testing” or “how to calculate p.”
As biostatistician Jeff Leek pointed out, most analyses are no longer performed by statisticians, as data are now abundant and cheap to collect. Long gone are the days of door-to-door surveys, and phone surveys are almost a thing of the past. Online surveys are everywhere due to platforms like Lime Surveys and the powerful Google Consumer Surveys that make it easy to collect and analyze survey data. Log file data is free and overwhelming in size. There’s even software geared towards non-statisticians that automates statistical analyses.
My main recommendation – include only users impacted by the change in your analysis; exclude users who are not.
- Let’s say you have an e-commerce site. You want to test whether certain changes to your checkout page would increase conversion (% of users purchasing).
- You want to run a 2 x 2 Multi-Variable experiment with 1 control and 3 treatment groups.
- Your current conversion is 5%; you want to detect conversion changes as small as 10% (with the conventional 80% probability of detection and confidence level at 95%).
- According to this table in my blog post, you would need 30,400 users in each group, or 30400 x 4 = 121,600 users in total visiting your site. (That’s a lot!)
To calculate how many people you need in your experiment, you need to know 3 things:
1. How many groups are in your experiment?
- In an A/B experiment with a control and treatment group, you have 2 groups.
- In a 2 x 2 Multi-Variable experiment with 1 control and 3 treatment groups, you have 4 groups.
- The more groups you have, the more people you need.