I’ve noticed that many A/B test tools often look at a static snapshot of test results – e.g. 7 days after the test launched, what’s the difference between the average of X metric in the test group versus the control group?
Depending on the type of experiment you’ve designed, I’d recommend taking a look at the experiment results over time:
- There could be major novelty effects. I’ve seen many tests where we’ve changed an icon, or changed the color of a button, naively pronounced “People are doing this 2x more!” on the first day and then realized the effects were close to 0 the following days.
- There could be early adopter effects. If you are experimenting on mobile apps, often times your experiments are shipped on the latest release of your app. So the users experiencing the test are not only daily active users (who are likely more engaged than weekly actives or monthly actives), but also are engaged enough to be updating their apps promptly and frequently.
- People learn new behavior. In cases where we’ve made drastic changes to the core experience, sometimes we’ve seen major hits to all metrics that proxy for user experience – e.g. user retention, reviews, revenue. But with things that do shake up the whole app, looking at the results over time allows you to know if this is either a one-time step change in the way people use the app, if it’s a continuous negative spiral, or if people adjust to the new experience over time.
I’m all for easy A/B testing tools with clear visual snapshots, but let’s be mindful that in data, there can be more to the story.