At Jana, we run a lot of A/B tests on our app mCent. We use experiments to answer questions like: which call-to-action is the clearest to our users who want to start messaging, or which registration flow is easiest for our users to complete? With many of these experiments, we run them for several days, sometimes a few weeks, choose the winning variant, and ship it. In a few cases though, we decide to keep a small holdout group for a much longer period of time to understand the long term effects of a feature on engagement.
Last month, we built the ability to show notification badges on our app’s icon on Android. We saw that this increased DAU and the frequency with which users came back to our app. While we were excited about that, we were concerned that over time, this would become an annoying experience for users – would they eventually ignore the badges, or turn off push, or worse, uninstall the app entirely?
To address this, we set aside 1% of our users who would never see the badge and compare them to the 99% of our users who now experience this feature. Through this holdout group, we were able to verify that across different cohorts of users who joined our app, those with the badge continued to come back to mCent more often and people kept push notifications on.
The 99%/1% split isn’t the only way to do it. I’ve seen teams at other tech companies hold out small countries from certain features that require strong network effects – like messaging apps. I’ve also seen other teams hold out a 1% of their users from all the features they build in a quarter to determine what the combined impact of all of the changes would be over time. Whatever you decide, it’s going to be helpful in understanding the value your feature is bringing to your users.