At Jana one of our core values is “Measure It” — turn the abstract into the quantitative, so that we aren’t guessing what works and what doesn’t, so we aren’t just relying on the HIghest Paid Person’s Opinion. On the whole, I doubt anybody, including myself, would debate the wisdom of this. But what is “it” that we are measuring? Are there times when whatever we’re measuring can be wrong, misguided, or just a red herring? How do we know we’re measuring the right things?
Even with such a quantitative company culture (a great thing) we still fall in the following trap at times. We start out with a statistic that the industry believes to be an indicator of success or failure. So for example, mobile app developers care about retention of its user base, because retention is a logical measurement of audience engagement. To make that metric more tractable, it gets simplified — we make up a heuristic that should do a good job representing retention overall — for example, how many users still have the app 7 (or any number X) days after they installed it?
Even at this point, we’re all still okay: the client, who decided this was a good quick way to measure things in the first place (plus, “everybody measures it this way now!”), feels (s)he has a good proxy. But then things become like a game of telephone, and the original purpose of having that quick, relatively easy-to-calculate stat gets lost, after (let’s say) an account manager tells a product manager, and a product manager tells a few engineers…. and pretty soon 7-day retention accidentally becomes a goal in and of itself.
We all start building tools — analytics to monitor 7-day retention (harmless so far) … campaigns to incentivize users to come back to an app repeatedly (still okay…), an internal KPI to maximize day 7 retention (here’s where it goes wrong), and finally the worst offender — campaigns that incentivize users to come back on day 7. We forgot the original purpose of why that was a good measurement.
7-day retention was (was…) a good KPI partially because (a) someone, at some point, found it to correlate with whatever they really care about (engagement) and (b) probably also unfortunately because it just became trendy.
Still all seems good and well until the inevitable happens: the industry, fickle as it can be, settles on a new KPI of the day. (Just google ‘mobile app success measures’, see how many articles and different metrics you’ll stumble on.) “Well we found that 7-day retention doesn’t matter as much, it’s whether users stream at least 10 songs, that’s what does the trick!” Account managers scramble to find an answer, product managers have to wrestle with re-prioritizing, analysts and engineers start building new code … and we’re already behind.
The problem is the metric (whatever metric) accidentally became the goal and not a measure of the goal.
In season 3 of The Wire, the Baltimore police leadership demands a 5% decrease in felony rates and a murder cap of “only” 275 bodies. Bunny Colvin, the resident skeptic, responds “Deputy, as familiar as we all are with the urban crime environment, I think we all understand there’s certain processes, by which you can reduce the number of overall felonies. You can re-classify an agg assault or you can unfound a robbery. But, how do you make a body disappear?”. I.e., messing with numbers to hit a KPI, instead of using to KPI just to monitor progress on a real initiative.
In season 4, the theme carries forward into “curriculum alignment”, where the city’s school system teaches questions from the standardized test directly — foregoing teaching the underlying skills that the test was originally created to measure. The lanky character Pryzbylewski, former cop turned teacher, comments: “I don’t get it. All this so we score higher on the state tests? If we’re teaching the kids the test questions, what is it assessing in them? … Juking the stats … Making robberies into larcenies. Making rapes disappear. You juke the stats, and majors become colonels. I’ve been here before.”
(I probably have developed a bad habit of taking any life situation and saying, “This is just like in the Wire when ” and everyone, to their credit, stops paying attention to me.)
In both of these examples the metric trumps the real goal.
I don’t think the problem is quite as grandiose as one of the running themes of The Wire — and definitely in the show there’s an air of conspiracy that isn’t really present in our case. But the point of the analogies is just in showing general tendency for people to misuse metrics and statistics — and it’s usually not something we do consciously. In mobile app performance measurement we ultimately need to get better at staying cognizant of the real goal and remembering that the metric is just a proxy for it — in other words, avoiding making the means the end, and try to never let the metric wander too far on its own.
In a future post I’ll talk about some ways of thinking about solutions to this, probably with some more borderline analogies.