One of our core values here at Jana is “Measure It”. This has been a really great value to have—because it encourages experimentation, and lets us know when we’re killing it! We have enough traffic and enough ideas bouncing around the office that we needed to implement a good, simple, and extensible method to measure things on the fly.
We use Apache Kafka to pass around our analytics data and HDFS (the Apache Hadoop File System) to capture and store it. We have dozens of Hadoop jobs that aggregate the data into various tables and create reports. This is great for specialized data—data we know we want and have decided to put some thought and time into. But the process is a little overkill for a quick “hey I wonder if” type of question. For that, we have the counter table. This table takes a few standard keys (a tracking id, the number of times the event occurred, etc) but it also has 6 flexible keys—these are anything-goes string fields.
As an example: part of my job is ensuring that we’re giving our members money quickly and reliably. We’ve noticed lately that sometimes transactions are failing because we’re trying to top-up members up on the wrong operator. We have a sneaking suspicion that this is happening because members are changing their operator and trying with ones they know are wrong.
To test the hypothesis I put in counters that “fire” (get recorded) at the end of each transaction:
tracking_client.count( tracking_id=member.tracking_id, key1=’airtime_topup_operator_mismatch’, key2=’member_changed_operator’ if matched else ‘member_didnt_change_operator’ key3=status # (‘succeeded’ or ‘failed’), key4=operator_name, key5=predicted_operator_name, count=1 )
Then, after waiting a day or two to collect some data, I can run the following sql query on the counters table, no custom hadoop jobs required!
SELECT key2 as 'changed', key3 as 'status', sum(count) from counter WHERE key1='operator_mismatch_counter' GROUP BY key2, key3
and with a little math, I get:
Showing that members did succeed more when using the pre-predicted operator. Additionally, the percent of recalled transactions is almost double for members that changed their op. This supports our hypothesis that members attempt to topup with multiple operators, recalling transactions that don’t succeed quickly, including ones that they know our incorrect.
We’re in the process of migrating from MySQL to Amazon RedShift, but the idea will remain the same.
If any of these things sound interesting, we’re hiring 🙂