Restraint in Analytics
One of the reasons I joined Jana a year ago was access to a massive database on how people use smartphones all over the world. The research behind nearly everything we “know” about human behavior comes from studies performed on college students in the United States and Europe (myself included). Though most of us are perfectly nice, we are WEIRD people: Western, Educated, Industrialized, Rich, and Democratic. And we just don’t know as much about people who aren’t WEIRD. Mobile phones promise to change this. Equipped with a variety of sensors and used to do more and more things every day, the data they collect provides an incredible window into the lives of their users. The academic was (and still is) overflowing with questions that could be answered given the ability to measure and analyze data on millions of people in understudied populations.
As the first data engineer at Jana, I discovered that one of the most important skills I needed to develop was restraint.
With all the buzz and hype surrounding “Big Data” and “Data Science”, it’s easy to be swept up in the endless stream of cool blog posts and github projects that fly across social media. When you have a massive, novel data set at your fingertips, you naturally want to emulate the popularity of blogs like FlowingData and OkCupid Data Blog . But in my first few weeks, it became clear that blog posts were not what Jana needed. We had done a great job of producing valuable data, but we were growing so fast that there was hardly any time to look at it. We had dashboards and an A/B testing framework to track important metrics, but we lacked a fundamental understanding of how those metrics causally related to each other. We needed to know where our levers were and what numbers they moved. We needed a process and tools to make finding levers easier.
This type of work isn’t always sexy. It doesn’t involve deep learning or GPU-accelerated clustering algorithms (although we love both) and it doesn’t produce viral blog posts. Instead, we are writing SQL queries that product managers and salespeople can use. We are building domain knowledge on how our business works so we can build a simple causal model. Honestly, it involves a lot of spreadsheets.
Over this past year, I’ve come across many data sets and thought about a neat study we could do, or a new algorithm we could apply. But I realized these things aren’t high leverage for a company at our stage. Restraint is not about sucking all the fun out of everything. Restraint is about looking at where your company is and what it needs to succeed, and prioritizing those things first. I see evidence of this in most successful companies. It took Google over 10 years of tedious backend work before they started building personalization into Google Maps. Spotify’s Discover Weekly playlist — perhaps my favorite data powered product feature of all time — came 9 years after the company’s inception. Even Facebook is still struggling to get its News Feed algorithm right. Different analytics projects are needed for companies and products at different stages of maturity.
Whenever I have an idea, I write it down. After a year at Jana, and now with a team of four awesome data engineers, I get to dig into that idea backlog. We have been building out our personalization, targeting, and recommendation pipelines. We are also getting more sophisticated about the algorithms and tools we use. I still spend more time in spreadsheets than in Hadoop or Spark, but I’m comfortable knowing that what I am doing is highly valuable and that when we get there, we’ll have a great foundation to support us.
If you’re interested in the foundation we’re building, check out our careers page!