python

Python and Non-uniform Random Distributions

I ran across this awesome site, emojitracker, the other day. If you love data like me, you might recognize (or otherwise have guessed) that the distribution of the usage of emoji is probably a pareto or zipf distribution. I downloaded a snapshot of the data and it’s a reasonable match:

Screen Shot 2015-11-30 at 3.02.55 PM

A semilog plot of the frequency of the 500 most common emoji from emojitracker (blue line), and some pareto distributions with different values for alpha and scale (red, green, and purple lines).

These particular non-uniform distributions are common in all kinds of applications. In our work, for example, a zipf distribution maps fairly well to the distribution of activity across countries, across particular advertising campaigns, and even across the behavior of users (that is, there is a set of very active users and a long tail of less active users). This can be helpful in making simulations better mirror real-world behavior; some of our integration tests take advantage of it.

dices-160005_1280

You may not know that Python has some non-uniform random distribution functions built-in: scroll down to the bottom of the random module’s documentation and you’ll see lots of functions that can be useful for simulations: gauss, expovariate, and even paretovariate!

In practice, though, you may find paretovariate difficult to use, because it can return an arbitrarily large number. For simulations like the ones above—where there are a finite number of, say, emoji to choose from—you really want a finite zipf distribution. Fortunately, it’s pretty easy to implement:

Altering the method to cache the cumulative weights, and to take optional shape parameters (α and s), is left as an exercise to the reader ❤

Also, we’re hiring!

 

Discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s