We send hundreds of thousands of SMS messages at Jana daily. They’re a primary driver of our registration process and are a mission critical part of how we grow our application’s user base. As a member of the vendors engineering team, one of my primary responsibilities is ensuring timely and reliable deliverability of SMS messages to our users. Here’s how that works.
When a user signs up for mCent (our app discovery platform that helps make the Internet free), they supply us a phone number which we use for the rest of the registration process. After all the typical member creation steps, we initiate the send of an SMS to confirm that this is a live phone number with a real user at the other end.
First, we have to determine what mobile operator the user is on. To do this, we perform an HLR lookup. After we’ve got their operator, we need to figure how to reliably send the message. Jana is integrated with many mobile partners capable of sending SMS messages to operators around the world. We do this for breadth of coverage and depth of redundancy. Coverage means having a mobile partner whose infrastructure is capable of reaching the user – a tall order in some regions of the world. Redundancy means having more than one partner capable of successfully delivering an SMS message. We strive for both and in most cases we have at least two partners able to fulfill an SMS request for a given user.
Once we know the set of partners who we know are theoretically capable of delivering an SMS message, how do we know who can deliver an SMS message to our new user? Statistics! More concretely, we use historical deliverability data to inform our decision of who is most likely to get the SMS to the user’s device. Our algorithm looks like this:
- Loop over the potential partners capable of delivering an SMS message to this user’s mobile operator. For the trailing N hours, compute the ratio of messages sent via this partner and users who successfully confirmed their membership after receiving a message sent via this partner. Call this the “deliverability”.
- Compute a 95% confidence interval that determines the uncertainty of the deliverability. We are assuming a binomial distribution and compute our confidence interval using this formula
- Filter out any partners whose deliverability upper bound is lower than the best partner’s lower bound. That is, filter out any partner having a bad time who are unlikely to deliver a message right now.
- Sort the list of filtered partners based on the hypothesized deliverability rate and return a random selection from that list as the first partner to try.
- For subsequent send attempts (i.e., when there’s a bug or failure during an API call, when we receive a firm negative response from a partner’s API, etc.) we again determine the set of potential partners using the just-described approach, excluding any that have already been attempted.
- If we’ve exhausted all of the “good” partners and filtered out the rest, only then do we fall back to partners who may have been filtered out.
In practice, we find that this is a good approach for ensuring deliverability of these key messages: it’s a system that automatically chooses the likely best route to a user. We are able to quickly and automatically adapt to partners having operational issues with zero engineering intervention. Equally important, it automatically switches back to those partners once they have recovered.
Interested in delivering free Internet to the next billion quickly and reliably? Come work with us!