In my last post I talked about how I use curl to help me integrate with new vendors. While that’s always a fun challenge, the reason I think my team is so kick ass (well other than that it’s comprised of incredibly talented and fun people) is that we manage the core infrastructure that those vendors interact with. We all know that the internet is fallible. Everyone has a bad day from the isp, to the vendor, to the mobile carrier. Just because things go down, doesn’t mean we don’t still love them. We’ve got a lot of moving pieces, which can be a liability. But it also means we’ve got a lot of different things to try.
Our system is built on an idea of failover and retry. Which means we work hard to make sure that we don’t just have one connection to all 300+ operators, but many. In our biggest markets (like India) we have 5 different routes. The goal being that if one goes down, we have backups. Here’s what’s worked for us:
1) Vendor redundancy. Without this, we don’t have any other options – we have a goal every quarter to integrate new vendors to improve both our reach and our reliability
2) Pricelist redundancy. Pricelists tell us what products a vendor offers (for example in India we can give a member 10 INR, 20 INR etc at a time). We can have 10 vendors in India, but if only one of them can top up a member on 10 INR, we’re just as vulnerable as we are with one.
3) Detect failing products and disable them. We have a cron job that runs once a day and checks on all of the different products we offer and lets us know which ones have been failing over the past 2 days. Vendors get information from operators, who then give the information to us. It takes a while to filter through to an engineer who has time to make the change in the codebase. We started relying heavily on the alert to tell us when products become unavailable. We also added a web tool that lets us disable products temporarily until we can get in touch with the vendors to make sure we know what’s going on.
|Vendor||mobile_brand||denom||% Success||# Transactions|
4) Improve vendor relationships (thanks Juliet!). This has actually been the biggest help to us, being on top of vendor payments, getting postpaid accounts and technical contacts who respond quickly and helpfully has been invaluable in improving our reliability. I think as engineers sometimes we forget that there are non-technical solutions to some problems.
We’ve got big plans to add to this secret sauce in the coming months and make the system more agile and more reliable. Want to help with this super cool project? Or just want to know our ideas? Get in touch! We’re hiring!