Blog

Cloud burst for disaster recovery

March 26, 2024

What if you could completely eliminate your charger’s disaster recovery (DR) system?

Yep, I said it. I mean, you would be insane to not set up a DR system for your main monetization system—your charger—but at the same time, have you ever really used your charger’s DR system? Talk about a total waste of money. Charging DR systems are a critical component of your organization’s risk management strategy, designed to operate quietly (albeit expensively) in the background to keep revenue flowing in the event of a disaster. Yet, whenever there is a rare but actual outage, most teams don’t roll over to DR, either because they think they’ll have the primary system up and running ASAP (so why bother?), or because of the real reason: the DR doesn’t actually work.

DR systems rarely fail over exactly as expected. That’s because as the twin of your primary charging system, you have to keep them in sync with your primary charger all the time. Activities like ongoing updates, replicating data, testing, and the coordination of the latest changes to both hardware and software are just some of the things you have to do to keep them up to date and usable. These systems are a pain to maintain, and when there’s competing priorities with the IT team, testing the DR system usually falls to the bottom of the list.

On top of that, there’s the very real problem that DR systems are expensive to set up. The infrastructure, software, and expert personnel required for these operations represent significant investments. Typically, they’re set up with identical capacity requirements… that sits unused. While the maintenance and expense of these systems require substantial resources, the cost of not having an effective disaster recovery plan in place can be far greater, potentially leading to irreparable damage to a telco’s operations, reputation, and financial stability. So, what do you do?

What if there was a better way, where you could save 90% of your spend, and still have a disaster recovery system that you could spin up in minutes, paying only for what you need, when you need it? You’ll be SHOCKED to find out that there’s a way to do it—if you use the public cloud and Totogi.

Cloud bursting

First, let me introduce a concept that you may not be familiar with: cloud bursting. This idea is when you have a hybrid private/public cloud configuration set up specifically to handle temporary peaks in demand. Companies use cloud bursting to manage spikes in traffic during marketing campaigns or short-term, resource-intensive tasks like big data analytics. Basically, it’s a cost-effective way to quickly and seamlessly scale up compute for temporary or variable workloads. What’s great about it is that it doesn’t require buying a lot of hardware in advance or pre-provisioning compute you’ll barely use. All the public cloud providers offer it; you can read more about it from Amazon Web Services (AWS), Azure, and Google Cloud.

With cloud bursting you can do manual bursting, where you go in and flip a switch to “turn on” the public-cloud component, or you can set up automated bursting for continuous distributed load balancing. For example, if your on-prem charging system reaches capacity, you could use cloud bursting to handle the extra traffic, directing the overflow to a public cloud. Or, say if your on-prem charging system goes down entirely, you could send all traffic to the public cloud to keep charging going. And voilà: disaster recovery for charging with cloud bursting.

Using cloud bursting for charger disaster recovery

Recently, a Tier 1 mobile network operator (MNO) approached us at Totogi and asked if it could use our Totogi Charging-as-a-Service on AWS for disaster recovery for its Ericsson on-prem charger. Instead of continuing to maintain an expensive twin backup system, the telco wanted to know if it could just burst to Totogi when/if needed. We thought this was a great idea, and it seemed both plausible and awesome. We also thought this could be a really great way to migrate off a legacy charging system: roll over, and never look back. So we said to ourselves—let’s go for it.

We started out trying to design the most basic, business-continuity type configuration. Just the basics. Whereas “business-as-usual” charging includes plans and promotions, custom rates for friends and family, and roaming, a “business-continuity” type configuration needs only the mission-critical stuff: simple rates and plans for voice, SMS, and data. A system like this may not even be set up to take on new customers, or even top ups. Just keep the lights on until the primary system is back up and running, anticipated to be in a few hours or days, and all other activities could go on hold for the duration of the emergency.

We knew we could connect to the MNO’s online charging system and integrate Totogi with its network. And because we’ve done scalability testing for Totogi, we knew we could handle its 20M+ subscriber numbers easily. But there were some other interesting problems we needed to solve since the primary system wasn’t also a Totogi system. Namely, how would we keep subscribers, balances, plans, and electronic data records (EDRs) in sync across the two systems? 🤔

For subscribers, balances and EDRs: that would be easy. We decided to set up a schedule to do a daily sync. We’d use Amazon’s S3 to exchange data—either data from the on-prem system to Totogi, or from Totogi back to Ericsson. We settled on synchronizing once per day. Each night we’d get a report of subscribers—those that were added or deleted. Same with balances: we could take a snapshot every night at midnight and sync the two systems. As for EDRs, any EDRs created in Totogi we could easily sync back to Ericsson using S3 as the repository. No problem.

Plans were a little more tricky because they depend on the source system—Ericsson’s plan design. Since Totogi was designed to be failover for only a very short period of time (hours or single digit days), the team decided to implement simple basic plans to cover the telco for the duration of the failover period. Not perfect, but a good enough design for what the system was designed to do—simple business continuity. We were ready.

But, like all things telco, the customer wasn’t ready to sign and was moving a little bit slower than us. The team was still unsure, and had lots of cloud questions. So we put the project on the back burner… for now.

Disaster strikes

Then this happened:

Real picture of complete loss of network starting on 2024-02-07. Source: netblocks.org

Like most telcos, this telco had two redundant on-prem data centers set up as active / active. One data center acting effectively as a DR for the other. But one day in early February, its worst nightmare came true: both data centers went down, and couldn’t come back up. A complete loss of network and services—including charging—countrywide. This was a scenario where the best laid plans of having an active / active setup couldn’t work, and with no end in sight to the outage, the customer was in a panic and ready to take the cloud bursting plunge with Totogi.

While most traditional (ahem, legacy) charging systems would typically take MONTHS to set up, the customer asked us if we could do it in THREE WEEKS. THREE WEEKS! For those of you in telco, you know how insane of an ask this is. But Team Totogi realized how important this was to the subscribers and the telco: it meant restoring critical services to an entire country and for the telco, the ability to re-establish service and collect revenue. We were all-in and we went to WORK.

Totogi Charging-as-a-Service first went live in 18 days with voice charging. Data and SMS charging came up in the following weeks. Although it was a system designed to be an emergency backup to the telco’s primary and disaster recovery systems, Totogi soon became its primary charging system. And what about its previous data centers and charging system? We’re almost two months into the outage and everything is still down. However, for this telco, it’s up and running and charging—all because of the public cloud.

While not all telcos will experience this kind of extreme outage in their lifetime, just imagine what you could do with Totogi Charging-as-a-Service as your disaster recovery system. What would you do with all the extra machines, people, and budget you’d free up? If you used Totogi, the system would be waiting to spring into action without the need to pre-provision capacity. It would just scale whenever you needed it, in minutes. I can’t promise we can deploy this in 18 days ALL THE TIME, but if you’re ready to take out the wasted cost on your charger’s disaster recovery system, we’re your gal. Just give it a try.

Want more info? Check out this case study and get in touch with the Totogi team today. It’s time to transform disaster recovery.

Recent Posts

Get my FREE insider newsletter, delivered every two weeks, with curated content to help telco execs across the globe move to the public cloud.

"*" indicates required fields

Listen

Ep 85 – What’s up with Totogi: 2degrees, a modern telco

New Zealand MNO 2degrees is tapping Totogi’s multi-tenant charger to streamline MVNO onboarding and support, and to capture additional revenue.

Listen now

Watch

MWC24: Danielle Rios Royston Talk – Unveiling the AI-First Future of BSS

At MWC24’s MVNO Summit, Danielle Rios Royston, acting CEO of Totogi, will show how AI is not just an add-on or a tool for optimizing existing processes for BSS, but a foundational technology that will shape how these systems are designed, developed, and deployed from the ground up.

Watch now

Read

Dead Vendor Walking – What to do when your vendor is TOAST

Watch out for warning signs that your telco software vendor is struggling and take these steps so its problems don’t become yours.

Read now