News & Views | Thought Leadership | Bottle Rocket

Critical systems buckled under coronavirus traffic. Could it have been avoided?

Written by Bottle Rocket | Mar 25, 2020 8:12:00 PM

Unemployment websites and other services are bending and breaking under traffic pressure. What could be done to fix them?

The coronavirus pandemic has sent shockwaves through systems across the web, perhaps none more so than the unemployment benefits websites that were not set up to deal with the sudden crush of traffic.

Tens of thousands of workers have been laid off or furloughed as cities, states and countries issue shelter-in-place ordinances. In the U.S. alone there's predicted to be around 2.25 million unemployment claims from last week, up from around 281,000 the week prior.

The cascading effects on benefits systems, many of which are based on outmoded technologies designed for much lighter traffic loads, have been devastating. The problems could have been avoided — but experts say getting organizations of all kinds to embrace better technologies before it's too late can be challenging.


MAJOR SLOWDOWNS

In some areas of New York, claims were up over 1,000%, according to a representative of the state's department of labor. As a result, the state's site has crashed for people looking to apply for benefits.

"To address this unprecedented increase, we have added server capacity, increased bandwidth, extended phone hours, and dedicated more than 700 staff members to address the influx," Deanna Cohen, the deputy director of communications at the department, told Protocol. From March 16 to 21, the state's site had 2,270,300 web hits and saw around a 400% increase in logins on some days, Cohen said. "Our representatives are doing the best they can to make sure everyone is served, even if it takes longer than usual," she added.

In Ohio, jobless claims shot up from under 5,000 to nearly 140,000 in a single week, and the state's system buckled under the pressure, and briefly went completely offline on March 23. "During previous downturns in the economy, claims came in waves as the recession worsened and industries began to shut down, whereas these claims came in all at once," Bret Crow, a representative for Ohio's Department of Job and Family Services, told Protocol. "This amount of claims in this short expanse of time would tax any online system."

Crow said the state has been "working around the clock" to add servers to deal with the additional traffic. The state's current unemployment website was built in 2004, back at a time before modern smartphones, and when a large slice of the American population still got their internet from AOL. Crow added that Ohio is working with Sagitec Solutions (a leading provider of unemployment systems that Crow said did not build the original site) to modernize the state's unemployment system. He was not optimistic about the short-term prospects. "Given the complexity of UI benefits, it will take a minimum of 24 months to configure the system to match our law," Crow said.

Tens of thousands of new applicants in the U.S. had a similar effect on benefits sites across the country. Pennsylvania, Oregon and New Jersey, among other states, all saw their employment and labor sites slow or crash under the unprecedented demand. In New Jersey, the state's unemployment benefits site is asking applicants to log on at different times of the day based on the last four digits of their Social Security number to help mitigate the crush. In Colorado and New York, the state is requesting citizens to apply on different days depending on the first letter of their last name.

And the problem is global: Australia's MyGov, a site for accessing government services, slowed to a crawl last week after thousands more people than usual attempted to apply for unemployment benefits. The government initially claimed it was a DDoS attack. In the U.K., reports suggest that thousands of people have been left waiting in a digital queue to access the Universal Credits benefits system.


THE PROBLEM OF SYSTEMS NOT BUILT FOR SCALE

For the most part, the backbone of the internet has held up admirably under increased traffic, and many modern sites are built atop cloud services that allow them to scale with demand. But sites that are built on older, more rigid software and hardware, like many unemployment sites, are struggling under the pressure, experts say.

"Any piece of hardware or any piece of software has some maximum capacity," said Josh Chessman, a senior director analyst at Gartner who covers the IT industry. "That's just the reality of life, it doesn't matter if it's your phone or a server or a mainframe — there's a limit to what it can do."

When building web applications, Chessman said, you'll have a plan in mind: say you're expecting to get 1,000 visitors, you may build it to be able to maintain 2,000 visitors, to give yourself a little leeway. "The problem comes when you're in that position, but all of a sudden you go from X to 500 X overnight," he added.

And it's not just the front-end architecture of these websites that's buckling. The database systems managing people's private data — things that companies and governments have been reluctant to move to the cloud — are also straining under unintended use.

"Very often these back-end systems are running on technology that could be 30, 40 or 50 or more years old," Chessman said of these sites that often struggle in high-stress situations. "You've put something in place that works really well, and you put it in place in 1970 or 1980 and the problem is if it's really working well, nobody wants to upgrade it because (a) why?, and (b), it means downtime."

Outside the world of benefits, other suddenly essential services, including some that should be technologically adept, have been stung, as well.

U.K. retailer Ocado, one of the early success stories in online groceries, experienced pretty much the exact issue Chessman outlined. Demand for grocery deliveries has skyrocketed as fewer and fewer people are able to leave their homes. To deal with the sheer weight of customers attempting to book deliveries, Ocado put up a digital queue page, much like with the Universal Credits system, where thousands of customers waited for hours to potentially get groceries. The company compared the demand to a "denial-of-service attack," had to pause new customer signups, and even suspended service temporarily to get through the backlog.

"At some point … just adding more servers isn't making it faster because of something else," Chessman said. "It could be communication between servers, back-end stuff — that's another issue that can happen and cause problems."


IT INERTIA VS. DIGITAL TRANSFORMATION

Across the board, experts agree that while it's easy to fault organizations for not adopting newer, more resilient technologies, it's understandable that administrators haven't leaped to upgrade systems that are fundamentally working most of the time. But in light of their current failures, it's possible to fix the flaws, given time, willingness — and cash.

"Things change. It's not like these systems were built poorly 10, 15, 20 years ago — they were just built [with] 10, 15, 20 years ago's knowledge," said Peter Klayman, head of strategy at Bottle Rocket, an agency that works with companies on digital transformation. "Now, if you know that you have cloud capacity, it changes the way our enterprise architecture practice approaches problems because they're able to architect it for scale."

Some companies have leaned into the digital transformation that has been taking place over the last decade, and are better-suited to be flexible in pressing times, Klayman said. He pointed to fast-food restaurants like Chick-Fil-A and Burger King, as well as sites like Netflix, which built their systems on cloud platforms that will allow them to scale as demand does.

But for companies looking to offer some new mobile solution — such as curbside pickup of food — or build out a robust digital experience for thousands of new users, executives should be wary of what's really going to be possible. Chick-Fil-A's app was in continual development for over three years, Klayman said, and isn't something you can just throw together. "You can't just stand that stuff up right now," he said.

Even businesses that want to take advantage of the coronavirus pandemic — such as food and delivery companies looking to add curbside drop-offs or scheduled deliveries into their apps — are going to likely struggle to get anything done quickly. "Everyone's running around with their hair on fire," Klayman said. "It's very difficult to get the appropriate number of stakeholders together to actually launch a new enterprise endeavor."

"That's quite a lot of logistical undertaking and change management to do at a time when everyone is stressed, when people are concerned about their jobs, when people are concerned about their health," Klayman added.

"If you are on these legacy applications, even small changes, it takes our clients months," Wipro Digital President Rajan Kohli told Protocol. In some companies it can take six approvals in six months just to get a small change made, Kohli said. "That's how complex the environment is: It's not that the people are slow — it is because it touches six systems on four different release cycles, and by the end of it, they have to wait six months to just to get that small change in."

But after the virus has started to recede and life starts to return to normal, should companies begin to invest in more durable, flexible systems? It really depends on what they do.

"It's almost always a case of money can fix it, but it doesn't make sense to spend that money upfront or deal with it afterwards — or hope it never happens," Chessman said. But not every business is likely to see a cost-benefit analysis where the likelihood of another pandemic happening anytime soon means they should invest in their systems.

"I wouldn't say they should because why spend the money on something that you might need once every five years, 10 years versus an Amazon or somebody like that who is pretty critical to the country right now," Chessman said.

This article was originally published on Protocol.com by Mike Murphy