Saturday, November 16, 2024
39.0°F

Amazon failure takes down Internet sites

Peter Svensson | Hagadone News Network | UPDATED 13 years, 6 months AGO
by Peter Svensson
| April 22, 2011 9:00 PM

NEW YORK - Major websites including Foursquare and Reddit crashed or suffered slowdowns Thursday after technical problems rattled Amazon.com's widely used Web servers, frustrating millions of people who couldn't access their favorite sites.

Though better known for selling books, DVDs and other consumer goods, Amazon also rents out space on huge computer servers that run many websites and other online services.

The problems began at an Amazon data center near Dulles Airport outside Washington and persisted into the afternoon. The failures were widespread, but they varied in severity.

HootSuite, which lets users monitor Twitter and other social networks more easily, was down completely, as was questions-and-answers site Quora.

The location-sharing social network Foursquare experienced glitches, while the news-sharing site Reddit was in "emergency read-only mode."

Many other companies that use Amazon Web Services, like Netflix Inc. and Zynga Inc., which runs Facebook games, appeared to be unscathed. Amazon has at least one other major data center that stayed up, in California.

It's not uncommon for Internet services to become inaccessible due to technical problems, sometimes for hours or even days. But Thursday's outages were notable because Amazon's servers are so commonly used, meaning many sites went down at once.

Amazon did not respond to requests for comment. It has not revealed how many companies use its Web services or how many were affected by the outage.

No one knew for sure how many people were inconvenienced, but the services affected are used by millions.

Amazon Web Services provide "cloud" or utility-style computing in which customers pay only for the computing power and storage they need, on remote computers.

Seattle-based Amazon has big plans for AWS. Although it now makes up just a few percent of the company's revenue, CEO Jeff Bezos said last year that it could eventually be as large as Amazon's retail business. Competitors include Rackspace Hosting Inc. and Microsoft Corp.'s Azure platform.

Some people consider cloud computing more reliable than conventional hosting services in which a small company might rent a handful of computers in a data center.

If one of them malfunctions, the failure can take down a website. But "clouds" like AWS use vast banks of computers. If one fails, the tasks that it performs, such as running a website or a game, can immediately be taken over by others.

When a company needs more capacity, maybe because of a surge in visitors to its website, it only takes minutes to rent more computers from Amazon.

But cloud computing isn't immune to failure, either.

Lydia Leong, an analyst for the tech research firm Gartner, said that judging by details posted on Amazon's AWS status page, a network connection failed Thursday morning, triggering an automatic recovery mechanism that then also failed.

Amazon's computers are divided into groups that are supposed to be independent of each other. If one group fails, others should stay up. And customers are encouraged to spread the computers they rent over several groups to ensure reliable service. But Thursday's problem took out many groups simultaneously.

Outages with Amazon's services are rare but not unprecedented. In 2008, several companies lost access to their own files for about two hours when one of Amazon's data centers failed. The companies included DigitalChalk Inc., which delivers multimedia training over the Web.

In general, Amazon Web Services have been more reliable and, above all, cheaper than many other hosting systems, said Josh Cochrane, vice president of product development at Palo Alto Software in Eugene, Ore.

But the firm's websites and Web-based applications that create business plans were all brought down by Thursday's crash.

"It's a pretty vulnerable feeling," he said. "This is a really big message to us that we need to revisit our strategy."

That might include spreading the applications more widely over Amazon's network, so that problems at one data center won't bring down everything, he said.

Amazon engineers struggled throughout the day to rectify the problem. Leong said the problems are of a type that's not covered by Amazon's money-back guarantees.

ARTICLES BY