Comments on Pete Miron: How to Scale Backend Infrastructure

Great point, Mike. One thing I didn't mention...

2010-09-16T20:00:00.509-07:00

Great point, Mike.

One thing I didn't mention in the initial post is that an old boss of mine used to threaten to do "Axe Testing". Where he threatened to take a random box in the data center off line, and we needed to make sure the system stayed online. I've had a few boxes in AWS become totally inaccessible. We need to write a post on when bad things happen to good infrastructure.

- both plugs in a server plugged into same (failed) UPS.
- Data Center power runs generator but AC does not. (how hot did those machines get?)
- DNS failure causes app server to take 60 seconds to find its localhost IP (127.0.0.1) - I believe that was an Oracle JDBC driver.
- Machines auto-negotiate network down to 10mbps. (personal fave.)
- Nasdaq shut off switch due to NetBIOS chatter on misconfigured server.
- Shower pump floods server closet and takes out T1 routers.

Things will go offline in unpredictable, unimaginable ways.

I'd like to highlight an implied concept, whic...

2010-09-16T18:46:17.094-07:00

I'd like to highlight an implied concept, which is usually taken for granted until something bad happens: Build your systems with the idea of robust infrastructure. When you build systems that can really scale you know that hardware and software will break. Make your systems robust enough to withstand multiple hardware/software failures at the same time. Just because you can reproduce a system quickly doesn't mean you don't have a giant single point of failure. Provisioning and configuration management is not enough. The App must understand how to survive when components go offline.

In memory caching is the logical extension of gett...

2010-09-16T18:14:54.651-07:00

In memory caching is the logical extension of getting your db in ram. We use memcached pretty extensively. The market data systems I've worked on kept all realtime quotes in in-memory hashtables.

I've not personally used reddis, but was discussing it today as a possible replacement for a homegrown workflow system. If you use it, let me know what you find.

Nice write up Pete, thanks for sharing. I'm c...

2010-09-16T17:14:18.447-07:00

Nice write up Pete, thanks for sharing.

I'm curious what your opinion on in memory cacheing systems like redis or memcached is?