Friday, August 31, 2012

Facebook makes big data look... big!

Oh I love these things: http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/

Every day there are 2.5B content items shares, and 2.7B "Like"s. I care less about GiGo content itself, but metadata, connections, relations are kept transactionally in a relational database. The above 2 use-cases generate 5.2B transactions on the database, and since there are only 86400 seconds a day, we get over 60000 write transactions per second on the database, from these 2 use-cases alone, not to mention all other use-cases, such as new profiles, emails, queries...

And what's the size of new data, on top of all the existing data, that cannot be deleted so easily, (remember why? Get a hint here: http://database-scalability.blogspot.com/2012/08/twitter-and-new-big-data-lifecycle.html). A total 500+TB is added every day, I would exaggerated and assume 98% is pictures and other GiGo content only to leaves us with fuzzy new daily 10TB. There were times Oracle called VLDB to a DB of over 1TB, and here we have 10TB, every day.

So how do FB handle all this? They have a scaled-out grid of several 10000s of MySQL servers.

The size alone is not the entire problem. Enough juice, memory, MPP, columnar - will do the trick.
The but if we put this throughput of 100Ks transactions per second, it'll rip the guts out of any database engine. Remember that it translates every write operation into at least 4 internal operations (table, index(s), undo, log) and also needs to do "buffer management, locking, thread locks/semaphores, and recovery tasks". Can't happen.

The only way to handle such data size and such load is scale-out, divide the big problem to 20000 smaller problems, and this is what FB is doing with their cluster of 10000s of MySQLs.

You're probably thinking "naaa it's not my problem", "hey how many Facebooks are out there?". Take a look and try to put yourself, your organization, on the chart below:

Where are you today? Where will you be in 1 year? In 5 years? Things today go wild and and they go wild faster than ever. Big data is everywhere. New social apps aren't afraid of "what if no one will show up to my party?", rather they're afraid "what if EVERYBODY show up?"

You don't need to be Facebook to need a good solution to scale out your database

7 comments:

  1. The article is well written and amazing post to read everytime. Thanks for sharing this info.
    Social Network

    Video Search Engine

    ReplyDelete
  2. The article is well written and amazing post to read everytime. Thanks for sharing this info. Facebook Covers

    ReplyDelete
  3. The dimension alone is not the whole problem. Enough juice, storage, MPP, columnar - will do the key.

    AllenCarlos

    ReplyDelete
  4. I was just looking to see what type of data center software Google uses to store and keep tabs on all of its virtual assets. I guess it would be equally good to know what Facebook uses. Any idea?

    ReplyDelete
  5. Nice stuff. This is my first blog on scalability , looking for more information
    :)

    ReplyDelete
  6. This is so true. With so much data out there how are companies supposed to manage it and make the most of it all? The answer is, data center management software.

    ReplyDelete