Saturday, April 21, 2012

Impressions from Amazon's AWS Summit in NYC

Yesterday (4/19) I attended the AWS Summit in NYC (http://aws.amazon.com/aws-summit-2012/nyc).

I'm a big fan and also a heavy user of AWS especially S3, EC2, and naturally, RDS. In every point in time I have several dozens of AWS machines running for me out there in the East region, and in some cases when we do some special benchmarks and tests, number of EC2 and RDS machines can easily reach 3-digit. As I said, I'm a fan...

A few quotes I was able to catch and document on my laptop, on my laps...:
"When you develop an app for facebook, you must be prepared (and be afraid) that to your party, not noone will show up, but everybody will show up!"
So true! Simple and true. We all want to succeed, to have success with our app. We have to think about scaling from day 1.
"Database was bottleneck for building of sophisticated apps. This is no longer the case when building DynamoDB".
The quote above was about DynamoDB which is an excellent new NoSQL service by AWS. But we all can think about YesSQL databases and hope and wish and make it the same. Databases, good old RDBMSs, are great for applications, they offload a lot of complexity, SQL is a rich language and API to access data, it leverages existing skills and it allows ACID. RDBMSs also should not be a  bottleneck for building of sophisticated apps! They should be able to scale.
"How people really want to interact w the database? not by 'how many servers' but with 'give me a DB to handle 1000 reads, 10000 writes'. That's all. Users want a situation when you cannot run out of space, you cannot run out of capacity."
Inspiring. I couldn't agree more. A service is a good service when it hides away all complexities, gives me a URL and, boom, everything works. AWS are getting there no doubt, and whoever provides a product or a service (including myself...), should work according to this quote!
"RDS has 2 push button scaling: Scale-Up or Scale-Out, read replica, or, sharding... 'have the applicaiton go to the right shard'"
This is a quote said in the excellent Solutions track seminar "Building scalable database application with Amazon RDS". As I said, RDS is an excellent service. It's capacity to be a "service" and being automatically tuned, backed-up, upgraded, etc. - is impressive. The ability to ensure transparent high availability across Availability Zones (Multi-AZ) and have read-replica(s) set up with a click is no less than phenomenal. However in the scale-out department I think the solution is good, but not excellent. The support for read replica is great but it covers only the transportation of the data between the databases. it leaves the application with 2 or more IP addresses to deal with, route reads and writes, handle replication lag consistency and so on. In the sharding department, it's even less complete as, while I can spawn RDS servers as much as I like, the application need to do all the command routing to the right shard and also handle the transportation of the data. It's quite far from the vision I see in the 3rd quote above, quoted and inspired from Dr. Werner Vogels. I think this good service by AWS can can be completed to become and excellent service with a 3rd party products, such as ScaleBase.

In addition to the above quotes, I enjoyed hearing a good scale-out case study from Pinterest (http://pinterest.com), who invested in sharding themselves over almost 70 RDS databases. See here a good article about Pinterest's case: http://www.itworld.com/software/269670/amazon-cloud-set-stage-rapid-pinterest-growth

I just love those case studies. Every one of those, especially by my prospects, customers, partners, makes me much smarter and my products much much better. If you have a scale-out story - don't be shy to share!!

A quick update: Look at this article, http://econsultancy.com/us/blog/9669-amazon-s-cto-highlights-seven-transformations-cloud-services-will-enable. Search for the "Transformation three: We're moving from scaling by architecture to scaling by command". Good statement about database scale-out.

Wednesday, April 18, 2012

So how can we scale databases?

There are ways to scale databases, unfortunately some are limited, some introduce complexities, some are do not fit the cloud...

By scaling solution I mean a solutions that help me scale my existing environment, my existing RDBMS. Some magic or technology that will take my existing Oracle or MySQL for example, to the next level, without porting to a new DB engine/vendor and without completely recoding my app.

Let's try to organize things a bit in this very summarized table, just to get the hunch of it. I can't imagine to cover it all in 1 table or even 100 pages, but that should be a start of a meaningful discussion to continue in next posts:

Solution
Scales reads?
Scales writes?
Scales data?
Scales sessions?
Cloud?
Bottom line
Scale-Up: faster HW, CPU, memory, disks, SSD
Y
Y
N
N
N
Costy, limited
Shared disk cluster: Oracle RAC and similar
Y
Y--
N
N
N
Costy, hard to implement, might damage non-read-mostly apps
Replication based
Read/Write splitting
Y
N
N
N
Y
A valid solution, easy to implement, limited
Multi master replication
Y
Y--
N
Y--
Y
Strict data ownership is a MUST to enjoy any advantage
Scale Out (Sharding?)
Y
Y
Y
Y
Y
A valid solution, might introduce major complexities...


I think it'll be safe to say: there are solutions, there are supporting technologies, however database tech alone is not enough. To really get the benefit from a shared-disk cluster we need to really know what we're doing, not to be overthrown by disk latency or inter-db-machines network noise traffic. Multi-master replication is a recipe for disaster, conflicts, split-brain, loss of data - without proper data ownership definition and enforcement. Shared-disk-cluster and multi-master replication (3 versions of it) exist in Oracle for over 10 years. And still, they can't solve all RDBMS scaling limits because those technologies alone are like double edge swords, should be handled gently by experienced craftsmen, and with good integration with additional concepts and tools.

No, having the long time waited multi-master replication available for MySQL, will not solve scale issues for MySQL, it will not bring piece to the force of geo-clustering etc. Without proper data ownership, proper design, it'll just introduce all the same all flaws Oracle DBAs have been dealing with for the last decade...

And for scale-out, it's absolutely great, but it's very hard to do it with data and databases.
Sharding - it's like the database outsourced scaling to the application. Application developers should concentrate in the application logic, strive to making it better, make the business competitive with new features. Every time an app developer spends time on database specific matters is a poor case of efforts waste and a skill mismatch.

It shouldn't be a surprise, that in recent years I was seeking a solution to scale out over to standard databases building blocks. If you want - a solution to that will bring that obvious advantages of sharding, but without the pains of doing it in the application tier... I gather we have been quite successful in introducing that in the MySQL database.

In future posts I'll drill down and elaborate on rows in the table above, feel free to add and comment, I'll address any comment!!

The light in the end of the tunnel is that the basic building blocks for solving database scalability are there! There're still not a safe, well packaged, polished solution like an iPhone4 that can be easily used by my 1 year old youngest daughter. Those building blocks still need to be put well together into a solution by experienced professionals, or with tools and design, 3rd party products and lots of thinking... It's not a simple task...

Tuesday, April 17, 2012

Applications come and go. Databases are here to scale.

In my heart, I'm a DBA, always was and always will be. People say I'm a database guy by the way I think, keep my car, and file my music and also bank statements... However I did great deal of development, design, architecture on the apps side. I (hope to) have some perspective.

Applications come and go. The second programming language I've ever learned and worked on was COBOL, some still say most of the world's lines of code are written in this language, maybe so, but anyway I since then have known and written in dozens of programming languages, from Assembly to Force.com, from Pascal to Delphi, from functional C to Object Oriented SmallTalk, C++, Java and , from compiled C/CGI to interpreted Perl, ASP and Ruby back to compiled node.js... My first applications ran on Main-Frame with green screen, later I created beautiful graphic client-server applications, later I had to create hideous white web applications (like the green MF), later Ajax, Flex and HTML5 made it client based again... And today we call them Apps...

Applications come and go, redesigned, refactored, rewritten. They should. 2 things are constant in the business universe, and those are any business's real assets. Users and data.

Applications are the pipe to give data to users, let them generate and modify the data. And in this universe those are also expanding. Any business wants more users, more customers, more business, more data. Data is never deleted, forever growing, written and updated, can be read many times, can give intelligence to the business when analyzed even years after generated. Data is always audited, backed up, 100% available. DBAs make highest salaries in IT, and DBMS is the most expensive software on the enterprise's shelf. Ask Larry Ellison, rumors say he took Sharon Stone to hang out in a Mig jet fighter. And he can relax, migrating and porting a database is one of the hardest, riskiest operations known to the CIO.

Applications should keep pace with times, trends and fashion, make the users happy. Data, however, must NEVER be compromised. No data should be generated as a silo/an island/isolated. Data WILL live a lot after the application that generated it will be replaced with another, and the programmer who wrote it will change jobs or retire. The data will be integrated by other apps, new generation apps will use it, reports and online analytics will analyze it, enterprise data marts and data warehouses will ETL it.

While there are hundreds of ways to develop an application, 97% of the world's structured data is in less then a dozen environments, all of them are RDBMS. I've worked with most, but Oracle and MySQL are the ones I have most mileage in, most scars from (DBAs always count scars...).

Today, those good old RDBMSs are under attack. RDBMS cannot scale to cope with throughput, data size, concurrency, complexity, distribution, virtualization, consolidation...

The need of good, reliable, interchangeable data is stronger than any man or any trend. Nature will correct itself, and we're living in interesting times. Applications come and go. Databases are here to scale.

Further in this blog let's try see why, what can be done, is done and will be done in database scalability.

Welcome to the database scalability blog.