Well, scale-out cannot be achieved with a shared storage and the word "shared" is the key. Scale-out is done with absolutely nothing shared or a "shared-nothing" architecture. This what makes it linear and unlimited. Any shared resource, creates a tremendous burden on each and every database server in the cluster.
In a previous post, I identified database engine activities such as buffer management, locking, thread locks/semaphores, and recovery tasks - as the main bottleneck in the OLTP database, handling reads and writes mixture. No matter which database engine, they all have all of the above, Oracle, MySQL, all of them. "The database engine itself becomes the bottleneck!" I wrote.
With a shared disk - there is a single shared copy of my big data on the shared disk, the database engine still have to maintain "buffer management, locking, thread locks/semaphores, and recovery tasks". But now - with a twist! Now all of the above need to be done "globally" between all participating servers, thru network adapters and cables, introducing latency. Every database server in the cluster needs to update all other nodes for every "buffer management, locking, thread locks/semaphores, and recovery tasks" it is doing on a block of data. See here number of "conversation paths" between the 4 nodes:
reminder from Computer Science courses...). Imagine the noise, the latency, for every "buffer management, locking, thread locks/semaphores, and recovery tasks". Shared-storage becomes shared-everything. Node A makes an update to block X - 10 machines need to acknowledge. I wanted to scale, but instead I multiplied the initial problem.
And I didn't even mention the fact that the shared disk might become a SPOF and a bottleneck.
And I didn't even mention the limitations when you wanna go with this to cloud or virtualization.
And I didn't mention the tons of money this toy costs. I prefer buying 2 of these, one for me and one for a good friend, and hit the road together...
Shared-disk solutions gives a very limited solution to OLTP scale. If the data is not distributed, computing resources required to handle this data will not be distributed and thus reduced, on the contrary, they will be multiplied and consume all available resources from all machines.