So the question was posed, and the potential buyer of the storage system was perplexed. His team was evaluating replacement systems and storage and comparing it to their existing production system. What they couldn’t figure out was why the old storage system was out performing the new.
Both old and new were running Oracle databases. There were architectural changes in place, with the old being a single Oracle 9i instance with direct attached stroage, and the new being a Oracle 10g RAC cluster with an NFS appliance on the back-end.
I wasn’t involved with the old or the new infrastructure. I was brought in to help them understand the perplexing results.
Because they wanted to only test the storage performance, their test consisted of the iterations of the mkfile command. After investigating the issue themselves, they learned that the performance problem was related to a required mount option – forcedirectio. When this option wasn’t used on the new system, they saw the type of performance both consistent with the older system and desirable. The next questions were, what was this option for, why was it required, and why would anyone want such an option that slowed down performance? Did it have to do with data integrity? They had tried push back on the vendors about this option but no one could tell them exactly what it did – only that it was required.
What this option means in Solaris is that all IO for this mount point will be performed using Direct IO. In this mode, the file buffer cache is bypassed. Writes go directly to storage, and reads are read directly from the storage subsystem. There are a number of benefits to using Direct IO that help with performance related to access serialization, access alignment, and double-buffering. With Direct IO, it is professed that near RAW device performance can be achieved. So why was it still slower?
The fundamental reason is that the test was flawed. The test was not testing storage performance. While mkfile was creating a file on disk, the writes generated by the command were being acknowledged as complete by the OS as soon as it was copied into the file buffer cache. They weren’t testing the storage performance. When I suggested that they use the same mount option on the existing system and rerun the tests, they found that indeed, they were experiencing better performance on the new storage system.
The question still remained as to why they would anyone want this option, because what was being observed was in fact worse performance? Data integrity isn’t an issue because that is related to the use of the O_DSYNC option which means that data goes to disk before the write is acknowledged as complete.
This option isn’t always a great option. For database systems, it usually is. The reason is that we can give that file buffer cache memory to the database instead. The database is better at buffering its blocks than the OS. When the DB issues a read, the data should go directly from disk to DB cache. On a write, we want it to go directly from DB cache to disk. We’d expect better overall performance with this configuration. For databases systems that experience high concurrency, we are in even better shape for good performance because writers to the data files aren’t blocking out readers. Access to the data isn’t serialized.
It becomes easier to see how a database system can benefit from Direct IO. Of course, there is no substitute for actually testing the application, and that is really the only time when we can feel better about things, isn’t it. If only things always worked out as I thought they should.
One situation that I can think of where Direct IO might not be a good idea is for a database system that doesn’t have high-concurrent access and can’t take advantage of larger cache sizes. If it is limited to a 32 bit memory footprint, but the system has much more than that available, then the performance might be better by having the OS cache those file buffers.
If there is a moral to the story, I suppose it might be that when testing performance of a new system, test the performance of the applications that are actually going to be running. That is what matters in the end, no?