Okay, it’s a bit of a hyperbole. But let me elaborate.
I regularly bump into SQL Server discussions, online or elsewhere, in which I hear people speaking with confidence that an operation is so and so because it is doing sequential disk I/Os or because it is doing random disk I/Os. That always makes me wonder how they know and how they can be so confident about that. In most cases, I absolutely have no idea whether the disk I/Os are sequential or random.
It makes sense to talk about I/Os issued by SQL Server being sequential or random at the SQL Server level. For instance, if a SQL Server operation (e.g. a table scan and a backup) accesses the database pages in the file page number sequence, it’s considered issuing sequential I/Os, and SQL Server certainly has control over that. Similarly, if you write a program (outside of SQL Server in the OS) to access a file, you can control how your I/O requests are issued with respect to how the data is laid out in that file. Whether the I/Os are sequential or random are relative to that file. No question there!
However, in today storage environment, there is usually a very long path and many layers of API calls between the SQL Server operations that we are interested in and I/Os on the physical drives (i.e. spindles and platters). It’s rare that you can control how your I/Os requests at an application level are serviced at the physical disk level. In lot of cases, you don’t have any visibility into how your I/O requests are eventually handled on the disk platters.
In addition, unlike other I/O access characteristics, whether I/Os are sequential or random, or to what extent they are sequential or random, is almost never exposed as any measurable metric. If you can’t measure or monitor whether I/Os are sequential or random, you can’t be so sure whether they are sequential or random. I don’t know where people get their confidence in talking about sequential disk I/Os or random disk I/Os.
All is not lost though. For all the intents and purposes of discussing sequential vs. random disk I/Os, we can turn our attention to large vs. small disk I/Os.
First of all, disk I/O sizes are measurable metrics and they are readily exposed by many tools.
In addition, when people speak of a database system such SQL Server being optimized for sequential I/Os, they really mean to say that the system is optimized for large I/Os. Even if they didn't mean it, you are free to interpret it as such. Looking from an I/O perspective, you can view key database techniques such as database checkpoints, read-ahead reads, and transaction logging as methods to optimize for large disk I/Os (among their other purposes).
On a disk platter, fewer sequential disk I/Os are indeed much more efficient than many random disk I/Os for accessing the same amount of data. The bet is that when you make a single large I/O request at the application level (e.g. SQL Server or your own code), it has a high probability of being translated into sequential disk I/Os at the lowest level (i.e. disk I/Os on the spindles and platters). Or at least, the chance of getting sequential disk I/Os is much higher for a large application-level I/O request than it is for multiple small I/O requests at the same application level. And that is a good bet!
Personally, I pay more attention to the I/O size than I do whether disk I/Os are sequential or random.