
At Tue, 3 May 2011 15:27:29 +0100, Mark Brown wrote:
On Tue, May 03, 2011 at 04:07:42PM +0200, Takashi Iwai wrote:
Mark Brown wrote:
If we can't get the data laid out in a contiguous array in memory then we have to gather the data for transmit in the I/O code which is painful and wasteful.
But it's just a matter of CPU usage to look for the caches. Well, what I don't understand is the text you wrote:
No, as I said in my initial reply the big win is I/O bandwidth from block writes.
This is a mysterious part. See below.
There will also be a memory and consequent dcache win from reducing the size of the data structure but that is likely to be dwarfed by any write coalescing.
Yes, but this can also bloat depending on the register map, too :)
| This isn't about CPU usage, it's about I/O bandwidth which is a big | concern in situations like resume where you can be bringing the device | back up from cold.
How does the contiguous memory layout inside the cache management reduce the I/O bandwidth...?
If the data is laid out contiguously then we can easily send it to the chip in one block.
This is what I don't understand. The sync loop is anyway done in the sorted order in rb-tree. How can the contiguous array helps to change the I/O pattern...?
Takashi