On Mon, May 02, 2011 at 04:47:41PM +0200, Takashi Iwai wrote:
At Mon, 2 May 2011 15:42:56 +0100, Dimitris Papastamos wrote:
On Mon, May 02, 2011 at 04:29:01PM +0200, Takashi Iwai wrote:
At Mon, 2 May 2011 13:28:28 +0100, Dimitris Papastamos wrote:
This patch prepares the ground for the actual rbtree optimization patch which will save a pointer to the last accessed register block that was used in either the read() or write() functions.
Each node manages a block of up to RBTREE_BLOCK_NUM registers. There can be no two nodes with overlapping blocks. Currently there is no check in the code to scream in case that ever happens. Each block has a base and top register, all others lie in between these registers. Note that variable length blocks aren't supported. So if you have an interval [8, 15] and only some of those registers actually exist on the device, the block will have the non-existent registers as zero. There is also no way of reporting that any of those non-existent registers were accessed/modified.
The larger the block size, the more probable it is that one of the managed registers is non-zero, and therefore the node will need to be allocated at initialization time and waste space.
If register N is accessed and it is not part of any of the current blocks in the rbtree, a new node is created with a base register which is floor(N / RBTREE_BLOCK_NUM) * RBTREE_BLOCK_NUM and a top register as base_register + RBTREE_BLOCK_NUM - 1. All other registers in the block are initialized as expected and the node is inserted into the tree.
Signed-off-by: Dimitris Papastamos dp@opensource.wolfsonmicro.com
Looking through the patch, I wonder whether it gives a performance gain enough for the additional complexity. Did you measure somehow?
This patch might still provide a performance benefit by itself since by increasing the block size we decrease the number of nodes in the rbtree that need to be looked up everytime we touch the cache.
Right. But does it buy so much, in comparison with the additional complexity of the code? That's my primary question.
For some codecs, the register maps group together registers of similar functionality. With an optimal block size or near optimal, there will be only one rbtree look up if we continually touch any of those registers. This helps primarily when we sync the register map as well as during initialization. I think the added code complexity is worth it. I could perhaps send patches incrementally to simplify this further.
Thanks, Dimitris