[Sound-open-firmware] cache invalidation evicting wrong lines
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation. This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
Therefore questions:
1. am I using cache invalidation wrongly? 2. if the invalidated memory region isn't cache-line aligned, what happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM. 3. is that a software or a hardware bug? 4. how to fix?
Thanks Guennadi
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM.
- is that a software or a hardware bug?
- how to fix?
More questions
1) Are other dw_intc-> members trashed ? How is this structure/member aligned in memory ? 2) Has descriptors been invalidated ? i.e. are you invalidating a valid address for desc ? 3) If desc and sizeof(*desc) are not cache line/size alligned, then try and make sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment. 4) Is this code running on core 0 (the same core that registered all the IRQs) ?
Thanks
Liam
Thanks Guennadi _______________________________________________ Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org https://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware
On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
So not all configurations require manual cache sync? Do we know which ones do and which ones don't?
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in software, in RAM.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Yes,
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM.
- is that a software or a hardware bug?
- how to fix?
More questions
- Are other dw_intc-> members trashed ? How is this structure/member aligned in
memory ?
Not in my test, because other members were initialised statically and only .irq was assigned before scanning the descriptor list and invalidating caches. I don't impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and the descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq is in fact the first element in the struct.
- Has descriptors been invalidated ? i.e. are you invalidating a valid address
for desc ?
Yes, addresses are valid.
- If desc and sizeof(*desc) are not cache line/size alligned, then try and make
sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment.
If this is the problem, this would indicate a hardware bug? I thought about trying this but according to our understanding this shouuldn't be necessary?
- Is this code running on core 0 (the same core that registered all the IRQs) ?
So far Sue Creek doesn't enable SMP, so, yes.
Thanks Guennadi
On Fri, 2019-03-15 at 12:47 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
So not all configurations require manual cache sync? Do we know which ones do and which ones don't?
Defined for each ISA, please see arch/cache.h. Client code does not need to know the cache config.
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in software, in RAM.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Yes,
Btw, have you tried both gcc and xt-xcc ? Also have you tried adding a memory barrier here (setting dw_intc volatile should do this) ?
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM.
- is that a software or a hardware bug?
- how to fix?
More questions
- Are other dw_intc-> members trashed ? How is this structure/member
aligned in memory ?
Not in my test, because other members were initialised statically and only .irq was assigned before scanning the descriptor list and invalidating caches. I don't impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and the descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq is in fact the first element in the struct.
ok, so both not aligned on 64 bytes and not on the same cache line
- Has descriptors been invalidated ? i.e. are you invalidating a valid
address for desc ?
Yes, addresses are valid.
- If desc and sizeof(*desc) are not cache line/size alligned, then try and
make sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment.
If this is the problem, this would indicate a hardware bug? I thought about trying this but according to our understanding this shouuldn't be necessary?
No HW bug, but HW can only invalidate/writeback in blocks of cacheline size with cacheline alignment. Currently the code is not checking this and this may be "undefined" at ISA level. Can you check xthal_dcache_region_writeback().
- Is this code running on core 0 (the same core that registered all the
IRQs) ?
So far Sue Creek doesn't enable SMP, so, yes.
Ok, so to confirm there is also no other IP (like DMA) that can write to L2/SRAM except core 0 ? and can you provide the whole function where this fails.
Thanks
Liam
Thanks Guennadi
On Fri, Mar 15, 2019 at 12:04:20PM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 12:47 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
So not all configurations require manual cache sync? Do we know which ones do and which ones don't?
Defined for each ISA, please see arch/cache.h. Client code does not need to know the cache config.
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in software, in RAM.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Yes,
Btw, have you tried both gcc and xt-xcc ? Also have you tried adding a memory barrier here (setting dw_intc volatile should do this) ?
No, only gcc. I thought about trying volatile, but I couldn't convince myself, that that should be the correct solution.
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM.
- is that a software or a hardware bug?
- how to fix?
More questions
- Are other dw_intc-> members trashed ? How is this structure/member
aligned in memory ?
Not in my test, because other members were initialised statically and only .irq was assigned before scanning the descriptor list and invalidating caches. I don't impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and the descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq is in fact the first element in the struct.
ok, so both not aligned on 64 bytes and not on the same cache line
- Has descriptors been invalidated ? i.e. are you invalidating a valid
address for desc ?
Yes, addresses are valid.
- If desc and sizeof(*desc) are not cache line/size alligned, then try and
make sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment.
If this is the problem, this would indicate a hardware bug? I thought about trying this but according to our understanding this shouuldn't be necessary?
No HW bug, but HW can only invalidate/writeback in blocks of cacheline size with cacheline alignment. Currently the code is not checking this and this may be "undefined" at ISA level. Can you check xthal_dcache_region_writeback().
Right, what I suspected. So, invalidating non-cache-aligned data leads to memory around the desired area to also lose the cache contents, possibly leading to data loss.
- Is this code running on core 0 (the same core that registered all the
IRQs) ?
So far Sue Creek doesn't enable SMP, so, yes.
Ok, so to confirm there is also no other IP (like DMA) that can write to L2/SRAM except core 0 ? and can you provide the whole function where this fails.
There is DMA, but later. This happens during initialisation, before DMA. And DMA doesn't touch those areas anyway. The beginning of the function, where this happens is below.
Thanks Guennadi
int interrupt_cascade_register(struct irq_cascade_desc *cascade) { struct irq_desc *desc = &cascade->desc; struct irq_cascade_desc *p_cascade; struct list_item *list; unsigned long flags, i; int ret;
if (!cascade->name || !cascade->ops) return -EINVAL;
spinlock_init(&cascade->lock); for (i = 0; i < PLATFORM_IRQ_CHILDREN; i++) list_init(&cascade->child[i].list);
spin_lock_irq(&cascade_lock, flags);
list_for_item_smp (list, &cascade_list) { struct irq_cascade_desc *c = container_of(list, struct irq_cascade_desc, list);
// One of these kills "cascade" dcache_invalidate_region(c, sizeof(*c)); if (!rstrcmp(c->name, cascade->name)) { ret = -EEXIST; goto unlock; } }
On Fri, 2019-03-15 at 13:29 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 12:04:20PM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 12:47 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
So not all configurations require manual cache sync? Do we know which ones do and which ones don't?
Defined for each ISA, please see arch/cache.h. Client code does not need to know the cache config.
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in software, in RAM.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Yes,
Btw, have you tried both gcc and xt-xcc ? Also have you tried adding a memory barrier here (setting dw_intc volatile should do this) ?
No, only gcc. I thought about trying volatile, but I couldn't convince myself, that that should be the correct solution.
Can you try with xt-xcc and volatile. This will rule out any compiler shenanigans.
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what
happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM. 3. is that a software or a hardware bug? 4. how to fix?
More questions
- Are other dw_intc-> members trashed ? How is this structure/member
aligned in memory ?
Not in my test, because other members were initialised statically and only .irq was assigned before scanning the descriptor list and invalidating caches. I don't impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and the descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq is in fact the first element in the struct.
ok, so both not aligned on 64 bytes and not on the same cache line
- Has descriptors been invalidated ? i.e. are you invalidating a valid
address for desc ?
Yes, addresses are valid.
- If desc and sizeof(*desc) are not cache line/size alligned, then try
and make sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment.
If this is the problem, this would indicate a hardware bug? I thought about trying this but according to our understanding this shouuldn't be necessary?
No HW bug, but HW can only invalidate/writeback in blocks of cacheline size with cacheline alignment. Currently the code is not checking this and this may be "undefined" at ISA level. Can you check xthal_dcache_region_writeback().
Right, what I suspected. So, invalidating non-cache-aligned data leads to memory around the desired area to also lose the cache contents, possibly leading to data loss.
- Is this code running on core 0 (the same core that registered all the
IRQs) ?
So far Sue Creek doesn't enable SMP, so, yes.
Ok, so to confirm there is also no other IP (like DMA) that can write to L2/SRAM except core 0 ? and can you provide the whole function where this fails.
There is DMA, but later. This happens during initialisation, before DMA. And DMA doesn't touch those areas anyway. The beginning of the function, where this happens is below.
Ok, so is this during IRQ core init or IRQ client code init ?
Thanks Guennadi
int interrupt_cascade_register(struct irq_cascade_desc *cascade) { struct irq_desc *desc = &cascade->desc; struct irq_cascade_desc *p_cascade; struct list_item *list; unsigned long flags, i; int ret;
if (!cascade->name || !cascade->ops) return -EINVAL;
spinlock_init(&cascade->lock); for (i = 0; i < PLATFORM_IRQ_CHILDREN; i++) list_init(&cascade->child[i].list);
spin_lock_irq(&cascade_lock, flags);
Cascade lock doesn't need invalidating since it's an SMP atomic op.
list_for_item_smp (list, &cascade_list) {
Could the list be corrupt ? I cant see where cacscade_list is invalidated ?
struct irq_cascade_desc *c = container_of(list, struct irq_cascade_desc, list);
// One of these kills "cascade"
Is cascade killed on the first list iteration or a subsequent iteration. Is this iteration random or always consistent. Does this bug location change with debug code added ?
Liam
dcache_invalidate_region(c, sizeof(*c)); if (!rstrcmp(c->name, cascade->name)) { ret = -EEXIST; goto unlock; }
}
Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org https://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware
On Fri, Mar 15, 2019 at 03:59:41PM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 13:29 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 12:04:20PM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 12:47 +0100, Guennadi Liakhovetski wrote:
On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
Hi,
Working on https://github.com/thesofproject/sof/pull/1053 I'm making sure all the DSP cores see the same data, which on Xtensa seems to require manual cache synchronisation.
Fwiw, this depends on the ISA and xtensa IP configuration (and this can vary between xtensa devices since the ISA is extremely configurable). This may also be true for other DSP ISAs too.
So not all configurations require manual cache sync? Do we know which ones do and which ones don't?
Defined for each ISA, please see arch/cache.h. Client code does not need to know the cache config.
This is a painful task, it's actually even only practically possible because there aren't many objects in SOF that can be accessed by different cores. However, with that PR interrupt descriptors will also become accessible by multiple cores.
On Sue Creek there are two kinds of cascading interrupt controllers: internal "level2" - "level5" and a DesignWare interrupt controller IP, used by several peripherals, uncliding the UART. With the addition of manual cache synchronisation UART interrupts stopped working. Debugging showed, that DW INTC interrupt descriptors lose their updates when other descriptors have their cache invalidated. And those descriptors aren't even immediately close to each other in RAM. Roughly the following is happening:
The DW- IRQ controller IP address space is non cacheable (since it's HW) so you shouldn't need to invalidate any access to it.
Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in software, in RAM.
dw_intc->irq = new_irq; list_for_item(list, descriptors) { desc = container_of(list, struct irq_desc, list); dcache_invalidate_region(desc, sizeof(*desc));
at which point dw_intc->irq loses its new_irq value and restores the initial -EINVAL value. If I either remove the invalidation line above or add a writeback for dw_intc between the first two lines, the problem goes away.
So iiuc dw_intc->irq is being trashed when you invalidate desc ?
Yes,
Btw, have you tried both gcc and xt-xcc ? Also have you tried adding a memory barrier here (setting dw_intc volatile should do this) ?
No, only gcc. I thought about trying volatile, but I couldn't convince myself, that that should be the correct solution.
Can you try with xt-xcc and volatile. This will rule out any compiler shenanigans.
If I have a working st-scc install... I'll try next week.
Therefore questions:
- am I using cache invalidation wrongly?
- if the invalidated memory region isn't cache-line aligned, what
happens then? Are all full lines invalidated, thus potentially erasing data? But even if this were handled wrongly, it wouldn't cause this problem, because those descriptors aren't adjacent in RAM. 3. is that a software or a hardware bug? 4. how to fix?
More questions
- Are other dw_intc-> members trashed ? How is this structure/member
aligned in memory ?
Not in my test, because other members were initialised statically and only .irq was assigned before scanning the descriptor list and invalidating caches. I don't impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and the descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq is in fact the first element in the struct.
ok, so both not aligned on 64 bytes and not on the same cache line
- Has descriptors been invalidated ? i.e. are you invalidating a valid
address for desc ?
Yes, addresses are valid.
- If desc and sizeof(*desc) are not cache line/size alligned, then try
and make sure that size is rounded up to nearest cache line and desc rounded down to nearest cache line for alignment.
If this is the problem, this would indicate a hardware bug? I thought about trying this but according to our understanding this shouuldn't be necessary?
No HW bug, but HW can only invalidate/writeback in blocks of cacheline size with cacheline alignment. Currently the code is not checking this and this may be "undefined" at ISA level. Can you check xthal_dcache_region_writeback().
Right, what I suspected. So, invalidating non-cache-aligned data leads to memory around the desired area to also lose the cache contents, possibly leading to data loss.
- Is this code running on core 0 (the same core that registered all the
IRQs) ?
So far Sue Creek doesn't enable SMP, so, yes.
Ok, so to confirm there is also no other IP (like DMA) that can write to L2/SRAM except core 0 ? and can you provide the whole function where this fails.
There is DMA, but later. This happens during initialisation, before DMA. And DMA doesn't touch those areas anyway. The beginning of the function, where this happens is below.
Ok, so is this during IRQ core init or IRQ client code init ?
during IRQ core and platform init.
Thanks Guennadi
int interrupt_cascade_register(struct irq_cascade_desc *cascade) { struct irq_desc *desc = &cascade->desc; struct irq_cascade_desc *p_cascade; struct list_item *list; unsigned long flags, i; int ret;
if (!cascade->name || !cascade->ops) return -EINVAL;
spinlock_init(&cascade->lock); for (i = 0; i < PLATFORM_IRQ_CHILDREN; i++) list_init(&cascade->child[i].list);
spin_lock_irq(&cascade_lock, flags);
Cascade lock doesn't need invalidating since it's an SMP atomic op.
list_for_item_smp (list, &cascade_list) {
Could the list be corrupt ? I cant see where cacscade_list is invalidated ?
list_for_item_smp does that:
#define list_for_item_smp(item, list) \ dcache_invalidate_region(list, sizeof(*(list))); \ for (item = (list)->next, dcache_invalidate_region(item, \ sizeof(*(item))); \ item != list; \ item = (item)->next, dcache_invalidate_region(item, \ sizeof(*(item))))
struct irq_cascade_desc *c = container_of(list, struct irq_cascade_desc, list);
// One of these kills "cascade"
Is cascade killed on the first list iteration or a subsequent iteration. Is this iteration random or always consistent. Does this bug location change with debug code added ?
It seemed to be consistent. First 4 "level2" - "level5" cascading controllers are added, all is good then. Then 2 INTC controllers are added - for the low and the high 32 bits respectively. When the first one of those is added and when the list of already installed controllers is checked, on a specific one of them the .irq gets invalidated.
Thanks Guennadi
Liam
dcache_invalidate_region(c, sizeof(*c)); if (!rstrcmp(c->name, cascade->name)) { ret = -EEXIST; goto unlock; }
}
Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org https://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware
On Fri, 2019-03-15 at 17:56 +0100, Guennadi Liakhovetski wrote:
Could the list be corrupt ? I cant see where cacscade_list is invalidated ?
list_for_item_smp does that:
#define list_for_item_smp(item, list) \ dcache_invalidate_region(list, sizeof(*(list))); \ for (item = (list)->next, dcache_invalidate_region(item, \ sizeof(*(item))); \ item != list; \ item = (item)->next, dcache_invalidate_region(item, \ sizeof(*(item))))
struct irq_cascade_desc *c = container_of(list, struct irq_cascade_desc,
list);
// One of these kills "cascade"
Another issue could possible be the compiler re-ordering some of these statements above ? Try removing -O2.
Btw, are you writing back prior to releasing the lock on the data ?
Is cascade killed on the first list iteration or a subsequent iteration. Is this iteration random or always consistent. Does this bug location change with debug code added ?
It seemed to be consistent. First 4 "level2" - "level5" cascading controllers are added, all is good then. Then 2 INTC controllers are added - for the low and the high 32 bits respectively. When the first one of those is added and when the list of already installed controllers is checked, on a specific one of them the .irq gets invalidated.
Also see (cacheasm.h), called by dcache_invalidate(), this is taking care of cache line alignment for address and size.
/* * Macro to apply a 'hit' cache instruction to a memory region, * ie. to any cache entries that cache a specified portion (region) of memory. * Takes care of the unaligned cases, ie. may apply to one * more cache line than $asize / lineSize if $aaddr is not aligned. * * * Parameters are: * cainst instruction/macro that takes an address register parameter * and an offset parameter (currently always zero) * and generates a cache instruction (eg. "dhi", "dhwb", "ihi", etc.) * linesize_log2 log2(size of cache line in bytes) * addr register containing start address of region (clobbered) * asize register containing size of the region in bytes (clobbered) * askew unique register used as temporary * awb unique register used as temporary for erratum 497. * * Note: A possible optimization to this macro is to apply the operation * to the entire cache if the region exceeds the size of the cache * by some empirically determined amount or factor. Some experimentation * is required to determine the appropriate factors, which also need * to be tunable if required. */
.macro cache_hit_region cainst, linesize_log2, addr, asize, askew, awb=a0
// Make \asize the number of iterations: extui \askew, \addr, 0, \linesize_log2 // get unalignment amount of \addr add \asize, \asize, \askew // ... and add it to \asize addi \asize, \asize, (1 << \linesize_log2) - 1 // round up! srli \asize, \asize, \linesize_log2
// Iterate over region: .ifne ((\awb !=a0) & XCHAL_ERRATUM_497) // don't use awb if set to a0 movi \awb, 0 .endif floopnez \asize, cacheh@ \cainst \addr, 0 .ifne ((\awb !=a0) & XCHAL_ERRATUM_497) // do memw after 8 cainst wb instructions addi \awb, \awb, 1 blti \awb, 8, .Lstart_memw@ memw movi \awb, 0 .Lstart_memw@: .endif addi \addr, \addr, (1 << \linesize_log2) // move to next line floopend \asize, cacheh@ .endm
Liam
participants (2)
-
Guennadi Liakhovetski
-
Liam Girdwood