[Sound-open-firmware] cache invalidation evicting wrong lines

Fri Mar 15 16:59:41 CET 2019

On Fri, 2019-03-15 at 13:29 +0100, Guennadi Liakhovetski wrote:
> On Fri, Mar 15, 2019 at 12:04:20PM +0000, Liam Girdwood wrote:
> > On Fri, 2019-03-15 at 12:47 +0100, Guennadi Liakhovetski wrote:
> > > On Fri, Mar 15, 2019 at 10:30:30AM +0000, Liam Girdwood wrote:
> > > > On Fri, 2019-03-15 at 10:20 +0100, Guennadi Liakhovetski wrote:
> > > > > Hi,
> > > > > 
> > > > > Working on https://github.com/thesofproject/sof/pull/1053 I'm making
> > > > > sure
> > > > > all the DSP cores see the same data, which on Xtensa seems to require
> > > > > manual cache synchronisation. 
> > > > 
> > > > Fwiw, this depends on the ISA and xtensa IP configuration (and this can
> > > > vary
> > > > between xtensa devices since the ISA is extremely configurable). This
> > > > may
> > > > also
> > > > be true for other DSP ISAs too. 
> > > 
> > > So not all configurations require manual cache sync? Do we know which ones
> > > do
> > > and which ones don't?
> > 
> > Defined for each ISA, please see arch/cache.h. Client code does not need to
> > know
> > the cache config.
> > 
> > > > > This is a painful task, it's actually even
> > > > > only practically possible because there aren't many objects in SOF
> > > > > that
> > > > > can be accessed by different cores. However, with that PR interrupt
> > > > > descriptors will also become accessible by multiple cores.
> > > > > 
> > > > > On Sue Creek there are two kinds of cascading interrupt controllers:
> > > > > internal "level2" - "level5" and a DesignWare interrupt controller IP,
> > > > > used by several peripherals, uncliding the UART. With the addition of
> > > > > manual cache synchronisation UART interrupts stopped working.
> > > > > Debugging
> > > > > showed, that DW INTC interrupt descriptors lose their updates when
> > > > > other descriptors have their cache invalidated. And those descriptors
> > > > > aren't even immediately close to each other in RAM. Roughly the
> > > > > following is happening:
> > > > 
> > > > The DW- IRQ controller IP address space is non cacheable (since it's HW)
> > > > so
> > > > you
> > > > shouldn't need to invalidate any access to it. 
> > > 
> > > Sorry, I didn't mean physical DW INTC registers, I meant a descriptor in
> > > software, in RAM.
> > > 
> > > > > dw_intc->irq = new_irq;
> > > > > list_for_item(list, descriptors) {
> > > > > 	desc = container_of(list, struct irq_desc, list);
> > > > > 	dcache_invalidate_region(desc, sizeof(*desc));
> > > > > 
> > > > > at which point dw_intc->irq loses its new_irq value and restores the
> > > > > initial -EINVAL value. If I either remove the invalidation line above
> > > > > or add a writeback for dw_intc between the first two lines, the
> > > > > problem
> > > > > goes away.
> > > > 
> > > > So iiuc dw_intc->irq is being trashed when you invalidate desc ?
> > > 
> > > Yes,
> > 
> > Btw, have you tried both gcc and xt-xcc ? Also have you tried adding a
> > memory
> > barrier here (setting dw_intc volatile should do this) ?
> 
> No, only gcc. I thought about trying volatile, but I couldn't convince myself,
> that that should be the correct solution.

Can you try with xt-xcc and volatile. This will rule out any compiler
shenanigans.

> 
> > > > > Therefore questions:
> > > > > 
> > > > > 1. am I using cache invalidation wrongly?
> > > > > 2. if the invalidated memory region isn't cache-line aligned, what
> > > > > happens
> > > > >    then? Are all full lines invalidated, thus potentially erasing
> > > > > data?
> > > > >    But even if this were handled wrongly, it wouldn't cause this
> > > > > problem,
> > > > >    because those descriptors aren't adjacent in RAM.
> > > > > 3. is that a software or a hardware bug?
> > > > > 4. how to fix?
> > > > > 
> > > > 
> > > > More questions
> > > > 
> > > > 1) Are other dw_intc-> members trashed ? How is this structure/member
> > > > aligned in
> > > > memory ?
> > > 
> > > Not in my test, because other members were initialised statically and only
> > > .irq
> > > was assigned before scanning the descriptor list and invalidating caches.
> > > I
> > > don't
> > > impose any alignment so far. The DW INTC descriptor was at 0xbe046aa8 and
> > > the
> > > descriptor, whose invalidation caused the trouble was at 0xbe04640c. .irq
> > > is
> > > in
> > > fact the first element in the struct.
> > > 
> > 
> > ok, so both not aligned on 64 bytes and not on the same cache line
> > 
> > 
> > > > 2) Has descriptors been invalidated ? i.e. are you invalidating a valid
> > > > address
> > > > for desc ?
> > > 
> > > Yes, addresses are valid.
> > > 
> > > > 3) If desc and sizeof(*desc) are not cache line/size alligned, then try
> > > > and
> > > > make
> > > > sure that size is rounded up to nearest cache line and desc rounded down
> > > > to
> > > > nearest cache line for alignment.
> > > 
> > > If this is the problem, this would indicate a hardware bug? I thought
> > > about
> > > trying
> > > this but according to our understanding this shouuldn't be necessary?
> > 
> > No HW bug, but HW can only invalidate/writeback in blocks of cacheline size
> > with
> > cacheline alignment. Currently the code is not checking this and this may be
> > "undefined" at ISA level. Can you check xthal_dcache_region_writeback().
> 
> Right, what I suspected. So, invalidating non-cache-aligned data leads to
> memory
> around the desired area to also lose the cache contents, possibly leading to
> data loss.
> 
> > > > 4) Is this code running on core 0 (the same core that registered all the
> > > > IRQs) ?
> > > 
> > > So far Sue Creek doesn't enable SMP, so, yes.
> > 
> > Ok, so to confirm there is also no other IP (like DMA) that can write to
> > L2/SRAM
> > except core 0 ? and can you provide the whole function where this fails.
> 
> There is DMA, but later. This happens during initialisation, before DMA. And
> DMA
> doesn't touch those areas anyway. The beginning of the function, where this
> happens
> is below.
> 

Ok, so is this during IRQ core init or IRQ client code init ?

> Thanks
> Guennadi
> 
> int interrupt_cascade_register(struct irq_cascade_desc *cascade)
> {
> 	struct irq_desc *desc = &cascade->desc;
> 	struct irq_cascade_desc *p_cascade;
> 	struct list_item *list;
> 	unsigned long flags, i;
> 	int ret;
> 
> 	if (!cascade->name || !cascade->ops)
> 		return -EINVAL;
> 
> 	spinlock_init(&cascade->lock);
> 	for (i = 0; i < PLATFORM_IRQ_CHILDREN; i++)
> 		list_init(&cascade->child[i].list);
> 
> 	spin_lock_irq(&cascade_lock, flags);

Cascade lock doesn't need invalidating since it's an SMP atomic op.

> 
> 	list_for_item_smp (list, &cascade_list) {

Could the list be corrupt ? I cant see where cacscade_list is invalidated ?

> 		struct irq_cascade_desc *c = container_of(list,
> 						struct irq_cascade_desc, list);
> 
> // One of these kills "cascade"

Is cascade killed on the first list iteration or a subsequent iteration. Is this
iteration random or always consistent. Does this bug location change with debug
code added ?

Liam

> 		dcache_invalidate_region(c, sizeof(*c));
> 		if (!rstrcmp(c->name, cascade->name)) {
> 			ret = -EEXIST;
> 			goto unlock;
> 		}
> 	}
> 
> _______________________________________________
> Sound-open-firmware mailing list
> Sound-open-firmware at alsa-project.org
> https://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware