I have mixed feelings about this.
One one hand, this looks simple enough.
But on the other hand we have other users of memcpy_fromio(), including SOF drivers, so what are the odds we have the same problems in other places? Wouldn't it be safer to either change this function so that it's behavior is not ambiguous or compiler-dependent, or fix the compiler?
Hi Pierre and Amadeusz,
I have to admit that I didn't dig into clang's __builtin_memcpy to see what's happening inside so I don't have direct evidence to say it's clang's problem. What I know is kernel built by clang10 works fine but have this issue once changed to clang11. At first I also suspect that it's a timing issue so I checked the command transaction. The transaction is simple, host writes command in SST_IPCX register, the DSP then writes reply in SST_IPCD register and trigger an interrupt. Finally the irq thread sst_byt_irq_thread() reads the SST_IPCD register to complete the transaction. I added some debug messages to see if there is something wrong in the transaction but it all looks good.
I am also confused that why this only happens to BYT but not BDW since they share the same register accessing code in sst-dsp.c. I checked the code and realized that in BDW, the irq thread (hsw_irq_thread) performs 32-bit register read instead of 64-bit in BYT platform. Therefore I change the code in BYT to use two readl() calls and found the problem is gone. My best guess is it's related to the implementation of __builtin_memcpy() but not sure it's the timing or implementing cause this problem.
Regards, Brent