Moving the discussion from https://github.com/thesofproject/sof/pull/1998 over to this list.
In summary, whenever I have the system agent enabled and at default values, after two or three continuous playthroughs (depending on which timer I use as the platform timer, for the most part) of the same song (with duration 17:06) I get a complete freeze (with verbose traces enabled I see no more activity in the trace mailbox, not even the WFX/WFE which are essentially flooding the trace mailbox during normal function) and no signs of panic (I'm going to test now with a trace added in platform_panic; currently this function is empty).
Disabling the system agent helps, changing CONFIG_SYSTICK_PERIOD to 2000 also helps. And I hear no glitch at all during the playthroughs with either of these changes applied (although a 1ms or smaller glitch once in 30 minutes is probably unnoticeable by any human ear is it?)
How should I continue with this?
I'll come back and mention if there is a panic or the freeze comes from something else.