Re: [alsa-devel] [PATCH] ASoC: omap-mcbsp: Add PM QoS support for McBSP to prevent glitches

5 Sep 2016

      On 09/02/16 23:40, Tony Lindgren wrote:
...

Peter Ujfalusi peter.ujfalusi@ti.com [160902 12:39]:

...
On 09/02/2016 05:56 PM, Tony Lindgren wrote:
...
That's because the hardware or a timer triggers the next dma automagically
so we don't need to do anything.
McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time
between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in
your case you also have threshold of 128, which means that we have DMA
requests coming in every 1.45ms.
In theory no audio should be working on OMAP3 if we can hit C7 runtime.
I agree C7 audio playback should only work if the external audio chip
is buffering. And there needs to be a wakeup event from the external
audio chip based on the external audio fifo threshold to wake up the
SoC and queue up more data. Or a hrtimer in the external audio chip
that wakes up the SoC and McBSP.
The only C state we can not take is where the McBSP would need context
save/restore... In that case the McBSP FIFO will be lost and there is no way
to recover that. If the external chip has FIFO we could try to somehow make
sure that at the end of a codec FIFO fill the McBSP FIFO is empty. I don't
think it is possible at all.
Another way is to check the amount of data in the FIFO before off and with CPU
try to write back the data from the ALSA buffer - from DMA point we step back
and read the data... But if it is in period boundary, the previous period
might be already updated -> corrupted audio. The sDMA also have internal FIFO
which holds unknown amount of data -> audio corruption, etc.
If the codec does not have FIFO, we can not hit a C state where McBSP would be
off at all.
...
...
...
...
Sure, we will not going to be able to hit C6 with this based on the numbers we
have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux
timers doing things for us :)
But the correct QoS latency for McBSP is the one which would ensure that the
FIFO will be not drained. This is the whole point of PM_QOS. And that is:
(1000/sampling_rate) * ((FIFOsize - threshold) / channels)
In case of playback
On OMAP3.McBSP2 for example (44.1KHz, stereo):
FIFO threshold 128

DMA request will be triggered when 128 slots are free in the FIFO
at that point we have still 1152 words in the FIFO.
if the C wakeup latency is longer then what it takes to play out the

samples from the FIFO (13.06ms), we will drain the FIFO and got underflow.

in this case the QOS should be set as 13.06ms

FIFO threshold 1024

DMA request will be triggered when 1024 slots are free in the FIFO
at that point we have still 256 words in the FIFO.
if the C wakeup latency is longer then what it takes to play out the

samples from the FIFO (2.9ms), we will drain the FIFO and got underflow.

in this case the QOS should be 2.9ms

On other McBSPs with 128 word FIFO the required latency is shorter to ensure
we don't drain the FIFO.
IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it
and not to work around system issues.
Hmm well actually with the fifo threshold your calculations above
start making some sense. The missing piece is that we will never hit
off mode until DMA is done. So the true worst case would be:

2.9 - 13.06 ms for the fifo threshold minus dma setup time
plus the time it takes to complete the fifo fill with dma
as dma will keep the SoC active
plus the time it takes to idle the dma after the last fifo fill
as the dma will keep SoC active

Yeah, I have done this calculations for the DADC33 mode7LP mode :D
While the calculation is correct for the wake latency requirements, there is a
flip side:
FIFO threshold 128:
- wake threshold is ~13.06ms to ensure that we don't drain the FIFO
- DMA requests are coming at 1.45ms rate.
While we could take a C state which would take ~13.06ms to leave, in reality
the system will be busy to respond to the DMA request coming in every 1.45ms.
FIFO threshold is 1024:
- wake threshold is ~2.9ms to ensure that we don't drain the FIFO
- DMA requests are coming at 11.6ms rate.
In this case we only going to allow C state from which we can leave under
2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but
we are going to be woken up by the DMA request and after the DMA request we
have ~2.9ms before the FIFO is empty...
So either we allow deeper C state (threshold 128) which we might not able to
take to the 'fast' DMA requests, or we only allow shallow C state (threshold
1024), and have more time between the DMA requests.
Is this makes sense?
...
The dma setup, complete, and idle latencies here can be probably
be measured with hrtimer :) That still does not explain we don't
miss anything in retention idle currently.
...
I don't know the PM code that well, but is there a way to set attribute to a
device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there
is no direct limiting of C states. Then I think the idea of
dev_pm_qos and set_latency_tolerance callback is that it allows the
SoC specific code to select the allowed C states. So if we
implemented set_latency_tolerance in the cpuidle driver, we could
tinker directly with the C states for latencies.
What we can agree on is that OFF need to be blocked when audio is in use. But
I'm not sure what is the correct way.
-- 
Péter