[alsa-devel] [PATCH] ASoC: omap-mcbsp: Add PM QoS support for McBSP to prevent glitches

Mon Sep 5 09:53:28 CEST 2016

On 09/02/16 23:40, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi at ti.com> [160902 12:39]:
>> On 09/02/2016 05:56 PM, Tony Lindgren wrote:
>>> That's because the hardware or a timer triggers the next dma automagically
>>> so we don't need to do anything.
>>
>> McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time
>> between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in
>> your case you also have threshold of 128, which means that we have DMA
>> requests coming in every 1.45ms.
>> In theory no audio should be working on OMAP3 if we can hit C7 runtime.
> 
> I agree C7 audio playback should only work if the external audio chip
> is buffering. And there needs to be a wakeup event from the external
> audio chip based on the external audio fifo threshold to wake up the
> SoC and queue up more data. Or a hrtimer in the external audio chip
> that wakes up the SoC and McBSP.

The only C state we can not take is where the McBSP would need context
save/restore... In that case the McBSP FIFO will be lost and there is no way
to recover that. If the external chip has FIFO we could try to somehow make
sure that at the end of a codec FIFO fill the McBSP FIFO is empty. I don't
think it is possible at all.
Another way is to check the amount of data in the FIFO before off and with CPU
try to write back the data from the ALSA buffer - from DMA point we step back
and read the data... But if it is in period boundary, the previous period
might be already updated -> corrupted audio. The sDMA also have internal FIFO
which holds unknown amount of data -> audio corruption, etc.

If the codec does not have FIFO, we can not hit a C state where McBSP would be
off at all.

>>>> Sure, we will not going to be able to hit C6 with this based on the numbers we
>>>> have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
>>>
>>> Right, but we don't need to block C6 because of the hardware and/or Linux
>>> timers doing things for us :)
>>
>> But the correct QoS latency for McBSP is the one which would ensure that the
>> FIFO will be not drained. This is the whole point of PM_QOS. And that is:
>> (1000/sampling_rate) * ((FIFOsize - threshold) / channels)
>> In case of playback
>>
>> On OMAP3.McBSP2 for example (44.1KHz, stereo):
>> FIFO threshold 128
>>  - DMA request will be triggered when 128 slots are free in the FIFO
>>  - at that point we have still 1152 words in the FIFO.
>>  - if the C wakeup latency is longer then what it takes to play out the
>> samples from the FIFO (13.06ms), we will drain the FIFO and got underflow.
>>  - in this case the QOS should be set as 13.06ms
>>
>> FIFO threshold 1024
>>  - DMA request will be triggered when 1024 slots are free in the FIFO
>>  - at that point we have still 256 words in the FIFO.
>>  - if the C wakeup latency is longer then what it takes to play out the
>> samples from the FIFO (2.9ms), we will drain the FIFO and got underflow.
>>  - in this case the QOS should be 2.9ms
>>
>> On other McBSPs with 128 word FIFO the required latency is shorter to ensure
>> we don't drain the FIFO.
>>
>> IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it
>> and not to work around system issues.
> 
> Hmm well actually with the fifo threshold your calculations above
> start making some sense. The missing piece is that we will never hit
> off mode until DMA is done. So the true worst case would be:
> 
> - 2.9 - 13.06 ms for the fifo threshold minus dma setup time
> - plus the time it takes to complete the fifo fill with dma
>   as dma will keep the SoC active
> - plus the time it takes to idle the dma after the last fifo fill
>   as the dma will keep SoC active

Yeah, I have done this calculations for the DADC33 mode7LP mode :D

While the calculation is correct for the wake latency requirements, there is a
flip side:
FIFO threshold 128:
- wake threshold is ~13.06ms to ensure that we don't drain the FIFO
- DMA requests are coming at 1.45ms rate.

While we could take a C state which would take ~13.06ms to leave, in reality
the system will be busy to respond to the DMA request coming in every 1.45ms.

FIFO threshold is 1024:
- wake threshold is ~2.9ms to ensure that we don't drain the FIFO
- DMA requests are coming at 11.6ms rate.

In this case we only going to allow C state from which we can leave under
2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but
we are going to be woken up by the DMA request and after the DMA request we
have ~2.9ms before the FIFO is empty...

So either we allow deeper C state (threshold 128) which we might not able to
take to the 'fast' DMA requests, or we only allow shallow C state (threshold
1024), and have more time between the DMA requests.

Is this makes sense?

> The dma setup, complete, and idle latencies here can be probably
> be measured with hrtimer :) That still does not explain we don't
> miss anything in retention idle currently.
> 
>> I don't know the PM code that well, but is there a way to set attribute to a
>> device to tell how deep C state it can tolerate on the given board or SoC?
> 
> I believe we can only set the pm_qos latency requirements and there
> is no direct limiting of C states. Then I think the idea of
> dev_pm_qos and set_latency_tolerance callback is that it allows the
> SoC specific code to select the allowed C states. So if we
> implemented set_latency_tolerance in the cpuidle driver, we could
> tinker directly with the C states for latencies.

What we can agree on is that OFF need to be blocked when audio is in use. But
I'm not sure what is the correct way.

-- 
Pï¿½ter