On 09/02/16 23:40, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160902 12:39]:
On 09/02/2016 05:56 PM, Tony Lindgren wrote:
That's because the hardware or a timer triggers the next dma automagically so we don't need to do anything.
McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in your case you also have threshold of 128, which means that we have DMA requests coming in every 1.45ms. In theory no audio should be working on OMAP3 if we can hit C7 runtime.
I agree C7 audio playback should only work if the external audio chip is buffering. And there needs to be a wakeup event from the external audio chip based on the external audio fifo threshold to wake up the SoC and queue up more data. Or a hrtimer in the external audio chip that wakes up the SoC and McBSP.
The only C state we can not take is where the McBSP would need context save/restore... In that case the McBSP FIFO will be lost and there is no way to recover that. If the external chip has FIFO we could try to somehow make sure that at the end of a codec FIFO fill the McBSP FIFO is empty. I don't think it is possible at all. Another way is to check the amount of data in the FIFO before off and with CPU try to write back the data from the ALSA buffer - from DMA point we step back and read the data... But if it is in period boundary, the previous period might be already updated -> corrupted audio. The sDMA also have internal FIFO which holds unknown amount of data -> audio corruption, etc.
If the codec does not have FIFO, we can not hit a C state where McBSP would be off at all.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux timers doing things for us :)
But the correct QoS latency for McBSP is the one which would ensure that the FIFO will be not drained. This is the whole point of PM_QOS. And that is: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) In case of playback
On OMAP3.McBSP2 for example (44.1KHz, stereo): FIFO threshold 128
- DMA request will be triggered when 128 slots are free in the FIFO
- at that point we have still 1152 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (13.06ms), we will drain the FIFO and got underflow.
- in this case the QOS should be set as 13.06ms
FIFO threshold 1024
- DMA request will be triggered when 1024 slots are free in the FIFO
- at that point we have still 256 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (2.9ms), we will drain the FIFO and got underflow.
- in this case the QOS should be 2.9ms
On other McBSPs with 128 word FIFO the required latency is shorter to ensure we don't drain the FIFO.
IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it and not to work around system issues.
Hmm well actually with the fifo threshold your calculations above start making some sense. The missing piece is that we will never hit off mode until DMA is done. So the true worst case would be:
- 2.9 - 13.06 ms for the fifo threshold minus dma setup time
- plus the time it takes to complete the fifo fill with dma as dma will keep the SoC active
- plus the time it takes to idle the dma after the last fifo fill as the dma will keep SoC active
Yeah, I have done this calculations for the DADC33 mode7LP mode :D
While the calculation is correct for the wake latency requirements, there is a flip side: FIFO threshold 128: - wake threshold is ~13.06ms to ensure that we don't drain the FIFO - DMA requests are coming at 1.45ms rate.
While we could take a C state which would take ~13.06ms to leave, in reality the system will be busy to respond to the DMA request coming in every 1.45ms.
FIFO threshold is 1024: - wake threshold is ~2.9ms to ensure that we don't drain the FIFO - DMA requests are coming at 11.6ms rate.
In this case we only going to allow C state from which we can leave under 2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but we are going to be woken up by the DMA request and after the DMA request we have ~2.9ms before the FIFO is empty...
So either we allow deeper C state (threshold 128) which we might not able to take to the 'fast' DMA requests, or we only allow shallow C state (threshold 1024), and have more time between the DMA requests.
Is this makes sense?
The dma setup, complete, and idle latencies here can be probably be measured with hrtimer :) That still does not explain we don't miss anything in retention idle currently.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there is no direct limiting of C states. Then I think the idea of dev_pm_qos and set_latency_tolerance callback is that it allows the SoC specific code to select the allowed C states. So if we implemented set_latency_tolerance in the cpuidle driver, we could tinker directly with the C states for latencies.
What we can agree on is that OFF need to be blocked when audio is in use. But I'm not sure what is the correct way.