[alsa-devel] appl_ptr and DMA overrun at end of stream

Jon Smirl

7 May 2009 7 May '09

6:23 p.m.

I am having problem with DMA overrun at the end of the audio stream. Is there an official way to know the address of the last valid audio sample?

mpc5200 ac97 is keeping a bunch of descriptors queued in a loop to continuously play music. I believe this is the way ALSA wants it. Now say the last period is half full. ALSA fills the other half with silence. When that period finishes playing it will generate an interrupt. ALSA comes back from that interrupt with trigger(STOP).

But, our CPU is slow compared to a 3Ghz desktop, there is considerable latency from the period end interrupt to trigger(STOP) getting called. So the DMA hardware starts playing the next period before trigger(STOP) can get the DMA stopped. I turned off tried turning off BestComm, flushing the FIFO, and turning off the audio clocks. None can be done fast enough. That next period contains stale data from further back in the stream. When the front part of it plays it makes a burst of noise.

What I need is the address of the end of valid data in the buffer. I need that address so that I can program the DMA automatically stop at end of stream and not overrun. Search around in the guys of ALSA I found appl_ptr. I can use appl_ptr to determine the location of end of stream and prevent DMA overrun. When there is no valid data I don't enqueue the descriptor.

s->appl_ptr track the previous value of s->runtime->control->appl_ptr. The difference between these two is the amount of valid data in the buffer. When this difference goes to zero, I stop queue new buffers to ALSA. That fixes the DMA overrun.

static void psc_dma_bcom_enqueue_next_buffer(struct psc_dma_stream *s) { struct bcom_bd *bd;

while (s->appl_ptr < s->runtime->control->appl_ptr) {

if (bcom_queue_full(s->bcom_task)) return;

s->appl_ptr += s->period_size;

/* Prepare and enqueue the next buffer descriptor */ bd = bcom_prepare_next_buffer(s->bcom_task); bd->status = s->period_bytes; bd->data[0] = s->period_next_pt; bcom_submit_next_buffer(s->bcom_task, NULL);

/* Update for next period */ s->period_next_pt += s->period_bytes; if (s->period_next_pt >= s->period_end) s->period_next_pt = s->period_start; } }

-- Jon Smirl jonsmirl@gmail.com

Show replies by date

Jon Smirl

7 May 7 May

6:35 p.m.

Debug trace... at the bottom you can see where I needed to stop queuing new DMA data

If you dig around in the archives, other people have run into this same problem.

root@phyCORE:~ aplay phone.wav mpc5200-psc-ac97 f0002200.sound: psc_dma_open(substream=c7913700) Playing WAVE 'phone.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo mpc5200-psc-ac97 f0002200.sound: psc_ac97_hw_analog_params(substream=c7913700) p_size=5513 p_bytes=44104 periods=4 buffer_size=22052 buffer_bytes=176416 channels=2 rate=44100 format=11 AC97_EXTENDED_STATUS 411 mpc5200-psc-ac97 f0002200.sound: psc_dma_trigger(substream=c7913700, cmd=1) stream_id=0 runtime->dma_area c7a00000 Initial pointer 0 Playback

--- DMA address / s->appl_ptr / runtime->control->appl_ptr

enqueue 7a00000 0 22052 queued enqueue 7a0ac48 5513 22052 queued enqueue 7a15890 11026 22052 queued enqueue 7a204d8 16539 22052 queued enqueue 7a00000 22052 22052 enqueue 7a00000 22052 22052 enqueue 7a00000 22052 27565 queued enqueue 7a0ac48 27565 33078 queued enqueue 7a15890 33078 38591 queued enqueue 7a204d8 38591 44104 queued enqueue 7a00000 44104 49617 queued enqueue 7a0ac48 49617 55130 queued enqueue 7a15890 55130 60643 queued enqueue 7a204d8 60643 66156 queued enqueue 7a00000 66156 71669 queued enqueue 7a0ac48 71669 77182 queued enqueue 7a15890 77182 82695 queued enqueue 7a204d8 82695 88208 queued enqueue 7a00000 88208 93721 queued enqueue 7a0ac48 93721 99234 queued enqueue 7a15890 99234 104747 queued enqueue 7a204d8 104747 110260 queued enqueue 7a00000 110260 115773 queued enqueue 7a0ac48 115773 115773 - stopped queuing new buffers here enqueue 7a0ac48 115773 115773 enqueue 7a0ac48 115773 115773 mpc5200-psc-ac97 f0002200.sound: psc_dma_trigger(substream=c7913700, cmd=0) stream_id=0 -- this trigger is too late. Samples have already started going into the codec mpc5200-psc-ac97 f0002200.sound: psc_dma_close(substream=c7913700) root@phyCORE:

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

11 May 11 May

11:53 a.m.

At Thu, 7 May 2009 12:23:51 -0400, Jon Smirl wrote:

...

I am having problem with DMA overrun at the end of the audio stream. Is there an official way to know the address of the last valid audio sample?

mpc5200 ac97 is keeping a bunch of descriptors queued in a loop to continuously play music. I believe this is the way ALSA wants it. Now say the last period is half full. ALSA fills the other half with silence. When that period finishes playing it will generate an interrupt. ALSA comes back from that interrupt with trigger(STOP).

But, our CPU is slow compared to a 3Ghz desktop, there is considerable latency from the period end interrupt to trigger(STOP) getting called. So the DMA hardware starts playing the next period before trigger(STOP) can get the DMA stopped. I turned off tried turning off BestComm, flushing the FIFO, and turning off the audio clocks. None can be done fast enough. That next period contains stale data from further back in the stream. When the front part of it plays it makes a burst of noise.

What I need is the address of the end of valid data in the buffer. I need that address so that I can program the DMA automatically stop at end of stream and not overrun. Search around in the guys of ALSA I found appl_ptr. I can use appl_ptr to determine the location of end of stream and prevent DMA overrun. When there is no valid data I don't enqueue the descriptor.

Right. This is the value to check in your case.

...

s->appl_ptr track the previous value of s->runtime->control->appl_ptr. The difference between these two is the amount of valid data in the buffer. When this difference goes to zero, I stop queue new buffers to ALSA.

Yes, that'll work, I guess.

...

That fixes the DMA overrun.

static void psc_dma_bcom_enqueue_next_buffer(struct psc_dma_stream *s) { struct bcom_bd *bd;
   while (s->appl_ptr < s->runtime->control->appl_ptr) {

You'd need to think of boundary overlap, too. It's a bit nasty because we wrap the value at runtime->boundary...

thanks,

Takashi

Jon Smirl

3:03 p.m.

On Mon, May 11, 2009 at 5:53 AM, Takashi Iwai tiwai@suse.de wrote:

...

At Thu, 7 May 2009 12:23:51 -0400, Jon Smirl wrote:

...
I am having problem with DMA overrun at the end of the audio stream. Is there an official way to know the address of the last valid audio sample?

mpc5200 ac97 is keeping a bunch of descriptors queued in a loop to continuously play music. I believe this is the way ALSA wants it. Now say the last period is half full. ALSA fills the other half with silence. When that period finishes playing it will generate an interrupt. ALSA comes back from that interrupt with trigger(STOP).

But, our CPU is slow compared to a 3Ghz desktop, there is considerable latency from the period end interrupt to trigger(STOP) getting called. So the DMA hardware starts playing the next period before trigger(STOP) can get the DMA stopped. I turned off tried turning off BestComm, flushing the FIFO, and turning off the audio clocks. None can be done fast enough. That next period contains stale data from further back in the stream. When the front part of it plays it makes a burst of noise.

What I need is the address of the end of valid data in the buffer. I need that address so that I can program the DMA automatically stop at end of stream and not overrun. Search around in the guys of ALSA I found appl_ptr. I can use appl_ptr to determine the location of end of stream and prevent DMA overrun. When there is no valid data I don't enqueue the descriptor.

Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

If I know where the end of valid data is, I can program the hardware to play silence instead of stale data. Don't other embedded systems suffer from this same problem? Seems to me like every system has this problem, but desktop machines are fast enough that you can't hear it.

...

...
s->appl_ptr track the previous value of s->runtime->control->appl_ptr. The difference between these two is the amount of valid data in the buffer. When this difference goes to zero, I stop queueing new buffers to ALSA.

Yes, that'll work, I guess.

...
That fixes the DMA overrun.

static void psc_dma_bcom_enqueue_next_buffer(struct psc_dma_stream *s) { struct bcom_bd *bd;

while (s->appl_ptr < s->runtime->control->appl_ptr) {

You'd need to think of boundary overlap, too. It's a bit nasty because we wrap the value at runtime->boundary...

thanks,

Takashi

-- Jon Smirl jonsmirl@gmail.com

Jaroslav Kysela

3:45 p.m.

On Mon, 11 May 2009, Jon Smirl wrote:

...

...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

All other cases are bad (underruns / overruns) and I'm not sure if we can do much. There is silence mode in the midlevel of kernel API to fill some samples in the ring buffer ahead with silence so applications can choose if last samples will be played in the underrun case or if silence will be played.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

4:14 p.m.

At Mon, 11 May 2009 15:45:49 +0200 (CEST), Jaroslav Kysela wrote:

...

On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer.

Ah, yes, that's a more elegant solution.

...

We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

I don't think such an extension is needed so far.

thanks,

Takashi

Takashi Iwai

4:21 p.m.

At Mon, 11 May 2009 15:45:49 +0200 (CEST), Jaroslav Kysela wrote:

...

On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

BTW, regarding the PCM core implementation: one missing thing in the PCM core is a proper way of queuing before trigger(START) and a proper clean-up after trigger(STOP). Since the trigger callback is atomic (in the ALSA sense), it cannot take a long task.

For a long task like buffer-prefetching or DMA clean up, there might be a need for yet another callback, such as ops->pre_start() or ops->post_stop(), and call this before the spinlocked trigger calls. For example, pre_start should be used to send the available data (up to appl_ptr) to the hardware.

Just $0.02 as now, though.

Takashi

Jon Smirl

5:11 p.m.

On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...

On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer. It's the asynchronous nature of the interface that is the problem. Leaving DMA free running and counting on ALSA to be fast enough to keep data in front of the DMA engine is not a reliable mechanism.

Wouldn't it be better to get a synchronous notification and know for sure what data is valid? DRAINING might fix the end of stream, but what if the user space app gets swapped out and can't provide data fast enough?

Most reliable for me would be to add enough info so that in my IRQ handler and the pointer() routine, I can check and see whether ALSA has empty/filled the DMA buffers and then only add them to the queue when they are full of valid data.

...

All other cases are bad (underruns / overruns) and I'm not sure if we can do much. There is silence mode in the midlevel of kernel API to fill some samples in the ring buffer ahead with silence so applications can choose if last samples will be played in the underrun case or if silence will be played.

Jaroslav

Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

5:40 p.m.

At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...

On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

...

It's the asynchronous nature of the interface that is the problem. Leaving DMA free running and counting on ALSA to be fast enough to keep data in front of the DMA engine is not a reliable mechanism.

Wouldn't it be better to get a synchronous notification and know for sure what data is valid?

The question is whether you need mmap or not. If you don't need mmap, all the transfer and the buffer update could be much more simplified robust. But if you support mmap, appl_ptr can be updated at any time even without the driver interaction, so the async support is required like the current model.

...

DRAINING might fix the end of stream, but what if the user space app gets swapped out and can't provide data fast enough?

Most reliable for me would be to add enough info so that in my IRQ handler and the pointer() routine, I can check and see whether ALSA has empty/filled the DMA buffers and then only add them to the queue when they are full of valid data.

Actually, for a hardware like yours, the appl_ptr check could help indeed. Similar implementations are found in include/sound/pcm-direct.h. It's basically for a hardware with own buffer to transfer, but a similar idea can be used for the DMA queue type devices, and hopefully can be generalized in a better form...

Takashi

Jon Smirl

5:50 p.m.

On Mon, May 11, 2009 at 11:40 AM, Takashi Iwai tiwai@suse.de wrote:

...

At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

This seems to me to be a broken design. ALSA is being put into the position of guessing when the application has supplied new data. Shouldn't the app be required to make a commit() call after filling in the data? Without commit it is impossible to detect over/underrun.

...

...
It's the asynchronous nature of the interface that is the problem. Leaving DMA free running and counting on ALSA to be fast enough to keep data in front of the DMA engine is not a reliable mechanism.

Wouldn't it be better to get a synchronous notification and know for sure what data is valid?

The question is whether you need mmap or not. If you don't need mmap, all the transfer and the buffer update could be much more simplified robust. But if you support mmap, appl_ptr can be updated at any time even without the driver interaction, so the async support is required like the current model.

...
DRAINING might fix the end of stream, but what if the user space app gets swapped out and can't provide data fast enough?

Most reliable for me would be to add enough info so that in my IRQ handler and the pointer() routine, I can check and see whether ALSA has empty/filled the DMA buffers and then only add them to the queue when they are full of valid data.

Actually, for a hardware like yours, the appl_ptr check could help indeed. Similar implementations are found in include/sound/pcm-direct.h. It's basically for a hardware with own buffer to transfer, but a similar idea can be used for the DMA queue type devices, and hopefully can be generalized in a better form...

Takashi

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

5:58 p.m.

At Mon, 11 May 2009 11:50:22 -0400, Jon Smirl wrote:

...

On Mon, May 11, 2009 at 11:40 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
...
Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

This seems to me to be a broken design. ALSA is being put into the position of guessing when the application has supplied new data. Shouldn't the app be required to make a commit() call after filling in the data? Without commit it is impossible to detect over/underrun.

The commit updates the mmapped control data (so that it works even without the context switch) if the architecture supports. In other cases, a commit issues an explicit sync ioctl.

Actually it should be possible to disable the mmap-control mode explicitly, but right now it's not done from the driver side but only checks the architecture.

Takashi

Jon Smirl

6:28 p.m.

On Mon, May 11, 2009 at 11:58 AM, Takashi Iwai tiwai@suse.de wrote:

...

At Mon, 11 May 2009 11:50:22 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 11:40 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
> Right. This is the value to check in your case.

What do think about redesigning the ALSA DMA interface to support detection of over and under run? Leaving the DMA engine in a loop and not coordinating with ALSA as to where the valid data is does not seem to be a safe way of exchanging data. That interface may be a source of the problems pulseaudio is encountering.

A simple solution would be for snd_pcm_period_elapsed() to return physical address of the last valid sample. That would let me avoid playing with s->runtime->control->appl_ptr. You could provide the same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

This seems to me to be a broken design. ALSA is being put into the position of guessing when the application has supplied new data. Shouldn't the app be required to make a commit() call after filling in the data? Without commit it is impossible to detect over/underrun.

The commit updates the mmapped control data (so that it works even without the context switch) if the architecture supports. In other cases, a commit issues an explicit sync ioctl.

Actually it should be possible to disable the mmap-control mode explicitly, but right now it's not done from the driver side but only checks the architecture.

Shared memory is another solution that doesn't involve context switches.

The app can update it's valid pointer in shared memory. My IRQ will call snd_pcm_period_elapsed(). snd_pcm_period_elapsed() can find the updated valid pointer, convert it to a physical address and leave it in a shared structure. When snd_pcm_period_elapsed() returns, my IRQ can get to the pointer and submit the necessary buffers.

What's missing is an official way of accessing s->runtime->control->appl_ptr from the low level driver. We're implementing a ring buffer. In a ring buffer I have to know where both pointers are in order to detect over/under run. I also don't understand why this is specific to my hardware, every DMA implementation should need these two pointers.

...

Takashi

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

6:43 p.m.

At Mon, 11 May 2009 12:28:48 -0400, Jon Smirl wrote:

...

On Mon, May 11, 2009 at 11:58 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:50:22 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 11:40 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

>> Right. This is the value to check in your case. > > What do think about redesigning the ALSA DMA interface to support > detection of over and under run? Leaving the DMA engine in a loop and > not coordinating with ALSA as to where the valid data is does not seem > to be a safe way of exchanging data. That interface may be a source of > the problems pulseaudio is encountering. > > A simple solution would be for snd_pcm_period_elapsed() to return > physical address of the last valid sample. That would let me avoid > playing with s->runtime->control->appl_ptr. You could provide the > same data in the pointer() function.

More simpler solution is to check the stream state in the low level driver. If it's in DRAINING state, then end of stream is signaled from the application and driver might not queue next buffer. We may also add another callback (or use ioctl callback) to pass this stream state change to the lowlevel driver immediately, so the driver might react more quickly on this situation.

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

This seems to me to be a broken design. ALSA is being put into the position of guessing when the application has supplied new data. Shouldn't the app be required to make a commit() call after filling in the data? Without commit it is impossible to detect over/underrun.

The commit updates the mmapped control data (so that it works even without the context switch) if the architecture supports. In other cases, a commit issues an explicit sync ioctl.

Actually it should be possible to disable the mmap-control mode explicitly, but right now it's not done from the driver side but only checks the architecture.

Shared memory is another solution that doesn't involve context switches.

It's mmap when you do between kernel <-> user spaces :)

...

The app can update it's valid pointer in shared memory. My IRQ will call snd_pcm_period_elapsed(). snd_pcm_period_elapsed() can find the updated valid pointer, convert it to a physical address and leave it in a shared structure. When snd_pcm_period_elapsed() returns, my IRQ can get to the pointer and submit the necessary buffers.

What's missing is an official way of accessing s->runtime->control->appl_ptr from the low level driver.

The appl_ptr itself can be accessed at any time from the driver, so there is no need for an "official" accessor to that.

...

We're implementing a ring buffer. In a ring buffer I have to know where both pointers are in order to detect over/under run.

Well, when you call snd_pcm_period_elapsed(), the PCM core actually checks the buffer XRUN there.

...

I also don't understand why this is specific to my hardware, every DMA implementation should need these two pointers.

These two pointers *are* available. That's why your first suggestion, checking appl_ptr, did work. That was basically right.

Yet, there is another question whether we need a better way for the buffer transfer on a queue-style device. For such a device, the async transfer with mmap is somehow troublesome.

Takashi

Jon Smirl

7:09 p.m.

On Mon, May 11, 2009 at 12:43 PM, Takashi Iwai tiwai@suse.de wrote:

...

At Mon, 11 May 2009 12:28:48 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 11:58 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:50:22 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 11:40 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 11:11:55 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 9:45 AM, Jaroslav Kysela perex@perex.cz wrote: > On Mon, 11 May 2009, Jon Smirl wrote: > >>> Right. This is the value to check in your case. >> >> What do think about redesigning the ALSA DMA interface to support >> detection of over and under run? Leaving the DMA engine in a loop and >> not coordinating with ALSA as to where the valid data is does not seem >> to be a safe way of exchanging data. That interface may be a source of >> the problems pulseaudio is encountering. >> >> A simple solution would be for snd_pcm_period_elapsed() to return >> physical address of the last valid sample. That would let me avoid >> playing with s->runtime->control->appl_ptr. You could provide the >> same data in the pointer() function. > > More simpler solution is to check the stream state in the low level driver. > If it's in DRAINING state, then end of stream is signaled from the > application and driver might not queue next buffer. We may also add another > callback (or use ioctl callback) to pass this stream state change to the > lowlevel driver immediately, so the driver might react more quickly on this > situation. >

Quickness is the wrong way to think about this problem. ALSA knows exactly when it has placed valid data into the buffer.

Not really. When the mmap mode is used, the update isn't always notified to the driver and the transfer can be completely asynchronous.

This seems to me to be a broken design. ALSA is being put into the position of guessing when the application has supplied new data. Shouldn't the app be required to make a commit() call after filling in the data? Without commit it is impossible to detect over/underrun.

The commit updates the mmapped control data (so that it works even without the context switch) if the architecture supports. In other cases, a commit issues an explicit sync ioctl.

Actually it should be possible to disable the mmap-control mode explicitly, but right now it's not done from the driver side but only checks the architecture.

Shared memory is another solution that doesn't involve context switches.

It's mmap when you do between kernel <-> user spaces :)

...
The app can update it's valid pointer in shared memory. My IRQ will call snd_pcm_period_elapsed(). snd_pcm_period_elapsed() can find the updated valid pointer, convert it to a physical address and leave it in a shared structure. When snd_pcm_period_elapsed() returns, my IRQ can get to the pointer and submit the necessary buffers.

What's missing is an official way of accessing s->runtime->control->appl_ptr from the low level driver.

The appl_ptr itself can be accessed at any time from the driver, so there is no need for an "official" accessor to that.

...
We're implementing a ring buffer. In a ring buffer I have to know where both pointers are in order to detect over/under run.

Well, when you call snd_pcm_period_elapsed(), the PCM core actually checks the buffer XRUN there.

Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

Also, if I had more detailed buffer information, I could set the hardware to only play a partial last period and not require that the period be padded with silence.

mpc5200 has full scatter/gather capability. You could even send me random length buffers scattered anywhere in memory. But not all DMA hardware can deal with non-contiguous buffers.

...

...
I also don't understand why this is specific to my hardware, every DMA implementation should need these two pointers.

These two pointers *are* available. That's why your first suggestion, checking appl_ptr, did work. That was basically right.

Yet, there is another question whether we need a better way for the buffer transfer on a queue-style device. For such a device, the async transfer with mmap is somehow troublesome.

Takashi

-- Jon Smirl jonsmirl@gmail.com

Jaroslav Kysela

7:37 p.m.

On Mon, 11 May 2009, Jon Smirl wrote:

...

Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Jon Smirl

8 p.m.

On Mon, May 11, 2009 at 1:37 PM, Jaroslav Kysela perex@perex.cz wrote:

...

On Mon, 11 May 2009, Jon Smirl wrote:

...
Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large

I didn't know about this function. I can use it to replace the direct manipulation of appl_ptr.

...

FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

How does this mode work? Could it be hidden inside snd_pcm_playback_hw_avail() such that low level drivers don't have to worry about it?

...

Jaroslav

Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

14 May 14 May

12:39 p.m.

At Mon, 11 May 2009 14:00:59 -0400, Jon Smirl wrote:

...

On Mon, May 11, 2009 at 1:37 PM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large

I didn't know about this function. I can use it to replace the direct manipulation of appl_ptr.

Well, this computes a value between appl_ptr and hw_ptr. That is, it relies on the current hw_ptr value. Just to make sure...

...

...
FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

How does this mode work? Could it be hidden inside snd_pcm_playback_hw_avail() such that low level drivers don't have to worry about it?

Hm, no, in this case, it won't work properly. *_avail() simply tells you a bogus value, even a negative one.

Basically this is a free-wheeling mode, e.g. used by dmix. In this mode, you simply ignore appl_ptr but assume that period_size samples are available at each period update. So, this would be a different implementation.

I don't know whether this mode has to be implemented for the hardware like yours. I feel it's better to avoid the free-wheel mode by some condition (a new pcm_info flag?)

thanks,

Takashi

Jon Smirl

4:25 p.m.

On Thu, May 14, 2009 at 6:39 AM, Takashi Iwai tiwai@suse.de wrote:

...

At Mon, 11 May 2009 14:00:59 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 1:37 PM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large

I didn't know about this function. I can use it to replace the direct manipulation of appl_ptr.

Well, this computes a value between appl_ptr and hw_ptr. That is, it relies on the current hw_ptr value. Just to make sure...

...
...
FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

How does this mode work? Could it be hidden inside snd_pcm_playback_hw_avail() such that low level drivers don't have to worry about it?

Hm, no, in this case, it won't work properly. *_avail() simply tells you a bogus value, even a negative one.

Basically this is a free-wheeling mode, e.g. used by dmix. In this mode, you simply ignore appl_ptr but assume that period_size samples are available at each period update. So, this would be a different implementation.

Free-wheeling mode is basically unreliable. Wouldn't it be better to start phasing dmix out in favor of pulseaudio?

Why is OSS still in the kernel? Things never seem to die in the audio world. Other areas add a deprecation warning and then delete the old systems a year latter.

...

I don't know whether this mode has to be implemented for the hardware like yours. I feel it's better to avoid the free-wheel mode by some condition (a new pcm_info flag?)

thanks,

Takashi

-- Jon Smirl jonsmirl@gmail.com

Takashi Iwai

5:05 p.m.

At Thu, 14 May 2009 10:25:22 -0400, Jon Smirl wrote:

...

On Thu, May 14, 2009 at 6:39 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 14:00:59 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 1:37 PM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large

I didn't know about this function. I can use it to replace the direct manipulation of appl_ptr.

Well, this computes a value between appl_ptr and hw_ptr. That is, it relies on the current hw_ptr value. Just to make sure...

...
...
FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

How does this mode work? Could it be hidden inside snd_pcm_playback_hw_avail() such that low level drivers don't have to worry about it?

Hm, no, in this case, it won't work properly. *_avail() simply tells you a bogus value, even a negative one.

Basically this is a free-wheeling mode, e.g. used by dmix. In this mode, you simply ignore appl_ptr but assume that period_size samples are available at each period update. So, this would be a different implementation.

Free-wheeling mode is basically unreliable. Wouldn't it be better to start phasing dmix out in favor of pulseaudio?

That's not what developers can decide.

...

Why is OSS still in the kernel?

Why? It is designed so :)

Takashi

Jon Smirl

5:29 p.m.

On Thu, May 14, 2009 at 11:05 AM, Takashi Iwai tiwai@suse.de wrote:

...

At Thu, 14 May 2009 10:25:22 -0400, Jon Smirl wrote:

...
On Thu, May 14, 2009 at 6:39 AM, Takashi Iwai tiwai@suse.de wrote:

...
At Mon, 11 May 2009 14:00:59 -0400, Jon Smirl wrote:

...
On Mon, May 11, 2009 at 1:37 PM, Jaroslav Kysela perex@perex.cz wrote:

...
On Mon, 11 May 2009, Jon Smirl wrote:

...
Checking for over/under run in software is not reliable since the DMA hardware runs asynchronously with the CPU. There will always be variable latencies between when the CPU detects the condition and when it can control the DMA hardware. The only reliable way to do this is to program the DMA hardware to do itself. AFAIK all DMA modern hardware can be programed to do this if the right information is made available. Programming the DMA hardware to do this is a 100% reliable solution and not subject to random latency problems.

I comment playback for simplicity.

The key point is when you fill DMA engine. Nothing forces you to queue whole ring buffer in the scatter-gather list. You may queue just one period and then next one (including partial). The available samples can be determined using snd_pcm_playback_hw_avail() function at any time. If you have large

I didn't know about this function. I can use it to replace the direct manipulation of appl_ptr.

Well, this computes a value between appl_ptr and hw_ptr. That is, it relies on the current hw_ptr value. Just to make sure...

...
...
FIFO and you receive interrupt before FIFO goes empty, you can fill partial period. The elapsed notifier does not require any change.

The underruns should be avoided as first step, because it's really unwanted system behaviour. All methods to fix underruns fails in some respects.

Also note that we have also mode when appl_ptr is not updated at all (when stop_threshold == boundary).

How does this mode work? Could it be hidden inside snd_pcm_playback_hw_avail() such that low level drivers don't have to worry about it?

Hm, no, in this case, it won't work properly. *_avail() simply tells you a bogus value, even a negative one.

Basically this is a free-wheeling mode, e.g. used by dmix. In this mode, you simply ignore appl_ptr but assume that period_size samples are available at each period update. So, this would be a different implementation.

Free-wheeling mode is basically unreliable. Wouldn't it be better to start phasing dmix out in favor of pulseaudio?

That's not what developers can decide.

Sure you can. Add a deprecation notification to dmix right now that says dmix is going to be phased out in the future. Print it in the log once at dmix startup time. Write a blurb for LKML/LWN.net that says dmix is being phased out in favor of pulse. You want to do this to stop new people from starting to use dmix. Work with distributions to start removing it when they feel pulse is ready. After the major distributions convert remove dmix from the standard alsa build packaging. Anything can be removed by giving sufficient notice and providing a clear migration path.

Until you make the announcement that dmix is not the way of the future, there is no incentive for anyone to migrate.

...

...
Why is OSS still in the kernel?

Why? It is designed so :)

Leaving old code in the kernel with no one maintaining it is worse than removing it. Old code will slowly suffer from bit rot as the kernel evolves. Unsuspecting people will try and use it and it will fail in random ways. It is better to make a one year notification and then remove it.

Don't forget that removing OSS from the kernel only effects people going forward. People using OSS on an ancient 2.4 kernels will still have it. You want to stop creating new OSS users.

...

Takashi

-- Jon Smirl jonsmirl@gmail.com

5900

Age (days ago)

5907

Last active (days ago)

List overview

Download

19 comments

3 participants

tags (0)

participants (3)

Jaroslav Kysela
Jon Smirl
Takashi Iwai