[alsa-devel] problems writing pcm driver
I'm trying to write an ALSA driver, but there are some thing I'm not clear on and it's not working as well as I'd like.
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
The pointer callback: What exactly is the current hardware position? My hardware has a counter that tells me how many periods have been DMAed into the buffer.
Suppose my period size is 256 frames and I'm doing audio capture. If pointer is called before any DMA transfers have completed, do I return 0? Now suppose it's called after one period of audio has been DMAed into the buffer. Do I return 255 or 256? After two periods have been received, should I return 511 or 0?
Why is pointer called so many times? Here's a log of callbacks and irqs my driver is getting. I'm using the cycle counter to give them microsecond timestamps. My period is 256 frames at 48kHz, so each period is 5333 us. There are only two periods in the buffer.
Time (us) | Callback or IRQ ---------------------------------------------- 0.000 trigger - start DMA 134.513 pointer - count 1, returned 0 160.254 IRQ - not calling snd_pcm_period_elapsed 5313.030 IRQ - called snd_pcm_period_elapsed 5318.808 pointer - count 2, returned 256 IRQ hander finished 5348.309 pointer - count 2, returned 256 10627.526 IRQ - called snd_pcm_period_elapsed 10633.075 pointer - count 3, returned 0 IRQ hander finished 10659.755 pointer - count 3, returned 0 15941.173 IRQ - called snd_pcm_period_elapsed 15946.601 pointer - count 4, returned 256 IRQ hander finished 15969.743 pointer - count 4, returned 256 [and so on]
The pointer callback is called only 134 us after the trigger, before the first IRQ or a period of data could possibly be received. Why is this done? To get the position in the buffer to start at? I assumed the data should start at the beginning of the buffer, but is this not the case?
You'll notice the first IRQ is received at only 160 us. The hardware generates an IRQ when each period _starts_, not when one ends. So the first IRQ is ignored, the second IRQ tells me the first period has finished, and so on. The hardware counter is the number of periods started; I have to subtract one to get the number of periods finished.
When the IRQ handler calls snd_pcm_period_elapsed(), that calls the pointer callback. That's what I expected from what the docs said. But about 30 us after the pointer callback is called the first time, after the irq handler has exited, it's called again. What is the purpose of this? Is it ok that I return the same value as the previous time it was called?
At Mon, 20 Aug 2007 13:47:48 -0700 (PDT), Trent Piepho wrote:
I'm trying to write an ALSA driver, but there are some thing I'm not clear on and it's not working as well as I'd like.
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
No, these callbacks are exclusive. In principle, they are called with substream->lock spinlock already held by the PCM core, so they cannot be called at the same time as long as belonging to the same PCM substream instance.
The pointer callback: What exactly is the current hardware position? My hardware has a counter that tells me how many periods have been DMAed into the buffer.
Suppose my period size is 256 frames and I'm doing audio capture. If pointer is called before any DMA transfers have completed, do I return 0? Now suppose it's called after one period of audio has been DMAed into the buffer. Do I return 255 or 256? After two periods have been received, should I return 511 or 0?
The pointer callback returns the position offset of the "ring buffer". It's not the period size. Usually, a ring buffer consists of multiple periods, and buffer_size = num_periods * period_size. In such a case, the pointer callback returns the range from 0 to buffer_size - 1. If you have period_size=256 and periods=16, the range would be between 0 and 4095. Overlaps go to zero again. Note that this unit is in "frames", not bytes. 1 frame = bytes_per_sample * nchannels.
Why is pointer called so many times?
This callback is called not only at the IRQ handler (via snd_pcm_period_elapsed()) but also when the PCM status query is called, e.g. snd_pcm_status() invokation, and to check the status before the read/write operations.
(snip)
exited, it's called again. What is the purpose of this? Is it ok that I return the same value as the previous time it was called?
The pointer callback isn't necessarily to be so accurate, but it's supposed to be fast. Hence, yes, it's fine that you return the same value as before unless updated via IRQ if the operation takes long time.
Takashi
On Tue, 21 Aug 2007, Takashi Iwai wrote:
At Mon, 20 Aug 2007 13:47:48 -0700 (PDT), Trent Piepho wrote:
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
No, these callbacks are exclusive. In principle, they are called with substream->lock spinlock already held by the PCM core, so they cannot be called at the same time as long as belonging to the same PCM substream instance.
The driver writing howto says for hw_params, "is that this callback is non-atomic (schedulable)."
If it's non-atomic, that would mean to me that it can be called multiple times at once, and at the same time as other callbacks. But, you're saying that's not the case?
The pointer callback: What exactly is the current hardware position? My hardware has a counter that tells me how many periods have been DMAed into the buffer.
Suppose my period size is 256 frames and I'm doing audio capture. If pointer is called before any DMA transfers have completed, do I return 0? Now suppose it's called after one period of audio has been DMAed into the buffer. Do I return 255 or 256? After two periods have been received, should I return 511 or 0?
The pointer callback returns the position offset of the "ring buffer". It's not the period size. Usually, a ring buffer consists
Suppose the ring buffer has 256 frames of valid data in it, from position 0 to position 255. Do I return 255 or 256? It could be either one, depending on how your ring buffer operates.
(snip)
exited, it's called again. What is the purpose of this? Is it ok that I return the same value as the previous time it was called?
The pointer callback isn't necessarily to be so accurate, but it's supposed to be fast. Hence, yes, it's fine that you return the same value as before unless updated via IRQ if the operation takes long time.
I have the pointer callback read a hardware register to get the number of frames transferred. Maybe I should instead have the irq handler save this register in the chip struct, and just read the value from there in the pointer callback? This means I have to use locking between the irq handler and the pointer callback, which I have been able to avoid so far.
Trent Piepho wrote:
The driver writing howto says for hw_params, "is that this callback is non-atomic (schedulable)."
If it's non-atomic, that would mean to me that it can be called multiple times at once, and at the same time as other callbacks. But, you're saying that's not the case?
Sounds like the documentation using a wrong definition of "atomic". Perhaps it's confused with GFP_ATOMIC, which is a parameter used for kmalloc() when you don't want the kernel to schedule another task.
Suppose the ring buffer has 256 frames of valid data in it, from position 0 to position 255. Do I return 255 or 256? It could be either one, depending on how your ring buffer operates.
If you have 256 frames of valid data in a 256-frame buffer, then you are about to have an underrun condition. I think it would be better if your driver reported an underrun before it got the 256th frame. Just don't allow the buffer to be completely full.
Most circular buffers don't really support that concept anyway - they generally require at least one empty slot.
I have the pointer callback read a hardware register to get the number of frames transferred. Maybe I should instead have the irq handler save this register in the chip struct, and just read the value from there in the pointer callback? This means I have to use locking between the irq handler and the pointer callback, which I have been able to avoid so far.
Is reading a hardware register really that slow?
Trent Piepho wrote:
Suppose my period size is 256 frames and I'm doing audio capture. If pointer is called before any DMA transfers have completed, do I return 0?
yes
Now suppose it's called after one period of audio has been DMAed into the buffer. Do I return 255 or 256? After two periods have been received, should I return 511 or 0?
0 (assuming buffer size is 2 periods)
The pointer callback returns the position offset of the "ring buffer". It's not the period size. Usually, a ring buffer consists
Suppose the ring buffer has 256 frames of valid data in it, from position 0 to position 255. Do I return 255 or 256? It could be either one, depending on how your ring buffer operates.
The difference between the current and previous pointer values needs to be the amount of new data (capture) in the buffer. Given that the pointer starts at zero, you should return 256 in the above scenario.
(snip)
exited, it's called again. What is the purpose of this? Is it ok that I return the same value as the previous time it was called?
The pointer callback isn't necessarily to be so accurate, but it's supposed to be fast. Hence, yes, it's fine that you return the same value as before unless updated via IRQ if the operation takes long time.
I have the pointer callback read a hardware register to get the number of frames transferred.
Does this report partial periods (ie count 0,1,2,3,4 rather than 0,256, etc). If so, it is better to read the register in the pointer callback. (unless as Takashi mentions, it is an expensive operation)
One scenario in general is that by the time the alsa core calls the pointer callback, there is more than one period available. (with short periods, scheduling delays)
Maybe I should instead have the irq handler save this register in the chip struct, and just read the value from there in the pointer callback?
It is OK to do it this way tpo.
This means I have to use locking between the irq handler and the pointer callback, which I have been able to avoid so far.
?? probably not as long as you only read the cached value from the chip struct once in the pointer callback. However you might need locking to provide exclusive access to the hardware if there are accesses that have to happen in sequence.
On Wed, 22 Aug 2007, Eliot Blennerhassett wrote:
(snip)
exited, it's called again. What is the purpose of this? Is it ok that I return the same value as the previous time it was called?
The pointer callback isn't necessarily to be so accurate, but it's supposed to be fast. Hence, yes, it's fine that you return the same value as before unless updated via IRQ if the operation takes long time.
I have the pointer callback read a hardware register to get the number of frames transferred.
Does this report partial periods (ie count 0,1,2,3,4 rather than 0,256, > etc). If so, it is better to read the register in the pointer callback. (unless as Takashi mentions, it is an expensive operation)
It returns the number of periods transferred. e.g. 1 = 256 frames of data, 2 = 512 frames of data, etc. Not high resolution. Right now the driver only supports periods of 1k bytes. I could extend it to support periods that are multiples of 1k and program the counter to keep counting 1k blocks or to count whole periods. I'm trying to get the simplest stuff working correctly first.
This means I have to use locking between the irq handler and the pointer callback, which I have been able to avoid so far.
?? probably not as long as you only read the cached value from the chip struct once in the pointer callback. However you might need locking to provide exclusive access to the hardware if there are accesses that have to happen in sequence.
I was under the impression that one should not count on writing to an integer on one while cpu reading from the same integer on another cpu at the same time working correctly. If you want to do this, you have to use atomic_t and related functions.
I think on X86, atomic_read and atomic_set are nothing more than simple moves with the lock prefix set. The important part is compiler optimization barriers. Just because you only only read from the variable once in the C code, does not mean the compiler will not turn it into multiple reads. Something like:
count = chip->count; if(count == 1) {...} else if(count == 2) {...}
Could be turned by the compiler into:
register = &(chip->count); if(*register == 1) { ... } else if(*register == 2) { ... }
Just because you pulled a value out of the chip struct and put it in a local, doesn't mean the compiler will actually emit code to do that. If the compiler thinks the value can't change, and it doesn't know about other threads or interrupt handlers, it's free to remove reads or insert extra reads.
At Tue, 21 Aug 2007 22:13:45 -0700 (PDT), Trent Piepho wrote:
This means I have to use locking between the irq handler and the pointer callback, which I have been able to avoid so far.
?? probably not as long as you only read the cached value from the chip struct once in the pointer callback. However you might need locking to provide exclusive access to the hardware if there are accesses that have to happen in sequence.
I was under the impression that one should not count on writing to an integer on one while cpu reading from the same integer on another cpu at the same time working correctly. If you want to do this, you have to use atomic_t and related functions.
See $LINUXKERNEL/Documentation/memory-barriers.txt. Worth to read to understand what you need for such a case.
Takashi
At Tue, 21 Aug 2007 14:22:54 -0700 (PDT), Trent Piepho wrote:
On Tue, 21 Aug 2007, Takashi Iwai wrote:
At Mon, 20 Aug 2007 13:47:48 -0700 (PDT), Trent Piepho wrote:
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
No, these callbacks are exclusive. In principle, they are called with substream->lock spinlock already held by the PCM core, so they cannot be called at the same time as long as belonging to the same PCM substream instance.
The driver writing howto says for hw_params, "is that this callback is non-atomic (schedulable)."
If it's non-atomic, that would mean to me that it can be called multiple times at once, and at the same time as other callbacks. But, you're saying that's not the case?
No, in that context, non-atomic means schedulable. That is, the function can get scheduled safely. It's outside the spinlock.
OTOH, the atomic callback cannot be scheduled. You cannot call functions that might result in sleep, e.g. kmalloc with GFP_KERNEL. Also, it implicitly means that you shouldn't take too long time in such callbacks.
About the concurrent access: since the PCM substream instance is exclusive, all PCM callbacks are also basically exclusive. Even open and close are already protected via mutex in PCM core. Thus they cannot be called simultaneously for the same substream instance. (For different substreams, they can be called.)
Takashi
On Wed, 22 Aug 2007, Takashi Iwai wrote:
On Tue, 21 Aug 2007, Takashi Iwai wrote:
At Mon, 20 Aug 2007 13:47:48 -0700 (PDT), Trent Piepho wrote:
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
No, these callbacks are exclusive. In principle, they are called with substream->lock spinlock already held by the PCM core, so they cannot be called at the same time as long as belonging to the same PCM substream instance.
The driver writing howto says for hw_params, "is that this callback is non-atomic (schedulable)."
If it's non-atomic, that would mean to me that it can be called multiple times at once, and at the same time as other callbacks. But, you're saying that's not the case?
No, in that context, non-atomic means schedulable. That is, the function can get scheduled safely. It's outside the spinlock.
I see, it's not the normal definition of atomic: http://en.wikipedia.org/wiki/Atomic_operations
I suggest something like "schedulable" or "process context". When I see "X is not atomic" I think that means I need to code the function to be re-entrant.
About the concurrent access: since the PCM substream instance is exclusive, all PCM callbacks are also basically exclusive. Even open and close are already protected via mutex in PCM core. Thus they cannot be called simultaneously for the same substream instance. (For different substreams, they can be called.)
Speaking of open, does one need to worry about the same pcm device being opened multiple times? I see code in the open callback of the bt87x driver and the cx88-alsa driver that checks if the chip has already been opened using atomic bit ops. I'm guessing this is correct the bt87x driver, which has two substreams of which only one can be used at one time. But in the cx88-alsa driver, which has only one substream, it is unnecessary.
I've figured out one of my problems, and it looks like it's not my fault.
I tried running arecord piped to aplay, both using the ca0106 driver, and it works ok. Very few overruns. So this operation with this buffer size _can_ be done. arecord -v -D hw:0,0 -f DAT --buffer-size=512 | aplay -D hw:0,1 --buffer-size=512
So, I do the same thing with the device I'm working on, running arecord on hw:1,0, but it doesn't work so well! It's ok for a minute or two, but then I start getting a lot of overruns. I thought it was my driver, but then I noticed something.
Once the overruns start, they were consistently about 3.5 seconds apart. If I increased the arecord buffer size to 1024, they were 7 seconds apart, at 2048 they were 13 seconds apart.
When I recorded and played back using the ca0106 card, both the ADC and DAC were probably using the same master clock signal and the rates were locked. But the other card was using a different clock and must have been producing data faster than the ca0106 would play it. After a while the pipe buffer would fill up and arecord would block too long waiting for aplay to consume its data, and there would be an overrun. That empties the driver's pcm buffer, but not the pipe between arecord and aplay. So as soon as the pcm buffer fills up, another overrun. Each time the pcm buffer doubles in size, it takes twice as long to fill up, and the time between overruns doubles.
At Wed, 22 Aug 2007 02:54:28 -0700 (PDT), Trent Piepho wrote:
On Wed, 22 Aug 2007, Takashi Iwai wrote:
On Tue, 21 Aug 2007, Takashi Iwai wrote:
At Mon, 20 Aug 2007 13:47:48 -0700 (PDT), Trent Piepho wrote:
Atomicity: trigger, pointer, and ack are atomic with respect to themselves. e.g., two copies of the trigger callback can't be running at the same time. That much is clear. But are they atomic with respect to each other? Can the trigger callback run at the same time as the pointer callback? How about the atomic callbacks with respect to the non-atomic ones? Can trigger run at the same time as prepare?
No, these callbacks are exclusive. In principle, they are called with substream->lock spinlock already held by the PCM core, so they cannot be called at the same time as long as belonging to the same PCM substream instance.
The driver writing howto says for hw_params, "is that this callback is non-atomic (schedulable)."
If it's non-atomic, that would mean to me that it can be called multiple times at once, and at the same time as other callbacks. But, you're saying that's not the case?
No, in that context, non-atomic means schedulable. That is, the function can get scheduled safely. It's outside the spinlock.
I see, it's not the normal definition of atomic: http://en.wikipedia.org/wiki/Atomic_operations
I suggest something like "schedulable" or "process context". When I see "X is not atomic" I think that means I need to code the function to be re-entrant.
Yeah, maybe it's better.
About the concurrent access: since the PCM substream instance is exclusive, all PCM callbacks are also basically exclusive. Even open and close are already protected via mutex in PCM core. Thus they cannot be called simultaneously for the same substream instance. (For different substreams, they can be called.)
Speaking of open, does one need to worry about the same pcm device being opened multiple times?
Two cases that open/close can be called for the same PCM instance: - for different directions (full-duplex), playback and capture - for different substreams (only when multiple substreams are defined)
I see code in the open callback of the bt87x driver and the cx88-alsa driver that checks if the chip has already been opened using atomic bit ops. I'm guessing this is correct the bt87x driver, which has two substreams of which only one can be used at one time. But in the cx88-alsa driver, which has only one substream, it is unnecessary.
Yes, likely so.
I've figured out one of my problems, and it looks like it's not my fault.
I tried running arecord piped to aplay, both using the ca0106 driver, and it works ok. Very few overruns. So this operation with this buffer size _can_ be done. arecord -v -D hw:0,0 -f DAT --buffer-size=512 | aplay -D hw:0,1 --buffer-size=512
So, I do the same thing with the device I'm working on, running arecord on hw:1,0, but it doesn't work so well! It's ok for a minute or two, but then I start getting a lot of overruns. I thought it was my driver, but then I noticed something.
Once the overruns start, they were consistently about 3.5 seconds apart. If I increased the arecord buffer size to 1024, they were 7 seconds apart, at 2048 they were 13 seconds apart.
When I recorded and played back using the ca0106 card, both the ADC and DAC were probably using the same master clock signal and the rates were locked. But the other card was using a different clock and must have been producing data faster than the ca0106 would play it. After a while the pipe buffer would fill up and arecord would block too long waiting for aplay to consume its data, and there would be an overrun. That empties the driver's pcm buffer, but not the pipe between arecord and aplay. So as soon as the pcm buffer fills up, another overrun. Each time the pcm buffer doubles in size, it takes twice as long to fill up, and the time between overruns doubles.
Indeed, the sync with multiple devices in such a small buffer size is often difficult.
Takashi
participants (4)
-
Eliot Blennerhassett
-
Takashi Iwai
-
Timur Tabi
-
Trent Piepho