[alsa-devel] Buffer size for ALSA USB PCM audio
I have been studying the data_ep_set_params() function in sound/usb/endpoint.c. This is the routine that calculates the number of samples and I/O requests to keep on the USB hardware queue for PCM audio, based on the ALSA parameters.
It uses the PERIOD_BYTES parameter but not BUFFER_BYTES. In simplified terms (ignoring rounding, boundary cases, and other things), the number of periods per buffer is fixed at 24 for recording and 1 for playback, completely ignoring the user's setting. If you look at the parameters copied in snd_usb_hw_params() in sound/usb/pcm.c, you'll see what I mean.
Is this really the intended behavior? It doesn't seem right at all.
Alan Stern
At Wed, 24 Jul 2013 10:41:43 -0400 (EDT), Alan Stern wrote:
I have been studying the data_ep_set_params() function in sound/usb/endpoint.c. This is the routine that calculates the number of samples and I/O requests to keep on the USB hardware queue for PCM audio, based on the ALSA parameters.
It uses the PERIOD_BYTES parameter but not BUFFER_BYTES. In simplified terms (ignoring rounding, boundary cases, and other things), the number of periods per buffer is fixed at 24 for recording and 1 for playback, completely ignoring the user's setting. If you look at the parameters copied in snd_usb_hw_params() in sound/usb/pcm.c, you'll see what I mean.
Is this really the intended behavior? It doesn't seem right at all.
The buffer size doesn't matter for urb setup because the usb-audio driver transfers the data by the driver itself at urb completes. The buffer size is considered in these callbacks, i.e. prepare_playback_urb() and retire_capture_urb().
OTOH, the period size is evaluated for determining the urb buffer size, so that the data transfer (thus the wakeup via complete) is aligned to the period size.
Takashi
On Wed, 24 Jul 2013, Takashi Iwai wrote:
At Wed, 24 Jul 2013 10:41:43 -0400 (EDT), Alan Stern wrote:
I have been studying the data_ep_set_params() function in sound/usb/endpoint.c. This is the routine that calculates the number of samples and I/O requests to keep on the USB hardware queue for PCM audio, based on the ALSA parameters.
It uses the PERIOD_BYTES parameter but not BUFFER_BYTES. In simplified terms (ignoring rounding, boundary cases, and other things), the number of periods per buffer is fixed at 24 for recording and 1 for playback, completely ignoring the user's setting. If you look at the parameters copied in snd_usb_hw_params() in sound/usb/pcm.c, you'll see what I mean.
Is this really the intended behavior? It doesn't seem right at all.
The buffer size doesn't matter for urb setup because the usb-audio driver transfers the data by the driver itself at urb completes. The buffer size is considered in these callbacks, i.e. prepare_playback_urb() and retire_capture_urb().
I don't understand. Consider a simple playback example. Suppose the user wants to keep the latency low, so he requests 2 periods per buffer. snd-usb-audio ignores this value and decides to use 10 URBs, which is equivalent to setting the buffer size to 10 periods.
Now the latency will be five times higher than the user wants, because data from the user is stored in the next available URB, and it won't get sent to the hardware until all 9 of the URBs preceding it in the queue have been sent.
Alan Stern
At Wed, 24 Jul 2013 11:22:00 -0400 (EDT), Alan Stern wrote:
On Wed, 24 Jul 2013, Takashi Iwai wrote:
At Wed, 24 Jul 2013 10:41:43 -0400 (EDT), Alan Stern wrote:
I have been studying the data_ep_set_params() function in sound/usb/endpoint.c. This is the routine that calculates the number of samples and I/O requests to keep on the USB hardware queue for PCM audio, based on the ALSA parameters.
It uses the PERIOD_BYTES parameter but not BUFFER_BYTES. In simplified terms (ignoring rounding, boundary cases, and other things), the number of periods per buffer is fixed at 24 for recording and 1 for playback, completely ignoring the user's setting. If you look at the parameters copied in snd_usb_hw_params() in sound/usb/pcm.c, you'll see what I mean.
Is this really the intended behavior? It doesn't seem right at all.
The buffer size doesn't matter for urb setup because the usb-audio driver transfers the data by the driver itself at urb completes. The buffer size is considered in these callbacks, i.e. prepare_playback_urb() and retire_capture_urb().
I don't understand. Consider a simple playback example. Suppose the user wants to keep the latency low, so he requests 2 periods per buffer. snd-usb-audio ignores this value and decides to use 10 URBs, which is equivalent to setting the buffer size to 10 periods.
You don't have to fill all 10 URBs to start the stream...
Takashi
On Wed, 24 Jul 2013, Takashi Iwai wrote:
I don't understand. Consider a simple playback example. Suppose the user wants to keep the latency low, so he requests 2 periods per buffer. snd-usb-audio ignores this value and decides to use 10 URBs, which is equivalent to setting the buffer size to 10 periods.
You don't have to fill all 10 URBs to start the stream...
Maybe you don't have to, but it looks the driver does just that. From snd_usb_endpoint_start():
if (snd_usb_endpoint_implicit_feedback_sink(ep)) { for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *ctx = ep->urb + i; list_add_tail(&ctx->ready_list, &ep->ready_playback_urbs); }
return 0; }
for (i = 0; i < ep->nurbs; i++) { struct urb *urb = ep->urb[i].urb;
if (snd_BUG_ON(!urb)) goto __error;
if (usb_pipeout(ep->pipe)) { prepare_outbound_urb(ep, urb->context); } else { prepare_inbound_urb(ep, urb->context); }
err = usb_submit_urb(urb, GFP_ATOMIC); if (err < 0) { snd_printk(KERN_ERR "cannot submit urb %d, error %d: %s\n", i, err, usb_error_string(err)); goto __error; } set_bit(i, &ep->active_mask); }
In this case, I'm particularly considering what happens when snd_usb_endpoint_implicit_feedback_sink(ep) is false.
Besides, what's the reason for allocating 10 URBs if you're not going to submit more than 2 of them at a time?
Alan Stern
At Wed, 24 Jul 2013 11:43:58 -0400 (EDT), Alan Stern wrote:
On Wed, 24 Jul 2013, Takashi Iwai wrote:
I don't understand. Consider a simple playback example. Suppose the user wants to keep the latency low, so he requests 2 periods per buffer. snd-usb-audio ignores this value and decides to use 10 URBs, which is equivalent to setting the buffer size to 10 periods.
You don't have to fill all 10 URBs to start the stream...
Maybe you don't have to, but it looks the driver does just that. From snd_usb_endpoint_start():
if (snd_usb_endpoint_implicit_feedback_sink(ep)) { for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *ctx = ep->urb + i; list_add_tail(&ctx->ready_list, &ep->ready_playback_urbs); }
return 0;
}
for (i = 0; i < ep->nurbs; i++) { struct urb *urb = ep->urb[i].urb;
if (snd_BUG_ON(!urb)) goto __error; if (usb_pipeout(ep->pipe)) { prepare_outbound_urb(ep, urb->context); } else { prepare_inbound_urb(ep, urb->context); } err = usb_submit_urb(urb, GFP_ATOMIC); if (err < 0) { snd_printk(KERN_ERR "cannot submit urb %d, error %d: %s\n", i, err, usb_error_string(err)); goto __error; } set_bit(i, &ep->active_mask);
}
In this case, I'm particularly considering what happens when snd_usb_endpoint_implicit_feedback_sink(ep) is false.
Besides, what's the reason for allocating 10 URBs if you're not going to submit more than 2 of them at a time?
I don't know how you deduced 10 urbs in your example, but in general, ep->nurbs is calculated so that the driver can hold at least two ep->periods (i.e. double buffer). The USB audio driver has essentially two buffers: an internal hardware buffer via URBs and an intermediate buffer via vmalloc. The latter is exposed to user-space and its content is copied to the URBs appropriately via complete callbacks.
Due to this design, we just need two periods for URB buffers, ideally, no matter how many periods are used in the latter buffer specified by user. That's why no buffer_size is needed in ep->nurbs estimation. The actual calculation is, however, a bit more complicated to achieve enough fine-grained updates but yet not too big buffer size. I guess this can be simplified / improved.
Takashi
On Thu, 25 Jul 2013, Takashi Iwai wrote:
Besides, what's the reason for allocating 10 URBs if you're not going to submit more than 2 of them at a time?
I don't know how you deduced 10 urbs in your example,
I made it up. :-)
but in general, ep->nurbs is calculated so that the driver can hold at least two ep->periods (i.e. double buffer). The USB audio driver has essentially two buffers: an internal hardware buffer via URBs and an intermediate buffer via vmalloc. The latter is exposed to user-space and its content is copied to the URBs appropriately via complete callbacks.
Due to this design, we just need two periods for URB buffers, ideally, no matter how many periods are used in the latter buffer specified by user. That's why no buffer_size is needed in ep->nurbs estimation. The actual calculation is, however, a bit more complicated to achieve enough fine-grained updates but yet not too big buffer size. I guess this can be simplified / improved.
Ah, that makes sense. Thanks for the explanation.
The existing code has a problem: Under some conditions the URB queue will be too short. EHCI requires at least 10 microframes on the queue at all times (even when an URB is completing and is therefore no longer on the queue) to avoid underruns. Well, 9 microframes is probably good enough, but I'll use 10 to be safe. A typical arrangement of 2 URBs each with 8 packets each (at 1 packet/uframe) violates this constraint, and I have seen underruns in practice.
The patch below fixes this problem and drastically simplifies the calculation. How does it look?
Alan Stern
Index: usb-3.10/sound/usb/endpoint.c =================================================================== --- usb-3.10.orig/sound/usb/endpoint.c +++ usb-3.10/sound/usb/endpoint.c @@ -575,6 +575,7 @@ static int data_ep_set_params(struct snd struct snd_usb_endpoint *sync_ep) { unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; + unsigned int min_queued_packs, max_packs; int is_playback = usb_pipeout(ep->pipe); int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
@@ -608,11 +609,21 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else + + /* high speed needs 10 USB uframes on the queue at all times */ + min_queued_packs = DIV_ROUND_UP(10, 8 / packs_per_ms); + max_packs = MAX_PACKS_HS; + } else { packs_per_ms = 1;
+ /* full speed needs one USB frame on the queue at all times */ + min_queued_packs = 1; + max_packs = MAX_PACKS; + } + max_packs = min(max_packs, MAX_QUEUE * packs_per_ms); + if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { urb_packs = max(ep->chip->nrpacks, 1); urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); @@ -625,42 +636,18 @@ static int data_ep_set_params(struct snd if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; - /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); - /* with sync from device, assume it can be 12% lower */ - if (sync_ep) - minsize -= minsize >> 3; - minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - } - - ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; + /* each URB must fit into one period */ + urb_packs = min(urb_packs, period_bytes / maxsize); + urb_packs = max(1u, urb_packs); + + /* at least one more URB than the minimum queue size */ + ep->nurbs = 1 + DIV_ROUND_UP(min_queued_packs, urb_packs); + + if (ep->nurbs * urb_packs > max_packs) { + /* too much -- URBs have too many packets */ + urb_packs = max_packs / ep->nurbs; } + total_packs = ep->nurbs * urb_packs;
/* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) {
At Thu, 25 Jul 2013 10:50:49 -0400 (EDT), Alan Stern wrote:
On Thu, 25 Jul 2013, Takashi Iwai wrote:
Besides, what's the reason for allocating 10 URBs if you're not going to submit more than 2 of them at a time?
I don't know how you deduced 10 urbs in your example,
I made it up. :-)
but in general, ep->nurbs is calculated so that the driver can hold at least two ep->periods (i.e. double buffer). The USB audio driver has essentially two buffers: an internal hardware buffer via URBs and an intermediate buffer via vmalloc. The latter is exposed to user-space and its content is copied to the URBs appropriately via complete callbacks.
Due to this design, we just need two periods for URB buffers, ideally, no matter how many periods are used in the latter buffer specified by user. That's why no buffer_size is needed in ep->nurbs estimation. The actual calculation is, however, a bit more complicated to achieve enough fine-grained updates but yet not too big buffer size. I guess this can be simplified / improved.
Ah, that makes sense. Thanks for the explanation.
The existing code has a problem: Under some conditions the URB queue will be too short. EHCI requires at least 10 microframes on the queue at all times (even when an URB is completing and is therefore no longer on the queue) to avoid underruns. Well, 9 microframes is probably good enough, but I'll use 10 to be safe. A typical arrangement of 2 URBs each with 8 packets each (at 1 packet/uframe) violates this constraint, and I have seen underruns in practice.
The patch below fixes this problem and drastically simplifies the calculation. How does it look?
Looks good through a quick glance. But I'd rather like to let review Clemens and Daniel as I already forgot the old voodoo logic of the current driver code :)
Thanks!
Takashi
Alan Stern
Index: usb-3.10/sound/usb/endpoint.c
--- usb-3.10.orig/sound/usb/endpoint.c +++ usb-3.10/sound/usb/endpoint.c @@ -575,6 +575,7 @@ static int data_ep_set_params(struct snd struct snd_usb_endpoint *sync_ep) { unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms;
- unsigned int min_queued_packs, max_packs; int is_playback = usb_pipeout(ep->pipe); int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
@@ -608,11 +609,21 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL)
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval;
- else
/* high speed needs 10 USB uframes on the queue at all times */
min_queued_packs = DIV_ROUND_UP(10, 8 / packs_per_ms);
max_packs = MAX_PACKS_HS;
} else { packs_per_ms = 1;
/* full speed needs one USB frame on the queue at all times */
min_queued_packs = 1;
max_packs = MAX_PACKS;
}
max_packs = min(max_packs, MAX_QUEUE * packs_per_ms);
if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { urb_packs = max(ep->chip->nrpacks, 1); urb_packs = min(urb_packs, (unsigned int) MAX_PACKS);
@@ -625,42 +636,18 @@ static int data_ep_set_params(struct snd if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */
- if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) {
unsigned int minsize, maxpacks;
/* determine how small a packet can be */
minsize = (ep->freqn >> (16 - ep->datainterval))
* (frame_bits >> 3);
/* with sync from device, assume it can be 12% lower */
if (sync_ep)
minsize -= minsize >> 3;
minsize = max(minsize, 1u);
total_packs = (period_bytes + minsize - 1) / minsize;
/* we need at least two URBs for queueing */
if (total_packs < 2) {
total_packs = 2;
} else {
/* and we don't want too long a queue either */
maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2);
total_packs = min(total_packs, maxpacks);
}
- } else {
while (urb_packs > 1 && urb_packs * maxsize >= period_bytes)
urb_packs >>= 1;
total_packs = MAX_URBS * urb_packs;
- }
- ep->nurbs = (total_packs + urb_packs - 1) / urb_packs;
- if (ep->nurbs > MAX_URBS) {
/* too much... */
ep->nurbs = MAX_URBS;
total_packs = MAX_URBS * urb_packs;
- } else if (ep->nurbs < 2) {
/* too little - we need at least two packets
* to ensure contiguous playback/capture
*/
ep->nurbs = 2;
/* each URB must fit into one period */
urb_packs = min(urb_packs, period_bytes / maxsize);
urb_packs = max(1u, urb_packs);
/* at least one more URB than the minimum queue size */
ep->nurbs = 1 + DIV_ROUND_UP(min_queued_packs, urb_packs);
if (ep->nurbs * urb_packs > max_packs) {
/* too much -- URBs have too many packets */
urb_packs = max_packs / ep->nurbs;
}
total_packs = ep->nurbs * urb_packs;
/* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) {
On Mon, 29 Jul 2013, Takashi Iwai wrote:
The existing code has a problem: Under some conditions the URB queue will be too short. EHCI requires at least 10 microframes on the queue at all times (even when an URB is completing and is therefore no longer on the queue) to avoid underruns. Well, 9 microframes is probably good enough, but I'll use 10 to be safe. A typical arrangement of 2 URBs each with 8 packets each (at 1 packet/uframe) violates this constraint, and I have seen underruns in practice.
The patch below fixes this problem and drastically simplifies the calculation. How does it look?
Looks good through a quick glance. But I'd rather like to let review Clemens and Daniel as I already forgot the old voodoo logic of the current driver code :)
Okay.
To recap, the bug I want to fix is that snd-usb-audio does not always keep the USB hardware queue sufficiently full. (There is at least one other minor bug -- the driver uses MAX_PACKS instead of MAX_PACKS_HS for high-speed devices.)
Clemens remarked some time ago that keeping the queue full would be trivial, if only he knew how full it needed to be. The answer to that is given above. I have been trying to make the appropriate changes, but I'm not finding it "trivial". :-( Partly because I don't fully understand all the constraints that are already present, but also because it isn't a straightforward calculation.
For a recording endpoint, it truly is simple because then the number of packets per URB never changes (as far as I can tell).
For a playback endpoint, the number of packets gets reduced when necessary, to insure that none of the packets in the URB starts after the end of the current ALSA period. Since a period is defined in terms of the number of samples (ALSA frames) rather than an absolute time, the end of the period depends on the size of the packets. And this size can vary if the output endpoint has a feedback endpoint.
The current driver seems to assume that endpoints with an _implicit_ feedback endpoint won't have variable-length packets. I don't understand why not; doesn't queue_pending_output_urbs() make the packet sizes match the sizes from the corresponding feedback endpoint?
I'd appreciate any advice you can offer.
Alan Stern
Alan Stern wrote:
Clemens remarked some time ago that keeping the queue full would be trivial, if only he knew how full it needed to be. The answer to that is given above. I have been trying to make the appropriate changes, but I'm not finding it "trivial".
What I meant was that it would be trivial to change the lower bound of the period size (from which many queueing parameters are derived).
The current driver seems to assume that endpoints with an _implicit_ feedback endpoint won't have variable-length packets.
That's likely to be a bug.
Regards, Clemens
On Mon, 29 Jul 2013, Clemens Ladisch wrote:
Alan Stern wrote:
Clemens remarked some time ago that keeping the queue full would be trivial, if only he knew how full it needed to be. The answer to that is given above. I have been trying to make the appropriate changes, but I'm not finding it "trivial".
What I meant was that it would be trivial to change the lower bound of the period size (from which many queueing parameters are derived).
I doubt that would help. What really matters here is the relation between urb_packs (the maximum number of packets in an URB) and the number of packets in a period (which can vary as the device's clock frequency drifts).
For a bizarre example of sort of thing that can happen, suppose urb_packs is 8 and and a period always consists of 8 packets. Then to occupy 10 uframes requires two URBs (assuming one packet per uframe).
But now suppose a period might need 9 packets (say because the device's clock frequency is a little low, so it consumes samples less quickly and therefore a fixed number of samples takes more time). Then to occupy 10 uframes, two URBs would no longer be enough, because the numbers of packets in successive URBs could be 8, 1, 8, etc. Two URBs would occupy only 9 uframes.
The current driver seems to assume that endpoints with an _implicit_ feedback endpoint won't have variable-length packets.
That's likely to be a bug.
data_ep_set_params() checks snd_usb_endpoint_implicit_feedback_sink() in three places. It looks like only the second place is correct.
Can you verify whether the other two are right, and explain to me if they are?
Alan Stern
On 29.07.2013 20:20, Alan Stern wrote:
data_ep_set_params() checks snd_usb_endpoint_implicit_feedback_sink() in three places. It looks like only the second place is correct.
Can you verify whether the other two are right, and explain to me if they are?
Which version are you looking at? Eldad Zack recently posted work in that area, but I can't seem to find them anywhere yet in the sound tree. Takashi?
Eldad, as you're likely more into the detail right now - any oppinion on Alan's findings?
Thanks, Daniel
On Mon, 29 Jul 2013, Daniel Mack wrote:
On 29.07.2013 20:20, Alan Stern wrote:
data_ep_set_params() checks snd_usb_endpoint_implicit_feedback_sink() in three places. It looks like only the second place is correct.
Can you verify whether the other two are right, and explain to me if they are?
Which version are you looking at? Eldad Zack recently posted work in that area, but I can't seem to find them anywhere yet in the sound tree.
I'm working with Linus's 3.11-rc3 kernel.
Takashi?
Eldad, as you're likely more into the detail right now - any oppinion on Alan's findings?
Alan Stern
At Mon, 29 Jul 2013 21:23:15 +0200, Daniel Mack wrote:
On 29.07.2013 20:20, Alan Stern wrote:
data_ep_set_params() checks snd_usb_endpoint_implicit_feedback_sink() in three places. It looks like only the second place is correct.
Can you verify whether the other two are right, and explain to me if they are?
Which version are you looking at? Eldad Zack recently posted work in that area, but I can't seem to find them anywhere yet in the sound tree. Takashi?
I'm waiting for the revised patchset (for the issue in patch 9/10 Clemens pointed).
Takashi
Eldad, as you're likely more into the detail right now - any oppinion on Alan's findings?
Thanks, Daniel
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
On Tue, 30 Jul 2013, Takashi Iwai wrote:
At Mon, 29 Jul 2013 21:23:15 +0200, Daniel Mack wrote:
On 29.07.2013 20:20, Alan Stern wrote:
data_ep_set_params() checks snd_usb_endpoint_implicit_feedback_sink() in three places. It looks like only the second place is correct.
Can you verify whether the other two are right, and explain to me if they are?
Which version are you looking at? Eldad Zack recently posted work in that area, but I can't seem to find them anywhere yet in the sound tree. Takashi?
I'm waiting for the revised patchset (for the issue in patch 9/10 Clemens pointed).
Sorry for the delayed responses - I'm working on my free time, and that can be very sporadic.
I'll get around to that patchset on the weekend.
Eldad, as you're likely more into the detail right now - any oppinion on Alan's findings?
Unfortunately I haven't read too deeply into that part of the code. FWIW, I will take a closer look at the weekend.
Alan, let me know if you plan to post the patch as-is, I'll happily test it.
Cheers, Eldad
On Mon, 29 Jul 2013, Clemens Ladisch wrote:
Alan Stern wrote:
Clemens remarked some time ago that keeping the queue full would be trivial, if only he knew how full it needed to be. The answer to that is given above. I have been trying to make the appropriate changes, but I'm not finding it "trivial".
What I meant was that it would be trivial to change the lower bound of the period size (from which many queueing parameters are derived).
Here's what I've got. In turns out the predicting the optimum number of URBs needed is extremely difficult. I decided it would be better to make an overestimate and then to submit URBs as needed, rather than keeping all of them active all the time.
I ended up with this patch (untested). It is certainly incomplete because it doesn't address endpoints with implicit sync. It also suffers from a race between snd_submit_urbs() and deactivate_urbs(): an URB may be resubmitted after it has been deactivated. (In all fairness, I think this race probably exists in the current code too -- there are no spinlocks for synchronization.)
The patch also fixes a couple of unrelated items: MAX_PACKS vs. MAX_PACKS_HS, and the maxsize calculation should realize that a packet can't contain a partial frame.
What do you think of this approach?
Alan Stern
sound/usb/card.h | 7 + sound/usb/endpoint.c | 185 +++++++++++++++++++++++++++++---------------------- sound/usb/pcm.c | 3 3 files changed, 114 insertions(+), 81 deletions(-)
Index: usb-3.11/sound/usb/card.h =================================================================== --- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -4,7 +4,7 @@ #define MAX_NR_RATES 1024 #define MAX_PACKS 20 #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 11 #define SYNC_URBS 4 /* always four urbs for sync */ #define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */
@@ -43,6 +43,7 @@ struct snd_urb_ctx { struct snd_usb_endpoint *ep; int index; /* index for urb array */ int packets; /* number of packets per urb */ + int data_packets; /* current number of data packets */ int packet_size[MAX_PACKS_HS]; /* size of packets for next submission */ struct list_head ready_list; }; @@ -99,6 +100,10 @@ struct snd_usb_endpoint { int skip_packets; /* quirks for devices to ignore the first n packets in a stream */
+ unsigned int min_queued_packs; /* USB hardware queue requirement */ + unsigned int queued_packs; /* number of packets currently queued */ + unsigned int queued_urbs; /* number of URBs currently queued */ + int next_urb; /* next to submit */ spinlock_t lock; struct list_head list; }; Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -1328,7 +1328,7 @@ static void prepare_playback_urb(struct stride = runtime->frame_bits >> 3;
frames = 0; - urb->number_of_packets = 0; + ctx->data_packets = urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) @@ -1341,6 +1341,7 @@ static void prepare_playback_urb(struct urb->iso_frame_desc[i].length = counts * ep->stride; frames += counts; urb->number_of_packets++; + ctx->data_packets++; subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size; Index: usb-3.11/sound/usb/endpoint.c =================================================================== --- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -217,6 +217,7 @@ static void prepare_outbound_urb(struct }
urb->number_of_packets = ctx->packets; + ctx->data_packets = ctx->packets; urb->transfer_buffer_length = offs * ep->stride; memset(urb->transfer_buffer, ep->silence_value, offs * ep->stride); @@ -273,6 +274,7 @@ static inline void prepare_inbound_urb(s
urb->transfer_buffer_length = offs; urb->number_of_packets = urb_ctx->packets; + urb_ctx->data_packets = urb_ctx->packets; break;
case SND_USB_ENDPOINT_TYPE_SYNC: @@ -341,6 +343,47 @@ static void queue_pending_output_urbs(st } }
+/** + * snd_submit_urbs: Submit data URBs for endpoints without implicit feedback + * + * @ep: The endpoint to use + * @ctx: Context for the first URB on the queue (or to be queued) + * + * Submit enough URBs so that when the next one completes, the hardware queue + * will still contain sufficiently many packets. + * + * Note: If the hardware queue is empty (and @ctx refers to the next URB to be + * submitted), then ctx->data_packets must be equal to 0. + */ +static int snd_submit_urbs(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) +{ + int err = 0; + + while (ep->queued_packs - ctx->data_packets < ep->min_queued_packs && + ep->queued_urbs < ep->nurbs) { + struct snd_urb_ctx *u = &ep->urb[ep->next_urb]; + + if (usb_pipeout(ep->pipe)) + prepare_outbound_urb(ep, u); + else + prepare_inbound_urb(ep, u); + + err = usb_submit_urb(u->urb, GFP_ATOMIC); + if (err) { + snd_printk(KERN_ERR "cannot submit urb (err = %d: %s)\n", + err, usb_error_string(err)); + break; + } + set_bit(ep->next_urb, &ep->active_mask); + ep->queued_packs += u->data_packets; + ++ep->queued_urbs; + + if (++ep->next_urb >= ep->nurbs) + ep->next_urb = 0; + } + return err; +} + /* * complete callback for urbs */ @@ -348,20 +391,23 @@ static void snd_complete_urb(struct urb { struct snd_urb_ctx *ctx = urb->context; struct snd_usb_endpoint *ep = ctx->ep; - int err; + + clear_bit(ctx->index, &ep->active_mask); + ep->queued_packs -= ctx->data_packets; + --ep->queued_urbs;
if (unlikely(urb->status == -ENOENT || /* unlinked */ urb->status == -ENODEV || /* device removed */ urb->status == -ECONNRESET || /* unlinked */ urb->status == -ESHUTDOWN || /* device disabled */ ep->chip->shutdown)) /* device disconnected */ - goto exit_clear; + return;
if (usb_pipeout(ep->pipe)) { retire_outbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags))) - goto exit_clear; + return;
if (snd_usb_endpoint_implicit_feedback_sink(ep)) { unsigned long flags; @@ -371,28 +417,24 @@ static void snd_complete_urb(struct urb spin_unlock_irqrestore(&ep->lock, flags); queue_pending_output_urbs(ep);
- goto exit_clear; + return; }
- prepare_outbound_urb(ep, ctx); } else { retire_inbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags))) - goto exit_clear; - - prepare_inbound_urb(ep, ctx); + return; }
- err = usb_submit_urb(urb, GFP_ATOMIC); - if (err == 0) - return; + ctx->data_packets = 0; + ++ctx; + if (ctx >= ep->urb + ep->nurbs) + ctx = &ep->urb[0];
- snd_printk(KERN_ERR "cannot submit urb (err = %d)\n", err); + if (snd_submit_urbs(ep, ctx) == 0) + return; //snd_pcm_stop(substream, SNDRV_PCM_STATE_XRUN); - -exit_clear: - clear_bit(ctx->index, &ep->active_mask); }
/** @@ -574,7 +616,7 @@ static int data_ep_set_params(struct snd struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) { - unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; + unsigned int maxsize, i, urb_packs, max_packs, packs_per_ms; int is_playback = usb_pipeout(ep->pipe); int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
@@ -593,8 +635,8 @@ static int data_ep_set_params(struct snd
/* assume max. frequency is 25% higher than nominal */ ep->freqmax = ep->freqn + (ep->freqn >> 2); - maxsize = ((ep->freqmax + 0xffff) * (frame_bits >> 3)) - >> (16 - ep->datainterval); + maxsize = ((ep->freqmax + 0xffff) >> (16 - ep->datainterval)) + * (frame_bits >> 3); /* but wMaxPacketSize might reduce this */ if (ep->maxpacksize && ep->maxpacksize < maxsize) { /* whatever fits into a max. size packet */ @@ -608,11 +650,21 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else + + /* high speed needs 10 USB uframes on the queue at all times */ + ep->min_queued_packs = DIV_ROUND_UP(10, 8 / packs_per_ms); + max_packs = MAX_PACKS_HS; + } else { packs_per_ms = 1;
+ /* full speed needs one USB frame on the queue at all times */ + ep->min_queued_packs = 1; + max_packs = MAX_PACKS; + } + max_packs = min(max_packs, MAX_QUEUE * packs_per_ms); + if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { urb_packs = max(ep->chip->nrpacks, 1); urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); @@ -625,41 +677,32 @@ static int data_ep_set_params(struct snd if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; - /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); - /* with sync from device, assume it can be 12% lower */ - if (sync_ep) - minsize -= minsize >> 3; - minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - } + /* no point having an URB much longer than one period */ + urb_packs = min(urb_packs, 1 + period_bytes / maxsize);
- ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; + /* + * We can't tell in advance how many URBs are needed to fill the + * USB hardware queue, because the number of packets per URB is + * variable (it depends on the period size and the device's clock + * frequency). Instead, get a worst-case overestimate by assuming + * the number of packets alternates between 1 and urb_packs. + * + * The total number of URBs needed is one higher than this, because + * the hardware queue must remain full even while one URB is + * completing (and therefore not on the queue). + * + * Recording endpoints with no sync always use the same number of + * packets per URB, so we can get a better estimate for them. + */ + if (is_playback || sync_ep) + ep->nurbs = 1 + 2 * DIV_ROUND_UP(ep->min_queued_packs, + urb_packs + 1); + else + ep->nurbs = 1 + DIV_ROUND_UP(ep->min_queued_packs, urb_packs); + + if (ep->nurbs * urb_packs > max_packs) { + /* too much -- URBs have too many packets */ + urb_packs = max_packs / ep->nurbs; }
/* allocate and initialize data urbs */ @@ -667,8 +710,8 @@ static int data_ep_set_params(struct snd struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; - u->packets = (i + 1) * total_packs / ep->nurbs - - i * total_packs / ep->nurbs; + u->packets = urb_packs; + u->data_packets = 0; u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II) @@ -861,30 +904,14 @@ int snd_usb_endpoint_start(struct snd_us return 0; }
- for (i = 0; i < ep->nurbs; i++) { - struct urb *urb = ep->urb[i].urb; - - if (snd_BUG_ON(!urb)) - goto __error; - - if (usb_pipeout(ep->pipe)) { - prepare_outbound_urb(ep, urb->context); - } else { - prepare_inbound_urb(ep, urb->context); - } - - err = usb_submit_urb(urb, GFP_ATOMIC); - if (err < 0) { - snd_printk(KERN_ERR "cannot submit urb %d, error %d: %s\n", - i, err, usb_error_string(err)); - goto __error; - } - set_bit(i, &ep->active_mask); - } - - return 0; + ep->queued_packs = 0; + ep->queued_urbs = 0; + ep->next_urb = 0; + ep->urb[0].data_packets = 0; + err = snd_submit_urbs(ep, &ep->urb[0]); + if (!err) + return 0;
-__error: clear_bit(EP_FLAG_RUNNING, &ep->flags); ep->use_count--; deactivate_urbs(ep, false);
At Thu, 1 Aug 2013 13:37:45 -0400 (EDT), Alan Stern wrote:
On Mon, 29 Jul 2013, Clemens Ladisch wrote:
Alan Stern wrote:
Clemens remarked some time ago that keeping the queue full would be trivial, if only he knew how full it needed to be. The answer to that is given above. I have been trying to make the appropriate changes, but I'm not finding it "trivial".
What I meant was that it would be trivial to change the lower bound of the period size (from which many queueing parameters are derived).
Here's what I've got. In turns out the predicting the optimum number of URBs needed is extremely difficult. I decided it would be better to make an overestimate and then to submit URBs as needed, rather than keeping all of them active all the time.
I ended up with this patch (untested). It is certainly incomplete because it doesn't address endpoints with implicit sync. It also suffers from a race between snd_submit_urbs() and deactivate_urbs(): an URB may be resubmitted after it has been deactivated.
What's the reason to clear ep->active_mask at the beginning of snd_complete_urb()?
(In all fairness, I think this race probably exists in the current code too -- there are no spinlocks for synchronization.)
The calls of usb_submit_urb() and usb_unlink_urb() might race indeed. The current code somehow assumed that the USB accepts such concurrent calls. deactivate_urbs() just kills the pending urbs and it doesn't change ep->active_mask bit by itself, and the synchronization is done in wait_clear_urbs(). So, if the concurrent calls of usb_submit_urb() and usb_unlink_urbs() were allowed, it should work as is (in the worst case, the complete will be called once again, but then it goes to exit_clear).
thanks,
Takashi
The patch also fixes a couple of unrelated items: MAX_PACKS vs. MAX_PACKS_HS, and the maxsize calculation should realize that a packet can't contain a partial frame.
What do you think of this approach?
Alan Stern
sound/usb/card.h | 7 + sound/usb/endpoint.c | 185 +++++++++++++++++++++++++++++---------------------- sound/usb/pcm.c | 3 3 files changed, 114 insertions(+), 81 deletions(-)
Index: usb-3.11/sound/usb/card.h
--- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -4,7 +4,7 @@ #define MAX_NR_RATES 1024 #define MAX_PACKS 20 #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 11 #define SYNC_URBS 4 /* always four urbs for sync */ #define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */
@@ -43,6 +43,7 @@ struct snd_urb_ctx { struct snd_usb_endpoint *ep; int index; /* index for urb array */ int packets; /* number of packets per urb */
- int data_packets; /* current number of data packets */ int packet_size[MAX_PACKS_HS]; /* size of packets for next submission */ struct list_head ready_list;
}; @@ -99,6 +100,10 @@ struct snd_usb_endpoint { int skip_packets; /* quirks for devices to ignore the first n packets in a stream */
- unsigned int min_queued_packs; /* USB hardware queue requirement */
- unsigned int queued_packs; /* number of packets currently queued */
- unsigned int queued_urbs; /* number of URBs currently queued */
- int next_urb; /* next to submit */ spinlock_t lock; struct list_head list;
}; Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -1328,7 +1328,7 @@ static void prepare_playback_urb(struct stride = runtime->frame_bits >> 3;
frames = 0;
- urb->number_of_packets = 0;
- ctx->data_packets = urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i])
@@ -1341,6 +1341,7 @@ static void prepare_playback_urb(struct urb->iso_frame_desc[i].length = counts * ep->stride; frames += counts; urb->number_of_packets++;
subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size;ctx->data_packets++;
Index: usb-3.11/sound/usb/endpoint.c
--- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -217,6 +217,7 @@ static void prepare_outbound_urb(struct }
urb->number_of_packets = ctx->packets;
ctx->data_packets = ctx->packets; urb->transfer_buffer_length = offs * ep->stride; memset(urb->transfer_buffer, ep->silence_value, offs * ep->stride);
@@ -273,6 +274,7 @@ static inline void prepare_inbound_urb(s
urb->transfer_buffer_length = offs; urb->number_of_packets = urb_ctx->packets;
urb_ctx->data_packets = urb_ctx->packets;
break;
case SND_USB_ENDPOINT_TYPE_SYNC:
@@ -341,6 +343,47 @@ static void queue_pending_output_urbs(st } }
+/**
- snd_submit_urbs: Submit data URBs for endpoints without implicit feedback
- @ep: The endpoint to use
- @ctx: Context for the first URB on the queue (or to be queued)
- Submit enough URBs so that when the next one completes, the hardware queue
- will still contain sufficiently many packets.
- Note: If the hardware queue is empty (and @ctx refers to the next URB to be
- submitted), then ctx->data_packets must be equal to 0.
- */
+static int snd_submit_urbs(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) +{
- int err = 0;
- while (ep->queued_packs - ctx->data_packets < ep->min_queued_packs &&
ep->queued_urbs < ep->nurbs) {
struct snd_urb_ctx *u = &ep->urb[ep->next_urb];
if (usb_pipeout(ep->pipe))
prepare_outbound_urb(ep, u);
else
prepare_inbound_urb(ep, u);
err = usb_submit_urb(u->urb, GFP_ATOMIC);
if (err) {
snd_printk(KERN_ERR "cannot submit urb (err = %d: %s)\n",
err, usb_error_string(err));
break;
}
set_bit(ep->next_urb, &ep->active_mask);
ep->queued_packs += u->data_packets;
++ep->queued_urbs;
if (++ep->next_urb >= ep->nurbs)
ep->next_urb = 0;
- }
- return err;
+}
/*
- complete callback for urbs
*/ @@ -348,20 +391,23 @@ static void snd_complete_urb(struct urb { struct snd_urb_ctx *ctx = urb->context; struct snd_usb_endpoint *ep = ctx->ep;
- int err;
clear_bit(ctx->index, &ep->active_mask);
ep->queued_packs -= ctx->data_packets;
--ep->queued_urbs;
if (unlikely(urb->status == -ENOENT || /* unlinked */ urb->status == -ENODEV || /* device removed */ urb->status == -ECONNRESET || /* unlinked */ urb->status == -ESHUTDOWN || /* device disabled */ ep->chip->shutdown)) /* device disconnected */
goto exit_clear;
return;
if (usb_pipeout(ep->pipe)) { retire_outbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags)))
goto exit_clear;
return;
if (snd_usb_endpoint_implicit_feedback_sink(ep)) { unsigned long flags;
@@ -371,28 +417,24 @@ static void snd_complete_urb(struct urb spin_unlock_irqrestore(&ep->lock, flags); queue_pending_output_urbs(ep);
goto exit_clear;
}return;
} else { retire_inbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags)))prepare_outbound_urb(ep, ctx);
goto exit_clear;
prepare_inbound_urb(ep, ctx);
}return;
- err = usb_submit_urb(urb, GFP_ATOMIC);
- if (err == 0)
return;
- ctx->data_packets = 0;
- ++ctx;
- if (ctx >= ep->urb + ep->nurbs)
ctx = &ep->urb[0];
- snd_printk(KERN_ERR "cannot submit urb (err = %d)\n", err);
- if (snd_submit_urbs(ep, ctx) == 0)
//snd_pcm_stop(substream, SNDRV_PCM_STATE_XRUN);return;
-exit_clear:
- clear_bit(ctx->index, &ep->active_mask);
}
/** @@ -574,7 +616,7 @@ static int data_ep_set_params(struct snd struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) {
- unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms;
- unsigned int maxsize, i, urb_packs, max_packs, packs_per_ms; int is_playback = usb_pipeout(ep->pipe); int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
@@ -593,8 +635,8 @@ static int data_ep_set_params(struct snd
/* assume max. frequency is 25% higher than nominal */ ep->freqmax = ep->freqn + (ep->freqn >> 2);
- maxsize = ((ep->freqmax + 0xffff) * (frame_bits >> 3))
>> (16 - ep->datainterval);
- maxsize = ((ep->freqmax + 0xffff) >> (16 - ep->datainterval))
/* but wMaxPacketSize might reduce this */ if (ep->maxpacksize && ep->maxpacksize < maxsize) { /* whatever fits into a max. size packet */* (frame_bits >> 3);
@@ -608,11 +650,21 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL)
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval;
- else
/* high speed needs 10 USB uframes on the queue at all times */
ep->min_queued_packs = DIV_ROUND_UP(10, 8 / packs_per_ms);
max_packs = MAX_PACKS_HS;
} else { packs_per_ms = 1;
/* full speed needs one USB frame on the queue at all times */
ep->min_queued_packs = 1;
max_packs = MAX_PACKS;
}
max_packs = min(max_packs, MAX_QUEUE * packs_per_ms);
if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { urb_packs = max(ep->chip->nrpacks, 1); urb_packs = min(urb_packs, (unsigned int) MAX_PACKS);
@@ -625,41 +677,32 @@ static int data_ep_set_params(struct snd if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */
- if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) {
unsigned int minsize, maxpacks;
/* determine how small a packet can be */
minsize = (ep->freqn >> (16 - ep->datainterval))
* (frame_bits >> 3);
/* with sync from device, assume it can be 12% lower */
if (sync_ep)
minsize -= minsize >> 3;
minsize = max(minsize, 1u);
total_packs = (period_bytes + minsize - 1) / minsize;
/* we need at least two URBs for queueing */
if (total_packs < 2) {
total_packs = 2;
} else {
/* and we don't want too long a queue either */
maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2);
total_packs = min(total_packs, maxpacks);
}
- } else {
while (urb_packs > 1 && urb_packs * maxsize >= period_bytes)
urb_packs >>= 1;
total_packs = MAX_URBS * urb_packs;
- }
- /* no point having an URB much longer than one period */
- urb_packs = min(urb_packs, 1 + period_bytes / maxsize);
- ep->nurbs = (total_packs + urb_packs - 1) / urb_packs;
- if (ep->nurbs > MAX_URBS) {
/* too much... */
ep->nurbs = MAX_URBS;
total_packs = MAX_URBS * urb_packs;
- } else if (ep->nurbs < 2) {
/* too little - we need at least two packets
* to ensure contiguous playback/capture
*/
ep->nurbs = 2;
/*
* We can't tell in advance how many URBs are needed to fill the
* USB hardware queue, because the number of packets per URB is
* variable (it depends on the period size and the device's clock
* frequency). Instead, get a worst-case overestimate by assuming
* the number of packets alternates between 1 and urb_packs.
*
* The total number of URBs needed is one higher than this, because
* the hardware queue must remain full even while one URB is
* completing (and therefore not on the queue).
*
* Recording endpoints with no sync always use the same number of
* packets per URB, so we can get a better estimate for them.
*/
if (is_playback || sync_ep)
ep->nurbs = 1 + 2 * DIV_ROUND_UP(ep->min_queued_packs,
urb_packs + 1);
else
ep->nurbs = 1 + DIV_ROUND_UP(ep->min_queued_packs, urb_packs);
if (ep->nurbs * urb_packs > max_packs) {
/* too much -- URBs have too many packets */
urb_packs = max_packs / ep->nurbs;
}
/* allocate and initialize data urbs */
@@ -667,8 +710,8 @@ static int data_ep_set_params(struct snd struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep;
u->packets = (i + 1) * total_packs / ep->nurbs
- i * total_packs / ep->nurbs;
u->packets = urb_packs;
u->data_packets = 0;
u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II)
@@ -861,30 +904,14 @@ int snd_usb_endpoint_start(struct snd_us return 0; }
- for (i = 0; i < ep->nurbs; i++) {
struct urb *urb = ep->urb[i].urb;
if (snd_BUG_ON(!urb))
goto __error;
if (usb_pipeout(ep->pipe)) {
prepare_outbound_urb(ep, urb->context);
} else {
prepare_inbound_urb(ep, urb->context);
}
err = usb_submit_urb(urb, GFP_ATOMIC);
if (err < 0) {
snd_printk(KERN_ERR "cannot submit urb %d, error %d: %s\n",
i, err, usb_error_string(err));
goto __error;
}
set_bit(i, &ep->active_mask);
- }
- return 0;
- ep->queued_packs = 0;
- ep->queued_urbs = 0;
- ep->next_urb = 0;
- ep->urb[0].data_packets = 0;
- err = snd_submit_urbs(ep, &ep->urb[0]);
- if (!err)
return 0;
-__error: clear_bit(EP_FLAG_RUNNING, &ep->flags); ep->use_count--; deactivate_urbs(ep, false);
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
On Mon, 12 Aug 2013, Takashi Iwai wrote:
Here's what I've got. In turns out the predicting the optimum number of URBs needed is extremely difficult. I decided it would be better to make an overestimate and then to submit URBs as needed, rather than keeping all of them active all the time.
I ended up with this patch (untested). It is certainly incomplete because it doesn't address endpoints with implicit sync. It also suffers from a race between snd_submit_urbs() and deactivate_urbs(): an URB may be resubmitted after it has been deactivated.
What's the reason to clear ep->active_mask at the beginning of snd_complete_urb()?
In order to keep ep->active_mask accurate. snd_complete_urb() might not resubmit the URB.
(In all fairness, I think this race probably exists in the current code too -- there are no spinlocks for synchronization.)
The calls of usb_submit_urb() and usb_unlink_urb() might race indeed. The current code somehow assumed that the USB accepts such concurrent calls.
The USB API does indeed allow concurrent calls of usb_submit_urb() and usb_unlink_urb(). They are serialized by a spinlock in the host controller driver, and if the usb_unlink_urb() call ends up coming first, it will fail.
(Ironically, in the EHCI driver, trying to unlink an isochronous URB doesn't do anything at all. The URB will complete in the usual way, when all its time slots expire.)
deactivate_urbs() just kills the pending urbs and it doesn't change ep->active_mask bit by itself, and the synchronization is done in wait_clear_urbs().
Oh, so that's where ep->active_mask gets used. Why call bitmap_weight()? Why not simply see if the mask value is equal to 0?
So, if the concurrent calls of usb_submit_urb() and usb_unlink_urbs() were allowed, it should work as is (in the worst case, the complete will be called once again, but then it goes to exit_clear).
I see. Assuming there's always at least one active URB while the endpoint is running, wait_clear_urbs() should work okay. The patch won't violate this assumption, so it's good.
Alan Stern
At Mon, 12 Aug 2013 10:53:36 -0400 (EDT), Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
Here's what I've got. In turns out the predicting the optimum number of URBs needed is extremely difficult. I decided it would be better to make an overestimate and then to submit URBs as needed, rather than keeping all of them active all the time.
I ended up with this patch (untested). It is certainly incomplete because it doesn't address endpoints with implicit sync. It also suffers from a race between snd_submit_urbs() and deactivate_urbs(): an URB may be resubmitted after it has been deactivated.
What's the reason to clear ep->active_mask at the beginning of snd_complete_urb()?
In order to keep ep->active_mask accurate. snd_complete_urb() might not resubmit the URB.
(In all fairness, I think this race probably exists in the current code too -- there are no spinlocks for synchronization.)
The calls of usb_submit_urb() and usb_unlink_urb() might race indeed. The current code somehow assumed that the USB accepts such concurrent calls.
The USB API does indeed allow concurrent calls of usb_submit_urb() and usb_unlink_urb(). They are serialized by a spinlock in the host controller driver, and if the usb_unlink_urb() call ends up coming first, it will fail.
(Ironically, in the EHCI driver, trying to unlink an isochronous URB doesn't do anything at all. The URB will complete in the usual way, when all its time slots expire.)
deactivate_urbs() just kills the pending urbs and it doesn't change ep->active_mask bit by itself, and the synchronization is done in wait_clear_urbs().
Oh, so that's where ep->active_mask gets used. Why call bitmap_weight()? Why not simply see if the mask value is equal to 0?
Yeah, it can be so. Though, the number of pending urbs is printed in the error message, thus bitmap_weight() is needed there instead.
So, if the concurrent calls of usb_submit_urb() and usb_unlink_urbs() were allowed, it should work as is (in the worst case, the complete will be called once again, but then it goes to exit_clear).
I see. Assuming there's always at least one active URB while the endpoint is running, wait_clear_urbs() should work okay. The patch won't violate this assumption, so it's good.
OK.
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
thanks,
Takashi
On Mon, 12 Aug 2013, Takashi Iwai wrote:
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
Here's a revised version of the patch (still untested). The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue. This will provide better protection against underruns when the period is larger than the queue's minimum requirement.
Alan Stern
sound/usb/card.h | 9 ++ sound/usb/endpoint.c | 200 ++++++++++++++++++++++++++++++--------------------- sound/usb/pcm.c | 4 - 3 files changed, 132 insertions(+), 81 deletions(-)
Index: usb-3.11/sound/usb/card.h =================================================================== --- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -4,7 +4,7 @@ #define MAX_NR_RATES 1024 #define MAX_PACKS 20 #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 11 #define SYNC_URBS 4 /* always four urbs for sync */ #define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */
@@ -43,6 +43,8 @@ struct snd_urb_ctx { struct snd_usb_endpoint *ep; int index; /* index for urb array */ int packets; /* number of packets per urb */ + int data_packets; /* current number of data packets */ + int data_frames; /* current number of data frames */ int packet_size[MAX_PACKS_HS]; /* size of packets for next submission */ struct list_head ready_list; }; @@ -99,6 +101,11 @@ struct snd_usb_endpoint { int skip_packets; /* quirks for devices to ignore the first n packets in a stream */
+ unsigned int min_queued_packs; /* USB hardware queue requirement */ + unsigned int queued_packs; /* number of packets on the queue */ + unsigned int queued_urbs; /* number of URBs on the queue */ + unsigned int queued_frames; /* number of data frames on the queue */ + int next_urb; /* next to submit */ spinlock_t lock; struct list_head list; }; Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -1328,7 +1328,7 @@ static void prepare_playback_urb(struct stride = runtime->frame_bits >> 3;
frames = 0; - urb->number_of_packets = 0; + ctx->data_packets = urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) @@ -1341,6 +1341,7 @@ static void prepare_playback_urb(struct urb->iso_frame_desc[i].length = counts * ep->stride; frames += counts; urb->number_of_packets++; + ctx->data_packets++; subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size; @@ -1370,6 +1371,7 @@ static void prepare_playback_urb(struct !snd_usb_endpoint_implicit_feedback_sink(subs->data_endpoint)) /* finish at the period boundary */ break; } + ctx->data_frames = frames; bytes = frames * ep->stride;
if (unlikely(subs->pcm_format == SNDRV_PCM_FORMAT_DSD_U16_LE && Index: usb-3.11/sound/usb/endpoint.c =================================================================== --- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -203,6 +203,7 @@ static void prepare_outbound_urb(struct } else { /* no data provider, so send silence */ unsigned int offs = 0; + for (i = 0; i < ctx->packets; ++i) { int counts;
@@ -217,6 +218,8 @@ static void prepare_outbound_urb(struct }
urb->number_of_packets = ctx->packets; + ctx->data_packets = ctx->packets; + ctx->data_frames = offs; urb->transfer_buffer_length = offs * ep->stride; memset(urb->transfer_buffer, ep->silence_value, offs * ep->stride); @@ -273,6 +276,7 @@ static inline void prepare_inbound_urb(s
urb->transfer_buffer_length = offs; urb->number_of_packets = urb_ctx->packets; + urb_ctx->data_packets = urb_ctx->packets; break;
case SND_USB_ENDPOINT_TYPE_SYNC: @@ -341,6 +345,55 @@ static void queue_pending_output_urbs(st } }
+/** + * snd_submit_urbs: Submit data URBs for endpoints without implicit feedback + * + * @ep: The endpoint to use + * @ctx: Context for the first URB on the queue (or to be queued) + * + * Submit enough URBs so that when the next one completes, the hardware queue + * will still contain sufficiently many packets. + * + * Note: If the hardware queue is empty (and @ctx refers to the next URB to be + * submitted), then ctx->data_packets must be equal to 0. + */ +static int snd_submit_urbs(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx) +{ + int err = 0; + unsigned int period_size = ep->data_subs->pcm_substream->runtime-> + period_size; + + while (ep->queued_urbs < ep->nurbs) { + struct snd_urb_ctx *u; + + if (ep->queued_frames >= period_size && + ep->queued_packs - ctx->data_packets >= + ep->min_queued_packs) + break; + + u = &ep->urb[ep->next_urb]; + if (usb_pipeout(ep->pipe)) + prepare_outbound_urb(ep, u); + else + prepare_inbound_urb(ep, u); + + err = usb_submit_urb(u->urb, GFP_ATOMIC); + if (err) { + snd_printk(KERN_ERR "cannot submit urb (err = %d: %s)\n", + err, usb_error_string(err)); + break; + } + set_bit(ep->next_urb, &ep->active_mask); + ep->queued_packs += u->data_packets; + ++ep->queued_urbs; + ep->queued_frames += u->data_frames; + + if (++ep->next_urb >= ep->nurbs) + ep->next_urb = 0; + } + return err; +} + /* * complete callback for urbs */ @@ -348,20 +401,24 @@ static void snd_complete_urb(struct urb { struct snd_urb_ctx *ctx = urb->context; struct snd_usb_endpoint *ep = ctx->ep; - int err; + + clear_bit(ctx->index, &ep->active_mask); + ep->queued_packs -= ctx->data_packets; + --ep->queued_urbs; + ep->queued_frames -= ctx->data_frames;
if (unlikely(urb->status == -ENOENT || /* unlinked */ urb->status == -ENODEV || /* device removed */ urb->status == -ECONNRESET || /* unlinked */ urb->status == -ESHUTDOWN || /* device disabled */ ep->chip->shutdown)) /* device disconnected */ - goto exit_clear; + return;
if (usb_pipeout(ep->pipe)) { retire_outbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags))) - goto exit_clear; + return;
if (snd_usb_endpoint_implicit_feedback_sink(ep)) { unsigned long flags; @@ -371,28 +428,24 @@ static void snd_complete_urb(struct urb spin_unlock_irqrestore(&ep->lock, flags); queue_pending_output_urbs(ep);
- goto exit_clear; + return; }
- prepare_outbound_urb(ep, ctx); } else { retire_inbound_urb(ep, ctx); /* can be stopped during retire callback */ if (unlikely(!test_bit(EP_FLAG_RUNNING, &ep->flags))) - goto exit_clear; - - prepare_inbound_urb(ep, ctx); + return; }
- err = usb_submit_urb(urb, GFP_ATOMIC); - if (err == 0) - return; + ctx->data_packets = 0; + ++ctx; + if (ctx >= ep->urb + ep->nurbs) + ctx = &ep->urb[0];
- snd_printk(KERN_ERR "cannot submit urb (err = %d)\n", err); + if (snd_submit_urbs(ep, ctx) == 0) + return; //snd_pcm_stop(substream, SNDRV_PCM_STATE_XRUN); - -exit_clear: - clear_bit(ctx->index, &ep->active_mask); }
/** @@ -574,7 +627,7 @@ static int data_ep_set_params(struct snd struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) { - unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; + unsigned int maxsize, i, urb_packs, max_packs, packs_per_ms; int is_playback = usb_pipeout(ep->pipe); int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
@@ -593,8 +646,8 @@ static int data_ep_set_params(struct snd
/* assume max. frequency is 25% higher than nominal */ ep->freqmax = ep->freqn + (ep->freqn >> 2); - maxsize = ((ep->freqmax + 0xffff) * (frame_bits >> 3)) - >> (16 - ep->datainterval); + maxsize = ((ep->freqmax + 0xffff) >> (16 - ep->datainterval)) + * (frame_bits >> 3); /* but wMaxPacketSize might reduce this */ if (ep->maxpacksize && ep->maxpacksize < maxsize) { /* whatever fits into a max. size packet */ @@ -608,11 +661,21 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else + + /* high speed needs 10 USB uframes on the queue at all times */ + ep->min_queued_packs = DIV_ROUND_UP(10, 8 / packs_per_ms); + max_packs = MAX_PACKS_HS; + } else { packs_per_ms = 1;
+ /* full speed needs one USB frame on the queue at all times */ + ep->min_queued_packs = 1; + max_packs = MAX_PACKS; + } + max_packs = min(max_packs, MAX_QUEUE * packs_per_ms); + if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { urb_packs = max(ep->chip->nrpacks, 1); urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); @@ -625,41 +688,36 @@ static int data_ep_set_params(struct snd if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; - /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); - /* with sync from device, assume it can be 12% lower */ - if (sync_ep) - minsize -= minsize >> 3; - minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - } + /* no point having an URB much longer than one period */ + urb_packs = min(urb_packs, 1 + period_bytes / maxsize);
- ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; + /* + * We can't tell in advance how many URBs are needed to fill the + * USB hardware queue, because the number of packets per URB is + * variable (it depends on the period size and the device's clock + * frequency). Instead, get a worst-case overestimate by assuming + * the number of packets alternates between 1 and urb_packs. + * + * The total number of URBs needed is one higher than this, because + * the hardware queue must remain full even while one URB is + * completing (and therefore not on the queue). + * + * Recording endpoints with no sync always use the same number of + * packets per URB, so we can get a better estimate for them. + */ + if (is_playback || sync_ep) + ep->nurbs = 1 + 2 * DIV_ROUND_UP(ep->min_queued_packs, + urb_packs + 1); + else + ep->nurbs = 1 + DIV_ROUND_UP(ep->min_queued_packs, urb_packs); + + /* at least enough URBs to last an entire period */ + ep->nurbs = max(ep->nurbs, period_bytes / (urb_packs * maxsize)); + ep->nurbs = min(ep->nurbs, (unsigned) MAX_URBS); + + if (ep->nurbs * urb_packs > max_packs) { + /* too much -- URBs have too many packets */ + urb_packs = max_packs / ep->nurbs; }
/* allocate and initialize data urbs */ @@ -667,8 +725,8 @@ static int data_ep_set_params(struct snd struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; - u->packets = (i + 1) * total_packs / ep->nurbs - - i * total_packs / ep->nurbs; + u->packets = urb_packs; + u->data_packets = 0; u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II) @@ -861,30 +919,14 @@ int snd_usb_endpoint_start(struct snd_us return 0; }
- for (i = 0; i < ep->nurbs; i++) { - struct urb *urb = ep->urb[i].urb; - - if (snd_BUG_ON(!urb)) - goto __error; - - if (usb_pipeout(ep->pipe)) { - prepare_outbound_urb(ep, urb->context); - } else { - prepare_inbound_urb(ep, urb->context); - } - - err = usb_submit_urb(urb, GFP_ATOMIC); - if (err < 0) { - snd_printk(KERN_ERR "cannot submit urb %d, error %d: %s\n", - i, err, usb_error_string(err)); - goto __error; - } - set_bit(i, &ep->active_mask); - } - - return 0; + ep->queued_packs = 0; + ep->queued_urbs = 0; + ep->next_urb = 0; + ep->urb[0].data_packets = 0; + err = snd_submit_urbs(ep, &ep->urb[0]); + if (!err) + return 0;
-__error: clear_bit(EP_FLAG_RUNNING, &ep->flags); ep->use_count--; deactivate_urbs(ep, false);
On Mon, 12 Aug 2013, Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
Here's a revised version of the patch (still untested). The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue. This will provide better protection against underruns when the period is larger than the queue's minimum requirement.
After more thought, I decided that patch does too much. It's not necessary to keep track of the number of packets. Instead, the driver should always try to keep as much data in the USB hardware queue as it is allowed to.
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets. It doesn't make sense to allocate just enough URBs to cover a single period.
Does this seem reasonable?
Alan Stern
Hi Alan,
On 13.08.2013 23:06, Alan Stern wrote:
On Mon, 12 Aug 2013, Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
Here's a revised version of the patch (still untested). The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue. This will provide better protection against underruns when the period is larger than the queue's minimum requirement.
After more thought, I decided that patch does too much. It's not necessary to keep track of the number of packets. Instead, the driver should always try to keep as much data in the USB hardware queue as it is allowed to.
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets. It doesn't make sense to allocate just enough URBs to cover a single period.
Does this seem reasonable?
I think so, yes. But I'd like to comment on the actual patch, and also give it a try first of course. It took me some days to gather my setup again, but if you send a revised version, I hope to be able to test it in the next days.
Thanks, Daniel
On Tue, 13 Aug 2013, Daniel Mack wrote:
Hi Alan,
On 13.08.2013 23:06, Alan Stern wrote:
On Mon, 12 Aug 2013, Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
Here's a revised version of the patch (still untested). The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue. This will provide better protection against underruns when the period is larger than the queue's minimum requirement.
After more thought, I decided that patch does too much. It's not necessary to keep track of the number of packets. Instead, the driver should always try to keep as much data in the USB hardware queue as it is allowed to.
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets. It doesn't make sense to allocate just enough URBs to cover a single period.
Does this seem reasonable?
I think so, yes. But I'd like to comment on the actual patch, and also give it a try first of course. It took me some days to gather my setup again, but if you send a revised version, I hope to be able to test it in the next days.
I can also test the revised patch on the weekend. My device uses implicit feedback though.
Cheers, Eldad
On Tue, 13 Aug 2013, Daniel Mack wrote:
Does this seem reasonable?
I think so, yes. But I'd like to comment on the actual patch, and also give it a try first of course. It took me some days to gather my setup again, but if you send a revised version, I hope to be able to test it in the next days.
On Wed, 14 Aug 2013, Eldad Zack wrote:
I can also test the revised patch on the weekend. My device uses implicit feedback though.
Thanks guys. I won't have anything ready for testing for some time; this is still pretty much in the discussion and design stage.
On Tue, 13 Aug 2013, Clemens Ladisch wrote:
- maxsize = ((ep->freqmax + 0xffff) * (frame_bits >> 3))
>> (16 - ep->datainterval);
- maxsize = ((ep->freqmax + 0xffff) >> (16 - ep->datainterval))
* (frame_bits >> 3);
What do you assume prevents firmware writers from splitting frames across packages? The specification? Sanity? :) (see txfr_quirk)
Three things. First, I was thinking mostly about playback endpoints, not recording endpoints, and the code in prepare_playback_urb() doesn't split frames.
Second, even for recording endpoints, it doesn't make sense for the device to split frames across packets, since the data samples making up the frame are all collected at the same time. (Or if they aren't collected at the same time, the device has got a nasty phase problem.)
Third, the calculation already overestimates the clock rate by 25%. (The corresponding calculation for a slow clock assumes only a 12% deviation.) The length difference caused by a partial frame pales in comparison to this excess, and it seemed better to keep the calculation sensible.
The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue.
Having truncated URBs is possible only when URBs are shorter than one period,
No. URBs are truncated when a full-sized URB would extend at least one packet beyond the end of a frame. Even then, the frame end actually occurs somewhere in the middle of the last packet of the truncated URB.
so I think that a queue length of at least two periods, together with a minimum period size, should ensure the isochrounous scheduling threshold.
This depends on how long a period is. Assumptions about the length should be avoided.
On Wed, 14 Aug 2013, Clemens Ladisch wrote:
After more thought, I decided that patch does too much. It's not necessary to keep track of the number of packets. Instead, the driver should always try to keep as much data in the USB hardware queue as it is allowed to.
This is what the current code does (i.e., re-submitting all URBs in their completion handler).
But that isn't as much as is allowed. Consider an example where each URB is 8 uframes, a period is 1 ms, and the buffer size is 4 periods. The driver will allocate 2 or 3 URBs and keep them all busy, but it is allowed to use as many as 4.
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets.
The URB queue adds latency, so it should never be made too big, even for big ALSA buffers. Once we have reached a queue length that is practically guaranteed to be safe from underruns, more URBs do not make sense. (The current limit of 24 ms is somewhat arbitrary.)
I agree. The total queue size should be limited by:
a maximum total number of URBs,
a maximum total number of packets,
a maximum reasonable latency (maybe smaller than 24 ms),
and the ALSA buffer size.
All these factors should be taken into account, and whatever the final value ends up being, the queue should be kept as close to it as possible. If the user specifies a buffer size so small that the scheduling threshold is violated and an underrun occurs, it's the user's own fault.
The difficulty arises because (for playback endpoints) URBs don't have fixed numbers of packets and the packets don't have fixed numbers of frames. The numbers change to avoid sending too much data to the device in a single packet and to avoid going beyond the end of an ALSA period in a single URB.
This means that the driver should overestimate the number of URBs required, and submit them as needed to keep the queue full. If this means that sometimes a completion callback doesn't submit anything and other times it submits more than one URB, so be it.
It doesn't make sense to allocate just enough URBs to cover a single period.
Indeed. I've used two periods in my recent patch, but I don't really know how I would justify this choice in the commit message. ;-)
What doesn't make sense either is the current algorithm that computes a specific value for the total number of packets (total_packs) and then takes great care to allocate the URBs so that this number is reached, even if this means that the number of packets per URBs varies.
Yeah, that's one of the things that got changed in my patch.
And while we're at it: the default number of packets per URB was chosen before the driver had the ability to estimate the delay from the current frame number; it should now be quite safe to increase it.
I don't understand how this delay estimation is supposed to work, or what it is meant to accomplish. Can you explain?
Alan Stern
Alan Stern wrote:
On Tue, 13 Aug 2013, Clemens Ladisch wrote:
The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue.
Having truncated URBs is possible only when URBs are shorter than one period,
No. URBs are truncated when a full-sized URB would extend at least one packet beyond the end of a frame.
This thought was in the context of avoiding too-short queues. When there are multiple URBs per period, the queue is long enough anyway.
so I think that a queue length of at least two periods, together with a minimum period size, should ensure the isochrounous scheduling threshold.
This depends on how long a period is.
In that patch, a period is guaranteed to be no smaller than the IST.
And while we're at it: the default number of packets per URB was chosen before the driver had the ability to estimate the delay from the current frame number; it should now be quite safe to increase it.
I don't understand how this delay estimation is supposed to work, or what it is meant to accomplish. Can you explain?
The "delay" is the difference between the time when a sample is written by the application and when that sample is actually played, and is important for things like A/V synchronization. It depends on 1) the amount of samples already in the ring buffer; 2) any processing done by the driver; and 3) any processing done by the hardware.
Most drivers don't do any processing, and most of the hardware has very low delays, so it is common for drivers to pretend 2) and 3) are zero.
snd-usb-audio computes 2) from the current number of packets in the queue, with the progress estimated based on the current frame. Without this computation, it was desirable to have short URBs because the delay would jump by a large amount whenever a URB was completed.
Regards, Clemens
Alan Stern wrote:
On Mon, 12 Aug 2013, Alan Stern wrote:
Here's a revised version of the patch (still untested). The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue. This will provide better protection against underruns when the period is larger than the queue's minimum requirement.
After more thought, I decided that patch does too much. It's not necessary to keep track of the number of packets. Instead, the driver should always try to keep as much data in the USB hardware queue as it is allowed to.
This is what the current code does (i.e., re-submitting all URBs in their completion handler).
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets.
The URB queue adds latency, so it should never be made too big, even for big ALSA buffers. Once we have reached a queue length that is practically guaranteed to be safe from underruns, more URBs do not make sense. (The current limit of 24 ms is somewhat arbitrary.)
It doesn't make sense to allocate just enough URBs to cover a single period.
Indeed. I've used two periods in my recent patch, but I don't really know how I would justify this choice in the commit message. ;-)
What doesn't make sense either is the current algorithm that computes a specific value for the total number of packets (total_packs) and then takes great care to allocate the URBs so that this number is reached, even if this means that the number of packets per URBs varies.
And while we're at it: the default number of packets per URB was chosen before the driver had the ability to estimate the delay from the current frame number; it should now be quite safe to increase it.
Regards, Clemens
On Wed, 14 Aug 2013, Clemens Ladisch wrote:
In other words, there should be enough URBs so that an entire ALSA buffer can be queued at any time, subject only to the limit on the maximum number of URBs and packets.
The URB queue adds latency, so it should never be made too big, even for big ALSA buffers. Once we have reached a queue length that is practically guaranteed to be safe from underruns, more URBs do not make sense. (The current limit of 24 ms is somewhat arbitrary.)
It doesn't make sense to allocate just enough URBs to cover a single period.
Indeed. I've used two periods in my recent patch, but I don't really know how I would justify this choice in the commit message. ;-)
What doesn't make sense either is the current algorithm that computes a specific value for the total number of packets (total_packs) and then takes great care to allocate the URBs so that this number is reached, even if this means that the number of packets per URBs varies.
And while we're at it: the default number of packets per URB was chosen before the driver had the ability to estimate the delay from the current frame number; it should now be quite safe to increase it.
Okay, so here's my next attempt. It's based on a simple idea: All the difficulty arises from the fact that we don't know beforehand how many URBs will constitute an ALSA period since for playback endpoints, the URB sizes can vary. The problem can be solved simply by setting the number of URBs per period to a fixed value, and truncating URBs as needed to make it work.
Briefly, the patch fixes the max number of packets per URB, based on the minimum packet size. The only requirement is that no URB should exceed 8 ms (MAX_PACKS) or the length of a period. Given this, the number of URBs per period is fixed, and the number of packets in each URB is adjusted during playback so that all the URBs end up having roughly the same number of frames. As a result, we don't need to submit variable numbers of URBs in the completion handler after all.
The total number of URBs is limited to 12 (MAX_URBS) and 16 ms worth (MAX_QUEUE), and it is also restricted so that the total amount of data in the queue does not exceed the size of an ALSA buffer. If the user sets the buffer size too small, he gets what he deserves.
The parameter calculation now ends up being the same for both recording and playback endpoints. This may seem odd, but it follows from the reasoning above.
Although I believe the logic is now sound, the patch itself is incomplete. I don't know the best way of passing the frames/period and periods/buffer values, so they are stubbed as 0 with a FIXME comment. Maybe you can tell me how to do this properly.
The patch eliminates the use of ep->chip->nrpacks and therefore the nrpacks module parameter. Does that matter? I left the parameter in place, but now it doesn't do anything.
I'm not sure about the interaction with ep->curpacksize. I left it alone, since it looks like it won't affect playback endpoints.
Also, the relations between frames_per_period, period_bytes, and frame_bits are a little confusing. It's not clear to me that they will always behave as one would expect, i.e., that period_bytes will always be equal to (frame_bits >> 3) * frames_per_period.
Overall, the patch removes more lines of code than it adds. I think it looks pretty good, as far as it goes. What do you think?
Alan Stern
Index: usb-3.11/sound/usb/card.h =================================================================== --- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -2,11 +2,11 @@ #define __USBAUDIO_CARD_H
#define MAX_NR_RATES 1024 -#define MAX_PACKS 20 +#define MAX_PACKS 8 /* per URB */ #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 12 #define SYNC_URBS 4 /* always four urbs for sync */ -#define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */ +#define MAX_QUEUE 16 /* try not to exceed this queue length, in ms */
struct audioformat { struct list_head list; @@ -87,6 +87,7 @@ struct snd_usb_endpoint { unsigned int phase; /* phase accumulator */ unsigned int maxpacksize; /* max packet size in bytes */ unsigned int maxframesize; /* max packet size in frames */ + unsigned int max_urb_frames; /* max URB size in frames */ unsigned int curpacksize; /* current packet size in bytes (for capture) */ unsigned int curframesize; /* current packet size in frames (for capture) */ unsigned int syncmaxsize; /* sync endpoint packet size */ @@ -125,6 +126,7 @@ struct snd_usb_substream {
unsigned int hwptr_done; /* processed byte position in the buffer */ unsigned int transfer_done; /* processed frames since last period update */ + unsigned int frame_limit; /* limits number of packets in URB */
/* data and sync endpoints for this stream */ unsigned int ep_num; /* the endpoint number */ Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -1330,6 +1330,7 @@ static void prepare_playback_urb(struct frames = 0; urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); + subs->frame_limit += ep->max_urb_frames; for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) counts = ctx->packet_size[i]; @@ -1344,6 +1345,7 @@ static void prepare_playback_urb(struct subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size; + subs->frame_limit = 0; period_elapsed = 1; if (subs->fmt_type == UAC_FORMAT_TYPE_II) { if (subs->transfer_done > 0) { @@ -1366,8 +1368,10 @@ static void prepare_playback_urb(struct break; } } - if (period_elapsed && - !snd_usb_endpoint_implicit_feedback_sink(subs->data_endpoint)) /* finish at the period boundary */ + /* finish at the period boundary or after enough frames */ + if ((period_elapsed || + subs->transfer_done >= subs->frame_limit) && + !snd_usb_endpoint_implicit_feedback_sink(ep)) break; } bytes = frames * ep->stride; Index: usb-3.11/sound/usb/endpoint.c =================================================================== --- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -571,11 +571,14 @@ static int data_ep_set_params(struct snd snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int frames_per_period, + unsigned int periods_per_buffer, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) { - unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; - int is_playback = usb_pipeout(ep->pipe); + unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb; + unsigned int max_packs_per_period, urbs_per_period, urb_packs; + unsigned int max_urbs, i; int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
if (pcm_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) { @@ -608,67 +611,44 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else - packs_per_ms = 1; - - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - urb_packs = max(ep->chip->nrpacks, 1); - urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); + max_packs_per_urb = MAX_PACKS_HS >> ep->datainterval; } else { - urb_packs = 1; + packs_per_ms = 1; + max_packs_per_urb = MAX_PACKS; } - - urb_packs *= packs_per_ms; - if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) - urb_packs = min(urb_packs, 1U << sync_ep->syncinterval); + max_packs_per_urb = min(max_packs_per_urb, + 1U << sync_ep->syncinterval);
- /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; - /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); - /* with sync from device, assume it can be 12% lower */ - if (sync_ep) - minsize -= minsize >> 3; - minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - } - - ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; - } + /* determine how small a packet can be */ + minsize = (ep->freqn >> (16 - ep->datainterval)) * (frame_bits >> 3); + /* with sync from device, assume it can be 12% lower */ + if (sync_ep) + minsize -= minsize >> 3; + minsize = max(minsize, 1u); + + /* how many packets will contain an entire ALSA period? */ + max_packs_per_period = DIV_ROUND_UP(period_bytes, minsize); + + /* how many URBs contain a period, and how many packets in each? */ + urbs_per_period = DIV_ROUND_UP(max_packs_per_period, max_packs_per_urb); + urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period); + ep->max_urb_frames = DIV_ROUND_UP(frames_per_period, + urbs_per_period); + + /* try to use enough URBs to contain an entire ALSA buffer */ + max_urbs = min((unsigned) MAX_URBS, + MAX_QUEUE * packs_per_ms / urb_packs); + ep->nurbs = min(max_urbs, urbs_per_period * periods_per_buffer);
/* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; - u->packets = (i + 1) * total_packs / ep->nurbs - - i * total_packs / ep->nurbs; + u->packets = urb_packs; u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II) @@ -790,7 +770,8 @@ int snd_usb_endpoint_set_params(struct s switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep, pcm_format, channels, - period_bytes, fmt, sync_ep); + period_bytes, 0, 0, fmt, sync_ep); + /* FIXME: buffer size isn't 0! */ break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep, fmt);
Clemens and everyone else:
Not having heard any responses to the patch posted last Wednesday, I have updated and completed it. The version below is ready for testing. Please let me know what you find.
It is not very different from the previous version. I got rid of the nrpacks module parameter and figured out how to pass the frames/period and periods/buffer values to data_ep_set_params(). (Maybe this isn't the best way to do it, but it works.) I also fixed up the calculation that limits URBs to a sync interval.
Of greater interest, I decided to limit each URB to 6 ms and the total queue to 18 ms, thereby encouraging the driver to use three URBs rather than two. (These values could be increased to 8 and 24, respectively.) This helps to reduce the effects of an underrun, as follows:
With only two URBs, following an underrun the driver would submit URB1 and URB2. Because of the delay, it can easily happen that the last packet of URB1 comes before the isochronous scheduling threshold. The URB gets scheduled, but the hardware never sees it. Consequently there is no completion interrupt until URB2 finishes, at which point the queue is empty, causing a secondary underrun. I actually saw this happen several times during testing.
James, this patch should be tested along with Clemens's original "maxpacket is too large" patch and my "EHCI accepts late iso URBs" patches. It should allow you to go down to ridiculously low parameter values, provided only that the total latency is higher than ~1.2 ms. For example, at 48 KHz this patch should work okay with 8 frames/period and 8 periods/buffer. Of course, larger values will provide greater resilience against underruns.
Alan Stern
Index: usb-3.11/sound/usb/usbaudio.h =================================================================== --- usb-3.11.orig/sound/usb/usbaudio.h +++ usb-3.11/sound/usb/usbaudio.h @@ -55,7 +55,6 @@ struct snd_usb_audio { struct list_head mixer_list; /* list of mixer interfaces */
int setup; /* from the 'device_setup' module param */ - int nrpacks; /* from the 'nrpacks' module param */ bool autoclock; /* from the 'autoclock' module param */
struct usb_host_interface *ctrl_intf; /* the audio control interface */ Index: usb-3.11/sound/usb/card.c =================================================================== --- usb-3.11.orig/sound/usb/card.c +++ usb-3.11/sound/usb/card.c @@ -79,7 +79,6 @@ static bool enable[SNDRV_CARDS] = SNDRV_ /* Vendor/product IDs for this card */ static int vid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; static int pid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; -static int nrpacks = 8; /* max. number of packets per urb */ static int device_setup[SNDRV_CARDS]; /* device parameter for this card */ static bool ignore_ctl_error; static bool autoclock = true; @@ -94,8 +93,6 @@ module_param_array(vid, int, NULL, 0444) MODULE_PARM_DESC(vid, "Vendor ID for the USB audio device."); module_param_array(pid, int, NULL, 0444); MODULE_PARM_DESC(pid, "Product ID for the USB audio device."); -module_param(nrpacks, int, 0644); -MODULE_PARM_DESC(nrpacks, "Max. number of packets per URB."); module_param_array(device_setup, int, NULL, 0444); MODULE_PARM_DESC(device_setup, "Specific device setup (if needed)."); module_param(ignore_ctl_error, bool, 0444); @@ -374,7 +371,6 @@ static int snd_usb_audio_create(struct u chip->dev = dev; chip->card = card; chip->setup = device_setup[idx]; - chip->nrpacks = nrpacks; chip->autoclock = autoclock; chip->probing = 1;
@@ -756,10 +752,6 @@ static struct usb_driver usb_audio_drive
static int __init snd_usb_audio_init(void) { - if (nrpacks < 1 || nrpacks > MAX_PACKS) { - printk(KERN_WARNING "invalid nrpacks value.\n"); - return -EINVAL; - } return usb_register(&usb_audio_driver); }
Index: usb-3.11/sound/usb/endpoint.h =================================================================== --- usb-3.11.orig/sound/usb/endpoint.h +++ usb-3.11/sound/usb/endpoint.h @@ -12,6 +12,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int period_frames, + unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep); Index: usb-3.11/sound/usb/card.h =================================================================== --- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -2,11 +2,11 @@ #define __USBAUDIO_CARD_H
#define MAX_NR_RATES 1024 -#define MAX_PACKS 20 +#define MAX_PACKS 6 /* per URB */ #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 12 #define SYNC_URBS 4 /* always four urbs for sync */ -#define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */ +#define MAX_QUEUE 18 /* try not to exceed this queue length, in ms */
struct audioformat { struct list_head list; @@ -87,6 +87,7 @@ struct snd_usb_endpoint { unsigned int phase; /* phase accumulator */ unsigned int maxpacksize; /* max packet size in bytes */ unsigned int maxframesize; /* max packet size in frames */ + unsigned int max_urb_frames; /* max URB size in frames */ unsigned int curpacksize; /* current packet size in bytes (for capture) */ unsigned int curframesize; /* current packet size in frames (for capture) */ unsigned int syncmaxsize; /* sync endpoint packet size */ @@ -116,6 +117,8 @@ struct snd_usb_substream { unsigned int channels_max; /* max channels in the all audiofmts */ unsigned int cur_rate; /* current rate (for hw_params callback) */ unsigned int period_bytes; /* current period bytes (for hw_params callback) */ + unsigned int period_frames; /* current frames per period */ + unsigned int buffer_periods; /* current periods per buffer */ unsigned int altset_idx; /* USB data format: index of alternate setting */ unsigned int txfr_quirk:1; /* allow sub-frame alignment */ unsigned int fmt_type; /* USB audio format type (1-3) */ @@ -125,6 +128,7 @@ struct snd_usb_substream {
unsigned int hwptr_done; /* processed byte position in the buffer */ unsigned int transfer_done; /* processed frames since last period update */ + unsigned int frame_limit; /* limits number of packets in URB */
/* data and sync endpoints for this stream */ unsigned int ep_num; /* the endpoint number */ Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -562,6 +562,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, subs->channels, subs->period_bytes, + 0, 0, subs->cur_rate, subs->cur_audiofmt, NULL); @@ -598,6 +599,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, sync_fp->channels, sync_period_bytes, + 0, 0, subs->cur_rate, sync_fp, NULL); @@ -620,6 +622,8 @@ static int configure_endpoint(struct snd subs->pcm_format, subs->channels, subs->period_bytes, + subs->period_frames, + subs->buffer_periods, subs->cur_rate, subs->cur_audiofmt, subs->sync_endpoint); @@ -656,6 +660,8 @@ static int snd_usb_hw_params(struct snd_
subs->pcm_format = params_format(hw_params); subs->period_bytes = params_period_bytes(hw_params); + subs->period_frames = params_period_size(hw_params); + subs->buffer_periods = params_periods(hw_params); subs->channels = params_channels(hw_params); subs->cur_rate = params_rate(hw_params);
@@ -1330,6 +1336,7 @@ static void prepare_playback_urb(struct frames = 0; urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); + subs->frame_limit += ep->max_urb_frames; for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) counts = ctx->packet_size[i]; @@ -1344,6 +1351,7 @@ static void prepare_playback_urb(struct subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size; + subs->frame_limit = 0; period_elapsed = 1; if (subs->fmt_type == UAC_FORMAT_TYPE_II) { if (subs->transfer_done > 0) { @@ -1366,8 +1374,10 @@ static void prepare_playback_urb(struct break; } } - if (period_elapsed && - !snd_usb_endpoint_implicit_feedback_sink(subs->data_endpoint)) /* finish at the period boundary */ + /* finish at the period boundary or after enough frames */ + if ((period_elapsed || + subs->transfer_done >= subs->frame_limit) && + !snd_usb_endpoint_implicit_feedback_sink(ep)) break; } bytes = frames * ep->stride; Index: usb-3.11/sound/usb/endpoint.c =================================================================== --- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -571,11 +571,14 @@ static int data_ep_set_params(struct snd snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int frames_per_period, + unsigned int periods_per_buffer, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) { - unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; - int is_playback = usb_pipeout(ep->pipe); + unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb; + unsigned int max_packs_per_period, urbs_per_period, urb_packs; + unsigned int max_urbs, i; int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
if (pcm_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) { @@ -608,67 +611,47 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else - packs_per_ms = 1; - - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - urb_packs = max(ep->chip->nrpacks, 1); - urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); + max_packs_per_urb = MAX_PACKS_HS; } else { - urb_packs = 1; + packs_per_ms = 1; + max_packs_per_urb = MAX_PACKS; } - - urb_packs *= packs_per_ms; - if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) - urb_packs = min(urb_packs, 1U << sync_ep->syncinterval); - - /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; - /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); - /* with sync from device, assume it can be 12% lower */ - if (sync_ep) - minsize -= minsize >> 3; - minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - } - - ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; - } + max_packs_per_urb = min(max_packs_per_urb, + 1U << sync_ep->syncinterval); + max_packs_per_urb = max(1u, max_packs_per_urb >> ep->datainterval); + + /* determine how small a packet can be */ + minsize = (ep->freqn >> (16 - ep->datainterval)) * (frame_bits >> 3); + /* with sync from device, assume it can be 12% lower */ + if (sync_ep) + minsize -= minsize >> 3; + minsize = max(minsize, 1u); + + /* how many packets will contain an entire ALSA period? */ + max_packs_per_period = DIV_ROUND_UP(period_bytes, minsize); + + /* how many URBs contain a period, and how many packets in each? */ + urbs_per_period = DIV_ROUND_UP(max_packs_per_period, max_packs_per_urb); + urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period); + + /* limit the number of frames in a single URB */ + ep->max_urb_frames = DIV_ROUND_UP(frames_per_period, + urbs_per_period); + + /* try to use enough URBs to contain an entire ALSA buffer */ + max_urbs = min((unsigned) MAX_URBS, + MAX_QUEUE * packs_per_ms / urb_packs); + ep->nurbs = min(max_urbs, urbs_per_period * periods_per_buffer);
/* allocate and initialize data urbs */ for (i = 0; i < ep->nurbs; i++) { struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; - u->packets = (i + 1) * total_packs / ep->nurbs - - i * total_packs / ep->nurbs; + u->packets = urb_packs; u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II) @@ -745,6 +728,8 @@ out_of_memory: * @pcm_format: the audio fomat. * @channels: the number of audio channels. * @period_bytes: the number of bytes in one alsa period. + * @period_frames: the number of frames in one alsa period. + * @buffer_periods: the number of periods in one alsa buffer. * @rate: the frame rate. * @fmt: the USB audio format information * @sync_ep: the sync endpoint to use, if any @@ -757,6 +742,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int period_frames, + unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) @@ -790,7 +777,8 @@ int snd_usb_endpoint_set_params(struct s switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep, pcm_format, channels, - period_bytes, fmt, sync_ep); + period_bytes, period_frames, + buffer_periods, fmt, sync_ep); break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep, fmt);
Alan Stern wrote:
All the difficulty arises from the fact that we don't know beforehand how many URBs will constitute an ALSA period since for playback endpoints, the URB sizes can vary. [...] the number of URBs per period is fixed, and the number of packets in each URB is adjusted during playback so that all the URBs end up having roughly the same number of frames. [...] The parameter calculation now ends up being the same for both recording and playback endpoints. This may seem odd, but it follows from the reasoning above.
There is no reasoning about capture endpoints.
The driver cannot control how many samples actually end up in a capture packet, so it is possible that URBs end up being one USB frame longer than a period, in which case the ALSA period interrupts will accumulate delays of up to one period, or that URBs end up being one USB frame shorter than a period, in which case the ALSA period interrupt will be delayed to the end of the _next_ URB.
The current algorithm uses very short capture URBs to ensure that _some_ URB is completed as soon as possible after a period ends.
Your patch makes things worse for running jackd at lower latencies because playback and capture period interrupts are expected to be more or less synchronous (jackd waits for both interrupts before acting on one period). With only two periods per buffer and the capture period interrupt randomly delayed by up to one period, the time available for processing one period is reduced to zero.
I'd suggest to keep the old calculation for capture URBs. It would make sense to use longer capture URBs only if the period size is relatively large.
Note: for capturing, the number of URBs has no effect on latency and just reduces the risk of overruns, so it is desirable to make the queue as long as possible (for a given URB length).
Not having heard any responses to the patch posted last Wednesday,
Sorry, my #patches / free_time quotient is rather large.
Regards, Clemens
On 27.8.2013 09:19, Clemens Ladisch wrote:
The driver cannot control how many samples actually end up in a capture packet,...
Does this reasoning apply to asynchronous playback too? I understand the driver has some control, but has to satisfy the endpoint feedback requests.
Sorry if this is cluelessly offtopic :-)
Regards,
Pavel.
Pavel Hofman wrote:
On 27.8.2013 09:19, Clemens Ladisch wrote:
The driver cannot control how many samples actually end up in a capture packet,...
Does this reasoning apply to asynchronous playback too?
No.
I understand the driver has some control, but has to satisfy the endpoint feedback requests.
When the driver wants to submit a playback URB, it knows how many samples it is copying into the packets. (There is a delay between the device reporting its desired rate and the driver actually using it.)
Regards, Clemens
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
There is no reasoning about capture endpoints.
The driver cannot control how many samples actually end up in a capture packet, so it is possible that URBs end up being one USB frame longer than a period, in which case the ALSA period interrupts will accumulate delays of up to one period, or that URBs end up being one USB frame shorter than a period, in which case the ALSA period interrupt will be delayed to the end of the _next_ URB.
The current algorithm uses very short capture URBs to ensure that _some_ URB is completed as soon as possible after a period ends.
Your patch makes things worse for running jackd at lower latencies because playback and capture period interrupts are expected to be more or less synchronous (jackd waits for both interrupts before acting on one period). With only two periods per buffer and the capture period interrupt randomly delayed by up to one period, the time available for processing one period is reduced to zero.
I'd suggest to keep the old calculation for capture URBs. It would make sense to use longer capture URBs only if the period size is relatively large.
Okay. What about playback endpoints with implicit sync? The current driver treats them the same as capture endpoints. Is that really the right thing to do?
Note: for capturing, the number of URBs has no effect on latency and just reduces the risk of overruns, so it is desirable to make the queue as long as possible (for a given URB length).
Sure. It looks like the only limit that will matter here is MAX_URBS.
Not having heard any responses to the patch posted last Wednesday,
Sorry, my #patches / free_time quotient is rather large.
Me too. And don't forget bug reports. :-)
Alan Stern
Alan Stern wrote:
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
The current algorithm uses very short capture URBs to ensure that _some_ URB is completed as soon as possible after a period ends. [...] I'd suggest to keep the old calculation for capture URBs. It would make sense to use longer capture URBs only if the period size is relatively large.
Okay. What about playback endpoints with implicit sync? The current driver treats them the same as capture endpoints. Is that really the right thing to do?
The only reason to not have an interrupt after each packet is to avoid the interrupt overhead (for both CPU and power); shorter URBs would result in a more precise DMA position reported to user space. If there is already an interrupt for some capture URB (or for the feedback packet in case of explicit feedback), we might as well handle the playback packets so far.
Regards, Clemens
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
Alan Stern wrote:
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
The current algorithm uses very short capture URBs to ensure that _some_ URB is completed as soon as possible after a period ends. [...] I'd suggest to keep the old calculation for capture URBs. It would make sense to use longer capture URBs only if the period size is relatively large.
Okay. What about playback endpoints with implicit sync? The current driver treats them the same as capture endpoints. Is that really the right thing to do?
The only reason to not have an interrupt after each packet is to avoid the interrupt overhead (for both CPU and power); shorter URBs would result in a more precise DMA position reported to user space. If there is already an interrupt for some capture URB (or for the feedback packet in case of explicit feedback), we might as well handle the playback packets so far.
I don't quite understand. Are you saying that playback endpoints with implicit sync may as well use the same values for urb_packs and ep->nurbs as other playback endpoints, rather than using the values calculated for capture endpoints (which is what the current driver does)?
Alan Stern
Alan Stern wrote:
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
Alan Stern wrote:
On Tue, 27 Aug 2013, Clemens Ladisch wrote:
The current algorithm uses very short capture URBs to ensure that _some_ URB is completed as soon as possible after a period ends. [...] I'd suggest to keep the old calculation for capture URBs. It would make sense to use longer capture URBs only if the period size is relatively large.
Okay. What about playback endpoints with implicit sync? The current driver treats them the same as capture endpoints. Is that really the right thing to do?
The only reason to not have an interrupt after each packet is to avoid the interrupt overhead (for both CPU and power); shorter URBs would result in a more precise DMA position reported to user space. If there is already an interrupt for some capture URB (or for the feedback packet in case of explicit feedback), we might as well handle the playback packets so far.
I don't quite understand. Are you saying that playback endpoints with implicit sync may as well use the same values for urb_packs and ep->nurbs as other playback endpoints, rather than using the values calculated for capture endpoints (which is what the current driver does)?
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Regards, Clemens
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
Alan Stern
Index: usb-3.11/sound/usb/usbaudio.h =================================================================== --- usb-3.11.orig/sound/usb/usbaudio.h +++ usb-3.11/sound/usb/usbaudio.h @@ -55,7 +55,6 @@ struct snd_usb_audio { struct list_head mixer_list; /* list of mixer interfaces */
int setup; /* from the 'device_setup' module param */ - int nrpacks; /* from the 'nrpacks' module param */ bool autoclock; /* from the 'autoclock' module param */
struct usb_host_interface *ctrl_intf; /* the audio control interface */ Index: usb-3.11/sound/usb/card.c =================================================================== --- usb-3.11.orig/sound/usb/card.c +++ usb-3.11/sound/usb/card.c @@ -79,7 +79,6 @@ static bool enable[SNDRV_CARDS] = SNDRV_ /* Vendor/product IDs for this card */ static int vid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; static int pid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; -static int nrpacks = 8; /* max. number of packets per urb */ static int device_setup[SNDRV_CARDS]; /* device parameter for this card */ static bool ignore_ctl_error; static bool autoclock = true; @@ -94,8 +93,6 @@ module_param_array(vid, int, NULL, 0444) MODULE_PARM_DESC(vid, "Vendor ID for the USB audio device."); module_param_array(pid, int, NULL, 0444); MODULE_PARM_DESC(pid, "Product ID for the USB audio device."); -module_param(nrpacks, int, 0644); -MODULE_PARM_DESC(nrpacks, "Max. number of packets per URB."); module_param_array(device_setup, int, NULL, 0444); MODULE_PARM_DESC(device_setup, "Specific device setup (if needed)."); module_param(ignore_ctl_error, bool, 0444); @@ -374,7 +371,6 @@ static int snd_usb_audio_create(struct u chip->dev = dev; chip->card = card; chip->setup = device_setup[idx]; - chip->nrpacks = nrpacks; chip->autoclock = autoclock; chip->probing = 1;
@@ -756,10 +752,6 @@ static struct usb_driver usb_audio_drive
static int __init snd_usb_audio_init(void) { - if (nrpacks < 1 || nrpacks > MAX_PACKS) { - printk(KERN_WARNING "invalid nrpacks value.\n"); - return -EINVAL; - } return usb_register(&usb_audio_driver); }
Index: usb-3.11/sound/usb/endpoint.h =================================================================== --- usb-3.11.orig/sound/usb/endpoint.h +++ usb-3.11/sound/usb/endpoint.h @@ -12,6 +12,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int period_frames, + unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep); Index: usb-3.11/sound/usb/card.h =================================================================== --- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -2,11 +2,11 @@ #define __USBAUDIO_CARD_H
#define MAX_NR_RATES 1024 -#define MAX_PACKS 20 +#define MAX_PACKS 6 /* per URB */ #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 12 #define SYNC_URBS 4 /* always four urbs for sync */ -#define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */ +#define MAX_QUEUE 18 /* try not to exceed this queue length, in ms */
struct audioformat { struct list_head list; @@ -87,6 +87,7 @@ struct snd_usb_endpoint { unsigned int phase; /* phase accumulator */ unsigned int maxpacksize; /* max packet size in bytes */ unsigned int maxframesize; /* max packet size in frames */ + unsigned int max_urb_frames; /* max URB size in frames */ unsigned int curpacksize; /* current packet size in bytes (for capture) */ unsigned int curframesize; /* current packet size in frames (for capture) */ unsigned int syncmaxsize; /* sync endpoint packet size */ @@ -116,6 +117,8 @@ struct snd_usb_substream { unsigned int channels_max; /* max channels in the all audiofmts */ unsigned int cur_rate; /* current rate (for hw_params callback) */ unsigned int period_bytes; /* current period bytes (for hw_params callback) */ + unsigned int period_frames; /* current frames per period */ + unsigned int buffer_periods; /* current periods per buffer */ unsigned int altset_idx; /* USB data format: index of alternate setting */ unsigned int txfr_quirk:1; /* allow sub-frame alignment */ unsigned int fmt_type; /* USB audio format type (1-3) */ @@ -125,6 +128,7 @@ struct snd_usb_substream {
unsigned int hwptr_done; /* processed byte position in the buffer */ unsigned int transfer_done; /* processed frames since last period update */ + unsigned int frame_limit; /* limits number of packets in URB */
/* data and sync endpoints for this stream */ unsigned int ep_num; /* the endpoint number */ Index: usb-3.11/sound/usb/pcm.c =================================================================== --- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -562,6 +562,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, subs->channels, subs->period_bytes, + 0, 0, subs->cur_rate, subs->cur_audiofmt, NULL); @@ -598,6 +599,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, sync_fp->channels, sync_period_bytes, + 0, 0, subs->cur_rate, sync_fp, NULL); @@ -620,6 +622,8 @@ static int configure_endpoint(struct snd subs->pcm_format, subs->channels, subs->period_bytes, + subs->period_frames, + subs->buffer_periods, subs->cur_rate, subs->cur_audiofmt, subs->sync_endpoint); @@ -656,6 +660,8 @@ static int snd_usb_hw_params(struct snd_
subs->pcm_format = params_format(hw_params); subs->period_bytes = params_period_bytes(hw_params); + subs->period_frames = params_period_size(hw_params); + subs->buffer_periods = params_periods(hw_params); subs->channels = params_channels(hw_params); subs->cur_rate = params_rate(hw_params);
@@ -1330,6 +1336,7 @@ static void prepare_playback_urb(struct frames = 0; urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags); + subs->frame_limit += ep->max_urb_frames; for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) counts = ctx->packet_size[i]; @@ -1344,6 +1351,7 @@ static void prepare_playback_urb(struct subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size; + subs->frame_limit = 0; period_elapsed = 1; if (subs->fmt_type == UAC_FORMAT_TYPE_II) { if (subs->transfer_done > 0) { @@ -1366,8 +1374,10 @@ static void prepare_playback_urb(struct break; } } - if (period_elapsed && - !snd_usb_endpoint_implicit_feedback_sink(subs->data_endpoint)) /* finish at the period boundary */ + /* finish at the period boundary or after enough frames */ + if ((period_elapsed || + subs->transfer_done >= subs->frame_limit) && + !snd_usb_endpoint_implicit_feedback_sink(ep)) break; } bytes = frames * ep->stride; Index: usb-3.11/sound/usb/endpoint.c =================================================================== --- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -571,11 +571,14 @@ static int data_ep_set_params(struct snd snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int frames_per_period, + unsigned int periods_per_buffer, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) { - unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms; - int is_playback = usb_pipeout(ep->pipe); + unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb; + unsigned int max_packs_per_period, urbs_per_period, urb_packs; + unsigned int max_urbs, i; int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
if (pcm_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) { @@ -608,58 +611,67 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) + if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval; - else - packs_per_ms = 1; - - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - urb_packs = max(ep->chip->nrpacks, 1); - urb_packs = min(urb_packs, (unsigned int) MAX_PACKS); + max_packs_per_urb = MAX_PACKS_HS; } else { - urb_packs = 1; + packs_per_ms = 1; + max_packs_per_urb = MAX_PACKS; } - - urb_packs *= packs_per_ms; - if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep)) - urb_packs = min(urb_packs, 1U << sync_ep->syncinterval); + max_packs_per_urb = min(max_packs_per_urb, + 1U << sync_ep->syncinterval); + max_packs_per_urb = max(1u, max_packs_per_urb >> ep->datainterval); + + /* + * Capture endpoints need to use small URBs because there's no way + * to tell in advance where the next period will end, and we don't + * want the next URB to complete much after the period ends. + * + * Playback endpoints with implicit sync much use the same parameters + * as their corresponding capture endpoint. + */ + if (usb_pipein(ep->pipe) || + snd_usb_endpoint_implicit_feedback_sink(ep)) { + + /* make capture URBs <= 1 ms and smaller than a period */ + urb_packs = min(max_packs_per_urb, packs_per_ms); + while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) + urb_packs >>= 1; + ep->nurbs = MAX_URBS;
- /* decide how many packets to be used */ - if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) { - unsigned int minsize, maxpacks; + /* + * Playback endpoints without implicit sync are adjusted so that + * a period fits as evenly as possible in the smallest number of + * URBs. The total number of URBs is adjusted to the size of the + * ALSA buffer, subject to the MAX_URBS and MAX_QUEUE limits. + */ + } else { /* determine how small a packet can be */ - minsize = (ep->freqn >> (16 - ep->datainterval)) - * (frame_bits >> 3); + minsize = (ep->freqn >> (16 - ep->datainterval)) * + (frame_bits >> 3); /* with sync from device, assume it can be 12% lower */ if (sync_ep) minsize -= minsize >> 3; minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; - /* we need at least two URBs for queueing */ - if (total_packs < 2) { - total_packs = 2; - } else { - /* and we don't want too long a queue either */ - maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2); - total_packs = min(total_packs, maxpacks); - } - } else { - while (urb_packs > 1 && urb_packs * maxsize >= period_bytes) - urb_packs >>= 1; - total_packs = MAX_URBS * urb_packs; - }
- ep->nurbs = (total_packs + urb_packs - 1) / urb_packs; - if (ep->nurbs > MAX_URBS) { - /* too much... */ - ep->nurbs = MAX_URBS; - total_packs = MAX_URBS * urb_packs; - } else if (ep->nurbs < 2) { - /* too little - we need at least two packets - * to ensure contiguous playback/capture - */ - ep->nurbs = 2; + /* how many packets will contain an entire ALSA period? */ + max_packs_per_period = DIV_ROUND_UP(period_bytes, minsize); + + /* how many URBs will contain a period? */ + urbs_per_period = DIV_ROUND_UP(max_packs_per_period, + max_packs_per_urb); + /* how many packets are needed in each URB? */ + urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period); + + /* limit the number of frames in a single URB */ + ep->max_urb_frames = DIV_ROUND_UP(frames_per_period, + urbs_per_period); + + /* try to use enough URBs to contain an entire ALSA buffer */ + max_urbs = min((unsigned) MAX_URBS, + MAX_QUEUE * packs_per_ms / urb_packs); + ep->nurbs = min(max_urbs, urbs_per_period * periods_per_buffer); }
/* allocate and initialize data urbs */ @@ -667,8 +679,7 @@ static int data_ep_set_params(struct snd struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep; - u->packets = (i + 1) * total_packs / ep->nurbs - - i * total_packs / ep->nurbs; + u->packets = urb_packs; u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II) @@ -745,6 +756,8 @@ out_of_memory: * @pcm_format: the audio fomat. * @channels: the number of audio channels. * @period_bytes: the number of bytes in one alsa period. + * @period_frames: the number of frames in one alsa period. + * @buffer_periods: the number of periods in one alsa buffer. * @rate: the frame rate. * @fmt: the USB audio format information * @sync_ep: the sync endpoint to use, if any @@ -757,6 +770,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes, + unsigned int period_frames, + unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep) @@ -790,7 +805,8 @@ int snd_usb_endpoint_set_params(struct s switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep, pcm_format, channels, - period_bytes, fmt, sync_ep); + period_bytes, period_frames, + buffer_periods, fmt, sync_ep); break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep, fmt);
On Wed, Aug 28, 2013 at 7:46 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
Works fine. With this, it is possible to get clear playback at 64 frames/period too. With vanilla 3.10.9, there was some glitchy distortion to the sound at that latency, so this seems to be an improvement.
James
On Thu, 29 Aug 2013, James Stone wrote:
On Wed, Aug 28, 2013 at 7:46 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
Works fine. With this, it is possible to get clear playback at 64 frames/period too. With vanilla 3.10.9, there was some glitchy distortion to the sound at that latency, so this seems to be an improvement.
That's good. What happens if you push frames/period even lower?
Daniel and Eldad, more testing of the most recent proposed patch would be welcome.
Alan Stern
At 32 frames/period (reported round-trip latency 1.33ms), it starts up but there are too many xruns for it to be usable.
James
On Thu, Aug 29, 2013 at 4:00 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Thu, 29 Aug 2013, James Stone wrote:
On Wed, Aug 28, 2013 at 7:46 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
Works fine. With this, it is possible to get clear playback at 64 frames/period too. With vanilla 3.10.9, there was some glitchy distortion to the sound at that latency, so this seems to be an improvement.
That's good. What happens if you push frames/period even lower?
Daniel and Eldad, more testing of the most recent proposed patch would be welcome.
Alan Stern
On Thu, 29 Aug 2013, James Stone wrote:
At 32 frames/period (reported round-trip latency 1.33ms), it starts up but there are too many xruns for it to be usable.
Interesting. Can you try doing the same thing with just playback, no recording? If that still gets underruns, please collect a usbmon trace.
Alan Stern
On Thu, 29 Aug 2013, James Stone wrote:
At 32 frames/period (reported round-trip latency 1.33ms), it starts up but there are too many xruns for it to be usable.
After further testing (off-list), it turns out that 32 frames/period works okay with three periods/buffer instead of two. Not surprising really; with two periods you get only two URBs, and whenever one of them completes, the other isn't long enough to exceed the scheduling threshold.
In summary, it now appears that the most recent patch works better than the current driver.
Daniel and Eldad, more testing of the most recent proposed patch would be welcome.
Yes, please, more testing is needed. I intend to submit the patch when the current merge period ends.
Alan Stern
On Thu, 29 Aug 2013, Alan Stern wrote:
On Thu, 29 Aug 2013, James Stone wrote:
On Wed, Aug 28, 2013 at 7:46 PM, Alan Stern stern@rowland.harvard.edu wrote:
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
Works fine. With this, it is possible to get clear playback at 64 frames/period too. With vanilla 3.10.9, there was some glitchy distortion to the sound at that latency, so this seems to be an improvement.
That's good. What happens if you push frames/period even lower?
Daniel and Eldad, more testing of the most recent proposed patch would be welcome.
Sorry for the long response time but I've been busy lately.
I see a good improvement on my card (M-Audio C400, implicit feedback) - using a 48 frame period (@48kHz) I get no xruns with jack running without clients. Without the patch, the above test generates constant xruns. I tested the patch with mainline 3.11.0, and let jack run about 5 minutes.
Thank you for working on this! FWIW, feel free to add:
Tested-by: Eldad Zack eldad@fogrefinery.com
Cheers, Eldad
On 28.08.2013 20:46, Alan Stern wrote:
On Wed, 28 Aug 2013, Clemens Ladisch wrote:
Sorry, what I said applies more to explicit sync endpoints. When using implicit sync, a playback URB is submitted for each completed capture URB, with the number of samples per packet identical to the corresponding capture packet, so the parameters must be identical.
Got it. Below is an updated patch.
James, the problem you encountered was most likely a result of the faulty treatment of capture endpoints that Clemens pointed out. It was quite obvious in the usbmon traces that the unpatched driver used 8 packets per URB whereas the patched driver used 22. This updated patch should fix that problem.
I gave this patch a try and I can confirm that it results in a sigificant improvement for small sample buffers.
So feel free to add my
Tested-by: Daniel Mack zonque@gmail.com
Thanks, Daniel
Index: usb-3.11/sound/usb/usbaudio.h
--- usb-3.11.orig/sound/usb/usbaudio.h +++ usb-3.11/sound/usb/usbaudio.h @@ -55,7 +55,6 @@ struct snd_usb_audio { struct list_head mixer_list; /* list of mixer interfaces */
int setup; /* from the 'device_setup' module param */
int nrpacks; /* from the 'nrpacks' module param */ bool autoclock; /* from the 'autoclock' module param */
struct usb_host_interface *ctrl_intf; /* the audio control interface */
Index: usb-3.11/sound/usb/card.c
--- usb-3.11.orig/sound/usb/card.c +++ usb-3.11/sound/usb/card.c @@ -79,7 +79,6 @@ static bool enable[SNDRV_CARDS] = SNDRV_ /* Vendor/product IDs for this card */ static int vid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; static int pid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 }; -static int nrpacks = 8; /* max. number of packets per urb */ static int device_setup[SNDRV_CARDS]; /* device parameter for this card */ static bool ignore_ctl_error; static bool autoclock = true; @@ -94,8 +93,6 @@ module_param_array(vid, int, NULL, 0444) MODULE_PARM_DESC(vid, "Vendor ID for the USB audio device."); module_param_array(pid, int, NULL, 0444); MODULE_PARM_DESC(pid, "Product ID for the USB audio device."); -module_param(nrpacks, int, 0644); -MODULE_PARM_DESC(nrpacks, "Max. number of packets per URB."); module_param_array(device_setup, int, NULL, 0444); MODULE_PARM_DESC(device_setup, "Specific device setup (if needed)."); module_param(ignore_ctl_error, bool, 0444); @@ -374,7 +371,6 @@ static int snd_usb_audio_create(struct u chip->dev = dev; chip->card = card; chip->setup = device_setup[idx];
- chip->nrpacks = nrpacks; chip->autoclock = autoclock; chip->probing = 1;
@@ -756,10 +752,6 @@ static struct usb_driver usb_audio_drive
static int __init snd_usb_audio_init(void) {
- if (nrpacks < 1 || nrpacks > MAX_PACKS) {
printk(KERN_WARNING "invalid nrpacks value.\n");
return -EINVAL;
- } return usb_register(&usb_audio_driver);
}
Index: usb-3.11/sound/usb/endpoint.h
--- usb-3.11.orig/sound/usb/endpoint.h +++ usb-3.11/sound/usb/endpoint.h @@ -12,6 +12,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes,
unsigned int period_frames,
unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep);
Index: usb-3.11/sound/usb/card.h
--- usb-3.11.orig/sound/usb/card.h +++ usb-3.11/sound/usb/card.h @@ -2,11 +2,11 @@ #define __USBAUDIO_CARD_H
#define MAX_NR_RATES 1024 -#define MAX_PACKS 20 +#define MAX_PACKS 6 /* per URB */ #define MAX_PACKS_HS (MAX_PACKS * 8) /* in high speed mode */ -#define MAX_URBS 8 +#define MAX_URBS 12 #define SYNC_URBS 4 /* always four urbs for sync */ -#define MAX_QUEUE 24 /* try not to exceed this queue length, in ms */ +#define MAX_QUEUE 18 /* try not to exceed this queue length, in ms */
struct audioformat { struct list_head list; @@ -87,6 +87,7 @@ struct snd_usb_endpoint { unsigned int phase; /* phase accumulator */ unsigned int maxpacksize; /* max packet size in bytes */ unsigned int maxframesize; /* max packet size in frames */
- unsigned int max_urb_frames; /* max URB size in frames */ unsigned int curpacksize; /* current packet size in bytes (for capture) */ unsigned int curframesize; /* current packet size in frames (for capture) */ unsigned int syncmaxsize; /* sync endpoint packet size */
@@ -116,6 +117,8 @@ struct snd_usb_substream { unsigned int channels_max; /* max channels in the all audiofmts */ unsigned int cur_rate; /* current rate (for hw_params callback) */ unsigned int period_bytes; /* current period bytes (for hw_params callback) */
- unsigned int period_frames; /* current frames per period */
- unsigned int buffer_periods; /* current periods per buffer */ unsigned int altset_idx; /* USB data format: index of alternate setting */ unsigned int txfr_quirk:1; /* allow sub-frame alignment */ unsigned int fmt_type; /* USB audio format type (1-3) */
@@ -125,6 +128,7 @@ struct snd_usb_substream {
unsigned int hwptr_done; /* processed byte position in the buffer */ unsigned int transfer_done; /* processed frames since last period update */
unsigned int frame_limit; /* limits number of packets in URB */
/* data and sync endpoints for this stream */ unsigned int ep_num; /* the endpoint number */
Index: usb-3.11/sound/usb/pcm.c
--- usb-3.11.orig/sound/usb/pcm.c +++ usb-3.11/sound/usb/pcm.c @@ -562,6 +562,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, subs->channels, subs->period_bytes,
0, 0, subs->cur_rate, subs->cur_audiofmt, NULL);
@@ -598,6 +599,7 @@ static int configure_sync_endpoint(struc subs->pcm_format, sync_fp->channels, sync_period_bytes,
0, 0, subs->cur_rate, sync_fp, NULL);
@@ -620,6 +622,8 @@ static int configure_endpoint(struct snd subs->pcm_format, subs->channels, subs->period_bytes,
subs->period_frames,
subs->buffer_periods, subs->cur_rate, subs->cur_audiofmt, subs->sync_endpoint);
@@ -656,6 +660,8 @@ static int snd_usb_hw_params(struct snd_
subs->pcm_format = params_format(hw_params); subs->period_bytes = params_period_bytes(hw_params);
- subs->period_frames = params_period_size(hw_params);
- subs->buffer_periods = params_periods(hw_params); subs->channels = params_channels(hw_params); subs->cur_rate = params_rate(hw_params);
@@ -1330,6 +1336,7 @@ static void prepare_playback_urb(struct frames = 0; urb->number_of_packets = 0; spin_lock_irqsave(&subs->lock, flags);
- subs->frame_limit += ep->max_urb_frames; for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) counts = ctx->packet_size[i];
@@ -1344,6 +1351,7 @@ static void prepare_playback_urb(struct subs->transfer_done += counts; if (subs->transfer_done >= runtime->period_size) { subs->transfer_done -= runtime->period_size;
subs->frame_limit = 0; period_elapsed = 1; if (subs->fmt_type == UAC_FORMAT_TYPE_II) { if (subs->transfer_done > 0) {
@@ -1366,8 +1374,10 @@ static void prepare_playback_urb(struct break; } }
if (period_elapsed &&
!snd_usb_endpoint_implicit_feedback_sink(subs->data_endpoint)) /* finish at the period boundary */
/* finish at the period boundary or after enough frames */
if ((period_elapsed ||
subs->transfer_done >= subs->frame_limit) &&
} bytes = frames * ep->stride;!snd_usb_endpoint_implicit_feedback_sink(ep)) break;
Index: usb-3.11/sound/usb/endpoint.c
--- usb-3.11.orig/sound/usb/endpoint.c +++ usb-3.11/sound/usb/endpoint.c @@ -571,11 +571,14 @@ static int data_ep_set_params(struct snd snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes,
unsigned int frames_per_period,
unsigned int periods_per_buffer, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep)
{
- unsigned int maxsize, i, urb_packs, total_packs, packs_per_ms;
- int is_playback = usb_pipeout(ep->pipe);
unsigned int maxsize, minsize, packs_per_ms, max_packs_per_urb;
unsigned int max_packs_per_period, urbs_per_period, urb_packs;
unsigned int max_urbs, i; int frame_bits = snd_pcm_format_physical_width(pcm_format) * channels;
if (pcm_format == SNDRV_PCM_FORMAT_DSD_U16_LE && fmt->dsd_dop) {
@@ -608,58 +611,67 @@ static int data_ep_set_params(struct snd else ep->curpacksize = maxsize;
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL)
- if (snd_usb_get_speed(ep->chip->dev) != USB_SPEED_FULL) { packs_per_ms = 8 >> ep->datainterval;
- else
packs_per_ms = 1;
- if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) {
urb_packs = max(ep->chip->nrpacks, 1);
urb_packs = min(urb_packs, (unsigned int) MAX_PACKS);
} else {max_packs_per_urb = MAX_PACKS_HS;
urb_packs = 1;
packs_per_ms = 1;
}max_packs_per_urb = MAX_PACKS;
- urb_packs *= packs_per_ms;
- if (sync_ep && !snd_usb_endpoint_implicit_feedback_sink(ep))
urb_packs = min(urb_packs, 1U << sync_ep->syncinterval);
max_packs_per_urb = min(max_packs_per_urb,
1U << sync_ep->syncinterval);
- max_packs_per_urb = max(1u, max_packs_per_urb >> ep->datainterval);
- /*
* Capture endpoints need to use small URBs because there's no way
* to tell in advance where the next period will end, and we don't
* want the next URB to complete much after the period ends.
*
* Playback endpoints with implicit sync much use the same parameters
* as their corresponding capture endpoint.
*/
- if (usb_pipein(ep->pipe) ||
snd_usb_endpoint_implicit_feedback_sink(ep)) {
/* make capture URBs <= 1 ms and smaller than a period */
urb_packs = min(max_packs_per_urb, packs_per_ms);
while (urb_packs > 1 && urb_packs * maxsize >= period_bytes)
urb_packs >>= 1;
ep->nurbs = MAX_URBS;
- /* decide how many packets to be used */
- if (is_playback && !snd_usb_endpoint_implicit_feedback_sink(ep)) {
unsigned int minsize, maxpacks;
- /*
* Playback endpoints without implicit sync are adjusted so that
* a period fits as evenly as possible in the smallest number of
* URBs. The total number of URBs is adjusted to the size of the
* ALSA buffer, subject to the MAX_URBS and MAX_QUEUE limits.
*/
- } else { /* determine how small a packet can be */
minsize = (ep->freqn >> (16 - ep->datainterval))
* (frame_bits >> 3);
minsize = (ep->freqn >> (16 - ep->datainterval)) *
/* with sync from device, assume it can be 12% lower */ if (sync_ep) minsize -= minsize >> 3; minsize = max(minsize, 1u);(frame_bits >> 3);
total_packs = (period_bytes + minsize - 1) / minsize;
/* we need at least two URBs for queueing */
if (total_packs < 2) {
total_packs = 2;
} else {
/* and we don't want too long a queue either */
maxpacks = max(MAX_QUEUE * packs_per_ms, urb_packs * 2);
total_packs = min(total_packs, maxpacks);
}
} else {
while (urb_packs > 1 && urb_packs * maxsize >= period_bytes)
urb_packs >>= 1;
total_packs = MAX_URBS * urb_packs;
}
ep->nurbs = (total_packs + urb_packs - 1) / urb_packs;
if (ep->nurbs > MAX_URBS) {
/* too much... */
ep->nurbs = MAX_URBS;
total_packs = MAX_URBS * urb_packs;
} else if (ep->nurbs < 2) {
/* too little - we need at least two packets
* to ensure contiguous playback/capture
*/
ep->nurbs = 2;
/* how many packets will contain an entire ALSA period? */
max_packs_per_period = DIV_ROUND_UP(period_bytes, minsize);
/* how many URBs will contain a period? */
urbs_per_period = DIV_ROUND_UP(max_packs_per_period,
max_packs_per_urb);
/* how many packets are needed in each URB? */
urb_packs = DIV_ROUND_UP(max_packs_per_period, urbs_per_period);
/* limit the number of frames in a single URB */
ep->max_urb_frames = DIV_ROUND_UP(frames_per_period,
urbs_per_period);
/* try to use enough URBs to contain an entire ALSA buffer */
max_urbs = min((unsigned) MAX_URBS,
MAX_QUEUE * packs_per_ms / urb_packs);
ep->nurbs = min(max_urbs, urbs_per_period * periods_per_buffer);
}
/* allocate and initialize data urbs */
@@ -667,8 +679,7 @@ static int data_ep_set_params(struct snd struct snd_urb_ctx *u = &ep->urb[i]; u->index = i; u->ep = ep;
u->packets = (i + 1) * total_packs / ep->nurbs
- i * total_packs / ep->nurbs;
u->packets = urb_packs;
u->buffer_size = maxsize * u->packets;
if (fmt->fmt_type == UAC_FORMAT_TYPE_II)
@@ -745,6 +756,8 @@ out_of_memory:
- @pcm_format: the audio fomat.
- @channels: the number of audio channels.
- @period_bytes: the number of bytes in one alsa period.
- @period_frames: the number of frames in one alsa period.
- @buffer_periods: the number of periods in one alsa buffer.
- @rate: the frame rate.
- @fmt: the USB audio format information
- @sync_ep: the sync endpoint to use, if any
@@ -757,6 +770,8 @@ int snd_usb_endpoint_set_params(struct s snd_pcm_format_t pcm_format, unsigned int channels, unsigned int period_bytes,
unsigned int period_frames,
unsigned int buffer_periods, unsigned int rate, struct audioformat *fmt, struct snd_usb_endpoint *sync_ep)
@@ -790,7 +805,8 @@ int snd_usb_endpoint_set_params(struct s switch (ep->type) { case SND_USB_ENDPOINT_TYPE_DATA: err = data_ep_set_params(ep, pcm_format, channels,
period_bytes, fmt, sync_ep);
period_bytes, period_frames,
break; case SND_USB_ENDPOINT_TYPE_SYNC: err = sync_ep_set_params(ep, fmt);buffer_periods, fmt, sync_ep);
On Mon, 9 Sep 2013, Daniel Mack wrote:
I gave this patch a try and I can confirm that it results in a sigificant improvement for small sample buffers.
So feel free to add my
Tested-by: Daniel Mack zonque@gmail.com
Thanks; I'll include this when the patch is submitted.
Alan Stern
Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
Here's a revised version of the patch (still untested).
- maxsize = ((ep->freqmax + 0xffff) * (frame_bits >> 3))
>> (16 - ep->datainterval);
- maxsize = ((ep->freqmax + 0xffff) >> (16 - ep->datainterval))
* (frame_bits >> 3);
What do you assume prevents firmware writers from splitting frames across packages? The specification? Sanity? :) (see txfr_quirk)
The difference is that this version tries always to keep a period's worth of bytes in the USB hardware queue.
Having truncated URBs is possible only when URBs are shorter than one period, so I think that a queue length of at least two periods, together with a minimum period size, should ensure the isochrounous scheduling threshold.
This isn't tested either.
Regards, Clemens
sound/usb/endpoint.c | 3 ++- sound/usb/pcm.c | 16 ++++++++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-)
--- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -638,7 +638,8 @@ static int data_ep_set_params(struct snd_usb_endpoint *ep, if (sync_ep) minsize -= minsize >> 3; minsize = max(minsize, 1u); - total_packs = (period_bytes + minsize - 1) / minsize; + /* try to make the queue length the same as two periods */ + total_packs = (2 * period_bytes + minsize - 1) / minsize; /* we need at least two URBs for queueing */ if (total_packs < 2) { total_packs = 2; --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1131,10 +1131,22 @@ static int setup_hw_info(struct snd_pcm_runtime *runtime, struct snd_usb_substre return err;
param_period_time_if_needed = SNDRV_PCM_HW_PARAM_PERIOD_TIME; - if (subs->speed == USB_SPEED_FULL) + switch (subs->speed) { + case USB_SPEED_FULL: /* full speed devices have fixed data packet interval */ ptmin = 1000; - if (ptmin == 1000) + break; + case USB_SPEED_HIGH: + /* + * Assume we have an EHCI controller with an Isochronous + * Scheduling Threshold of one frame (8 microframes), and + * add 2 microframes for boundary cases. Increase by 4% + * to have a margin for clock inaccuracies. + */ + ptmin = max(ptmin, (8 + 2) * 130u); + break; + } + if (ptmin >= 1000) /* if period time doesn't go below 1 ms, no rules needed */ param_period_time_if_needed = -1; snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_PERIOD_TIME,
participants (7)
-
Alan Stern
-
Clemens Ladisch
-
Daniel Mack
-
Eldad Zack
-
James Stone
-
Pavel Hofman
-
Takashi Iwai