At Mon, 12 Aug 2013 10:53:36 -0400 (EDT), Alan Stern wrote:
On Mon, 12 Aug 2013, Takashi Iwai wrote:
Here's what I've got. In turns out the predicting the optimum number of URBs needed is extremely difficult. I decided it would be better to make an overestimate and then to submit URBs as needed, rather than keeping all of them active all the time.
I ended up with this patch (untested). It is certainly incomplete because it doesn't address endpoints with implicit sync. It also suffers from a race between snd_submit_urbs() and deactivate_urbs(): an URB may be resubmitted after it has been deactivated.
What's the reason to clear ep->active_mask at the beginning of snd_complete_urb()?
In order to keep ep->active_mask accurate. snd_complete_urb() might not resubmit the URB.
(In all fairness, I think this race probably exists in the current code too -- there are no spinlocks for synchronization.)
The calls of usb_submit_urb() and usb_unlink_urb() might race indeed. The current code somehow assumed that the USB accepts such concurrent calls.
The USB API does indeed allow concurrent calls of usb_submit_urb() and usb_unlink_urb(). They are serialized by a spinlock in the host controller driver, and if the usb_unlink_urb() call ends up coming first, it will fail.
(Ironically, in the EHCI driver, trying to unlink an isochronous URB doesn't do anything at all. The URB will complete in the usual way, when all its time slots expire.)
deactivate_urbs() just kills the pending urbs and it doesn't change ep->active_mask bit by itself, and the synchronization is done in wait_clear_urbs().
Oh, so that's where ep->active_mask gets used. Why call bitmap_weight()? Why not simply see if the mask value is equal to 0?
Yeah, it can be so. Though, the number of pending urbs is printed in the error message, thus bitmap_weight() is needed there instead.
So, if the concurrent calls of usb_submit_urb() and usb_unlink_urbs() were allowed, it should work as is (in the worst case, the complete will be called once again, but then it goes to exit_clear).
I see. Assuming there's always at least one active URB while the endpoint is running, wait_clear_urbs() should work okay. The patch won't violate this assumption, so it's good.
OK.
So... Clemens, Daniel, Eldad, could you guys review the latest version of Alan's patch? I'd love to sort this out for 3.12.
thanks,
Takashi