[alsa-devel] Zoom R16/24 playback slient
Bringing up an old thread from 2014 since I've gotten access to a Zoom R16 and am keen on getting it to work in full duplex under Linux. Apparently (not yet verified here yet, rebuilding a kernel with the patch applied at the moment), 8 channel capture works with the quirk, but playback doesn't (well, it appears to work, but no sound comes out) and neither does the mixer.
At Sun, 30 Nov 2014 12:08:38 +0200, Panu Matilainen wrote:
On 11/29/2014 09:39 PM, Takashi Iwai wrote:
At Sat, 29 Nov 2014 21:35:04 +0200, Panu Matilainen wrote:
This makes the midi interface and capture work out of the box with R16 (and presumably R24 too but untested). Playback stream would also seem to function fine except for one caveat: no sound is produced, so it is disabled for now.
[ ... ]
In the meanwhile I started looking at the mixer - this is just guessing but maybe its starting with outputs muted which could explain the playback stream running but without sound.
The descriptors for interface 0 are garbage but what is there would seem to map to logical elements by subtype and terminaltype, after those the> > data doesn't seem to add up (eg number of input/output channels missing/wrongly placed etc):
** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -> Header? ** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -> USB streaming input terminal? ** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00 -> Speaker output terminal? ** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -> Microphone input terminal? ** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00 -> USB streaming output terminal?
Anything there that might ring a bell, or other ideas where to start poking at this thing?
Yeah, most of these look almost sane, so any off-by-one firmware issue?
Searching the archives, I could find no further posts on the subject of the R16. Panu, did you do any more work on this?
Where does the above output come from, and where could I find information on what type of data is expected? I assume what's expected is something one would get from an USB mixer device?
In an earlier incarnation of a patch for the R16 (on the 10th of March 2014), there's a discussion between Jason Mancine and Takashi regarding the .format field in the quirks table (http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/074117.html) which didn't seem to reach a conclusion; the problem then seemed to be that even when setting the .format to SNDRV_PCM_FMTBIT_S24_3LE, ALSA still setup up a 32 bit transfer. However, Panu's patch also added a small delay in snd_usb_ctl_msg_quirk(), so that perhaps was the real solution to the alleged format problem? (The final patch is much smaller than in this particular thread, possibly due to the use of QUIRK_ANY_INTERFACE since kernel version >3.11 ?)
/Ricard
On 09/29/2015 11:19 PM, Ricard Wanderlof wrote:
Bringing up an old thread from 2014 since I've gotten access to a Zoom R16 and am keen on getting it to work in full duplex under Linux. Apparently (not yet verified here yet, rebuilding a kernel with the patch applied at the moment), 8 channel capture works with the quirk, but playback doesn't (well, it appears to work, but no sound comes out) and neither does the mixer.
At Sun, 30 Nov 2014 12:08:38 +0200, Panu Matilainen wrote:
On 11/29/2014 09:39 PM, Takashi Iwai wrote:
At Sat, 29 Nov 2014 21:35:04 +0200, Panu Matilainen wrote:
This makes the midi interface and capture work out of the box with R16 (and presumably R24 too but untested). Playback stream would also seem to function fine except for one caveat: no sound is produced, so it is disabled for now.
[ ... ]
In the meanwhile I started looking at the mixer - this is just guessing but maybe its starting with outputs muted which could explain the playback stream running but without sound.
The descriptors for interface 0 are garbage but what is there would seem to map to logical elements by subtype and terminaltype, after those the> > data doesn't seem to add up (eg number of input/output channels missing/wrongly placed etc):
** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -> Header? ** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -> USB streaming input terminal? ** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00 -> Speaker output terminal? ** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -> Microphone input terminal? ** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00 -> USB streaming output terminal?
Anything there that might ring a bell, or other ideas where to start poking at this thing?
Yeah, most of these look almost sane, so any off-by-one firmware issue?
Searching the archives, I could find no further posts on the subject of the R16. Panu, did you do any more work on this?
I did, but haven't gotten the playback to work (otherwise I would've sent a patch, obviously). Last five months have been a series of health issues for me so haven't had any energy left for poking at a stubbornly silent white box :)
Where does the above output come from, and where could I find information on what type of data is expected? I assume what's expected is something one would get from an USB mixer device?
The output is simply 'lsusb -v' plus manually added guesses as to what these unrecognized bits are. The information is "out there", official USB documentation is valuable (http://www.usb.org/developers/docs/devclass_docs/) and looking at lsusb output of other similar products helps figure what might be there.
As for the mixer, I concluded the Zoom does not really have one. The interface 0 is a control interface where a mixer would reside, but there's no actual mixer unit in there.
In an earlier incarnation of a patch for the R16 (on the 10th of March 2014), there's a discussion between Jason Mancine and Takashi regarding the .format field in the quirks table (http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/074117.html) which didn't seem to reach a conclusion; the problem then seemed to be that even when setting the .format to SNDRV_PCM_FMTBIT_S24_3LE, ALSA still setup up a 32 bit transfer. However, Panu's patch also added a small delay in snd_usb_ctl_msg_quirk(), so that perhaps was the real solution to the alleged format problem? (The final patch is much smaller than in this particular thread, possibly due to the use of QUIRK_ANY_INTERFACE since kernel version >3.11 ?)
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC. The delay was a key part, but not enough.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
So perhaps it needs another quirk similar in spirit to the E-Mu:
/* When capture is active * sample rate shouldn't be changed * by playback substream */ if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK) { if (subs->stream->substream[SNDRV_PCM_STREAM_CAPTURE].interface != -1) return;
Or something. I do intend to continue poking at it sooner or later but can't make any promises.
- Panu -
/Ricard
On Wed, 30 Sep 2015, Panu Matilainen wrote:
Where does the above output come from, and where could I find information on what type of data is expected? I assume what's expected is something one would get from an USB mixer device?
The output is simply 'lsusb -v' plus manually added guesses as to what these unrecognized bits are.
Ah, I should have guessed as much... (I was looking in the kernel for some USB debug thing via /sys or similar).
As for the mixer, I concluded the Zoom does not really have one. The interface 0 is a control interface where a mixer would reside, but there's no actual mixer unit in there.
I wonder if any information could be gleaned from running the Windows driver and somehow snooping the USB bus.
There must be some way of enabling the playback stream.
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC.
I think that many devices put 24 bits in the most significant bits of a 32 bit word, hence SNDRV_PCM_FMTBIT_S32 works fine, and the device simply disregards the lowest 8 bits. I don't know about USB, but I've come across SoC devices where it is the lower 24 bits that carry the data, in which case SNDRV_PCM_FMTBIT_S24LE is needed, and then there are devices which really handle the data in 24 bit chunks, using SNDRV_PCM_FMTBIT_S24_3LE.
At any rate this issue seems to be cleared up.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Ok, thanks for that tidbit, I'd read about problems with simulatenous capture and playback in previous threads; I thought perhaps (hopefully...) the only remaining issue was the silent playback.
So perhaps it needs another quirk similar in spirit to the E-Mu:
/* When capture is active * sample rate shouldn't be changed * by playback substream */ if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK) { if
(subs->stream->substream[SNDRV_PCM_STREAM_CAPTURE].interface != -1) return;
So your theory is that reconfiguring the device once opened for one direction is what causes it to hang?
Or something. I do intend to continue poking at it sooner or later but can't make any promises.
I understand. Since I have a setup at home that works now I thought I'd take a look at it and see how far I get.
/Ricard
On 09/30/2015 11:05 AM, Ricard Wanderlof wrote:
On Wed, 30 Sep 2015, Panu Matilainen wrote:
Where does the above output come from, and where could I find information on what type of data is expected? I assume what's expected is something one would get from an USB mixer device?
The output is simply 'lsusb -v' plus manually added guesses as to what these unrecognized bits are.
Ah, I should have guessed as much... (I was looking in the kernel for some USB debug thing via /sys or similar).
As for the mixer, I concluded the Zoom does not really have one. The interface 0 is a control interface where a mixer would reside, but there's no actual mixer unit in there.
I wonder if any information could be gleaned from running the Windows driver and somehow snooping the USB bus.
There must be some way of enabling the playback stream.
I do have an usb dump of Windows initializing the thing and can/will share and/or generate new ones if you/others want to have a look. It hasn't helped me so far, problem being I dont really know what to look for in the first place.
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC.
I think that many devices put 24 bits in the most significant bits of a 32 bit word, hence SNDRV_PCM_FMTBIT_S32 works fine, and the device simply disregards the lowest 8 bits. I don't know about USB, but I've come across SoC devices where it is the lower 24 bits that carry the data, in which case SNDRV_PCM_FMTBIT_S24LE is needed, and then there are devices which really handle the data in 24 bit chunks, using SNDRV_PCM_FMTBIT_S24_3LE.
At any rate this issue seems to be cleared up.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Ok, thanks for that tidbit, I'd read about problems with simulatenous capture and playback in previous threads; I thought perhaps (hopefully...) the only remaining issue was the silent playback.
So perhaps it needs another quirk similar in spirit to the E-Mu:
/* When capture is active * sample rate shouldn't be changed * by playback substream */ if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK) { if
(subs->stream->substream[SNDRV_PCM_STREAM_CAPTURE].interface != -1) return;
So your theory is that reconfiguring the device once opened for one direction is what causes it to hang?
IIRC you start getting hangs accompanied with "cannot get/set freq XXX to ep Y" (or similar) messages when you try to open both. For a single interface at a time, those messages went away by adding the small delay, but adding more delay doesn't help with the case where both are active.
So to me the evidence seems to suggest it only supports a global frequency for the device, instead of per interface. Which, considering what the device is, seems quite reasonable.
Or something. I do intend to continue poking at it sooner or later but can't make any promises.
I understand. Since I have a setup at home that works now I thought I'd take a look at it and see how far I get.
Cool.
- Panu -
/Ricard
On Wed, 30 Sep 2015, Panu Matilainen wrote:
I do have an usb dump of Windows initializing the thing and can/will share and/or generate new ones if you/others want to have a look. It hasn't helped me so far, problem being I dont really know what to look for in the first place.
Although I doubt I'll be able to make more sense of it than you, it would be interesting to have a look at, perhaps it can be useful.
What might also be interesting if you don't already have a dump of it is Windows actually starting a playback stream.
IIRC you start getting hangs accompanied with "cannot get/set freq XXX to ep Y" (or similar) messages when you try to open both. For a single interface at a time, those messages went away by adding the small delay, but adding more delay doesn't help with the case where both are active.
So to me the evidence seems to suggest it only supports a global frequency for the device, instead of per interface. Which, considering what the device is, seems quite reasonable.
Yes, I agree.
/Ricard
On Wed, 30 Sep 2015, Panu Matilainen wrote:
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
I did a small bit of experimenting, and as far as I can tell, playback seems to hang the device so it becomes completely unresponsive and needs to be restarted.
What I did was change QUICK_IGNORE_INTERFACE to QUIRK_AUDIO_STANDARD_INTERFACE for ifnum 1 (playback) in the quirks table.
What happens here is that if I attmempt to aplay a file that is 10.5 seconds long, it actually plays for about 16 seconds, but no sound comes out. If I attempt it again, it takes about 5 seconds, after which I get aplay: set_params:1297: Unable to install hw params: <dump of attempted parameter settings>. There is no error message the first time.
The problem is the same no matter if I have IGNORE_INTERFACE set for ifnum 2 (capture) or not. Setting both ifnum 1 and 2 to AUDIO_STANDARD_INTERFACE still results in capture to work as expected.
Interestingly enough, if I instrument the call to pcm:set_params(), it seems to get called as soon as the device is plugged in, with no ill effects. The sample rate display on the R16 also changes from 96.0 kHz to 44.1 kHz. I don't seem to remember it doing this the first times I plugged it in, but the behavior remains after rebooting my computer, so it doesn't seem to be anything cached or something like that.
The problem doesn't seem to be in set_params(), as I can tell by adding delays that the unit freezes well after calling this function.
/Ricard
On Wed, 30 Sep 2015, Ricard Wanderlof wrote:
I did a small bit of experimenting, and as far as I can tell, playback seems to hang the device so it becomes completely unresponsive and needs to be restarted.
Actually, after a bit more experimenting, the R16 actually wakes up from its unresponsive state after a couple of minutes, at least when playing a small file. After that it seems to behave as before.
I'm learning USB and USB-audio at the same time so progress is slow.
Installed the Zoom Windows driver on a Win7 box, followed by the bundled DAW software (Cubase 6 LE), in order to dump the USB data to see what was going on, using USBPcap.
Captured the corresponding traffic in Linux using usbmon and Wireshark.
I transferred the Windows pcap dumps to Linux for subsequent analysis in Wireshark.
It was getting late so I didn't do more than a cursory look. I noted that the packet format is very different between Windows and Linux, but if that's due to the different capture methods I don't know. For the USBPcap dump each Isochronous frame contains a number of isochronous packets, whereas the dump I do on Linux just lists the payload as 'captured isochronous data' or somesuch. Even basic things such as the packet lengths don't match, but I also noted that USBPcap seems to insert metadata into the frames (to ease decoding?) Got to research that more.
I did conclude however that when playing back to the R16 there seems to be an isochronous packet flow going on with data from both ends of the link (again, I need to read up what really to expect). What is odd is that in Linux, the final packet from the host is a SET CONFIGURATION, and the R16 seems to take about 5 seconds to respond to it. I thought 5 seconds was supposed to be a general timeout when waiting for responses in USB, not the maximum time before a device can respond. But perhaps the R16 thinks it's in a mode where it's expecting more data from the host and thus times out when receiving a request its not expecting and actually goes ahead and processes the same request.
My next step may be to use USBPcap to capture the data in Linux too so I get comparable dumps. And do read up on USB audio documentation...
/RIcard
On 09/30/2015 11:05 AM, Ricard Wanderlof wrote:
On Wed, 30 Sep 2015, Panu Matilainen wrote:
[...]
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Ok, thanks for that tidbit, I'd read about problems with simulatenous capture and playback in previous threads; I thought perhaps (hopefully...) the only remaining issue was the silent playback.
Forgot to clarify this: I actually thought the silent playback was the only remaining issue when submitting the patch, but later realized I only ever had one of the interfaces enabled at a time to avoid any side-effects, and apparently never bothered to check with both since the playback was not really working anyway.
- Panu -
On Wed, 30 Sep 2015, Panu Matilainen wrote:
At Sat, 29 Nov 2014 21:35:04 +0200, Panu Matilainen wrote:
The descriptors for interface 0 are garbage but what is there would seem to map to logical elements by subtype and terminaltype, after those the> > data doesn't seem to add up (eg number of input/output channels missing/wrongly placed etc):
** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -> Header? ** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -> USB streaming input terminal? ** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00 -> Speaker output terminal? ** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -> Microphone input terminal? ** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00 -> USB streaming output terminal?
Anything there that might ring a bell, or other ideas where to start poking at this thing?
Yeah, most of these look almost sane, so any off-by-one firmware issue?
I was looking at this, and comparing it to the corresponding output from my Edirol UA-5, which is a class-complient (in its standard mode) 2 in / 2 out device. The descriptors look suspiciously similar on both devices, especially considering that the R16 has 8 channels of input instead of 2 and a MIDI interface which the UA-5 lacks.
I'm not sure exactly which version of the USB audio standard they correspond to, looking at the specifications for versions 1 and 2 it didn't seem to make sense, but they do happen to correspond very much to the descriptions in the following document:
https://www.silabs.com/Support%20Documents/TechnicalDocs/AN295.pdf
The fact that lsusb doesn't recognize them I would think boils down to the fact that it simply doesn't know about USB audio; it appears that the ALSA USB driver can make sense of the information, which is why the resulting entry in the quirks table doesn't need a FIXED_ENDPOINT quirk.
/Ricard
On 10/06/2015 09:47 AM, Ricard Wanderlof wrote:
On Wed, 30 Sep 2015, Panu Matilainen wrote:
At Sat, 29 Nov 2014 21:35:04 +0200, Panu Matilainen wrote: >
The descriptors for interface 0 are garbage but what is there would seem to map to logical elements by subtype and terminaltype, after those the> > data doesn't seem to add up (eg number of input/output channels missing/wrongly placed etc):
** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -> Header? ** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -> USB streaming input terminal? ** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00 -> Speaker output terminal? ** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -> Microphone input terminal? ** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00 -> USB streaming output terminal?
Anything there that might ring a bell, or other ideas where to start poking at this thing?
Yeah, most of these look almost sane, so any off-by-one firmware issue?
I was looking at this, and comparing it to the corresponding output from my Edirol UA-5, which is a class-complient (in its standard mode) 2 in / 2 out device. The descriptors look suspiciously similar on both devices, especially considering that the R16 has 8 channels of input instead of 2 and a MIDI interface which the UA-5 lacks.
I'm not sure exactly which version of the USB audio standard they correspond to, looking at the specifications for versions 1 and 2 it didn't seem to make sense, but they do happen to correspond very much to the descriptions in the following document:
https://www.silabs.com/Support%20Documents/TechnicalDocs/AN295.pdf
The fact that lsusb doesn't recognize them I would think boils down to the fact that it simply doesn't know about USB audio; it appears that the ALSA USB driver can make sense of the information, which is why the resulting entry in the quirks table doesn't need a FIXED_ENDPOINT quirk.
lsusb knows about USB audio alright, the problem is all these stupid vendors tagging their devices and their interfaces as vendor specific, even if when actually are class compliant. Which is why the quirks-table is needed: QUIRK_AUDIO_STANDARD_INTERFACE just tells the driver to parse the descriptor as a standard audio interface despite it claiming to be vendor specific. But lsusb doesn't have access to the quirks table.
- Panu -
/Ricard
On Tue, 6 Oct 2015, Panu Matilainen wrote:
> At Sat, 29 Nov 2014 21:35:04 +0200, > Panu Matilainen wrote: >> ** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -> Header? ** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -> USB streaming input terminal? ** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00 -> Speaker output terminal? ** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -> Microphone input terminal? ** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00 -> USB streaming output terminal?
lsusb knows about USB audio alright, the problem is all these stupid vendors tagging their devices and their interfaces as vendor specific, even if when actually are class compliant. Which is why the quirks-table is needed: QUIRK_AUDIO_STANDARD_INTERFACE just tells the driver to parse the descriptor as a standard audio interface despite it claiming to be vendor specific. But lsusb doesn't have access to the quirks table.
That makes sense, however, I seem to recall (will double check tonight) that lsusb still output UNRECOGNIZED for the corresponding descriptors even when the UA-5 was set to its class complient mode.
I'd have to do more research, but it would seem that the above decriptors match something other than the USB audio version 1 or 2 specs, whatever that might be.
Part of the decoding I found was: (partly with reference to https://www.silabs.com/Support%20Documents/TechnicalDocs/AN295.pdf)
** UNRECOGNIZED: 0b 24 01 00 01 35 00 03 01 02 03 -- ---- ----- -- -------- | | | | +---- device nos (cap,pb,midi) | | | +---------- no of devices | | +--------------- total descriptors length | +----------------------- bcdADC (0x0001) ? +-------------------------- subtype HEADER
** UNRECOGNIZED: 0c 24 02 05 01 01 00 02 03 00 00 00 -- -- ----- -- ----- | | | | +------ channel placement (L,R) | | | +------------ no of channels | | +-------------------- terminal type? (output) | +------------------------ terminal ID (5) +--------------------------- streaming terminal descr.
** UNRECOGNIZED: 09 24 03 08 01 03 00 05 00
** UNRECOGNIZED: 0c 24 02 09 01 02 00 08 00 00 00 00 -- -- ----- -- ----- | | | | +------ channel placement (none) | | | +------------ no of channels | | +-------------------- terminal type? (input) | +------------------------ terminal ID (9) +--------------------------- streaming terminal descr.
** UNRECOGNIZED: 09 24 03 0c 01 01 00 09 00
and further down in the dump (after the interface descriptor for the output interface)
** UNRECOGNIZED: 07 24 01 05 01 01 00 -- -- -- ----- | | | +-------- format tag (0x0100 = PCM) | | +------------- bDelay | +---------------- terminal link? +------------------- subtype AS_GENERAL
** UNRECOGNIZED: 14 24 02 01 02 04 18 04 44 ac 00 80 bb 00 88 58 01 00 77 01 -- -- -- -- -- -------- -------- -------- -------- | | | | | 44.1 kHz 48 kHz 88.2 kHz 96 kHz | | | | | | | | | +--- no of sample rate specifications | | | +------ bit resolution (24 bits) | | +--------- subframe size (4 bytes) | +------------ no of channels (2) +--------------- format type (TYPE_I)
and for the input interface:
** UNRECOGNIZED: 07 24 01 0c 01 01 00 -- -- -- ----- | | | +-------- format tag (0x0100 = PCM) | | +------------- bDelay | +---------------- terminal link? +------------------- subtype AS_GENERAL
** UNRECOGNIZED: 14 24 02 01 08 04 18 04 44 ac 00 80 bb 00 88 58 01 00 77 01 -- -- -- -- -- -------- -------- -------- -------- | | | | | 44.1 kHz 48 kHz 88.2 kHz 96 kHz | | | | | | | | | +--- no of sample rate specifications | | | +------ bit resolution (24 bits) | | +--------- subframe size (4 bytes) | +------------ no of channels (8) +--------------- format type (TYPE_I)
What doesn't make sense to me here is that certain subtypes seem to be reused for different purposes even with the same descriptor type, for instance descriptor type 24 has subtype 02 being used to specify terminal type and also the data format. Perhaps the length is also a part of the type specifier?
While this could all be vendor specific based on a standard but only loosely following it, when attempting the same interpretation on the corresponding lsusb -v output from the Edirol UA-5, they correspond perfectly, given the differences in the devices (for instance, the UA-5 only runs at a single sample rate governed by a front panel switch, and in its class-complient mode uses 2 bytes per sample with 16 bit resolution, and in its advanced mode uses 3 bytes per sample with 24 bit resolution) so apparently they are based on the same standard, whatever that may be.
Perhaps someone can point in the right direction to some standards document, for now, my main interest is getting the playback streaming going, and since QUIRK_AUDIO_STANDARD_INTERFACE seems to suffice in order to grab the format information, it is apparent that the ALSA USB layer understands the information at least sufficiently for its purpose, so I don't really intend to dive deeper into the descriptor interpretation for now. Most importantly, it confirms that the correct audio format is 24 bits in a 32 bit word.
/Ricard
On Tuesday 06 Oct 2015 10:16:05 Ricard Wanderlof wrote:
What doesn't make sense to me here is that certain subtypes seem to be reused for different purposes even with the same descriptor type, for instance descriptor type 24 has subtype 02 being used to specify terminal type and also the data format. Perhaps the length is also a part of the type specifier?
My reading of it is that the meaning of the CS_INTERFACE subtype is context specific, so it depends where in the descriptor it appears.
Cheers,
Keith
On Wed, 30 Sep 2015, Panu Matilainen wrote:
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC. The delay was a key part, but not enough.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Having dived into this, and looking carefully at the data produced by the Windows driver, it appears that what's happening is that the driver stuffs a 32-bit length specifier at the start of each isochronous data packet.
So, for instance, instead of transferring the sample data
00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 .. .. (40 bytes)
the Zoom driver would send:
28 00 00 00 00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 ... (44 bytes)
No wonder the Zoom gets confused when it gets sent ordinary sample data, and tries to interpret the first sample value as a length.
There doesn't seem to be anything in the ALSA USB driver to do this, and I'm thinking it's Zoom specific (perhaps to overcome some deficiency in the hardware (or USB firmware) in the R16?).
I'm trying to figure out where would be the best place to add a quirk for this. Input from anyone knowledgeable about the ALSA USB system would be very helpful.
At the moment I'm considering adding some additional code to sound/usb/pcm.c: prepare_playback_urb(), governed by a new boolean in struct snd_usb_substream in a similar vain as txfr_quirk in that structure (which in turn is set in some quirk function detecting the R16).
What needs to be done is to add 4 bytes to the length, and adjust the offset accordingly, in urb->iso_frame_desc[i], and then add the additional length descriptor for each packet when copying out the data further down in the same function.
It would be nice to add a foo_quirk() function but since the actual copying of the data needs to be changed, it's not really possible to do efficiently with a separate routine.
/Ricard
On 10/06/2015 10:02 AM, Ricard Wanderlof wrote:
On Wed, 30 Sep 2015, Panu Matilainen wrote:
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC. The delay was a key part, but not enough.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Having dived into this, and looking carefully at the data produced by the Windows driver, it appears that what's happening is that the driver stuffs a 32-bit length specifier at the start of each isochronous data packet.
So, for instance, instead of transferring the sample data
00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 .. .. (40 bytes)
the Zoom driver would send:
28 00 00 00 00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 ... (44 bytes)
Wow, nice detective work! :)
No wonder the Zoom gets confused when it gets sent ordinary sample data, and tries to interpret the first sample value as a length.
Heh, indeed.
There doesn't seem to be anything in the ALSA USB driver to do this, and I'm thinking it's Zoom specific (perhaps to overcome some deficiency in the hardware (or USB firmware) in the R16?).
I'm trying to figure out where would be the best place to add a quirk for this. Input from anyone knowledgeable about the ALSA USB system would be very helpful.
At the moment I'm considering adding some additional code to sound/usb/pcm.c: prepare_playback_urb(), governed by a new boolean in struct snd_usb_substream in a similar vain as txfr_quirk in that structure (which in turn is set in some quirk function detecting the R16).
What needs to be done is to add 4 bytes to the length, and adjust the offset accordingly, in urb->iso_frame_desc[i], and then add the additional length descriptor for each packet when copying out the data further down in the same function.
It would be nice to add a foo_quirk() function but since the actual copying of the data needs to be changed, it's not really possible to do efficiently with a separate routine.
Sounds like a plan to me, but keep in mind I'm just another newbie in all this. Anyway, I wouldn't worry about cleanest possible way at this point, just do a quick-n-dirty hack to see if adding the length is enough to get it working and worry about the rest later. I'll try to have a look at it too as soon as time permits, but meanwhile if more experienced people have better suggestions...
- Panu -
On Tue, 6 Oct 2015, Panu Matilainen wrote:
At the moment I'm considering adding some additional code to sound/usb/pcm.c: prepare_playback_urb(), governed by a new boolean in struct snd_usb_substream in a similar vain as txfr_quirk in that structure (which in turn is set in some quirk function detecting the R16).
What needs to be done is to add 4 bytes to the length, and adjust the offset accordingly, in urb->iso_frame_desc[i], and then add the additional length descriptor for each packet when copying out the data further down in the same function.
It would be nice to add a foo_quirk() function but since the actual copying of the data needs to be changed, it's not really possible to do efficiently with a separate routine.
Sounds like a plan to me, but keep in mind I'm just another newbie in all this. Anyway, I wouldn't worry about cleanest possible way at this point, just do a quick-n-dirty hack to see if adding the length is enough to get it working and worry about the rest later. I'll try to have a look at it too as soon as time permits, but meanwhile if more experienced people have better suggestions...
Yes, a proof-of-concept is needed, and there may be further issues, but I'm hoping for some input too from the 'more experienced people' if this is the right way to go about it before I go to far in the wrong direction.
/Ricard
On Tue, 6 Oct 2015, Panu Matilainen wrote:
Having dived into this, and looking carefully at the data produced by the Windows driver, it appears that what's happening is that the driver stuffs a 32-bit length specifier at the start of each isochronous data packet.
So, for instance, instead of transferring the sample data
00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 .. .. (40 bytes)
the Zoom driver would send:
28 00 00 00 00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 ... (44 bytes)
At the moment I'm considering adding some additional code to sound/usb/pcm.c: prepare_playback_urb(), governed by a new boolean in struct snd_usb_substream in a similar vain as txfr_quirk in that structure (which in turn is set in some quirk function detecting the R16).
What needs to be done is to add 4 bytes to the length, and adjust the offset accordingly, in urb->iso_frame_desc[i], and then add the additional length descriptor for each packet when copying out the data further down in the same function.
It would be nice to add a foo_quirk() function but since the actual copying of the data needs to be changed, it's not really possible to do efficiently with a separate routine.
Sounds like a plan to me, but keep in mind I'm just another newbie in all this. Anyway, I wouldn't worry about cleanest possible way at this point, just do a quick-n-dirty hack to see if adding the length is enough to get it working and worry about the rest later. I'll try to have a look at it too as soon as time permits, but meanwhile if more experienced people have better suggestions...
I tried this out yesterday, with good results. I only had time for limited testing, but I managed to playback a couple of 44.1 kHz files (didn't test anything else) without any hitches. So this is definitely the right way to go. Actually it was a real joy after the R16 has been silent for a week. :-)
Apart from prepare_playback_urb(), I had to make a similar change to endpoint.c:prepare_outound_urb() which outputs silence before there is data to send.
Patch to follow, but there are a couple of things I need to straighten out first:
- Since we're stuffing more data in the outgoing packets, some allowance must be made for this when allocating the urb->transfer_buffer. I've got to review that code so that we don't spill over the end of the allocated buffer in some case. - The best way to trigger the quirk. I'm thinking something along the lines of introducing a QUIRK_AUDIO_ZOOM_INTERFACE which enables the aforementioned bit in order to enable the quirk in prepare_playback_urb() and prepare_output_urb(), and then calls create_standard_audio_quirk() (just as QUIRK_AUDIO_STANDARD_INTERFACE would have resulted in). Then it's clear in quirks-table.h that something special is going on. (Currently I've added the quirk initialization directly in create_standard_audio_quirk(), but it seems wrong to hide it away in there).
And I also need to test some more.
Also, in the dump from the Windows driver I saw some form of sample rate control message being sent that Linux doesn't send, on the other hand, that was sent as part of starting capture, which already has been proven to work, so it might simply not be needed.
/Ricard
On 10/07/2015 10:53 AM, Ricard Wanderlof wrote:
On Tue, 6 Oct 2015, Panu Matilainen wrote:
Having dived into this, and looking carefully at the data produced by the Windows driver, it appears that what's happening is that the driver stuffs a 32-bit length specifier at the start of each isochronous data packet.
So, for instance, instead of transferring the sample data
00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 .. .. (40 bytes)
the Zoom driver would send:
28 00 00 00 00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 ... (44 bytes)
At the moment I'm considering adding some additional code to sound/usb/pcm.c: prepare_playback_urb(), governed by a new boolean in struct snd_usb_substream in a similar vain as txfr_quirk in that structure (which in turn is set in some quirk function detecting the R16).
What needs to be done is to add 4 bytes to the length, and adjust the offset accordingly, in urb->iso_frame_desc[i], and then add the additional length descriptor for each packet when copying out the data further down in the same function.
It would be nice to add a foo_quirk() function but since the actual copying of the data needs to be changed, it's not really possible to do efficiently with a separate routine.
Sounds like a plan to me, but keep in mind I'm just another newbie in all this. Anyway, I wouldn't worry about cleanest possible way at this point, just do a quick-n-dirty hack to see if adding the length is enough to get it working and worry about the rest later. I'll try to have a look at it too as soon as time permits, but meanwhile if more experienced people have better suggestions...
I tried this out yesterday, with good results. I only had time for limited testing, but I managed to playback a couple of 44.1 kHz files (didn't test anything else) without any hitches. So this is definitely the right way to go. Actually it was a real joy after the R16 has been silent for a week. :-)
Awesome! This will make you a hero for many :)
Apart from prepare_playback_urb(), I had to make a similar change to endpoint.c:prepare_outound_urb() which outputs silence before there is data to send.
Patch to follow, but there are a couple of things I need to straighten out first:
- Since we're stuffing more data in the outgoing packets, some allowance must be made for this when allocating the urb->transfer_buffer. I've got to review that code so that we don't spill over the end of the allocated buffer in some case.
- The best way to trigger the quirk. I'm thinking something along the lines of introducing a QUIRK_AUDIO_ZOOM_INTERFACE which enables the aforementioned bit in order to enable the quirk in prepare_playback_urb() and prepare_output_urb(), and then calls create_standard_audio_quirk() (just as QUIRK_AUDIO_STANDARD_INTERFACE would have resulted in). Then it's clear in quirks-table.h that something special is going on. (Currently I've added the quirk initialization directly in create_standard_audio_quirk(), but it seems wrong to hide it away in there).
Clearly the device not following standard here, but then all sorts of other quirks get silently handled behind the scenes of QUIRK_AUDIO_STANDARD_INTERFACE too so... If handled in create_standard_audio_quirk() then the quirks-table entry for the thing could be reduced to just:
.driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) { .ifnum = QUIRK_ANY_INTERFACE, .type = QUIRK_AUTODETECT }
Not that it matters a whole lot I guess.
And I also need to test some more.
I'd be happy to help with testing + extra eyeballs for any work-in-progress patch you have. Feel free to send privately if you feel its not suitable for mass-consumption yet.
Also, in the dump from the Windows driver I saw some form of sample rate control message being sent that Linux doesn't send, on the other hand, that was sent as part of starting capture, which already has been proven to work, so it might simply not be needed.
Right, good to know. While capture appears to work just fine, I wouldn't be surprised if there are glitches left, its probably not being used all that much because of the dead playback (until now of course).
- Panu -
/Ricard
On 10/06/2015 10:02 AM, Ricard Wanderlof wrote:
On Wed, 30 Sep 2015, Panu Matilainen wrote:
My memory is a bit hazy on the details, but the automatically detected SNDRV_PCM_FMTBIT_S32 (which iirc means the actual 24bit values are carried in 32bit chunks) seems to be the right thing for the device, forcing it to something else just makes things worse. IIRC. The delay was a key part, but not enough.
Another finding from my last round in the spring I do remember clearly, is that enabling both the input and output interfaces makes things hang up (timeouts and nothing really works). But if you disable the input interface and enable the output interface, you get an apparently working playback stream. Only no sound comes out. If you disable the output and enable input, capture itself works fine.
Having dived into this, and looking carefully at the data produced by the Windows driver, it appears that what's happening is that the driver stuffs a 32-bit length specifier at the start of each isochronous data packet.
So, for instance, instead of transferring the sample data
00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 .. .. (40 bytes)
the Zoom driver would send:
28 00 00 00 00 12 bf 34 00 98 87 76 00 3c 24 35 00 86 75 64 ... (44 bytes)
No wonder the Zoom gets confused when it gets sent ordinary sample data, and tries to interpret the first sample value as a length.
There doesn't seem to be anything in the ALSA USB driver to do this, and I'm thinking it's Zoom specific (perhaps to overcome some deficiency in the hardware (or USB firmware) in the R16?).
FWIW, while poking around the source I spotted this in sound/usb/midi.c:
/* * Novation USB MIDI protocol: number of data bytes is in the first byte * (when receiving) (+1!) or in the second byte (when sending); data begins * at the third byte. */
Just goes to show such quirks are not previously unheard-of. It's MIDI so the details differ, but perhaps a similar approach could be used for PCM too.
- Panu -
participants (3)
-
Keith A. Milner
-
Panu Matilainen
-
Ricard Wanderlof