[alsa-devel] [RFC] AVB - network-based soundcards in ALSA

Takashi Iwai tiwai at suse.de
Tue May 27 16:36:27 CEST 2014


At Mon, 26 May 2014 15:03:52 +0200,
Henrik Austad wrote:
> 
> Hi all!
> 
> This is an RFC for a new class of soundcards. I am not very familiar
> with how ALSA is tied together underneath the hood, so what you see
> here, is based on my naive understanding of ALSA. I wear asbestos
> underwear on a regular basis, so I prefer honesty over sugarcoating :)
> 
> I use "I" and "we" interchangeably. By 'we' I mean a small R&D group at
> Cisco Norway, by "I", I mean.. well, me. So, we plan for AVB, I do the
> kernel side work. We plan to upstream this, given that the community
> accepts it.
> 
> Also, I've used my private address as that is set up to track 
> kernel-related lists, but added my Cisco-address so please keep that on the 
> CC if you reply.
> 
> We have recently begun working on Audio Video Bridging (AVB, [1]) and is
> looking into how this can be added to the Linux Kernel via ALSA and
> video4linux.
> 
> But first; for those of you who are not familiar with AVB:
> 
> In short, AVB is just a set of open standards governing network and
> timing configuration so that you can stream audio and video reliably and
> with low latency. Note that this is not the kind of streaming services
> currently associated with streaming (a few companies distributing movies
> and TV-shows comes to mind; one rhyming with lightsticks). It is the
> kind of streaming you use when connecting a pair of speakers to your
> computer - via ethernet. Or a webcam via the wireless network. (I'm
> aware of the security implications here, but bear with me).
> 
> For the eager reader, AVB is being promoted by AVnu Alliance [2], they
> have a lot of information available. I also added a link to a very short
> intro to AVB that Hans held a few weeks back (focus on the network
> though) in [3]. Then the IEEE 802.1 working group [4] has a few standards,
> but these are probably not that relevant no this list, at least not
> right now.
> 
> For AVB to work, you need support in the networking infrastructure. This
> is not prevalent but it is coming. There are a few manufacturers that
> provide AVB ready equipment and some networking gear.
> 
> What you need of standards for AVB:
> 
> * gPTP support (IEEE 802.1AS), this is an IEEE 1588 (PTP) profile for
>   AVB. This is needed for accurate timestamping of samples, and all
>   nodes in an AVB domain must agree to the _same_ time (not that the
>   _correct_ time is not that important in this setting). .1AS should
>   give you a <1us error between the clocks for the systems involved.
> 
> * Stream Reservation (IEEE 802.1Qat, or 802.1Q:2011 Sec. #35) to make
>   sure we have guaranteed bandwidth. This will avoid dropped etherframes
>   due to congested network. It also caps the amount you can reserve to
>   75% of total BW, making sure AVB can coexist with normal traffic.
> 
> * Traffic Shaping and adminssion control (IEEE 802.1Qav, or 802.1Q:2011
>   Sec. #34) to improve utilization but also avoid/minimize jitter due to
>   queues inside switches/routers/bridges.
> 
> * IEEE 802.1BA, default configuration for AVB devices and what the
>   network looks like.
> 
> * IEEE 1722 (and 1733 for layer-3) Layer 2 Transport for
>   audio/video. The packing is similar to what is done in Firewire. You
>   have 8kHz frame intervals for class A, 4kHz for class B. This gives
>   relatively few samples pr. frame. Currently we only look at Layer 2 as
>   small peripherals (microphones, speakers) will only have to implement
>   L2 instead of the entire IP-stack.
> 
> * IEEE 1722.1 Device discovery protocol (AVDECC) defines how Talkers and
>   Listeners find each other and connect. Any talker will regularly
>   announce its presence, and 1722.1 defines how to announce - and how to
>   respond.
> 
> Of all these standards, the 802.1BA and 1722 are probably the most
> interesting ones. AVnu also has a 'best practice' [5] document that
> gives a outline that serves as a nice starting point.
> 
> Terminology (brief)
> - Bridge: Node in the network with more than 1 port (think switches)
> - End-station: Node in the network with 1 port.
> - Talkers: End-station that produce media (mic, camera)
> - Listeners: End-station that receives from Talkers
> - Streams & Channels: A talker creates a stream through the network to a
>   Listener. Each stream is composed of 1..N channels where each sample
>   is interleaved.
> - An end-station can act as both Talker and Listener.
> - gPTP domain: set of PTP-capable nodes connected (gPTP will not allow
>   non-timeaware nodes in the domain).
> - SRP domain: nodes in a network that supports stream reservation.
> - AVB Domain: intersection of SRP domain and gPTP domain.
> 
> To put it to easier terms, AVB gives you a way to add 'stuff' to your
> computer and play music to them via the network.
> 
> Moving out into ALSA-land and introducing "The plan":
> 
> * A central driver, an "avb_core" if you like. Once loaded it will
>   create a configfs directory and start looking at etherframes to see if
>   anything of interest comes along. This will be present from the start
>   and is required for all the rest to work.
> 
> * An "avb_media_driver" to split data going to ALSA and v4l as well as
>   combining streams coming back. The easiest way is probably to combine
>   snd_avb and the corresponding v4l driver into a single driver, but
>   expose it as "snd_avb" to ALSA (and ditto for v4l2).
> 
> * A userspace tool for tapping into the AVDECC data (for autodiscovery of
>   nodes). Let's call this avdecclib for now (there are a few userspace
>   libraries available on github).
> 
> * ConfigFS [6] is then used by userspace to spawn an new
>   avb_media_driver for each stream we want to connect to.
> 
>   Tree-structure will look something like this
>   mkdir /config/avb/node0;
>   config/
>   └── avb
>       └── node0
>           ├── channels_in
>           ├── channels_out
>           ├── enable
>           └── mac
> 
>   (the number of attributes will have to be adjusted as I figure out
>   what makes sense to have in the configfs item.
> 
>   Writing 1 to enable will then trigger the negotiating phase and wait
>   for the driver to come online. A new ALSA soundcard will then pop into
>   existence, which can then be used as any regular soundcard attached to
>   the computer.
> 
> So, an attempt to bring this to life using state of the art ASCII skills
> 
>                     +----------------------------------------------------+
>                     |                                                    |
>                     |                media application                   |
>                     |                                                    |
>                     +-------+-----------------+--------------------+-----+
>                             |                 |                    |
>                             |                 |                    |
>                     +-------+-----+    +------+------+     +-------+-----+
>                     |             |    |             |     |             |
>                     |  alsalib    |    |   v4l2lib   |     |  avdecclib  |
>                     |             |    |             |     |             |
> userspace           +-------+--- -+    +------+------+     +-------+-----+
> ................            |                 |                    |
> kernelspace                 |                 |                    |
>                     +-------+-----+           |                    |
>                     |             |           |                    |
>                     |  alsa core  |           |                    |
>                     |             |           |                    |
>                     +-------+-----+           |                    |
>                             |                 |                    |
>                     +-------+-----------------+------+             |
>                     |                                |     +-------+-----+
>                     |    snd_avb          v4l2_avb   |     |             |
>                     |                                |     |  ConfigFS   |
>                     |       avb_media_driver         |     |             |
>                     |                                |     +-------+-----+
>                     +-------------+------------+-----+             |
>                                   |            |           +-------+-----+
> +---------------+          +------+------+     |           |             |
> |               +----------+             +-----------------|  avb_config |
> |    time       |          |   avb_core  |     |           |             |
> |               +-----+    |             |     |           +-------------+
> +-------+-------+     |    +------+------+     |
>         |             |           |            |
>         |             |           |            |
> +-------+-------+     |           |            |
> |               |     |           |            |
> |  media_clock  |     |    +------+------+     |
> |               |     |    |             |     |
> +---------------+     +----+     net     +-----+
>                            |             |
>                            +-------------+
> 
> 
> 
> 
> So, why in the kernel and not completely in userspace?
> 
> Primarily because we would like to make it as easy as possible to create
> a Talker or a Listener in an AVB domain. Sure, you would need some kind
> of tool to manage the ConfigFS interface and set up the detailed
> configuration, but once that is done, _any_ program on a standard
> GNU/Linux box can use AVB as if it was a regular soundcard. That is a
> real benefit, and what makes it really exciting.
> 
> It is also a bit difficult to associate a physical location to a
> MAC-address. A userspace tool can be configured to remember this, but
> this is not information that belongs in the kernel. This needs to be
> persistent anyway, so setting 00:00:A4.. to be "L&R Speaker in Henrik's
> Den" doesn't really make sense to compile into the kernel.
> 
> Then there is the notion of security. If the kernel triggers on every
> newly discovered device, it is pretty simple to write a metasploit
> plugin that will bring any AVB enabled Linux box to its knees by just
> flooding the network with Announce-messages. Also, I don't necessarily
> want the stream from my computer to my speakers to be accessed by
> someone (tm) on my network.
> 
> I'd greatly appreciate feedback and comments, especially with regards to
> the rough outline and the usage of ConfigFS and ioctls.
> 
> Stay tuned! Once we have something that doesn't crash and burn in the
> most horrible sense, I'll submit a few patches for people to look at. If
> the interest is high, I'll probably create a public repo that I'll
> update more frequently, but with more of the bleeding-part of the edge.
> 
> Thanks!
> 
> 
> 1) http://en.wikipedia.org/wiki/Audio_Video_Bridging
> 2) http://www.avnu.org/
> 3) http://www.slideshare.net/henrikau/avb-v4l2summit
> 4) http://en.wikipedia.org/wiki/IEEE_802.1
> 5) http://www.avnu.org/knowledge_center
> 6) http://events.linuxfoundation.org/sites/events/files/slides/USB%20Gadget%20Configfs%20API_0.pdf

This reminds me of the talk Pierre gave in LPC at San Diego a couple
of years ago.  Although his topic was more about the audio time
accounting, the framework mentioned at that time would fit with this
scenario?


Takashi


More information about the Alsa-devel mailing list