[alsa-devel] [RFC] AVB - network-based soundcards in ALSA

Tue May 27 16:36:27 CEST 2014

At Mon, 26 May 2014 15:03:52 +0200,
Henrik Austad wrote:
> 
> Hi all!
> 
> This is an RFC for a new class of soundcards. I am not very familiar
> with how ALSA is tied together underneath the hood, so what you see
> here, is based on my naive understanding of ALSA. I wear asbestos
> underwear on a regular basis, so I prefer honesty over sugarcoating :)
> 
> I use "I" and "we" interchangeably. By 'we' I mean a small R&D group at
> Cisco Norway, by "I", I mean.. well, me. So, we plan for AVB, I do the
> kernel side work. We plan to upstream this, given that the community
> accepts it.
> 
> Also, I've used my private address as that is set up to track 
> kernel-related lists, but added my Cisco-address so please keep that on the 
> CC if you reply.
> 
> We have recently begun working on Audio Video Bridging (AVB, [1]) and is
> looking into how this can be added to the Linux Kernel via ALSA and
> video4linux.
> 
> But first; for those of you who are not familiar with AVB:
> 
> In short, AVB is just a set of open standards governing network and
> timing configuration so that you can stream audio and video reliably and
> with low latency. Note that this is not the kind of streaming services
> currently associated with streaming (a few companies distributing movies
> and TV-shows comes to mind; one rhyming with lightsticks). It is the
> kind of streaming you use when connecting a pair of speakers to your
> computer - via ethernet. Or a webcam via the wireless network. (I'm
> aware of the security implications here, but bear with me).
> 
> For the eager reader, AVB is being promoted by AVnu Alliance [2], they
> have a lot of information available. I also added a link to a very short
> intro to AVB that Hans held a few weeks back (focus on the network
> though) in [3]. Then the IEEE 802.1 working group [4] has a few standards,
> but these are probably not that relevant no this list, at least not
> right now.
> 
> For AVB to work, you need support in the networking infrastructure. This
> is not prevalent but it is coming. There are a few manufacturers that
> provide AVB ready equipment and some networking gear.
> 
> What you need of standards for AVB:
> 
> * gPTP support (IEEE 802.1AS), this is an IEEE 1588 (PTP) profile for
>   AVB. This is needed for accurate timestamping of samples, and all
>   nodes in an AVB domain must agree to the _same_ time (not that the
>   _correct_ time is not that important in this setting). .1AS should
>   give you a <1us error between the clocks for the systems involved.
> 
> * Stream Reservation (IEEE 802.1Qat, or 802.1Q:2011 Sec. #35) to make
>   sure we have guaranteed bandwidth. This will avoid dropped etherframes
>   due to congested network. It also caps the amount you can reserve to
>   75% of total BW, making sure AVB can coexist with normal traffic.
> 
> * Traffic Shaping and adminssion control (IEEE 802.1Qav, or 802.1Q:2011
>   Sec. #34) to improve utilization but also avoid/minimize jitter due to
>   queues inside switches/routers/bridges.
> 
> * IEEE 802.1BA, default configuration for AVB devices and what the
>   network looks like.
> 
> * IEEE 1722 (and 1733 for layer-3) Layer 2 Transport for
>   audio/video. The packing is similar to what is done in Firewire. You
>   have 8kHz frame intervals for class A, 4kHz for class B. This gives
>   relatively few samples pr. frame. Currently we only look at Layer 2 as
>   small peripherals (microphones, speakers) will only have to implement
>   L2 instead of the entire IP-stack.
> 
> * IEEE 1722.1 Device discovery protocol (AVDECC) defines how Talkers and
>   Listeners find each other and connect. Any talker will regularly
>   announce its presence, and 1722.1 defines how to announce - and how to
>   respond.
> 
> Of all these standards, the 802.1BA and 1722 are probably the most
> interesting ones. AVnu also has a 'best practice' [5] document that
> gives a outline that serves as a nice starting point.
> 
> Terminology (brief)
> - Bridge: Node in the network with more than 1 port (think switches)
> - End-station: Node in the network with 1 port.
> - Talkers: End-station that produce media (mic, camera)
> - Listeners: End-station that receives from Talkers
> - Streams & Channels: A talker creates a stream through the network to a
>   Listener. Each stream is composed of 1..N channels where each sample
>   is interleaved.
> - An end-station can act as both Talker and Listener.
> - gPTP domain: set of PTP-capable nodes connected (gPTP will not allow
>   non-timeaware nodes in the domain).
> - SRP domain: nodes in a network that supports stream reservation.
> - AVB Domain: intersection of SRP domain and gPTP domain.
> 
> To put it to easier terms, AVB gives you a way to add 'stuff' to your
> computer and play music to them via the network.
> 
> Moving out into ALSA-land and introducing "The plan":
> 
> * A central driver, an "avb_core" if you like. Once loaded it will
>   create a configfs directory and start looking at etherframes to see if
>   anything of interest comes along. This will be present from the start
>   and is required for all the rest to work.
> 
> * An "avb_media_driver" to split data going to ALSA and v4l as well as
>   combining streams coming back. The easiest way is probably to combine
>   snd_avb and the corresponding v4l driver into a single driver, but
>   expose it as "snd_avb" to ALSA (and ditto for v4l2).
> 
> * A userspace tool for tapping into the AVDECC data (for autodiscovery of
>   nodes). Let's call this avdecclib for now (there are a few userspace
>   libraries available on github).
> 
> * ConfigFS [6] is then used by userspace to spawn an new
>   avb_media_driver for each stream we want to connect to.
> 
>   Tree-structure will look something like this
>   mkdir /config/avb/node0;
>   config/
>   └── avb
>       └── node0
>           ├── channels_in
>           ├── channels_out
>           ├── enable
>           └── mac
> 
>   (the number of attributes will have to be adjusted as I figure out
>   what makes sense to have in the configfs item.
> 
>   Writing 1 to enable will then trigger the negotiating phase and wait
>   for the driver to come online. A new ALSA soundcard will then pop into
>   existence, which can then be used as any regular soundcard attached to
>   the computer.
> 
> So, an attempt to bring this to life using state of the art ASCII skills
> 
>                     +----------------------------------------------------+
>                     |                                                    |
>                     |                media application                   |
>                     |                                                    |
>                     +-------+-----------------+--------------------+-----+
>                             |                 |                    |
>                             |                 |                    |
>                     +-------+-----+    +------+------+     +-------+-----+
>                     |             |    |             |     |             |
>                     |  alsalib    |    |   v4l2lib   |     |  avdecclib  |
>                     |             |    |             |     |             |
> userspace           +-------+--- -+    +------+------+     +-------+-----+
> ................            |                 |                    |
> kernelspace                 |                 |                    |
>                     +-------+-----+           |                    |
>                     |             |           |                    |
>                     |  alsa core  |           |                    |
>                     |             |           |                    |
>                     +-------+-----+           |                    |
>                             |                 |                    |
>                     +-------+-----------------+------+             |
>                     |                                |     +-------+-----+
>                     |    snd_avb          v4l2_avb   |     |             |
>                     |                                |     |  ConfigFS   |
>                     |       avb_media_driver         |     |             |
>                     |                                |     +-------+-----+
>                     +-------------+------------+-----+             |
>                                   |            |           +-------+-----+
> +---------------+          +------+------+     |           |             |
> |               +----------+             +-----------------|  avb_config |
> |    time       |          |   avb_core  |     |           |             |
> |               +-----+    |             |     |           +-------------+
> +-------+-------+     |    +------+------+     |
>         |             |           |            |
>         |             |           |            |
> +-------+-------+     |           |            |
> |               |     |           |            |
> |  media_clock  |     |    +------+------+     |
> |               |     |    |             |     |
> +---------------+     +----+     net     +-----+
>                            |             |
>                            +-------------+
> 
> 
> 
> 
> So, why in the kernel and not completely in userspace?
> 
> Primarily because we would like to make it as easy as possible to create
> a Talker or a Listener in an AVB domain. Sure, you would need some kind
> of tool to manage the ConfigFS interface and set up the detailed
> configuration, but once that is done, _any_ program on a standard
> GNU/Linux box can use AVB as if it was a regular soundcard. That is a
> real benefit, and what makes it really exciting.
> 
> It is also a bit difficult to associate a physical location to a
> MAC-address. A userspace tool can be configured to remember this, but
> this is not information that belongs in the kernel. This needs to be
> persistent anyway, so setting 00:00:A4.. to be "L&R Speaker in Henrik's
> Den" doesn't really make sense to compile into the kernel.
> 
> Then there is the notion of security. If the kernel triggers on every
> newly discovered device, it is pretty simple to write a metasploit
> plugin that will bring any AVB enabled Linux box to its knees by just
> flooding the network with Announce-messages. Also, I don't necessarily
> want the stream from my computer to my speakers to be accessed by
> someone (tm) on my network.
> 
> I'd greatly appreciate feedback and comments, especially with regards to
> the rough outline and the usage of ConfigFS and ioctls.
> 
> Stay tuned! Once we have something that doesn't crash and burn in the
> most horrible sense, I'll submit a few patches for people to look at. If
> the interest is high, I'll probably create a public repo that I'll
> update more frequently, but with more of the bleeding-part of the edge.
> 
> Thanks!
> 
> 
> 1) http://en.wikipedia.org/wiki/Audio_Video_Bridging
> 2) http://www.avnu.org/
> 3) http://www.slideshare.net/henrikau/avb-v4l2summit
> 4) http://en.wikipedia.org/wiki/IEEE_802.1
> 5) http://www.avnu.org/knowledge_center
> 6) http://events.linuxfoundation.org/sites/events/files/slides/USB%20Gadget%20Configfs%20API_0.pdf

This reminds me of the talk Pierre gave in LPC at San Diego a couple
of years ago.  Although his topic was more about the audio time
accounting, the framework mentioned at that time would fit with this
scenario?

Takashi