[alsa-devel] [RFC] AVB - network-based soundcards in ALSA

Mon May 26 15:03:52 CEST 2014

Hi all!

This is an RFC for a new class of soundcards. I am not very familiar
with how ALSA is tied together underneath the hood, so what you see
here, is based on my naive understanding of ALSA. I wear asbestos
underwear on a regular basis, so I prefer honesty over sugarcoating :)

I use "I" and "we" interchangeably. By 'we' I mean a small R&D group at
Cisco Norway, by "I", I mean.. well, me. So, we plan for AVB, I do the
kernel side work. We plan to upstream this, given that the community
accepts it.

Also, I've used my private address as that is set up to track 
kernel-related lists, but added my Cisco-address so please keep that on the 
CC if you reply.

We have recently begun working on Audio Video Bridging (AVB, [1]) and is
looking into how this can be added to the Linux Kernel via ALSA and
video4linux.

But first; for those of you who are not familiar with AVB:

In short, AVB is just a set of open standards governing network and
timing configuration so that you can stream audio and video reliably and
with low latency. Note that this is not the kind of streaming services
currently associated with streaming (a few companies distributing movies
and TV-shows comes to mind; one rhyming with lightsticks). It is the
kind of streaming you use when connecting a pair of speakers to your
computer - via ethernet. Or a webcam via the wireless network. (I'm
aware of the security implications here, but bear with me).

For the eager reader, AVB is being promoted by AVnu Alliance [2], they
have a lot of information available. I also added a link to a very short
intro to AVB that Hans held a few weeks back (focus on the network
though) in [3]. Then the IEEE 802.1 working group [4] has a few standards,
but these are probably not that relevant no this list, at least not
right now.

For AVB to work, you need support in the networking infrastructure. This
is not prevalent but it is coming. There are a few manufacturers that
provide AVB ready equipment and some networking gear.

What you need of standards for AVB:

* gPTP support (IEEE 802.1AS), this is an IEEE 1588 (PTP) profile for
  AVB. This is needed for accurate timestamping of samples, and all
  nodes in an AVB domain must agree to the _same_ time (not that the
  _correct_ time is not that important in this setting). .1AS should
  give you a <1us error between the clocks for the systems involved.

* Stream Reservation (IEEE 802.1Qat, or 802.1Q:2011 Sec. #35) to make
  sure we have guaranteed bandwidth. This will avoid dropped etherframes
  due to congested network. It also caps the amount you can reserve to
  75% of total BW, making sure AVB can coexist with normal traffic.

* Traffic Shaping and adminssion control (IEEE 802.1Qav, or 802.1Q:2011
  Sec. #34) to improve utilization but also avoid/minimize jitter due to
  queues inside switches/routers/bridges.

* IEEE 802.1BA, default configuration for AVB devices and what the
  network looks like.

* IEEE 1722 (and 1733 for layer-3) Layer 2 Transport for
  audio/video. The packing is similar to what is done in Firewire. You
  have 8kHz frame intervals for class A, 4kHz for class B. This gives
  relatively few samples pr. frame. Currently we only look at Layer 2 as
  small peripherals (microphones, speakers) will only have to implement
  L2 instead of the entire IP-stack.

* IEEE 1722.1 Device discovery protocol (AVDECC) defines how Talkers and
  Listeners find each other and connect. Any talker will regularly
  announce its presence, and 1722.1 defines how to announce - and how to
  respond.

Of all these standards, the 802.1BA and 1722 are probably the most
interesting ones. AVnu also has a 'best practice' [5] document that
gives a outline that serves as a nice starting point.

Terminology (brief)
- Bridge: Node in the network with more than 1 port (think switches)
- End-station: Node in the network with 1 port.
- Talkers: End-station that produce media (mic, camera)
- Listeners: End-station that receives from Talkers
- Streams & Channels: A talker creates a stream through the network to a
  Listener. Each stream is composed of 1..N channels where each sample
  is interleaved.
- An end-station can act as both Talker and Listener.
- gPTP domain: set of PTP-capable nodes connected (gPTP will not allow
  non-timeaware nodes in the domain).
- SRP domain: nodes in a network that supports stream reservation.
- AVB Domain: intersection of SRP domain and gPTP domain.

To put it to easier terms, AVB gives you a way to add 'stuff' to your
computer and play music to them via the network.

Moving out into ALSA-land and introducing "The plan":

* A central driver, an "avb_core" if you like. Once loaded it will
  create a configfs directory and start looking at etherframes to see if
  anything of interest comes along. This will be present from the start
  and is required for all the rest to work.

* An "avb_media_driver" to split data going to ALSA and v4l as well as
  combining streams coming back. The easiest way is probably to combine
  snd_avb and the corresponding v4l driver into a single driver, but
  expose it as "snd_avb" to ALSA (and ditto for v4l2).

* A userspace tool for tapping into the AVDECC data (for autodiscovery of
  nodes). Let's call this avdecclib for now (there are a few userspace
  libraries available on github).

* ConfigFS [6] is then used by userspace to spawn an new
  avb_media_driver for each stream we want to connect to.

  Tree-structure will look something like this
  mkdir /config/avb/node0;
  config/
  └── avb
      └── node0
          ├── channels_in
          ├── channels_out
          ├── enable
          └── mac

  (the number of attributes will have to be adjusted as I figure out
  what makes sense to have in the configfs item.

  Writing 1 to enable will then trigger the negotiating phase and wait
  for the driver to come online. A new ALSA soundcard will then pop into
  existence, which can then be used as any regular soundcard attached to
  the computer.

So, an attempt to bring this to life using state of the art ASCII skills

                    +----------------------------------------------------+
                    |                                                    |
                    |                media application                   |
                    |                                                    |
                    +-------+-----------------+--------------------+-----+
                            |                 |                    |
                            |                 |                    |
                    +-------+-----+    +------+------+     +-------+-----+
                    |             |    |             |     |             |
                    |  alsalib    |    |   v4l2lib   |     |  avdecclib  |
                    |             |    |             |     |             |
userspace           +-------+--- -+    +------+------+     +-------+-----+
................            |                 |                    |
kernelspace                 |                 |                    |
                    +-------+-----+           |                    |
                    |             |           |                    |
                    |  alsa core  |           |                    |
                    |             |           |                    |
                    +-------+-----+           |                    |
                            |                 |                    |
                    +-------+-----------------+------+             |
                    |                                |     +-------+-----+
                    |    snd_avb          v4l2_avb   |     |             |
                    |                                |     |  ConfigFS   |
                    |       avb_media_driver         |     |             |
                    |                                |     +-------+-----+
                    +-------------+------------+-----+             |
                                  |            |           +-------+-----+
+---------------+          +------+------+     |           |             |
|               +----------+             +-----------------|  avb_config |
|    time       |          |   avb_core  |     |           |             |
|               +-----+    |             |     |           +-------------+
+-------+-------+     |    +------+------+     |
        |             |           |            |
        |             |           |            |
+-------+-------+     |           |            |
|               |     |           |            |
|  media_clock  |     |    +------+------+     |
|               |     |    |             |     |
+---------------+     +----+     net     +-----+
                           |             |
                           +-------------+

So, why in the kernel and not completely in userspace?

Primarily because we would like to make it as easy as possible to create
a Talker or a Listener in an AVB domain. Sure, you would need some kind
of tool to manage the ConfigFS interface and set up the detailed
configuration, but once that is done, _any_ program on a standard
GNU/Linux box can use AVB as if it was a regular soundcard. That is a
real benefit, and what makes it really exciting.

It is also a bit difficult to associate a physical location to a
MAC-address. A userspace tool can be configured to remember this, but
this is not information that belongs in the kernel. This needs to be
persistent anyway, so setting 00:00:A4.. to be "L&R Speaker in Henrik's
Den" doesn't really make sense to compile into the kernel.

Then there is the notion of security. If the kernel triggers on every
newly discovered device, it is pretty simple to write a metasploit
plugin that will bring any AVB enabled Linux box to its knees by just
flooding the network with Announce-messages. Also, I don't necessarily
want the stream from my computer to my speakers to be accessed by
someone (tm) on my network.

I'd greatly appreciate feedback and comments, especially with regards to
the rough outline and the usage of ConfigFS and ioctls.

Stay tuned! Once we have something that doesn't crash and burn in the
most horrible sense, I'll submit a few patches for people to look at. If
the interest is high, I'll probably create a public repo that I'll
update more frequently, but with more of the bleeding-part of the edge.

Thanks!

1) http://en.wikipedia.org/wiki/Audio_Video_Bridging
2) http://www.avnu.org/
3) http://www.slideshare.net/henrikau/avb-v4l2summit
4) http://en.wikipedia.org/wiki/IEEE_802.1
5) http://www.avnu.org/knowledge_center
6) http://events.linuxfoundation.org/sites/events/files/slides/USB%20Gadget%20Configfs%20API_0.pdf

-- 
Henrik Austad