[alsa-devel] [very-RFC 0/8] TSN driver for the kernel

Sun Jun 19 00:45:50 CEST 2016

On Sat, Jun 18, 2016 at 02:22:13PM +0900, Takashi Sakamoto wrote:
> Hi,

Hi Takashi,

You raise a lot of valid points and questions, I'll try to answer them.

edit: this turned out to be a somewhat lengthy answer. I have tried to 
shorten it down somewhere. it is getting late and I'm getting increasingly 
incoherent (Richard probably knows what I'm talking about ;) so I'll stop 
for now.

Plase post a follow-up with everything that's not clear!
Thanks!

> Sorry to be late. In this weekday, I have little time for this thread
> because working for alsa-lib[1]. Besides, I'm not full-time developer
> for this kind of work. In short, I use my limited private time for this
> discussion.

Thank you for taking the time to reply to this thread then, it is much 
appreciated

> On Jun 15 2016 17:06, Richard Cochran wrote:
> > On Wed, Jun 15, 2016 at 12:15:24PM +0900, Takashi Sakamoto wrote:
> >>> On Mon, Jun 13, 2016 at 01:47:13PM +0200, Richard Cochran wrote:
> >>>> I have seen audio PLL/multiplier chips that will take, for example, a
> >>>> 10 kHz input and produce your 48 kHz media clock.  With the right HW
> >>>> design, you can tell your PTP Hardware Clock to produce a 10000 PPS,
> >>>> and you will have a synchronized AVB endpoint.  The software is all
> >>>> there already.  Somebody should tell the ALSA guys about it.
> >>
> >> Just from my curiosity, could I ask you more explanation for it in ALSA
> >> side?
> > 
> > (Disclaimer: I really don't know too much about ALSA, expect that is
> > fairly big and complex ;)
> 
> In this morning, I read IEEE 1722:2011 and realized that it quite
> roughly refers to IEC 61883-1/6 and includes much ambiguities to end
> applications.

As far as I know, 1722 aims to describe how the data is wrapped in AVTPDU 
(and likewise for control-data), not how the end-station should implement 
it.

If there are ambiguities, would you mind listing a few? It would serve as a 
useful guide as to look for other pitfalls as well (thanks!)

> (In my opinion, the author just focuses on packet with timestamps,
> without enough considering about how to implement endpoint applications
> which perform semi-real sampling, fetching and queueing and so on, so as
> you. They're satisfied just by handling packet with timestamp, without
> enough consideration about actual hardware/software applications.)

You are correct, none of the standards explain exactly how it should be 
implemented, only what the end result should look like. One target of this 
collection of standards are embedded, dedicated AV equipment and the 
authors have no way of knowing (nor should they care I think) the 
underlying architecture of these.

> > Here is what I think ALSA should provide:
> > 
> > - The DA and AD clocks should appear as attributes of the HW device.

This would be very useful and helpful when determining if the clock of the 
HW time is falling behind or racing ahead of the gPTP time domain. It will 
also help finding the capture time or calculating when a sample in the 
buffer will be played back by the device.

> > - There should be a method for measuring the DA/AD clock rate with
> >   respect to both the system time and the PTP Hardware Clock (PHC)
> >   time.

as above.

> > - There should be a method for adjusting the DA/AD clock rate if
> >   possible.  If not, then ALSA should fall back to sample rate
> >   conversion.

This is not a requirement from the standard, but will help avoid costly 
resampling. At least it should be possible to detect the *need* for 
resampling so that we can try to avoid underruns.

> > - There should be a method to determine the time delay from the point
> >   when the audio data are enqueued into ALSA until they pass through
> >   the D/A converter.  If this cannot be known precisely, then the
> >   library should provide an estimate with an error bound.
> > 
> > - I think some AVB use cases will need to know the time delay from A/D
> >   until the data are available to the local application.  (Distributed
> >   microphones?  I'm not too sure about that.)

yes, if you have multiple microphones that you want to combine into a 
stream and do signal processing, some cases require sample-sync (so within 
1 us accuracy for 48kHz).

> > - If the DA/AD clocks are connected to other clock devices in HW,
> >   there should be a way to find this out in SW.  For example, if SW
> >   can see the PTP-PHC-PLL-DA relationship from the above example, then
> >   it knows how to synchronize the DA clock using the network.
> > 
> >   [ Implementing this point involves other subsystems beyond ALSA.  It
> >     isn't really necessary for people designing AVB systems, since
> >     they know their designs, but it would be nice to have for writing
> >     generic applications that can deal with any kind of HW setup. ]
> 
> Depends on which subsystem decides "AVTP presentation time"[3]. 

Presentation time is either set by
a) Local sound card performing capture (in which case it will be 'capture 
   time')
b) Local media application sending a stream accross the network 
   (time when the sample should be played out remotely)
c) Remote media application streaming data *to* host, in which case it will 
   be local presentation time on local  soundcard

> This value is dominant to the number of events included in an IEC 61883-1 
> packet. If this TSN subsystem decides it, most of these items don't need 
> to be in ALSA.

Not sure if I understand this correctly.

TSN should have a reference to the timing-domain of each *local* 
sound-device (for local capture or playback) as well as the shared 
time-reference provided by gPTP.

Unless an End-station acts as GrandMaster for the gPTP-domain, time set 
forth by gPTP is inmutable and cannot be adjusted. It follows that the 
sample-frequency of the local audio-devices must be adjusted, or the 
audio-streams to/from said devices must be resampled.

> As long as I know, the number of AVTPDU per second seems not to be
> fixed. So each application is not allowed to calculate the timestamp by
> its own way unless TSN implementation gives the information to each
> applications.

Before initiating a stream, an application needs to reserve a path and 
bandwidth through the network. Every bridge (switch/router) must accept 
this for the stream-allocation to succeed. If a single bridge along the way 
declies, the entire stream is denied. The StreamID combined with traffic 
class and destination address is used to uniquely identify the stream.

Once ready, frames leaving the End-station with the same StreamID will be 
forwarded through the bridges to the End-station(s).

If you choose to transmit *less* than the bandwidth you reserved, that is 
fine, but you cannot transmit *more*.

As to timestamps. When a talker transmit a frame, the timestamp in the 
AVTPDU describes the presentation-time.

1) The Talker is a mic, and the timestamp will then be the capture-time 
   of the sample.
2) For a Listener, the timestamp will be the presentation-time, 
   the time when the *first* sample in the sample-set should be played (or 
   aligned in an offline format with other samples).

The application should be part of the same gPTP-domain as all the other 
nodes in the domain, and all the nodes share a common sense of time. That 
means that time X will be the exact same time (or, within a sub-microsecond 
error) for all the nodes in the same domain.

> For your information, in current ALSA implementation of IEC 61883-1/6 on
> IEEE 1394 bus, the presentation timestamp is decided in ALSA side. The
> number of isochronous packet transmitted per second is fixed by 8,000 in
> IEEE 1394, and the number of data blocks in an IEC 61883-1 packet is
> deterministic according to 'sampling transfer frequency' in IEC 61883-6
> and isochronous cycle count passed from Linux FireWire subsystem.

For an audio-stream, it will be very similar. The difference is the split 
between class A and class B, the former is 8kHz frame-rate and a guaranteed 
2ms latency accross the network (think required buffering at end-stations), 
class B is 4kHz and a 50ms max latency. Class B is used for links 
traversing 1 or 2 wireless links.

If you look at the avb-shim in the series, you see that for 48kHz, 2ch, 
S16_LE, every frame is of the same size, 6 samples per frame, total of 24 
bytes / frame. For class B, size doubles to 48 bytes as it transmits frames 
4000 times / sec.

The 44.1 part is a bit more painful/messy/horrible, but is doable because 
the stream-reservation only gives an *upper* bound of bandwidth.

> In the TSN subsystem, like FireWire subsystem, callback for filling
> payload should have information of 'when the packet is scheduled to be
> transmitted'. 

[ Given that you are part of a gPTP domain and that you share a common 
  sense of what time it is *now* with all the other devices ]

A frame should be transmittet so that it will not arrive too late for it to 
be presented. A class A link guarantees that a frame will be delivered 
within 2ms. Then, by looking at the timestamp, you subtract the 
delivery-time and you get when the frame should be sent at the latest.

> With the information, each application can calculate the
> number of event in the packet and presentation timestamp. Of cource,
> this timestamp should be handled as 'avtp_timestamp' in packet queueing.

Not sure if I understand what you are asking, but I think maybe I've 
answered this above (re. 48kHz, 44.1khz and upper bound of framesize?)

> >> In ALSA, sampling rate conversion should be in userspace, not in kernel
> >> land. In alsa-lib, sampling rate conversion is implemented in shared object.
> >> When userspace applications start playbacking/capturing, depending on PCM
> >> node to access, these applications load the shared object and convert PCM
> >> frames from buffer in userspace to mmapped DMA-buffer, then commit them.
> > 
> > The AVB use case places an additional requirement on the rate
> > conversion.  You will need to adjust the frequency on the fly, as the
> > stream is playing.  I would guess that ALSA doesn't have that option?
> 
> In ALSA kernel/userspace interfaces , the specification cannot be
> supported, at all.
> 
> Please explain about this requirement, where it comes from, which
> specification and clause describe it (802.1AS or 802.1Q?). As long as I
> read IEEE 1722, I cannot find such a requirement.

1722 only describes how the L2 frames are constructed and transmittet. You 
are correct that it does not mention adjustable clocks there.

- 802.1BA gives an overview of AVB

- 802.1Q-2011 Sec 34 and 35 describes forwarding and queueing and Stream 
  Reservation (basically what the network needs in order to correctly 
  prioritize TSN streams)

- 802.1AS-2011 (gPTP) describes the timing in great detail (from a PTP 
  point of vew) and describes in more detail how the clocks should be 
  syntonized (802.1AS-2011, 7.3.3).

Since the clock that drives the sample-rate for the DA/AD must be 
controlled by the shared clock, the fact that gPTP can adjust the time 
means that the DA/AD circuit needs to be adjustable as well.

note that an adjustable sample-clock is not a *requirement* but in general 
you'd want to avoid resampling in software.

> (When considering about actual hardware codecs, on-board serial bus such
> as Inter-IC Sound, corresponding controller, immediate change of
> sampling rate is something imaginary for semi-realtime applications. And
> the idea has no meaning for typical playback/capture softwares.)

Yes, and no. When you play back a stored file to your soundcard, data is 
pulled by the card from memory. So you only have a single timing-domain to 
worry about. So I'd say the idea has meaning in normal scenarios as well, 
you don't have to worry about it.

When you send a stream accross the network, you cannot let the Listener 
pull data from you, you have to have some common sense of time in order to 
send just enough data, and that is why the gPTP domain is so important.

802.1Q gives you low latency through the network, but more importantly, no 
dropped frames. gPTP gives you a central reference to time.

> [1] [alsa-lib][PATCH 0/9 v3] ctl: add APIs for control element set
> http://mailman.alsa-project.org/pipermail/alsa-devel/2016-June/109274.html
> [2] IEEE 1722-2011
> http://ieeexplore.ieee.org/servlet/opac?punumber=5764873
> [3] 5.5 Timing and Synchronization
> op. cit.
> [4] 1394 Open Host Controller Interface Specification
> http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/ohci_11.pdf

I hope this cleared some of the questions

-- 
Henrik Austad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://mailman.alsa-project.org/pipermail/alsa-devel/attachments/20160619/124c81f2/attachment.sig>