[alsa-devel] [RFC 2/5] compress: add compress parameter definations

Sat Sep 3 02:05:14 CEST 2011

On Fri, Sep 02, 2011 at 02:26:01PM -0500, Pierre-Louis Bossart wrote:

> > > +/* AUDIO CODECS SUPPORTED */
> > > +#define MAX_NUM_CODECS 32
> > > +#define MAX_NUM_CODEC_DESCRIPTORS 32
> > > +#define MAX_NUM_RATES 32
> > > +#define MAX_NUM_BITRATES 32

> > Can we avoid these limitations?  The limit on the number of CODECs in
> > particular strikes me as not sufficiently high for me to be confident
> > we'd never run into it.  Consider a server side telephony system...

> The MAX_NUM_CODECS is actually the number of formats supported by your
> firmware, it's not related to the number of streams supported in
> parallel on your hardware. We could see support for 8 MP3 decoders, the
> number of codecs would be 1. This was dynamic but we limited it to make
> our life simpler. There's no problem to make it more flexible.

Yeah, I know.  I can't think it'll be a practical issue right now but
it's near enough to actual numbers that it doesn't make me happy seeing
it hard coded into an ABI.  The issue with server side telephony stuff
is that you end up interoperating with all sorts of weird stuff, some of
the PSTN stuff I used to work on would be getting close to this limit
due to some of the funky file formats people liked to do records in.

> We can align the sampling rates to use the exising ALSA definitions.
> The descriptors correspond to the number of variations for a given
> format, we can probably restrict it to 32...

That one is probably reasonable, yes.

> > I'd be inclined to add:

> > +#define SND_AUDIOCODEC_G723_1                  ((__u32) 0x0000000C)
> > +#define SND_AUDIOCODEC_G729                    ((__u32) 0x0000000D)

> > for VoIP usage as part of the default set but obviously it doesn't
> > really matter as it's trivial to add new numbers.

> Yes we can add these codecs, but it's actually extremely difficult to do
> any kind of hw acceleration for VoIP. G723.1 needs extra signaling for
> bad/lost frames, and you may want coupling between jitter buffer
> management, decoding and possibly a time-stretching solution to
> compensate for timing issues or dropped frames. This is difficult to

It's really not that hard, and there's also also the answerphone use
case where you're not dealing with a live VoIP stream but rather the
recorded data from one.  That was actually my main thought here - an
answerphone type thing rather than calls.

The G.723.1 lost frame stuff is generally just totally ignored, I'd be
astonished if anyone ever implements it and I'm not convinced from
memory that there's even a place for it in the RTP encoding.

> implement if the speech encoding/decoding is done on the DSP, while the
> jitter buffer management is done on the host. The data transfers based
> on ringbuffers/DMAs makes it also difficult to handle frames of varying
> sizes while limiting latency.

Yes, you would be using a message based thing if it were live audio
(which might be DMAed obviously but nothing like a single audio stream
with a single buffer) - the sort of thing copy() is good for.  The DMA
ring buffers just don't make much sense with the low volume low latency
traffic a live VoIP call generates.

> I'd rather push RTP packets down to the DSP and have the complete VoIP
> stack handled there.

Better yet, have a network stack on the DSP and never bother the host
with the data in the first place.