[Sound-open-firmware] Mixer doubts

newer
[Sound-open-firmware] For the...

older
[Sound-open-firmware] the SOF...

slawomir.blauciak＠linux.intel.com

17 Jul 2018 17 Jul '18

9:45 a.m.

Hello, everyone.

I’ve been reading through the mixer code and there’s one part which leaves me somewhat doubtful:

/* mix N PCM source streams to one sink stream */ static void mix_n(struct comp_dev *dev, struct comp_buffer *sink, struct comp_buffer **sources, uint32_t num_sources, uint32_t frames) { int32_t *src; int32_t *dest = sink->w_ptr; int32_t count; int64_t val[2]; int i; int j;

count = frames * dev->params.channels;

for (i = 0; i < count; i += 2) { val[0] = 0; val[1] = 0; for (j = 0; j < num_sources; j++) { src = sources[j]->r_ptr;

/* TODO: clamp */ val[0] += src[i]; val[1] += src[i + 1]; }

/* TODO: best place for attenuation ? */ dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1)); } }

More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.

Thank you, Sławomir Błauciak

Show replies by date

Liam Girdwood

17 Jul 17 Jul

12:45 p.m.

On Tue, 2018-07-17 at 08:45 +0200, slawomir.blauciak@linux.intel.com wrote:

...

Hello, everyone.

I’ve been reading through the mixer code and there’s one part which leaves me somewhat doubtful:

/* mix N PCM source streams to one sink stream */ static void mix_n(struct comp_dev *dev, struct comp_buffer *sink, struct comp_buffer **sources, uint32_t num_sources, uint32_t frames) { int32_t *src; int32_t *dest = sink->w_ptr; int32_t count; int64_t val[2]; int i; int j;
   count = frames * dev->params.channels;

   for (i = 0; i < count; i += 2) {
          val[0] = 0;
          val[1] = 0;
          for (j = 0; j < num_sources; j++) {
                 src = sources[j]->r_ptr;

                 /* TODO: clamp */
                 val[0] += src[i];
                 val[1] += src[i + 1];
          }

          /* TODO: best place for attenuation ? */
          dest[i] = (val[0] >> (num_sources >> 1));
          dest[i + 1] = (val[1] >> (num_sources >> 1));
   }
}

More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.

The intention is we have a generic optimised version that is easy to vectorize for 2,4,8 channels (on any architecture). We will need to have a separate generic mixer for 1,3,5,6 etc.

We should also have architecture specific mixing too like we have for xtensa HiFi3 volume.

Liam

...

Thank you, Sławomir Błauciak

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Pierre-Louis Bossart

4:02 p.m.

On 7/17/18 4:45 AM, Liam Girdwood wrote:

...

On Tue, 2018-07-17 at 08:45 +0200, slawomir.blauciak@linux.intel.com wrote:

...
Hello, everyone.

I’ve been reading through the mixer code and there’s one part which leaves me somewhat doubtful:

/* mix N PCM source streams to one sink stream */ static void mix_n(struct comp_dev *dev, struct comp_buffer *sink, struct comp_buffer **sources, uint32_t num_sources, uint32_t frames) { int32_t *src; int32_t *dest = sink->w_ptr; int32_t count; int64_t val[2]; int i; int j;
    count = frames * dev->params.channels;

    for (i = 0; i < count; i += 2) {
           val[0] = 0;
           val[1] = 0;
           for (j = 0; j < num_sources; j++) {
                  src = sources[j]->r_ptr;

                  /* TODO: clamp */
                  val[0] += src[i];
                  val[1] += src[i + 1];
           }

           /* TODO: best place for attenuation ? */
           dest[i] = (val[0] >> (num_sources >> 1));
           dest[i + 1] = (val[1] >> (num_sources >> 1));
    }
}

More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.
The intention is we have a generic optimised version that is easy to vectorize for 2,4,8 channels (on any architecture). We will need to have a separate generic mixer for 1,3,5,6 etc.

We should also have architecture specific mixing too like we have for xtensa HiFi3 volume.

I also don't agree with this right shift. It's well intended and aimed at avoiding saturations as you add more sources into the mix, but I don't think it should be done this way. In general, the only solutions that work well are: 1. make sure an attenuation is applied prior to the mix (i.e. don't compensate for saturations in the mix) 2. implement a soft limiter as part of the mixer

The #1 always win in practice for customer devices due to its simplicity. #2 is typical for pro-audio stuff.

...

Liam

...
Thank you, Sławomir Błauciak

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Seppo Ingalsuo

23 Jul 23 Jul

2:38 p.m.

On 17.07.2018 16:02, Pierre-Louis Bossart wrote:

...

On 7/17/18 4:45 AM, Liam Girdwood wrote:

...
On Tue, 2018-07-17 at 08:45 +0200, slawomir.blauciak@linux.intel.com wrote:

...
More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.

The intention is we have a generic optimised version that is easy to vectorize for 2,4,8 channels (on any architecture). We will need to have a separate generic mixer for 1,3,5,6 etc.

We should also have architecture specific mixing too like we have for xtensa HiFi3 volume.

I also don't agree with this right shift. It's well intended and aimed at avoiding saturations as you add more sources into the mix, but I don't think it should be done this way. In general, the only solutions that work well are:

make sure an attenuation is applied prior to the mix (i.e. don't

compensate for saturations in the mix) 2. implement a soft limiter as part of the mixer

The #1 always win in practice for customer devices due to its simplicity. #2 is typical for pro-audio stuff.

Yep, I proposed earlier this also but it was postponed. We should make sure the mixer uses saturating add operation and set mixer gain to 0 dB for all inputs. It's too tricky for system gains planning to have hidden gain enhancements (and the issue with non 2^N inputs number). We have already #1 enabled if the topology is setup to ensure no signal clipping in mixing.

The #2 with soft limiter as a separate component is in future plans too.

There's also third option to scale the pipelines to have some margin above 0 dB level (e.g. 24 dB) and have soft limiter or saturation only as last processing component in the pipelines. Though it works only with fixed point 32 bit and float pipelines where audio quality is not harmed significantly due to the extra room for the signal.

Thanks, Seppo

...

...
Liam

...
Thank you, Sławomir Błauciak

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

Pierre-Louis Bossart

4:56 p.m.

On 7/23/18 6:38 AM, Seppo Ingalsuo wrote:

...

On 17.07.2018 16:02, Pierre-Louis Bossart wrote:

...
On 7/17/18 4:45 AM, Liam Girdwood wrote:

...
On Tue, 2018-07-17 at 08:45 +0200, slawomir.blauciak@linux.intel.com wrote:

...
More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.

The intention is we have a generic optimised version that is easy to vectorize for 2,4,8 channels (on any architecture). We will need to have a separate generic mixer for 1,3,5,6 etc.

We should also have architecture specific mixing too like we have for xtensa HiFi3 volume.

I also don't agree with this right shift. It's well intended and aimed at avoiding saturations as you add more sources into the mix, but I don't think it should be done this way. In general, the only solutions that work well are:

make sure an attenuation is applied prior to the mix (i.e. don't

compensate for saturations in the mix) 2. implement a soft limiter as part of the mixer

The #1 always win in practice for customer devices due to its simplicity. #2 is typical for pro-audio stuff.

Yep, I proposed earlier this also but it was postponed. We should make sure the mixer uses saturating add operation and set mixer gain to 0 dB for all inputs. It's too tricky for system gains planning to have hidden gain enhancements (and the issue with non 2^N inputs number). We have already #1 enabled if the topology is setup to ensure no signal clipping in mixing.

The #2 with soft limiter as a separate component is in future plans too.

There's also third option to scale the pipelines to have some margin above 0 dB level (e.g. 24 dB) and have soft limiter or saturation only as last processing component in the pipelines. Though it works only with fixed point 32 bit and float pipelines where audio quality is not harmed significantly due to the extra room for the signal.

So there's no one in favor of keeping the existing solution so let's remove this right-shift for now, and refine better plans with a soft limiter for later.

Seppo Ingalsuo

7:55 p.m.

On 23.07.2018 16:56, Pierre-Louis Bossart wrote:

...

On 7/23/18 6:38 AM, Seppo Ingalsuo wrote:

...
On 17.07.2018 16:02, Pierre-Louis Bossart wrote:

...
On 7/17/18 4:45 AM, Liam Girdwood wrote:

...
On Tue, 2018-07-17 at 08:45 +0200, slawomir.blauciak@linux.intel.com wrote:

...
More specifically: dest[i] = (val[0] >> (num_sources >> 1)); dest[i + 1] = (val[1] >> (num_sources >> 1));

Going back in the history of the file I found it was supposed to be an average. However, it will only be an average if num_sources is power of 2, because otherwise such bitwise operations will yield different results from what I take should be an optimized way for division.

For example, suppose we have 2 cases: • We have 2 sources, thus num_sources >> 1 will equal 1, valid right bitshift, same as division by the number of sources, which is 2 • We have 3 sources, thus num_sources >> 1 will equal 1 once again, no longer valid, because it will yield the same result as division by 2, instead of 3.

There’s one more “problem”, by averaging the samples like that we will end up with quieter output for when we have 1 source completely silent. Is that intentional?

Whether it’s all deliberate or not, or simply whether I misunderstood the code, I hope you can help me clear this up.

The intention is we have a generic optimised version that is easy to vectorize for 2,4,8 channels (on any architecture). We will need to have a separate generic mixer for 1,3,5,6 etc.

We should also have architecture specific mixing too like we have for xtensa HiFi3 volume.

I also don't agree with this right shift. It's well intended and aimed at avoiding saturations as you add more sources into the mix, but I don't think it should be done this way. In general, the only solutions that work well are:

make sure an attenuation is applied prior to the mix (i.e. don't

compensate for saturations in the mix) 2. implement a soft limiter as part of the mixer

The #1 always win in practice for customer devices due to its simplicity. #2 is typical for pro-audio stuff.

Yep, I proposed earlier this also but it was postponed. We should make sure the mixer uses saturating add operation and set mixer gain to 0 dB for all inputs. It's too tricky for system gains planning to have hidden gain enhancements (and the issue with non 2^N inputs number). We have already #1 enabled if the topology is setup to ensure no signal clipping in mixing.

The #2 with soft limiter as a separate component is in future plans too.

There's also third option to scale the pipelines to have some margin above 0 dB level (e.g. 24 dB) and have soft limiter or saturation only as last processing component in the pipelines. Though it works only with fixed point 32 bit and float pipelines where audio quality is not harmed significantly due to the extra room for the signal.

So there's no one in favor of keeping the existing solution so let's remove this right-shift for now, and refine better plans with a soft limiter for later.

OK, I will make a pull request!

Thanks, Seppo

...

Sound-open-firmware mailing list Sound-open-firmware@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/sound-open-firmware

2538

Age (days ago)

2544

Last active (days ago)

List overview

Download

5 comments

4 participants

tags (0)

participants (4)

Liam Girdwood
Pierre-Louis Bossart
Seppo Ingalsuo
slawomir.blauciak＠linux.intel.com