[PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed

Richard Fitzgerald rf at opensource.cirrus.com
Tue Sep 13 17:30:07 CEST 2022


On 12/09/2022 18:09, Pierre-Louis Bossart wrote:
> 
> 
> On 9/12/22 14:25, Richard Fitzgerald wrote:
>> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>>> Only exit sdw_handle_slave_status() right after calling
>>>> sdw_program_device_num() if it actually programmed an ID into at
>>>> least one device.
>>>>
>>>> sdw_handle_slave_status() should protect itself against phantom
>>>> device #0 ATTACHED indications. In that case there is no actual
>>>> device still on #0. The early exit relies on there being a status
>>>> change to ATTACHED on the reprogrammed device to trigger another
>>>> call to sdw_handle_slave_status() which will then handle the status
>>>> of all peripherals. If no device was actually programmed with an
>>>> ID there won't be a new ATTACHED indication. This can lead to the
>>>> status of other peripherals not being handled.
>>>>
>>>> The status passed to sdw_handle_slave_status() is obviously always
>>>> from a point of time in the past, and may indicate accumulated
>>>> unhandled events (depending how the bus manager operates). It's
>>>> possible that a device ID is reprogrammed but the last PING status
>>>> captured state just before that, when it was still reporting on
>>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>>> just before a new PING status is available showing it now on its new
>>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>>> device on #0, but it will not find one.
>>>>
>>>> Signed-off-by: Richard Fitzgerald <rf at opensource.cirrus.com>
>>>> ---
>>>>    drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>>    1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>> --- a/drivers/soundwire/bus.c
>>>> +++ b/drivers/soundwire/bus.c
>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>> sdw_bus *bus)
>>>>        struct sdw_slave_id id;
>>>>        struct sdw_msg msg;
>>>>        bool found;
>>>> -    int count = 0, ret;
>>>> +    int count = 0, num_programmed = 0, ret;
>>>>        u64 addr;
>>>>          /* No Slave, so use raw xfer api */
>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>>        if (ret < 0)
>>>> -        return ret;
>>>> +        return 0;
>>>
>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>> handled in sdw_fill_msg().
>>>
>>> I didn't check if all these error cases are irrelevant in that specific
>>> enumeration case, if that was the case maybe we need to break that
>>> function in two helpers so that all the checks can be skipped.
>>>
>>
>> I don't think that there's anything useful that
>> sdw_modify_slave_status() could do to recover from an error.
>>
>> If any device IDs were programmed then, according to the statement in
>> sdw_modify_slave_status()
>>
>>      * programming a device number will have side effects,
>>      * so we deal with other devices at a later time
>>
>> if this is true, then we need to exit to deal with what _was_
>> programmed, even if one of them failed.
>>
>> If nothing was programmed, and there was an error, we can't bail out of
>> sdw_modify_slave_status(). We have status for other devices which
>> we can't simply ignore.
>>
>> Ultimately I can't see how pushing the error code up is useful.
>> sdw_modify_slave_status() can't really do any effective recovery action,
>> and the original behavior of giving up and returning means that
>> an error in programming dev ID potentially causes collateral damage to
>> the status of other peripherals.
> 
> I was suggesting something like
> 
> 
> void sdw_fill_msg_data(...)
> {
>    copy data in the msg structure
> }
> 
> int sdw_fill_msg(...)
> {
>      sdw_fill_msg_data();
>      handle_error_cases
> }
> 
> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
> 
> So no change in functionality beyond explicit skip of error checks that
> are not relevant and cannot be handled even if they were.
> 

sdw_fill_msg() will never report an error during
sdw_program_device_num() because the first check is to return if
the address doesn't need paging, and sdw_program_device_num() only
accesses SCP registers.

I don't want to mix coding improvements with bugfixes. Splitting
sdw_fill_msg() isn't needed to fix this bug.

> 
> 


More information about the Alsa-devel mailing list