Conceptual bug on SoundWire probe/remove?

Pierre-Louis Bossart pierre-louis.bossart at linux.intel.com
Fri Mar 25 17:51:52 CET 2022



On 3/25/22 11:26, Greg Kroah-Hartman wrote:
> On Fri, Mar 25, 2022 at 10:51:51AM -0500, Pierre-Louis Bossart wrote:
>>
>>
>> On 3/24/22 06:15, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 23, 2022 at 02:45:59PM -0500, Pierre-Louis Bossart wrote:
>>>> Hi,
>>>> I could use feedback/guidance on a possible conceptual bug in the SoundWire
>>>> probe and bus handling.
>>>>
>>>> When we probe a driver, the code does this:
>>>>
>>>> static int sdw_drv_probe(struct device *dev)
>>>> {
>>>> 	struct sdw_slave *slave = dev_to_sdw_dev(dev);
>>>> 	struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
>>>> 	const struct sdw_device_id *id;
>>>> 	const char *name;
>>>> 	int ret;
>>>>
>>>> 	/*
>>>> 	 * fw description is mandatory to bind
>>>> 	 */
>>>> 	if (!dev->fwnode)
>>>> 		return -ENODEV;
>>>>
>>>> 	if (!IS_ENABLED(CONFIG_ACPI) && !dev->of_node)
>>>> 		return -ENODEV;
>>>>
>>>> 	id = sdw_get_device_id(slave, drv);
>>>> 	if (!id)
>>>> 		return -ENODEV;
>>>>
>>>> 	slave->ops = drv->ops;
>>>
>>> That is wrong and should never happen as you lost all reference
>>> counting.  Please don't do that.
>>
>> ok, so I think we all agree on the issue. It's not new code, it's been there
>> since December 2017 and 4.16
> 
> It's hard to notice that in code review :(
> 
>>>> The last line is the problematic one. If at some point, the user does an
>>>> rmmod and unbinds the SoundWire codec driver, the .remove will be called and
>>>> the 'drv' will no longer be valid, but we will still have a reference to
>>>> drv->ops and use that pointer in the bus code, e.g.
>>>>
>>>> 		/* Update the Slave driver */
>>>> 		if (slave_notify && slave->ops &&
>>>> 		    slave->ops->interrupt_callback) {
>>>> 			slave_intr.sdca_cascade = sdca_cascade;
>>>> 			slave_intr.control_port = clear;
>>>> 			memcpy(slave_intr.port, &port_status,
>>>> 			       sizeof(slave_intr.port));
>>>>
>>>> 			slave->ops->interrupt_callback(slave,
>>>> &slave_intr);
>>>> 		}
>>>>
>>>> I noodled with a potential fix in
>>>> https://github.com/thesofproject/linux/pull/3534/commits/82d64fb0fd39b532263f060a8ec86e47e9ab305b
>>>>
>>>> where I force-reset this slave->ops pointer, but it is likely to be very
>>>> racy.
>>>
>>> Just properly reference count things and this should be ok.  You can
>>> NEVER just save off a pointer to a random thing that you do not own
>>> without increasing the reference count, otherwise bad things will
>>> happen.  It's always been this way.
>>
>> but I am not following the reference count recommendation in that specific
>> case. If we increase a reference count in the probe, wouldn't that prevent
>> the unbind/remove?
> 
> Depends on what you are increasing the reference count of :)
> 
> bind/unbind has nothing to do with the reference count of objects, it
> only prevents the devices/whatever from being removed from the system
> (hopefully.)
> 
> When doing "give me the driver ops of that driver over there" you have
> to be VERY careful.  It's a very uncommon pattern that I can't think of
> anyone else doing outside of a bus logic.  I don't recommend it, instead
> grab references to the devices themselves and go through the device and
> what is bound/unbound to that.  Also keep things in sync when you grab
> those other devices, that can also get messy.
> 
> How about just not doing this at all?

That's what I was exploring, I am not sure at all we need to use this 
pointer at all.

Maybe we should go back to the problem statement:

The SoundWire codec driver provides a set of callbacks that need to be 
invoked from bus layer when events are detected (bus reconfiguration, 
clock about to be stopped, attach/detach and interrupt).

The devices are created upfront, and if there's no driver things will 
work just fine. When we bind a driver, then these callbacks are detected 
as not null.

The part that I am not familiar with is how we can prevent a 
remove/unbind while invoking the callbacks.

Maybe we can indeed increase the refcount on the device, follow the 
device->driver->ops pointers and reduce the refcount once the callback 
completes.




More information about the Alsa-devel mailing list