Conceptual bug on SoundWire probe/remove?
Pierre-Louis Bossart
pierre-louis.bossart at linux.intel.com
Fri Mar 25 17:51:52 CET 2022
On 3/25/22 11:26, Greg Kroah-Hartman wrote:
> On Fri, Mar 25, 2022 at 10:51:51AM -0500, Pierre-Louis Bossart wrote:
>>
>>
>> On 3/24/22 06:15, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 23, 2022 at 02:45:59PM -0500, Pierre-Louis Bossart wrote:
>>>> Hi,
>>>> I could use feedback/guidance on a possible conceptual bug in the SoundWire
>>>> probe and bus handling.
>>>>
>>>> When we probe a driver, the code does this:
>>>>
>>>> static int sdw_drv_probe(struct device *dev)
>>>> {
>>>> struct sdw_slave *slave = dev_to_sdw_dev(dev);
>>>> struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
>>>> const struct sdw_device_id *id;
>>>> const char *name;
>>>> int ret;
>>>>
>>>> /*
>>>> * fw description is mandatory to bind
>>>> */
>>>> if (!dev->fwnode)
>>>> return -ENODEV;
>>>>
>>>> if (!IS_ENABLED(CONFIG_ACPI) && !dev->of_node)
>>>> return -ENODEV;
>>>>
>>>> id = sdw_get_device_id(slave, drv);
>>>> if (!id)
>>>> return -ENODEV;
>>>>
>>>> slave->ops = drv->ops;
>>>
>>> That is wrong and should never happen as you lost all reference
>>> counting. Please don't do that.
>>
>> ok, so I think we all agree on the issue. It's not new code, it's been there
>> since December 2017 and 4.16
>
> It's hard to notice that in code review :(
>
>>>> The last line is the problematic one. If at some point, the user does an
>>>> rmmod and unbinds the SoundWire codec driver, the .remove will be called and
>>>> the 'drv' will no longer be valid, but we will still have a reference to
>>>> drv->ops and use that pointer in the bus code, e.g.
>>>>
>>>> /* Update the Slave driver */
>>>> if (slave_notify && slave->ops &&
>>>> slave->ops->interrupt_callback) {
>>>> slave_intr.sdca_cascade = sdca_cascade;
>>>> slave_intr.control_port = clear;
>>>> memcpy(slave_intr.port, &port_status,
>>>> sizeof(slave_intr.port));
>>>>
>>>> slave->ops->interrupt_callback(slave,
>>>> &slave_intr);
>>>> }
>>>>
>>>> I noodled with a potential fix in
>>>> https://github.com/thesofproject/linux/pull/3534/commits/82d64fb0fd39b532263f060a8ec86e47e9ab305b
>>>>
>>>> where I force-reset this slave->ops pointer, but it is likely to be very
>>>> racy.
>>>
>>> Just properly reference count things and this should be ok. You can
>>> NEVER just save off a pointer to a random thing that you do not own
>>> without increasing the reference count, otherwise bad things will
>>> happen. It's always been this way.
>>
>> but I am not following the reference count recommendation in that specific
>> case. If we increase a reference count in the probe, wouldn't that prevent
>> the unbind/remove?
>
> Depends on what you are increasing the reference count of :)
>
> bind/unbind has nothing to do with the reference count of objects, it
> only prevents the devices/whatever from being removed from the system
> (hopefully.)
>
> When doing "give me the driver ops of that driver over there" you have
> to be VERY careful. It's a very uncommon pattern that I can't think of
> anyone else doing outside of a bus logic. I don't recommend it, instead
> grab references to the devices themselves and go through the device and
> what is bound/unbound to that. Also keep things in sync when you grab
> those other devices, that can also get messy.
>
> How about just not doing this at all?
That's what I was exploring, I am not sure at all we need to use this
pointer at all.
Maybe we should go back to the problem statement:
The SoundWire codec driver provides a set of callbacks that need to be
invoked from bus layer when events are detected (bus reconfiguration,
clock about to be stopped, attach/detach and interrupt).
The devices are created upfront, and if there's no driver things will
work just fine. When we bind a driver, then these callbacks are detected
as not null.
The part that I am not familiar with is how we can prevent a
remove/unbind while invoking the callbacks.
Maybe we can indeed increase the refcount on the device, follow the
device->driver->ops pointers and reduce the refcount once the callback
completes.
More information about the Alsa-devel
mailing list