[alsa-devel] Public ridicule due to sound/soc/soc-core.c abuse of the driver model

Hi Frank, Liam, and sound maintainers.
It was recently pointed out to me that the sound/soc/soc-core.c is flagrantly abusing the driver model by providing "empty" release functions, just to keep the kernel from complaining:
/* stop no dev release warning */ static void soc_ac97_device_release(struct device *dev){}
....
static void rtd_release(struct device *dev) {}
As per the documentation in the kernel tree[1], I now get to publicly mock you for thinking you know better than the driver model.
Come on people, do you think that I wrote the code in the kernel that produces those errors just for fun? It was telling you to fix your code by providing a function to free the structure that is being released, not to try to trick the kernel because you think you know better. The kernel was trying to help you out here, to get the programming model correct, in a place that was commonly misunderstood.
But, by you trying to be "smart", you just proved that your code was incorrect and buggy.
Please fix this for the 3.3 kernel release.
Yes, I know it's been there for a number of years, but that still doesn't make it correct. Especially as stuff like this tends to get cut-and-pasted over time.
thanks,
greg k-h
[1] Documentation/kobject.txt

On Fri, Jan 06, 2012 at 11:40:52AM -0800, Greg KH wrote:
It was recently pointed out to me that the sound/soc/soc-core.c is flagrantly abusing the driver model by providing "empty" release functions, just to keep the kernel from complaining:
Come on people, do you think that I wrote the code in the kernel that produces those errors just for fun? It was telling you to fix your code by providing a function to free the structure that is being released, not to try to trick the kernel because you think you know better. The kernel was trying to help you out here, to get the programming model correct, in a place that was commonly misunderstood.
The problem is that due to the entertaining nature of AC'97 support in Linux we don't actually have anything to free at this point - we'd need to redo the whole infrastructure, not just this code.
You really need to find someone with an ongoing interest in AC'97 and convince them that it's worth overhauling the bus, the whole thing is just too much of a can of worms to touch. Fixing this for 3.3 seems completely insane, we're already in the merge window and there's too many skeletons.
I'm not even sure I have any AC'97 hardware any more myself.

On Fri, Jan 06, 2012 at 12:15:01PM -0800, Mark Brown wrote:
On Fri, Jan 06, 2012 at 11:40:52AM -0800, Greg KH wrote:
It was recently pointed out to me that the sound/soc/soc-core.c is flagrantly abusing the driver model by providing "empty" release functions, just to keep the kernel from complaining:
Come on people, do you think that I wrote the code in the kernel that produces those errors just for fun? It was telling you to fix your code by providing a function to free the structure that is being released, not to try to trick the kernel because you think you know better. The kernel was trying to help you out here, to get the programming model correct, in a place that was commonly misunderstood.
The problem is that due to the entertaining nature of AC'97 support in Linux we don't actually have anything to free at this point - we'd need to redo the whole infrastructure, not just this code.
The fact is that these bugs are oopses waiting to happen. All you need is the kernel to hold a reference to the kobject underlying the struct device, and then release that reference after you've kfree'd the structure containing the struct device, and the kernel will go bang.
I also disagree with your statement - I think it's quite simple to fix:
1. Change to using a flag to indicate whether the struct device is registered rather than ac97->dev.bus. 2. Change snd_soc_new_ac97_codec() to initialize codec->ac97->dev, including calling device_initialize() on it and setting the release function. 3. Change device_register(&codec->ac97->dev); to device_add(&codec->ac97->dev); 4. Make the AC97 device release function do the freeing of 'codec->ac97' 5. Change the current device_unregister(&codec->ac97->dev); to device_del(&codec->ac97->dev); 6. Change the various kfree(codec->ac97); in the file to do a device_put(&codec->ac97->dev);
This will cause device_put() to check the refcount, when that hits zero it will call the release function, and that in turn will then kfree() the ac97 structure.
If someone else is holding a reference to this struct device, kfree will be called once that reference is dropped - and this is the exact behaviour the driver model as a whole requires.
At least this gets the AC97 ASoC stuff fixed, and one less abuse of the driver model in the kernel.

On Fri, Jan 06, 2012 at 08:50:36PM +0000, Russell King - ARM Linux wrote:
I also disagree with your statement - I think it's quite simple to fix:
- Change to using a flag to indicate whether the struct device is registered rather than ac97->dev.bus.
Looking at this deeper, you already have such a flag. It's called codec->ac97_registered:
ret = soc_ac97_dev_register(rtd->codec); if (ret < 0) { printk(KERN_ERR "asoc: AC97 device register failed\n"); return ret; }
rtd->codec->ac97_registered = 1;
static void soc_unregister_ac97_dai_link(struct snd_soc_codec *codec) { if (codec->ac97_registered) { soc_ac97_dev_unregister(codec); codec->ac97_registered = 0; } }
and soc_ac97_dev_{un,}register() do this:
static int soc_ac97_dev_unregister(struct snd_soc_codec *codec) { if (codec->ac97->dev.bus) device_unregister(&codec->ac97->dev); return 0; }
static int soc_ac97_dev_register(struct snd_soc_codec *codec) { int err;
codec->ac97->dev.bus = &ac97_bus_type; ... err = device_register(&codec->ac97->dev); if (err < 0) { ... codec->ac97->dev.bus = NULL; return err; } return 0; }
So, dev.bus is really duplicating what ac97_registered is doing and nothing more, and I'd suggest that the very first patch cleaning up this buggy code is to get rid of this duplication:
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index a25fa63..961f980 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -452,8 +452,7 @@ static inline void soc_cleanup_card_debugfs(struct snd_soc_card *card) /* unregister ac97 codec */ static int soc_ac97_dev_unregister(struct snd_soc_codec *codec) { - if (codec->ac97->dev.bus) - device_unregister(&codec->ac97->dev); + device_unregister(&codec->ac97->dev); return 0; }
@@ -474,7 +473,6 @@ static int soc_ac97_dev_register(struct snd_soc_codec *codec) err = device_register(&codec->ac97->dev); if (err < 0) { snd_printk(KERN_ERR "Can't register ac97 bus\n"); - codec->ac97->dev.bus = NULL; return err; } return 0;
Second patch would be to make soc_ac97_dev_unregister return type void - it's pointless to have a function which always returns 0 and its return value is never checked at its solitary call site.
Overall, it should look something like this (untested):
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index a25fa63..090b89e 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -448,33 +448,34 @@ static inline void soc_cleanup_card_debugfs(struct snd_soc_card *card) } #endif
+/* stop no dev release warning */ +static void soc_ac97_device_release(struct device *dev) +{ + struct snd_ac97 *ac97 = container_of(dev, snd_ac97, dev); + + kfree(ac97); +} + #ifdef CONFIG_SND_SOC_AC97_BUS /* unregister ac97 codec */ static int soc_ac97_dev_unregister(struct snd_soc_codec *codec) { - if (codec->ac97->dev.bus) - device_unregister(&codec->ac97->dev); + device_del(&codec->ac97->dev); return 0; }
-/* stop no dev release warning */ -static void soc_ac97_device_release(struct device *dev){} - /* register ac97 codec to bus */ static int soc_ac97_dev_register(struct snd_soc_codec *codec) { int err;
codec->ac97->dev.bus = &ac97_bus_type; - codec->ac97->dev.parent = codec->card->dev; - codec->ac97->dev.release = soc_ac97_device_release;
dev_set_name(&codec->ac97->dev, "%d-%d:%s", codec->card->snd_card->number, 0, codec->name); - err = device_register(&codec->ac97->dev); + err = device_add(&codec->ac97->dev); if (err < 0) { snd_printk(KERN_ERR "Can't register ac97 bus\n"); - codec->ac97->dev.bus = NULL; return err; } return 0; @@ -1748,9 +1749,13 @@ int snd_soc_new_ac97_codec(struct snd_soc_codec *codec, return -ENOMEM; }
+ device_initialize(&codec->ac97->dev); + codec->ac97->dev.parent = codec->card->dev; + codec->ac97->dev.release = soc_ac97_device_release; + codec->ac97->bus = kzalloc(sizeof(struct snd_ac97_bus), GFP_KERNEL); if (codec->ac97->bus == NULL) { - kfree(codec->ac97); + device_put(&codec->ac97->dev); codec->ac97 = NULL; mutex_unlock(&codec->mutex); return -ENOMEM; @@ -1783,7 +1788,7 @@ void snd_soc_free_ac97_codec(struct snd_soc_codec *codec) soc_unregister_ac97_dai_link(codec); #endif kfree(codec->ac97->bus); - kfree(codec->ac97); + device_put(&codec->ac97->dev); codec->ac97 = NULL; codec->ac97_created = 0; mutex_unlock(&codec->mutex);

On Fri, Jan 06, 2012 at 08:50:36PM +0000, Russell King - ARM Linux wrote:
On Fri, Jan 06, 2012 at 12:15:01PM -0800, Mark Brown wrote:
The problem is that due to the entertaining nature of AC'97 support in Linux we don't actually have anything to free at this point - we'd need to redo the whole infrastructure, not just this code.
The fact is that these bugs are oopses waiting to happen. All you need is the kernel to hold a reference to the kobject underlying the struct device, and then release that reference after you've kfree'd the structure containing the struct device, and the kernel will go bang.
Well, clearly. But given that the chances of anyone actually removing an AC'97 device from their system at runtime except in development is pretty much zero it's not a practical problem. I'm not saying this is good, I'm just saying this is *far* from the only problem AC'97 has with device model and the general fragility of the AC'97 code is such that fiddling with it in Linus' tree to resolve an issue which is currently only visible to code inspection doesn't fill me with happy thoughts.
I also disagree with your statement - I think it's quite simple to fix:
- Change to using a flag to indicate whether the struct device is registered rather than ac97->dev.bus.
- Change snd_soc_new_ac97_codec() to initialize codec->ac97->dev, including calling device_initialize() on it and setting the release function.
Which is crazy in itself since there is an AC'97 bus which shouldn't be making us open code things. Never mind the fact that the AC'97 bus is enumerable so the fact that something other than the bus is even instantiating these devices is at best questionalble. But anyway.
This will cause device_put() to check the refcount, when that hits zero it will call the release function, and that in turn will then kfree() the ac97 structure.
If someone else is holding a reference to this struct device, kfree will be called once that reference is dropped - and this is the exact behaviour the driver model as a whole requires.
This only helps this specific device model bit of things, if anything is still actually holding a reference to the device and tries to use it after we tore away the resource underneath it we'll still explode (never mind the fact that we're backing this stuff up with some global pointers to other devices which may or may not actually be there...).

On Fri, Jan 06, 2012 at 03:41:39PM -0800, Mark Brown wrote:
This only helps this specific device model bit of things, if anything is still actually holding a reference to the device and tries to use it after we tore away the resource underneath it we'll still explode (never mind the fact that we're backing this stuff up with some global pointers to other devices which may or may not actually be there...).
Err what? Explain showing the code where you think this is the case with the patch I proposed please.

On Fri, Jan 06, 2012 at 11:44:45PM +0000, Russell King - ARM Linux wrote:
On Fri, Jan 06, 2012 at 03:41:39PM -0800, Mark Brown wrote:
This only helps this specific device model bit of things, if anything is still actually holding a reference to the device and tries to use it after we tore away the resource underneath it we'll still explode (never mind the fact that we're backing this stuff up with some global pointers to other devices which may or may not actually be there...).
Err what? Explain showing the code where you think this is the case with the patch I proposed please.
These aren't new problems being introduced, they're preexisting problems which aren't fixed by this (hence why I say it "only helps with this specific device model side of things") but can come up in pretty much the same circumstances.

On Fri, Jan 06, 2012 at 03:49:27PM -0800, Mark Brown wrote:
On Fri, Jan 06, 2012 at 11:44:45PM +0000, Russell King - ARM Linux wrote:
On Fri, Jan 06, 2012 at 03:41:39PM -0800, Mark Brown wrote:
This only helps this specific device model bit of things, if anything is still actually holding a reference to the device and tries to use it after we tore away the resource underneath it we'll still explode (never mind the fact that we're backing this stuff up with some global pointers to other devices which may or may not actually be there...).
Err what? Explain showing the code where you think this is the case with the patch I proposed please.
These aren't new problems being introduced, they're preexisting problems which aren't fixed by this (hence why I say it "only helps with this specific device model side of things") but can come up in pretty much the same circumstances.
Anything which helps reduce the abuses of the driver model is a plus - it helps remove the possibility of kernel oopses.

On Mon, Jan 09, 2012 at 09:51:25AM +0000, Russell King - ARM Linux wrote:
Anything which helps reduce the abuses of the driver model is a plus - it helps remove the possibility of kernel oopses.
Trying to make any sort of modification to code this fragile is risky, especially during what's supposed to be a stabalization phase (which is what Greg is requesting). It just seems completely irresponsible for something that isn't actually a practical problem.

On Mon, Jan 09, 2012 at 07:52:13PM +0000, Mark Brown wrote:
On Mon, Jan 09, 2012 at 09:51:25AM +0000, Russell King - ARM Linux wrote:
Anything which helps reduce the abuses of the driver model is a plus - it helps remove the possibility of kernel oopses.
Trying to make any sort of modification to code this fragile is risky, especially during what's supposed to be a stabalization phase (which is what Greg is requesting). It just seems completely irresponsible for something that isn't actually a practical problem.
I find it hard to believe that ignoring the driver model is not a "practical" problem :)
For details as to why this is a problem, please see the kobject.txt file.
Please fix this up, as you have seen, people end up cutting-and-pasting bad code.
greg k-h

On Mon, Jan 09, 2012 at 12:11:10PM -0800, Greg KH wrote:
On Mon, Jan 09, 2012 at 07:52:13PM +0000, Mark Brown wrote:
Trying to make any sort of modification to code this fragile is risky, especially during what's supposed to be a stabalization phase (which is what Greg is requesting). It just seems completely irresponsible for something that isn't actually a practical problem.
I find it hard to believe that ignoring the driver model is not a "practical" problem :)
For details as to why this is a problem, please see the kobject.txt file.
Sure, I'm fully aware of the issue. The reason this is so painful to work with is that AC'97 is just generally doing a really bad job of using the driver model.
Please fix this up, as you have seen, people end up cutting-and-pasting bad code.
In my copious free time, but like I say trying to do this for 3.3 (you only posted *after* the merge window opened) is just nuts and I'd rather hope someone who cares about AC'97 systems will come forward and work on it (having one would be a real bonus). If people actually had problems we were fixing that'd be one thing but if they do they're being extremely quiet about it.
If it's code like the rtd devices which is reasonably sensible and stable that's one thing but wading into code we know is fragile for what's essentially a warning fix is completely disproportionate. It's not like we'd actually make a substantial improvement in the overall code quality or robustness, if anyone's using AC'97 as a reference for how to do this stuff they're going to have a bunch of other issues to work through.

On Mon, Jan 09, 2012 at 12:31:30PM -0800, Mark Brown wrote:
If it's code like the rtd devices which is reasonably sensible and stable that's one thing but wading into code we know is fragile for what's essentially a warning fix is completely disproportionate.
You're still under the impression that it's "just a warning" and not "a potential oops".
And I disagree with your assessment of how easy it is to fix each problem.
The rtd devices are stored in an array - this array needs breaking up into separate allocations for each rtd as the very first step in fixing that. That then needs more error checking added, etc.
On the other hand, the AC'97 code in ASoC should need this to fix it:
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index a25fa63..e85ae4d 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -452,14 +452,10 @@ static inline void soc_cleanup_card_debugfs(struct snd_soc_card *card) /* unregister ac97 codec */ static int soc_ac97_dev_unregister(struct snd_soc_codec *codec) { - if (codec->ac97->dev.bus) - device_unregister(&codec->ac97->dev); + device_del(&codec->ac97->dev); return 0; }
-/* stop no dev release warning */ -static void soc_ac97_device_release(struct device *dev){} - /* register ac97 codec to bus */ static int soc_ac97_dev_register(struct snd_soc_codec *codec) { @@ -467,14 +463,12 @@ static int soc_ac97_dev_register(struct snd_soc_codec *codec)
codec->ac97->dev.bus = &ac97_bus_type; codec->ac97->dev.parent = codec->card->dev; - codec->ac97->dev.release = soc_ac97_device_release;
dev_set_name(&codec->ac97->dev, "%d-%d:%s", codec->card->snd_card->number, 0, codec->name); - err = device_register(&codec->ac97->dev); + err = device_add(&codec->ac97->dev); if (err < 0) { snd_printk(KERN_ERR "Can't register ac97 bus\n"); - codec->ac97->dev.bus = NULL; return err; } return 0; @@ -1729,6 +1723,12 @@ int snd_soc_platform_write(struct snd_soc_platform *platform, } EXPORT_SYMBOL_GPL(snd_soc_platform_write);
+static void soc_ac97_device_release(struct device *dev) +{ + struct snd_ac97 *ac97 = container_of(dev, struct snd_ac97, dev); + kfree(ac97); +} + /** * snd_soc_new_ac97_codec - initailise AC97 device * @codec: audio codec @@ -1756,6 +1756,9 @@ int snd_soc_new_ac97_codec(struct snd_soc_codec *codec, return -ENOMEM; }
+ codec->ac97->dev.release = soc_ac97_device_release; + device_initialize(&codec->ac97->dev); + codec->ac97->bus->ops = ops; codec->ac97->num = num;
@@ -1783,7 +1786,7 @@ void snd_soc_free_ac97_codec(struct snd_soc_codec *codec) soc_unregister_ac97_dai_link(codec); #endif kfree(codec->ac97->bus); - kfree(codec->ac97); + put_device(&codec->ac97->dev); codec->ac97 = NULL; codec->ac97_created = 0; mutex_unlock(&codec->mutex);

On Mon, Jan 09, 2012 at 09:37:24PM +0000, Russell King - ARM Linux wrote:
On Mon, Jan 09, 2012 at 12:31:30PM -0800, Mark Brown wrote:
If it's code like the rtd devices which is reasonably sensible and stable that's one thing but wading into code we know is fragile for what's essentially a warning fix is completely disproportionate.
You're still under the impression that it's "just a warning" and not "a potential oops".
To repeat myself yet again: there are a bunch of other issues here, if we're ever in the situation where we can trigger this oops we're also likely to be running into other equally severe problems. As I keep having to say we've got code which is buggy, difficult to work with but has been there for quite some time without actually triggering issues on systems. Nothing about this is saying -rc is a good time to be wading into the code.
And I disagree with your assessment of how easy it is to fix each problem.
The rtd devices are stored in an array - this array needs breaking up into separate allocations for each rtd as the very first step in fixing that. That then needs more error checking added, etc.
There's a patch for the rtds already, they should be fine now - we just dynamically allocate the device rather than worrying about the rtd. The issue with AC'97 isn't that it's especially hard to make this specific change, it's that the code is horrible and I don't trust it not to explode underneath us if we change anything.
If you want to completely restructure the rtds as well as providng a release function for 3.3 then you've got similar issues, you're talking about restructuring which is *completely* out of scope for -rc.
On the other hand, the AC'97 code in ASoC should need this to fix it:
Which is roughly what we did for the rtds too. This won't do as-is though:
#endif kfree(codec->ac97->bus);
- kfree(codec->ac97);
- put_device(&codec->ac97->dev);
We can't have potentially live devices floating around without a bus, there's an assumption that a valid AC'97 device will be on a bus. Moving the bus free into the device release function is probably all we need to do for that.

On Mon, Jan 09, 2012 at 12:31:30PM -0800, Mark Brown wrote:
On Mon, Jan 09, 2012 at 12:11:10PM -0800, Greg KH wrote:
On Mon, Jan 09, 2012 at 07:52:13PM +0000, Mark Brown wrote:
Trying to make any sort of modification to code this fragile is risky, especially during what's supposed to be a stabalization phase (which is what Greg is requesting). It just seems completely irresponsible for something that isn't actually a practical problem.
I find it hard to believe that ignoring the driver model is not a "practical" problem :)
For details as to why this is a problem, please see the kobject.txt file.
Sure, I'm fully aware of the issue. The reason this is so painful to work with is that AC'97 is just generally doing a really bad job of using the driver model.
Please fix this up, as you have seen, people end up cutting-and-pasting bad code.
In my copious free time, but like I say trying to do this for 3.3 (you only posted *after* the merge window opened) is just nuts and I'd rather hope someone who cares about AC'97 systems will come forward and work on it (having one would be a real bonus). If people actually had problems we were fixing that'd be one thing but if they do they're being extremely quiet about it.
Ok, fair enough, if this is fixed by 3.4, I'll be happy.
thanks,
greg k-h

On Fri, Jan 06, 2012 at 11:40:52AM -0800, Greg KH wrote:
static void rtd_release(struct device *dev) {}
Oh, great - didn't notice this one. Liam, this is from the multi-component stuff and does need sorting.
participants (3)
-
Greg KH
-
Mark Brown
-
Russell King - ARM Linux