[alsa-devel] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver. In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug. Since the dependency of IRQ chip data was removed from HDMI LPE audio by Commit 9bd9590997b92fbd79fd028f704f6c584b4439d7 ("drm/i915: Stop pretending to mask/unmask LPE audio interrupts"), remove the code of setting IRQ chip data to resolve this issue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103731 Cc: Pierre-Louis Bossart pierre-louis.bossart@intel.com Cc: Jerome Anand jerome.anand@intel.com Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Augustine.Chen augustine.chen@intel.com --- drivers/gpu/drm/i915/intel_lpe_audio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_lpe_audio.c b/drivers/gpu/drm/i915/intel_lpe_audio.c index 3bf6528..56176f9 100644 --- a/drivers/gpu/drm/i915/intel_lpe_audio.c +++ b/drivers/gpu/drm/i915/intel_lpe_audio.c @@ -176,7 +176,7 @@ static int lpe_audio_irq_init(struct drm_i915_private *dev_priv) handle_simple_irq, "hdmi_lpe_audio_irq_handler");
- return irq_set_chip_data(irq, dev_priv); + return 0; }
static bool lpe_audio_detect(struct drm_i915_private *dev_priv)
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
Do we need something like - dev_priv->lpe_audio.irq = irq_alloc_desc(0); + dev_priv->lpe_audio.irq = irq_alloc_desc(-1); ?
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug. Since the dependency of IRQ chip data was removed from HDMI LPE audio by Commit 9bd9590997b92fbd79fd028f704f6c584b4439d7 ("drm/i915: Stop pretending to mask/unmask LPE audio interrupts"), remove the code of setting IRQ chip data to resolve this issue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103731 Cc: Pierre-Louis Bossart pierre-louis.bossart@intel.com Cc: Jerome Anand jerome.anand@intel.com Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Augustine.Chen augustine.chen@intel.com
drivers/gpu/drm/i915/intel_lpe_audio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_lpe_audio.c b/drivers/gpu/drm/i915/intel_lpe_audio.c index 3bf6528..56176f9 100644 --- a/drivers/gpu/drm/i915/intel_lpe_audio.c +++ b/drivers/gpu/drm/i915/intel_lpe_audio.c @@ -176,7 +176,7 @@ static int lpe_audio_irq_init(struct drm_i915_private *dev_priv) handle_simple_irq, "hdmi_lpe_audio_irq_handler");
- return irq_set_chip_data(irq, dev_priv);
- return 0;
}
static bool lpe_audio_detect(struct drm_i915_private *dev_priv)
1.9.1
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
Do we need something like
- dev_priv->lpe_audio.irq = irq_alloc_desc(0);
- dev_priv->lpe_audio.irq = irq_alloc_desc(-1);
#define irq_alloc_desc(node)
So instead of handing in node 0 you hand in node -1 which is NUMA_NO_NODE
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Curing the symptom is never a good approach.
Thanks,
tglx
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Saturday, December 9, 2017 4:22 AM To: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Chen, Augustine augustine.chen@intel.com; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Anand, Jerome jerome.anand@intel.com; Bossart, Pierre-louis <pierre- louis.bossart@intel.com>; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
Do we need something like
- dev_priv->lpe_audio.irq = irq_alloc_desc(0);
- dev_priv->lpe_audio.irq = irq_alloc_desc(-1);
#define irq_alloc_desc(node)
So instead of handing in node 0 you hand in node -1 which is NUMA_NO_NODE
Agree - am not sure whether it will make any difference.
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers with hwirq. It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
Curing the symptom is never a good approach.
Thanks,
tglx
On Mon, Dec 11, 2017 at 08:33:33AM +0000, Anand, Jerome wrote:
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Saturday, December 9, 2017 4:22 AM To: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Chen, Augustine augustine.chen@intel.com; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Anand, Jerome jerome.anand@intel.com; Bossart, Pierre-louis <pierre- louis.bossart@intel.com>; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
Do we need something like
- dev_priv->lpe_audio.irq = irq_alloc_desc(0);
- dev_priv->lpe_audio.irq = irq_alloc_desc(-1);
#define irq_alloc_desc(node)
So instead of handing in node 0 you hand in node -1 which is NUMA_NO_NODE
Ah. I misread the macros. So we already pass irq=-1.
Agree - am not sure whether it will make any difference.
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers with hwirq. It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
There is no need right now. But there might be a need in the future. LPE audio isn't even the only piece of hardware whose irq goes through the i915 display engine (there's also the ISP and VED). So I would much prefer a proper solution instead of sweeping the problem under the rug.
On Mon, 11 Dec 2017 14:20:23 +0100, Ville Syrjälä wrote:
On Mon, Dec 11, 2017 at 08:33:33AM +0000, Anand, Jerome wrote:
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Saturday, December 9, 2017 4:22 AM To: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Chen, Augustine augustine.chen@intel.com; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Anand, Jerome jerome.anand@intel.com; Bossart, Pierre-louis <pierre- louis.bossart@intel.com>; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
Do we need something like
- dev_priv->lpe_audio.irq = irq_alloc_desc(0);
- dev_priv->lpe_audio.irq = irq_alloc_desc(-1);
#define irq_alloc_desc(node)
So instead of handing in node 0 you hand in node -1 which is NUMA_NO_NODE
Ah. I misread the macros. So we already pass irq=-1.
Agree - am not sure whether it will make any difference.
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers with hwirq. It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
There is no need right now. But there might be a need in the future. LPE audio isn't even the only piece of hardware whose irq goes through the i915 display engine (there's also the ISP and VED). So I would much prefer a proper solution instead of sweeping the problem under the rug.
IMO, the primary question is whether the usage of irq chip without irq domain is valid or not. If an irq domain is mandatory, that's the thing to be fixed in i915 side.
thanks,
Takashi
-----Original Message----- From: Takashi Iwai [mailto:tiwai@suse.de] Sent: Monday, December 11, 2017 9:23 PM To: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Anand, Jerome jerome.anand@intel.com; Thomas Gleixner tglx@linutronix.de; Chen, Augustine augustine.chen@intel.com; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Bossart, Pierre-louis pierre-louis.bossart@intel.com; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Mon, 11 Dec 2017 14:20:23 +0100, Ville Syrjälä wrote:
On Mon, Dec 11, 2017 at 08:33:33AM +0000, Anand, Jerome wrote:
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Saturday, December 9, 2017 4:22 AM To: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Chen, Augustine augustine.chen@intel.com; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Anand, Jerome jerome.anand@intel.com; Bossart, Pierre-louis <pierre- louis.bossart@intel.com>; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
Do we need something like
- dev_priv->lpe_audio.irq = irq_alloc_desc(0);
- dev_priv->lpe_audio.irq = irq_alloc_desc(-1);
#define irq_alloc_desc(node)
So instead of handing in node 0 you hand in node -1 which is NUMA_NO_NODE
Ah. I misread the macros. So we already pass irq=-1.
No matter from the code perspective or from the real test result, this change doesn't make any difference in terms of the issue symptom.
Agree - am not sure whether it will make any difference.
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers
with hwirq.
It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
There is no need right now. But there might be a need in the future. LPE audio isn't even the only piece of hardware whose irq goes through the i915 display engine (there's also the ISP and VED). So I would much prefer a proper solution instead of sweeping the problem under the rug.
IMO, the primary question is whether the usage of irq chip without irq domain is valid or not. If an irq domain is mandatory, that's the thing to be fixed in i915 side.
In terms of functionality, the interrupt and hdmi audio work fine without irq domain according to the validation. And besides, there are other drivers with similar implementation which doesn't set chip data at all.
thanks,
Takashi
On Tue, 12 Dec 2017 10:26:08 +0100, Chen, Augustine wrote:
That *looks* more correct to me based on a cursory glance at the x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have others in GPIO etc.
> In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this > would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers
with hwirq.
It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
There is no need right now. But there might be a need in the future. LPE audio isn't even the only piece of hardware whose irq goes through the i915 display engine (there's also the ISP and VED). So I would much prefer a proper solution instead of sweeping the problem under the rug.
IMO, the primary question is whether the usage of irq chip without irq domain is valid or not. If an irq domain is mandatory, that's the thing to be fixed in i915 side.
In terms of functionality, the interrupt and hdmi audio work fine without irq domain according to the validation.
Sure, otherwise the patch never landed to mainline :)
And besides, there are other drivers with similar implementation which doesn't set chip data at all.
Yes, dropping the chip data should be OK in the driver, but the only question is whether it is the right fix.
Did you check whether the issue happens with 4.15-rc, too? This was never answered in bugzilla. If I understand correctly, the code triggering Oops has been changed largely there and it should prune the non-legacy irqs.
Takashi
-----Original Message----- From: Takashi Iwai [mailto:tiwai@suse.de] Sent: Tuesday, December 12, 2017 5:45 PM To: Chen, Augustine augustine.chen@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com; Anand, Jerome jerome.anand@intel.com; Thomas Gleixner tglx@linutronix.de; intel- gfx@lists.freedesktop.org; alsa-devel@alsa-project.org; Bossart, Pierre-louis pierre-louis.bossart@intel.com; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Tue, 12 Dec 2017 10:26:08 +0100, Chen, Augustine wrote:
> That *looks* more correct to me based on a cursory glance at > the > x86 code, but I didn't trawl very deeply.
The x86 core cares not at all about interrupt chips which are created in a driver and not connected to an actual apic/ioapic/msi interrupt. This interrupt chip is its own thing as we have
others in GPIO etc.
> > In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this > > would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers
with hwirq.
It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
There is no need right now. But there might be a need in the future. LPE audio isn't even the only piece of hardware whose irq goes through the i915 display engine (there's also the ISP and VED). So I would much prefer a proper solution instead of sweeping the problem under the rug.
IMO, the primary question is whether the usage of irq chip without irq domain is valid or not. If an irq domain is mandatory, that's the thing to be fixed in i915 side.
In terms of functionality, the interrupt and hdmi audio work fine without irq domain according to the validation.
Sure, otherwise the patch never landed to mainline :)
And besides, there are other drivers with similar implementation which doesn't set chip data at all.
Yes, dropping the chip data should be OK in the driver, but the only question is whether it is the right fix.
Did you check whether the issue happens with 4.15-rc, too? This was never answered in bugzilla. If I understand correctly, the code triggering Oops has been changed largely there and it should prune the non-legacy irqs.
I tested it with 4.15-RC1 today and couldn't observed the same symptom. And yes, the code running into invalid pointer was modified dramatically. Feels like it's available to keep this driver intact.
Takashi
On Mon, 11 Dec 2017, Anand, Jerome wrote:
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers with hwirq. It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
That still does not explain why this is broken with CONFIG_CPUMASK_OFFSTACK=n and cpu hotplug.
Thanks,
tglx
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Tuesday, December 12, 2017 10:37 PM To: Anand, Jerome jerome.anand@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com; Chen, Augustine augustine.chen@intel.com; intel-gfx@lists.freedesktop.org; alsa-devel@alsa- project.org; Bossart, Pierre-louis pierre-louis.bossart@intel.com; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: RE: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
On Mon, 11 Dec 2017, Anand, Jerome wrote:
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
In the case of not enabling CONFIG_CPUMASK_OFFSTACK, this would cause kernel panic while doing CPU hotplug.
And why so? Surely not because you set irq_chip_data. That's really no explanation at all.
Ideally, I feel there needs to be an irq domain for mapping the irq numbers with hwirq. It is not created as part of the hdmi lpe audio bridge. Since the logic to mask/unmask lpe audio interrupts is removed, there is no need of the Chip data or creation of the domain now.
That still does not explain why this is broken with CONFIG_CPUMASK_OFFSTACK=n and cpu hotplug.
If the config option is enabled, the address accessed by APIC is valid as shown below else it tries to access locations which might or might not be valid based on the chip data set. This is the reason why some drivers work and some don't. I feel this needs to be fixed in APIC - not sure though !
#ifdef CONFIG_CPUMASK_OFFSTACK typedef struct cpumask *cpumask_var_t; ... #else typedef struct cpumask cpumask_var_t[1]; ... #endif
Thanks,
tglx
On Wed, 13 Dec 2017, Anand, Jerome wrote:
-----Original Message----- From: Thomas Gleixner [mailto:tglx@linutronix.de] Sent: Tuesday, December 12, 2017 10:37 PM To: Anand, Jerome jerome.anand@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com; Chen, Augustine augustine.chen@intel.com; intel-gfx@lists.freedesktop.org; alsa-devel@alsa- project.org; Bossart, Pierre-louis pierre-louis.bossart@intel.com; tiwai@suse.de; Ingo Molnar mingo@redhat.com; H. Peter Anvin hpa@zytor.com; Jiang Liu jiang.liu@linux.intel.com; Juergen Gross jgross@suse.com; Dou Liyang douly.fnst@cn.fujitsu.com; linux- kernel@vger.kernel.org Subject: RE: [Intel-gfx] [PATCH] drm/i915: Remove unused IRQ chip data of HDMI LPE audio
Can you please fix your mail client NOT to replicate the full mail header. That's just annoying.
On Mon, 11 Dec 2017, Anand, Jerome wrote:
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote:
The chip data of HDMI LPE audio is set to drm_i915_private which is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
How does APIC code end up to touch that interrupt at all? Call stack please.
And please enable CONFIG_GENERIC_IRQ_DEBUGFS and provide the output of
cat /sys/kernel/debug/irq/irqs/$N
where N is the interrupt number of that thing.
Thanks,
tglx
On Wed, 13 Dec 2017 12:35:54 +0100, Thomas Gleixner wrote:
On Mon, 11 Dec 2017, Anand, Jerome wrote:
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote: > The chip data of HDMI LPE audio is set to drm_i915_private which > is not consistent with the expectation by x86 APIC driver.
Hmm. Why is the apic code looking at data for an irq chip it hasn't created?
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
How does APIC code end up to touch that interrupt at all? Call stack please.
It's found in the bugzilla referred in the patch: https://bugs.freedesktop.org/show_bug.cgi?id=103731
[ 87.353072] irq 298 idata->chip->name hdmi_lpe_audio_irqchip [ 87.353072] irq 298 apic_chip_data [ 87.353073] irq 298 data->domain is NULL [ 87.353120] BUG: unable to handle kernel NULL pointer dereference at (null) [ 87.353132] IP: setup_vector_irq+0x1ba/0x230 [ 87.353133] PGD 0
If my understanding is correct, it happens only with 4.14 and earlier kernels where __setup_vector_irq() loops over the all irqs:
static void __setup_vector_irq(int cpu) { struct apic_chip_data *data; struct irq_desc *desc; int irq, vector;
/* Mark the inuse vectors */ for_each_irq_desc(irq, desc) { struct irq_data *idata = irq_desc_get_irq_data(desc);
data = apic_chip_data(idata); if (!data || !cpumask_test_cpu(cpu, data->domain)) continue; ....
And since we have assigned a non-APIC chip data in the driver, the code above refers to a wrong object, leading to Oops.
As a further note, the setup_vector_irq() code has been changed in 4.15, and such a reference won't happen any longer. So the patch isn't necessary for now, although it's not bad to take as a cleanup. And we can eventually put Cc to stable there since it actually works around the issue above for the older kernels -- of course, with more detailed descriptions about the background.
thanks,
Takashi
On Wed, 13 Dec 2017, Takashi Iwai wrote:
On Wed, 13 Dec 2017 12:35:54 +0100, Thomas Gleixner wrote:
On Mon, 11 Dec 2017, Anand, Jerome wrote:
On Fri, 8 Dec 2017, Ville Syrjälä wrote:
> On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote: > > The chip data of HDMI LPE audio is set to drm_i915_private which > > is not consistent with the expectation by x86 APIC driver. > > Hmm. Why is the apic code looking at data for an irq chip it > hasn't created? >
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
How does APIC code end up to touch that interrupt at all? Call stack please.
It's found in the bugzilla referred in the patch: https://bugs.freedesktop.org/show_bug.cgi?id=103731
[ 87.353072] irq 298 idata->chip->name hdmi_lpe_audio_irqchip [ 87.353072] irq 298 apic_chip_data [ 87.353073] irq 298 data->domain is NULL [ 87.353120] BUG: unable to handle kernel NULL pointer dereference at (null) [ 87.353132] IP: setup_vector_irq+0x1ba/0x230 [ 87.353133] PGD 0
If my understanding is correct, it happens only with 4.14 and earlier kernels where __setup_vector_irq() loops over the all irqs:
static void __setup_vector_irq(int cpu) { struct apic_chip_data *data; struct irq_desc *desc; int irq, vector;
/* Mark the inuse vectors */ for_each_irq_desc(irq, desc) { struct irq_data *idata = irq_desc_get_irq_data(desc);
data = apic_chip_data(idata); if (!data || !cpumask_test_cpu(cpu, data->domain)) continue; ....
And since we have assigned a non-APIC chip data in the driver, the code above refers to a wrong object, leading to Oops.
Bah crap. This information should have been provided earlier instead of handwavy 'doesnt work with CONFIG_FOO and hotplug'.
As a further note, the setup_vector_irq() code has been changed in 4.15, and such a reference won't happen any longer. So the patch isn't necessary for now, although it's not bad to take as a cleanup. And we can eventually put Cc to stable there since it actually works around the issue above for the older kernels -- of course, with more detailed descriptions about the background.
No, that's just tinkering. The proper fix is to make that code robust.
Something like the completely untested patch below should do the trick.
Thanks,
tglx --- diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f3557a1eb562..02e6a3cc0d74 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -58,6 +58,9 @@ static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data) while (irq_data->parent_data) irq_data = irq_data->parent_data;
+ if (irq_data->domain != x86_vector_domain) + return NULL; + return irq_data->chip_data; }
On Wed, Dec 13, 2017 at 03:06:55PM +0100, Thomas Gleixner wrote:
On Wed, 13 Dec 2017, Takashi Iwai wrote:
On Wed, 13 Dec 2017 12:35:54 +0100, Thomas Gleixner wrote:
On Mon, 11 Dec 2017, Anand, Jerome wrote:
> On Fri, 8 Dec 2017, Ville Syrjälä wrote: > > > On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote: > > > The chip data of HDMI LPE audio is set to drm_i915_private which > > > is not consistent with the expectation by x86 APIC driver. > > > > Hmm. Why is the apic code looking at data for an irq chip it > > hasn't created? > >
apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
How does APIC code end up to touch that interrupt at all? Call stack please.
It's found in the bugzilla referred in the patch: https://bugs.freedesktop.org/show_bug.cgi?id=103731
[ 87.353072] irq 298 idata->chip->name hdmi_lpe_audio_irqchip [ 87.353072] irq 298 apic_chip_data [ 87.353073] irq 298 data->domain is NULL [ 87.353120] BUG: unable to handle kernel NULL pointer dereference at (null) [ 87.353132] IP: setup_vector_irq+0x1ba/0x230 [ 87.353133] PGD 0
If my understanding is correct, it happens only with 4.14 and earlier kernels where __setup_vector_irq() loops over the all irqs:
static void __setup_vector_irq(int cpu) { struct apic_chip_data *data; struct irq_desc *desc; int irq, vector;
/* Mark the inuse vectors */ for_each_irq_desc(irq, desc) { struct irq_data *idata = irq_desc_get_irq_data(desc);
data = apic_chip_data(idata); if (!data || !cpumask_test_cpu(cpu, data->domain)) continue; ....
And since we have assigned a non-APIC chip data in the driver, the code above refers to a wrong object, leading to Oops.
Bah crap. This information should have been provided earlier instead of handwavy 'doesnt work with CONFIG_FOO and hotplug'.
As a further note, the setup_vector_irq() code has been changed in 4.15, and such a reference won't happen any longer. So the patch isn't necessary for now, although it's not bad to take as a cleanup. And we can eventually put Cc to stable there since it actually works around the issue above for the older kernels -- of course, with more detailed descriptions about the background.
No, that's just tinkering. The proper fix is to make that code robust.
Something like the completely untested patch below should do the trick.
Thanks,
tglx
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f3557a1eb562..02e6a3cc0d74 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -58,6 +58,9 @@ static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data) while (irq_data->parent_data) irq_data = irq_data->parent_data;
- if (irq_data->domain != x86_vector_domain)
return NULL;
- return irq_data->chip_data;
}
Did this patch resolve the issue?
On Mon, Jan 29, 2018 at 07:09:22PM +0200, Ville Syrjälä wrote:
On Wed, Dec 13, 2017 at 03:06:55PM +0100, Thomas Gleixner wrote:
On Wed, 13 Dec 2017, Takashi Iwai wrote:
On Wed, 13 Dec 2017 12:35:54 +0100, Thomas Gleixner wrote:
On Mon, 11 Dec 2017, Anand, Jerome wrote: > > On Fri, 8 Dec 2017, Ville Syrjälä wrote: > > > > > On Fri, Dec 08, 2017 at 05:33:23PM +0800, Augustine.Chen wrote: > > > > The chip data of HDMI LPE audio is set to drm_i915_private which > > > > is not consistent with the expectation by x86 APIC driver. > > > > > > Hmm. Why is the apic code looking at data for an irq chip it > > > hasn't created? > > > > > apic code expects an irq domain to be place as generic approach.
APIC code does not even see that interrupt at all. It's completely disconnected.
That's the problem - APIC just converts the chip data to its internal format and fails.
How does APIC code end up to touch that interrupt at all? Call stack please.
It's found in the bugzilla referred in the patch: https://bugs.freedesktop.org/show_bug.cgi?id=103731
[ 87.353072] irq 298 idata->chip->name hdmi_lpe_audio_irqchip [ 87.353072] irq 298 apic_chip_data [ 87.353073] irq 298 data->domain is NULL [ 87.353120] BUG: unable to handle kernel NULL pointer dereference at (null) [ 87.353132] IP: setup_vector_irq+0x1ba/0x230 [ 87.353133] PGD 0
If my understanding is correct, it happens only with 4.14 and earlier kernels where __setup_vector_irq() loops over the all irqs:
static void __setup_vector_irq(int cpu) { struct apic_chip_data *data; struct irq_desc *desc; int irq, vector;
/* Mark the inuse vectors */ for_each_irq_desc(irq, desc) { struct irq_data *idata = irq_desc_get_irq_data(desc);
data = apic_chip_data(idata); if (!data || !cpumask_test_cpu(cpu, data->domain)) continue; ....
And since we have assigned a non-APIC chip data in the driver, the code above refers to a wrong object, leading to Oops.
Bah crap. This information should have been provided earlier instead of handwavy 'doesnt work with CONFIG_FOO and hotplug'.
As a further note, the setup_vector_irq() code has been changed in 4.15, and such a reference won't happen any longer. So the patch isn't necessary for now, although it's not bad to take as a cleanup. And we can eventually put Cc to stable there since it actually works around the issue above for the older kernels -- of course, with more detailed descriptions about the background.
No, that's just tinkering. The proper fix is to make that code robust.
Something like the completely untested patch below should do the trick.
Thanks,
tglx
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f3557a1eb562..02e6a3cc0d74 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -58,6 +58,9 @@ static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data) while (irq_data->parent_data) irq_data = irq_data->parent_data;
- if (irq_data->domain != x86_vector_domain)
return NULL;
- return irq_data->chip_data;
}
Did this patch resolve the issue?
Ping.
participants (6)
-
Anand, Jerome
-
Augustine.Chen
-
Chen, Augustine
-
Takashi Iwai
-
Thomas Gleixner
-
Ville Syrjälä