[PATCH 0/6] ASoC/soundwire: qcom: correctly probe devices after link init
Hi,
Dependencies ============ 1. ASoC codec: changes are independent, however they should rather come the same cycle as Soundwire changes, to avoid new warning and small delay.
2. Soundwire: changes (context) depend on: https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.... https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@lina...
Problems solved =============== Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence: 1. request_threaded_irq() 2. sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers. 3. qcom_swrm_init() - which initializes the link/bus and enables interrupts.
Any access to device registers at (2) above, will fail because link/bus is not yet initialized.
Cc: Patrick Lai quic_plai@quicinc.com
Best regards, Krzysztof
Dmitry Torokhov (1): ASoC: wcd938x: switch to using gpiod API
Krzysztof Kozlowski (5): ASoC: codecs: wcd938x: Keep device in reset till bind ASoC: codecs: wcd938x: Check for enumeration before using TX device soundwire: qcom: drop unused struct qcom_swrm_ctrl members soudnwire: master: protect concurrecnt check for bus->md soundwire: qcom: do not probe devices before bus/link init
drivers/soundwire/master.c | 7 ++- drivers/soundwire/qcom.c | 92 +++++++++++++++++++++++++++++--------- sound/soc/codecs/wcd938x.c | 44 +++++++++++------- 3 files changed, 107 insertions(+), 36 deletions(-)
From: Dmitry Torokhov dmitry.torokhov@gmail.com
Switch the driver from legacy gpio API that is deprecated to the newer gpiod API that respects line polarities described in ACPI/DT.
Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Tested-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org [krzysztof: rebased on recent dev_err_probe() changes] Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
Cc: Patrick Lai quic_plai@quicinc.com --- sound/soc/codecs/wcd938x.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index 11b264a63b04..33fd8fdde9fd 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -6,12 +6,14 @@ #include <linux/platform_device.h> #include <linux/device.h> #include <linux/delay.h> +#include <linux/err.h> #include <linux/gpio/consumer.h> +#include <linux/irq.h> +#include <linux/irqdomain.h> #include <linux/kernel.h> #include <linux/pm_runtime.h> #include <linux/component.h> #include <sound/tlv.h> -#include <linux/of_gpio.h> #include <linux/of.h> #include <sound/jack.h> #include <sound/pcm.h> @@ -194,7 +196,7 @@ struct wcd938x_priv { int flyback_cur_det_disable; int ear_rx_path; int variant; - int reset_gpio; + struct gpio_desc *reset_gpio; struct gpio_desc *us_euro_gpio; u32 micb1_mv; u32 micb2_mv; @@ -4234,16 +4236,16 @@ static int wcd938x_populate_dt_data(struct wcd938x_priv *wcd938x, struct device struct wcd_mbhc_config *cfg = &wcd938x->mbhc_cfg; int ret;
- wcd938x->reset_gpio = of_get_named_gpio(dev->of_node, "reset-gpios", 0); - if (wcd938x->reset_gpio < 0) - return dev_err_probe(dev, wcd938x->reset_gpio, - "Failed to get reset gpio\n"); + wcd938x->reset_gpio = devm_gpiod_get(dev, "reset", GPIOD_ASIS); + ret = PTR_ERR_OR_ZERO(wcd938x->reset_gpio); + if (ret) + return dev_err_probe(dev, ret, "Failed to get reset gpio\n");
wcd938x->us_euro_gpio = devm_gpiod_get_optional(dev, "us-euro", GPIOD_OUT_LOW); - if (IS_ERR(wcd938x->us_euro_gpio)) - return dev_err_probe(dev, PTR_ERR(wcd938x->us_euro_gpio), - "us-euro swap Control GPIO not found\n"); + ret = PTR_ERR_OR_ZERO(wcd938x->us_euro_gpio); + if (ret) + return dev_err_probe(dev, ret, "us-euro swap Control GPIO not found\n");
cfg->swap_gnd_mic = wcd938x_swap_gnd_mic;
@@ -4278,11 +4280,11 @@ static int wcd938x_populate_dt_data(struct wcd938x_priv *wcd938x, struct device
static int wcd938x_reset(struct wcd938x_priv *wcd938x) { - gpio_direction_output(wcd938x->reset_gpio, 0); - /* 20us sleep required after pulling the reset gpio to LOW */ + gpiod_set_value_cansleep(wcd938x->reset_gpio, 1); + /* 20us sleep required after asserting the reset gpio */ usleep_range(20, 30); - gpio_set_value(wcd938x->reset_gpio, 1); - /* 20us sleep required after pulling the reset gpio to HIGH */ + gpiod_set_value_cansleep(wcd938x->reset_gpio, 0); + /* 20us sleep required after releasing the reset gpio */ usleep_range(20, 30);
return 0;
On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
- gpio_direction_output(wcd938x->reset_gpio, 0);
- /* 20us sleep required after pulling the reset gpio to LOW */
- gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
- /* 20us sleep required after asserting the reset gpio */
This is inverting the sense of the GPIO in the API from active low to active high which will mean we're introducing a new reliance on having the signal described as active low in DT. That's an ABI concern.
I remain deeply unconvinced that remapping active low outputs like this in the GPIO API is helping.
On 20/04/2023 13:58, Mark Brown wrote:
On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
- gpio_direction_output(wcd938x->reset_gpio, 0);
- /* 20us sleep required after pulling the reset gpio to LOW */
- gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
- /* 20us sleep required after asserting the reset gpio */
This is inverting the sense of the GPIO in the API from active low to active high which will mean we're introducing a new reliance on having the signal described as active low in DT. That's an ABI concern.
It's bringing it to the correct level. Old code was not respecting the DTS thus if such DTS came with inverted design, the driver would not work.
We were already fixing the upstream DTS users and I thought all of them are fixed since long time (half a year) or even correct from the beginning. Now I found one more case with incorrect level, which I will fix.
I remain deeply unconvinced that remapping active low outputs like this in the GPIO API is helping.
The code is mapping them to correct state. The previous state was incorrect and did not allow to handle active high (which can happen). This is the effort to make code correct - driver and DTS.
Best regards, Krzysztof
On 20/04/2023 14:30, Krzysztof Kozlowski wrote:
On 20/04/2023 13:58, Mark Brown wrote:
On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
- gpio_direction_output(wcd938x->reset_gpio, 0);
- /* 20us sleep required after pulling the reset gpio to LOW */
- gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
- /* 20us sleep required after asserting the reset gpio */
This is inverting the sense of the GPIO in the API from active low to active high which will mean we're introducing a new reliance on having the signal described as active low in DT. That's an ABI concern.
It's bringing it to the correct level. Old code was not respecting the DTS thus if such DTS came with inverted design, the driver would not work.
We were already fixing the upstream DTS users and I thought all of them are fixed since long time (half a year) or even correct from the beginning. Now I found one more case with incorrect level, which I will fix.
No, my bad - all upstream DTSes are corrected since half year.
I remain deeply unconvinced that remapping active low outputs like this in the GPIO API is helping.
The code is mapping them to correct state. The previous state was incorrect and did not allow to handle active high (which can happen). This is the effort to make code correct - driver and DTS.
Best regards, Krzysztof
On Thu, Apr 20, 2023 at 02:30:17PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 13:58, Mark Brown wrote:
On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
- gpio_direction_output(wcd938x->reset_gpio, 0);
- /* 20us sleep required after pulling the reset gpio to LOW */
- gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
- /* 20us sleep required after asserting the reset gpio */
This is inverting the sense of the GPIO in the API from active low to active high which will mean we're introducing a new reliance on having the signal described as active low in DT. That's an ABI concern.
It's bringing it to the correct level. Old code was not respecting the DTS thus if such DTS came with inverted design, the driver would not work.
Sure, but OTOH if the user didn't bother specifying as active low it would work. I suspect it's more likely that someone missed a flag that had no practical impact in DT than that someone would add an inverter to their design.
We were already fixing the upstream DTS users and I thought all of them are fixed since long time (half a year) or even correct from the beginning. Now I found one more case with incorrect level, which I will fix.
That's just upstream, what about any downstream users?
I remain deeply unconvinced that remapping active low outputs like this in the GPIO API is helping.
The code is mapping them to correct state. The previous state was incorrect and did not allow to handle active high (which can happen). This is the effort to make code correct - driver and DTS.
We could handle inversions through an explicit property if that were needed, that would be a less problematic transition and clearer in the consumer code.
On 20/04/2023 15:00, Mark Brown wrote:
On Thu, Apr 20, 2023 at 02:30:17PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 13:58, Mark Brown wrote:
On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
- gpio_direction_output(wcd938x->reset_gpio, 0);
- /* 20us sleep required after pulling the reset gpio to LOW */
- gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
- /* 20us sleep required after asserting the reset gpio */
This is inverting the sense of the GPIO in the API from active low to active high which will mean we're introducing a new reliance on having the signal described as active low in DT. That's an ABI concern.
It's bringing it to the correct level. Old code was not respecting the DTS thus if such DTS came with inverted design, the driver would not work.
Sure, but OTOH if the user didn't bother specifying as active low it would work. I suspect it's more likely that someone missed a flag that had no practical impact in DT than that someone would add an inverter to their design.
We were already fixing the upstream DTS users and I thought all of them are fixed since long time (half a year) or even correct from the beginning. Now I found one more case with incorrect level, which I will fix.
That's just upstream, what about any downstream users?
Life of downstream. We all know the drill: merge your DTS or suffer. The WCD938x codecs are moderately new, so I do not expect many downstream users. They are in theory possible, because driver was merged in v5.14-rc1 and for the newest products Qualcomm uses v5.15. Although now it is v5.15, but the time driver was merged, maybe it was v5.10.
I could rework this patch to provide backwards compatible solution like I did for WSA: https://lore.kernel.org/all/20230102114152.297305-4-krzysztof.kozlowski@lina...
There are downsides of it, but as you pointed out - it's actually very rare to have the signal inverted in hardware.
I remain deeply unconvinced that remapping active low outputs like this in the GPIO API is helping.
The code is mapping them to correct state. The previous state was incorrect and did not allow to handle active high (which can happen). This is the effort to make code correct - driver and DTS.
We could handle inversions through an explicit property if that were needed, that would be a less problematic transition and clearer in the consumer code.
I am not sure if it is worth. The DTS is supposed to describe hardware, so even if reset pin flag was not effective, it is a mistake to describe it as ACTIVE_HIGH. Do we care about keeping broken code happy? If yes, then property is the way to go. If partially, then I can add backwards-compatible approach like I mentioned above.
Best regards, Krzysztof
On Thu, Apr 20, 2023 at 04:16:59PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 15:00, Mark Brown wrote:
That's just upstream, what about any downstream users?
Life of downstream. We all know the drill: merge your DTS or suffer. The
No, the DT is supposed to be an ABI. The point in having a domain specific language with a compiler is to allow device trees to be distributed independently of the kernel.
I could rework this patch to provide backwards compatible solution like I did for WSA: https://lore.kernel.org/all/20230102114152.297305-4-krzysztof.kozlowski@lina...
There we go...
We could handle inversions through an explicit property if that were needed, that would be a less problematic transition and clearer in the consumer code.
I am not sure if it is worth. The DTS is supposed to describe hardware, so even if reset pin flag was not effective, it is a mistake to describe it as ACTIVE_HIGH. Do we care about keeping broken code happy? If yes, then property is the way to go. If partially, then I can add backwards-compatible approach like I mentioned above.
It's not just this individual transition, it's the whole thing with encoding the polarity of the signal at all - it's a layer of abstraction that feels like it introduces at least as many problems as it solves, and requiring configuration on every single system integration doesn't feel like the right choice in general.
On 20/04/2023 18:28, Mark Brown wrote:
On Thu, Apr 20, 2023 at 04:16:59PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 15:00, Mark Brown wrote:
That's just upstream, what about any downstream users?
Life of downstream. We all know the drill: merge your DTS or suffer. The
No, the DT is supposed to be an ABI.
No, the DT bindings are the ABI. We are supposed not to break user-space, but out-of-tree users of drivers are not ABI by itself. Bindings are. If out-of-tree users make mistakes in their DTS and do not want to upstream it, it's their choice but it does not come for free.
The point in having a domain specific language with a compiler is to allow device trees to be distributed independently of the kernel.
When it is written incorrectly - wrong flag used for GPIO - there is no requirement to support it.
I could rework this patch to provide backwards compatible solution like I did for WSA: https://lore.kernel.org/all/20230102114152.297305-4-krzysztof.kozlowski@lina...
There we go...
We could handle inversions through an explicit property if that were needed, that would be a less problematic transition and clearer in the consumer code.
I am not sure if it is worth. The DTS is supposed to describe hardware, so even if reset pin flag was not effective, it is a mistake to describe it as ACTIVE_HIGH. Do we care about keeping broken code happy? If yes, then property is the way to go. If partially, then I can add backwards-compatible approach like I mentioned above.
It's not just this individual transition, it's the whole thing with encoding the polarity of the signal at all - it's a layer of abstraction that feels like it introduces at least as many problems as it solves, and requiring configuration on every single system integration doesn't feel like the right choice in general.
Choosing appropriate flag for GPIO in DTS is not difficult. It was skipped because we rarely cared in the drivers, but it should have been chosen correctly. The same about interrupt flags. We had many DTS for many times marking all possible interrupts as IRQ_TYPE_NONE. Did it matter for many drivers and setups? No, was perfectly "fine". Is it correct from DTS point of view. Also no.
Best regards, Krzysztof
On Thu, Apr 20, 2023 at 07:51:27PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 18:28, Mark Brown wrote:
On Thu, Apr 20, 2023 at 04:16:59PM +0200, Krzysztof Kozlowski wrote:
Life of downstream. We all know the drill: merge your DTS or suffer. The
No, the DT is supposed to be an ABI.
No, the DT bindings are the ABI. We are supposed not to break user-space, but out-of-tree users of drivers are not ABI by itself. Bindings are. If out-of-tree users make mistakes in their DTS and do not want to upstream it, it's their choice but it does not come for free.
This is absolutely not the case, users should be able to ship DTs without upstreaming them and run multiple operating systems on top of a single DT - ideally boards would ship with DTs in firmware and people would be able to install generic OSs onto them with just off the shelf install media. This is even a thing that people have actually done, both non-FDT systems like SPARC and the PowerPC systems from Apple and a few FDT ones like Synquacer.
The enormous costs of DT would hardly be worth it if it were purely an in tree thing.
The point in having a domain specific language with a compiler is to allow device trees to be distributed independently of the kernel.
When it is written incorrectly - wrong flag used for GPIO - there is no requirement to support it.
If it worked was it ever really wrong (and note that the bindings may not always be super clear...)? While there is a point at which things never worked, can be fixed and we don't need to care about it or where we know the userbase well enough to know there won't be any issue those shouldn't be the default and should generally be avoided. Where there is a good reason to break compatibility it should be something we're actively deciding to do for a clear reason having considered the tradeoffs, not something that gets done on a whim without even mentioning it.
It's not just this individual transition, it's the whole thing with encoding the polarity of the signal at all - it's a layer of abstraction that feels like it introduces at least as many problems as it solves, and requiring configuration on every single system integration doesn't feel like the right choice in general.
Choosing appropriate flag for GPIO in DTS is not difficult. It was skipped because we rarely cared in the drivers, but it should have been chosen correctly. The same about interrupt flags. We had many DTS for many times marking all possible interrupts as IRQ_TYPE_NONE. Did it matter for many drivers and setups? No, was perfectly "fine". Is it correct from DTS point of view. Also no.
There's no natural definition of "correct" here though - it's just picking something in a binding. If someone for example flips the label on a signal from reset to enable (perhaps during review) that ends up changing active high to active low, and really I'm not sure how much we're really winning compared to just having code in the end consumer which just directly says what value it wants the physical signal to have.
My point is not that we haven't defined things such that the user has to specify if something is active high or active low, it's that it feels like it's more trouble han it's worth.
The Soundwire master expects that bus devices will be kept in reset state and brought out of it in their Soundwire probe() or bind(). Keeping it in reset state avoids early new Soundwire device interrupts in the master. Fix this in WCD938x platform driver by moving the reset toggle code from platform probe() to component bind().
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
I wasn't sure whether this deserves a Fixes tag. It looks like a fix, but OTOH, I don't think Soundwire master expectation is documented anywhere.
Cc: Patrick Lai quic_plai@quicinc.com --- sound/soc/codecs/wcd938x.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index 33fd8fdde9fd..212667a7249c 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -4236,7 +4236,8 @@ static int wcd938x_populate_dt_data(struct wcd938x_priv *wcd938x, struct device struct wcd_mbhc_config *cfg = &wcd938x->mbhc_cfg; int ret;
- wcd938x->reset_gpio = devm_gpiod_get(dev, "reset", GPIOD_ASIS); + /* Keep device in reset status till wcd938x_bind() */ + wcd938x->reset_gpio = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH); ret = PTR_ERR_OR_ZERO(wcd938x->reset_gpio); if (ret) return dev_err_probe(dev, ret, "Failed to get reset gpio\n"); @@ -4407,6 +4408,8 @@ static int wcd938x_bind(struct device *dev) return -EINVAL; }
+ wcd938x_reset(wcd938x); + wcd938x->regmap = devm_regmap_init_sdw(wcd938x->tx_sdw_dev, &wcd938x_regmap_config); if (IS_ERR(wcd938x->regmap)) { dev_err(dev, "%s: tx csr regmap not found\n", __func__); @@ -4508,8 +4511,6 @@ static int wcd938x_probe(struct platform_device *pdev) if (ret) return ret;
- wcd938x_reset(wcd938x); - ret = component_master_add_with_match(dev, &wcd938x_comp_ops, match); if (ret) return ret;
Qualcomm WCD938x Soundwire codecs come as two Soundwire devices - TX and RX - on two Soundwire buses. In DTS they are represented as three device nodes: Soundwire TX, Soundwire RX and the platform codec node (binding to this driver).
Probing (and Soundwire enumeration) of all devices can happen in any order, but only the Soundwire TX WCD938x device is used for accessing actual WCD938x registers. It is possible that component bind() in the platform driver will be called too early, before the Soundwire TX device is fully enumerated. This might work or might not, but we cannot handle it correctly from the codec driver. It's job for Soundwire master to bring up devices in correct order. At least add some simple warning, so such condition will not be undetected.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
Cc: Patrick Lai quic_plai@quicinc.com --- sound/soc/codecs/wcd938x.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index 212667a7249c..e8e07e120fa1 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -77,6 +77,8 @@ #define WCD938X_MBHC_MOISTURE_RREF R_24_KOHM #define WCD_MBHC_HS_V_MAX 1600
+#define WCD938X_ENUM_TIMEOUT_MS 500 + #define WCD938X_EAR_PA_GAIN_TLV(xname, reg, shift, max, invert, tlv_array) \ { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, \ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ |\ @@ -4425,6 +4427,15 @@ static int wcd938x_bind(struct device *dev) wcd938x->sdw_priv[AIF1_PB]->slave_irq = wcd938x->virq; wcd938x->sdw_priv[AIF1_CAP]->slave_irq = wcd938x->virq;
+ /* + * Before any TX slave regmap usage, be sure the TX slave is actually + * enumerated. + */ + ret = wait_for_completion_timeout(&wcd938x->tx_sdw_dev->enumeration_complete, + msecs_to_jiffies(WCD938X_ENUM_TIMEOUT_MS)); + if (!ret) + dev_warn(dev, "Enumeration timeout in bind, possible failures in accessing registers\n"); + ret = wcd938x_set_micbias_data(wcd938x); if (ret < 0) { dev_err(dev, "%s: bad micbias pdata\n", __func__);
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Qualcomm WCD938x Soundwire codecs come as two Soundwire devices - TX and RX - on two Soundwire buses. In DTS they are represented as three device nodes: Soundwire TX, Soundwire RX and the platform codec node (binding to this driver).
Probing (and Soundwire enumeration) of all devices can happen in any order, but only the Soundwire TX WCD938x device is used for accessing actual WCD938x registers. It is possible that component bind() in the platform driver will be called too early, before the Soundwire TX device is fully enumerated. This might work or might not, but we cannot handle it correctly from the codec driver. It's job for Soundwire master to bring up devices in correct order.
That last sentence isn't aligned with the way enumeration works in general for SoundWire.
The Manager starts the clock, usually after a bus reset, and waits for Peripherals to signal their presence with Device0 Attached.
If multiple Peripherals are attached as Device0, the enumeration will resolve conflicts at the hardware level, and the Manager *cannot* control the order of enumeration; the order is defined by the values in the devID registers, whichever Peripheral has the highest value in the DevID registers wins the enumeration, and others have to back-off and be enumerated later.
Probing and enumeration are also different concepts. The SoundWire design allows for drivers to be probed even in the absence of any active hardware. This was added on purpose so that the driver could e.g. program a GPIO or talk to a power-management chip to allow SoundWire devices to start interacting with the bus.
see also suggestion below...
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Cc: Patrick Lai quic_plai@quicinc.com
sound/soc/codecs/wcd938x.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index 212667a7249c..e8e07e120fa1 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -77,6 +77,8 @@ #define WCD938X_MBHC_MOISTURE_RREF R_24_KOHM #define WCD_MBHC_HS_V_MAX 1600
+#define WCD938X_ENUM_TIMEOUT_MS 500
#define WCD938X_EAR_PA_GAIN_TLV(xname, reg, shift, max, invert, tlv_array) \ { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, \ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ |\ @@ -4425,6 +4427,15 @@ static int wcd938x_bind(struct device *dev) wcd938x->sdw_priv[AIF1_PB]->slave_irq = wcd938x->virq; wcd938x->sdw_priv[AIF1_CAP]->slave_irq = wcd938x->virq;
- /*
* Before any TX slave regmap usage, be sure the TX slave is actually
* enumerated.
*/
...
the alternative is to move regmap to be cache-only in the probe and remove the cache-only property when the device is enumerated.
That's a trick that's used for all resume cases in codecs in Intel platforms, and we need to extend it for the startup cases as well.
- ret = wait_for_completion_timeout(&wcd938x->tx_sdw_dev->enumeration_complete,
msecs_to_jiffies(WCD938X_ENUM_TIMEOUT_MS));
- if (!ret)
dev_warn(dev, "Enumeration timeout in bind, possible failures in accessing registers\n");
- ret = wcd938x_set_micbias_data(wcd938x); if (ret < 0) { dev_err(dev, "%s: bad micbias pdata\n", __func__);
On Thu, Apr 20, 2023 at 09:18:15AM -0500, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
- /*
* Before any TX slave regmap usage, be sure the TX slave is actually
* enumerated.
*/
the alternative is to move regmap to be cache-only in the probe and remove the cache-only property when the device is enumerated.
Right, it's generally a better approach unless you need the hardware to actually do something immediately - if you're just setting up what's needed whenver things actually get started then using cache only mode is much less complicated.
On 20/04/2023 16:18, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Qualcomm WCD938x Soundwire codecs come as two Soundwire devices - TX and RX - on two Soundwire buses. In DTS they are represented as three device nodes: Soundwire TX, Soundwire RX and the platform codec node (binding to this driver).
Probing (and Soundwire enumeration) of all devices can happen in any order, but only the Soundwire TX WCD938x device is used for accessing actual WCD938x registers. It is possible that component bind() in the platform driver will be called too early, before the Soundwire TX device is fully enumerated. This might work or might not, but we cannot handle it correctly from the codec driver. It's job for Soundwire master to bring up devices in correct order.
That last sentence isn't aligned with the way enumeration works in general for SoundWire.
I was rather referring to driver point of view. The Qualcomm Soundwire should work, not expect devices to be powered off during their bind...
The Manager starts the clock, usually after a bus reset, and waits for Peripherals to signal their presence with Device0 Attached.
If multiple Peripherals are attached as Device0, the enumeration will resolve conflicts at the hardware level, and the Manager *cannot* control the order of enumeration; the order is defined by the values in the devID registers, whichever Peripheral has the highest value in the DevID registers wins the enumeration, and others have to back-off and be enumerated later.
Probing and enumeration are also different concepts. The SoundWire design allows for drivers to be probed even in the absence of any active hardware. This was added on purpose so that the driver could e.g. program a GPIO or talk to a power-management chip to allow SoundWire devices to start interacting with the bus.
see also suggestion below...
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Cc: Patrick Lai quic_plai@quicinc.com
sound/soc/codecs/wcd938x.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/sound/soc/codecs/wcd938x.c b/sound/soc/codecs/wcd938x.c index 212667a7249c..e8e07e120fa1 100644 --- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -77,6 +77,8 @@ #define WCD938X_MBHC_MOISTURE_RREF R_24_KOHM #define WCD_MBHC_HS_V_MAX 1600
+#define WCD938X_ENUM_TIMEOUT_MS 500
#define WCD938X_EAR_PA_GAIN_TLV(xname, reg, shift, max, invert, tlv_array) \ { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, .name = xname, \ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ |\ @@ -4425,6 +4427,15 @@ static int wcd938x_bind(struct device *dev) wcd938x->sdw_priv[AIF1_PB]->slave_irq = wcd938x->virq; wcd938x->sdw_priv[AIF1_CAP]->slave_irq = wcd938x->virq;
- /*
* Before any TX slave regmap usage, be sure the TX slave is actually
* enumerated.
*/
...
the alternative is to move regmap to be cache-only in the probe and remove the cache-only property when the device is enumerated.
The driver wants already to use the regmap in RW just few lines below in wcd938x_set_micbias_data().
I guess I could move this entire piece of code to other place...
That's a trick that's used for all resume cases in codecs in Intel platforms, and we need to extend it for the startup cases as well.
Can you point me to some specific piece of driver, so I could see how it is done? It might help me to prepare a better patch for this.
- ret = wait_for_completion_timeout(&wcd938x->tx_sdw_dev->enumeration_complete,
msecs_to_jiffies(WCD938X_ENUM_TIMEOUT_MS));
- if (!ret)
dev_warn(dev, "Enumeration timeout in bind, possible failures in accessing registers\n");
- ret = wcd938x_set_micbias_data(wcd938x); if (ret < 0) { dev_err(dev, "%s: bad micbias pdata\n", __func__);
Best regards, Krzysztof
On Thu, Apr 20, 2023 at 07:57:11PM +0200, Krzysztof Kozlowski wrote:
On 20/04/2023 16:18, Pierre-Louis Bossart wrote:
the alternative is to move regmap to be cache-only in the probe and remove the cache-only property when the device is enumerated.
The driver wants already to use the regmap in RW just few lines below in wcd938x_set_micbias_data().
I guess I could move this entire piece of code to other place...
The register map is fully writeable (modulo any volatile registers of course) in cache only mode, it's just that hardware access is suppressed. The driver needs to resync the cache when transitioning out of cache only mode (often in a runtime PM resume or something), this will ensure any pending changes make it to the hardware.
Drop unused members from the driver state container: struct qcom_swrm_ctrl.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
Cc: Patrick Lai quic_plai@quicinc.com --- drivers/soundwire/qcom.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index fae8640b142b..679990dc3cc4 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -187,12 +187,9 @@ struct qcom_swrm_ctrl { #endif struct completion broadcast; struct completion enumeration; - struct work_struct slave_work; /* Port alloc/free lock */ struct mutex port_lock; struct clk *hclk; - u8 wr_cmd_id; - u8 rd_cmd_id; int irq; unsigned int version; int wake_irq;
The Soundwire master controllers might want to check for bus->md initialization to avoid race between early interrupt and finish of sdw_bus_master_add()/sdw_master_device_add(). Such early interrupt can happen if Soundwire devices are not powered off during their probe.
Add a store release barrier, so the Soundwire controllers can safely check it in concurrent (e.g. in interrupt) way.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
Cc: Patrick Lai quic_plai@quicinc.com --- drivers/soundwire/master.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/soundwire/master.c b/drivers/soundwire/master.c index 9b05c9e25ebe..d5bf13e7e602 100644 --- a/drivers/soundwire/master.c +++ b/drivers/soundwire/master.c @@ -161,7 +161,12 @@ int sdw_master_device_add(struct sdw_bus *bus, struct device *parent, /* add shortcuts to improve code readability/compactness */ md->bus = bus; bus->dev = &md->dev; - bus->md = md; + /* + * Make sure the contents of md is stored before storing bus->md. + * Paired with new slave attached and slave status interrupts + * on the Soundwire master side. + */ + smp_store_release(&bus->md, md);
pm_runtime_set_autosuspend_delay(&bus->md->dev, SDW_MASTER_SUSPEND_DELAY_MS); pm_runtime_use_autosuspend(&bus->md->dev);
typos in commit title...
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
The Soundwire master controllers might want to check for bus->md
Apologies for being pedantic but 'manager' and 'controller' are different concepts in SoundWire, see DisCo spec. It's not a 1:1 mapping, a controller can rely on M managers
initialization to avoid race between early interrupt and finish of sdw_bus_master_add()/sdw_master_device_add(). Such early interrupt can happen if Soundwire devices are not powered off during their probe.
Add a store release barrier, so the Soundwire controllers can safely check it in concurrent (e.g. in interrupt) way.
Can you elaborate on the race condition? I am not following what breaks, and what entity generates the 'early interrupt'.
I am specifically concerned about adding this in common code without any matching smp_load_acquire() - which is only added in the following patch for the Qualcomm manager only, but not added for Intel/AMD managers. Is this not a problem?
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Cc: Patrick Lai quic_plai@quicinc.com
drivers/soundwire/master.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/soundwire/master.c b/drivers/soundwire/master.c index 9b05c9e25ebe..d5bf13e7e602 100644 --- a/drivers/soundwire/master.c +++ b/drivers/soundwire/master.c @@ -161,7 +161,12 @@ int sdw_master_device_add(struct sdw_bus *bus, struct device *parent, /* add shortcuts to improve code readability/compactness */ md->bus = bus; bus->dev = &md->dev;
- bus->md = md;
/*
* Make sure the contents of md is stored before storing bus->md.
* Paired with new slave attached and slave status interrupts
* on the Soundwire master side.
*/
smp_store_release(&bus->md, md);
pm_runtime_set_autosuspend_delay(&bus->md->dev, SDW_MASTER_SUSPEND_DELAY_MS); pm_runtime_use_autosuspend(&bus->md->dev);
On 20/04/2023 18:42, Pierre-Louis Bossart wrote:
typos in commit title...
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
The Soundwire master controllers might want to check for bus->md
Apologies for being pedantic but 'manager' and 'controller' are different concepts in SoundWire, see DisCo spec. It's not a 1:1 mapping, a controller can rely on M managers
I wrote master, not manager. For the Qualcomm case one controller is one master, but in general I try to avoid the master/slave terminology.
initialization to avoid race between early interrupt and finish of sdw_bus_master_add()/sdw_master_device_add(). Such early interrupt can happen if Soundwire devices are not powered off during their probe.
Add a store release barrier, so the Soundwire controllers can safely check it in concurrent (e.g. in interrupt) way.
Can you elaborate on the race condition? I am not following what breaks, and what entity generates the 'early interrupt'.
The condition is explained in next patch. If you think it's better, I can squash it with next.
If the condition is still not clear, drop a note in next patch, so I will elaborate there.
I am specifically concerned about adding this in common code without any matching smp_load_acquire() - which is only added in the following patch for the Qualcomm manager only, but not added for Intel/AMD managers. Is this not a problem?
Shouldn't be. The barrier just won't be effective for these drivers, but that should not be a problem, because I also did not add to these checking bus->md in a concurrent path.
Basically the barrier here is necessary because I want to check bus->md in Qualcomm master interrupt handler.
Best regards, Krzysztof
On 4/20/23 12:27, Krzysztof Kozlowski wrote:
On 20/04/2023 18:42, Pierre-Louis Bossart wrote:
typos in commit title...
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
The Soundwire master controllers might want to check for bus->md
Apologies for being pedantic but 'manager' and 'controller' are different concepts in SoundWire, see DisCo spec. It's not a 1:1 mapping, a controller can rely on M managers
I wrote master, not manager. For the Qualcomm case one controller is one master, but in general I try to avoid the master/slave terminology.
The Soundwire 1.2.1 spec moved away from master/slave and uses manager/peripheral. It's the same concepts, just different terms. At some point we'll update the code, it's just been too busy in 2022/2023 to do this replacement. It doesn't hurt to use the new terms.
initialization to avoid race between early interrupt and finish of sdw_bus_master_add()/sdw_master_device_add(). Such early interrupt can happen if Soundwire devices are not powered off during their probe.
Add a store release barrier, so the Soundwire controllers can safely check it in concurrent (e.g. in interrupt) way.
Can you elaborate on the race condition? I am not following what breaks, and what entity generates the 'early interrupt'.
The condition is explained in next patch. If you think it's better, I can squash it with next.
If the condition is still not clear, drop a note in next patch, so I will elaborate there.
will do.
I am specifically concerned about adding this in common code without any matching smp_load_acquire() - which is only added in the following patch for the Qualcomm manager only, but not added for Intel/AMD managers. Is this not a problem?
Shouldn't be. The barrier just won't be effective for these drivers, but that should not be a problem, because I also did not add to these checking bus->md in a concurrent path.
Basically the barrier here is necessary because I want to check bus->md in Qualcomm master interrupt handler.
I really don't have any understanding or background on what this does.
Is there actually a precedent for this? I mean, dealing with the device/driver model is already complicated, if now we have to be careful on when the device pointer is stored it adds a whole new element of complexity or skillset required to understand the bus operation.
Re-looking at the code, the 'md' variable is allocated in sdw_master_device_add(), initialized with all kinds of values, used by device_register() so presumably when you store the value it's stored somewhere consistent, no?
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence: 1. request_threaded_irq() 2. sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers. 3. qcom_swrm_init() - which initializes the link/bus and enables interrupts.
Any access to device registers at (2) above, will fail because link/bus is not yet initialized.
However the fix is not as simple as moving qcom_swrm_init() before sdw_bus_master_add(), because this will cause early interrupt of new slave attached. The interrupt handler expects bus master (ctrl->bus.md) to be allocated, so this would lead to NULL pointer exception.
Rework the init sequence and change the interrupt handler. The correct sequence fixing accessing device registers before link init is now: 1. qcom_swrm_init() 2. request_threaded_irq() 3. sdw_bus_master_add() which still might cause early interrupts, if Soundwire devices are not in powered off state before their probe. This early interrupt issue is fixed by checking if bus master (ctrl->bus.md) was allocated and if not, scheduling delayed work for enumerating the slave device. Since we actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag from the interrupt, because it is not really valid and driver should use flags provided by DTS.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
---
Change context depends on: https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.... https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@lina...
Cc: Patrick Lai quic_plai@quicinc.com --- drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 17 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 679990dc3cc4..802d939ce7aa 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -19,6 +19,7 @@ #include <linux/slimbus.h> #include <linux/soundwire/sdw.h> #include <linux/soundwire/sdw_registers.h> +#include <linux/workqueue.h> #include <sound/pcm_params.h> #include <sound/soc.h> #include "bus.h" @@ -187,6 +188,7 @@ struct qcom_swrm_ctrl { #endif struct completion broadcast; struct completion enumeration; + struct delayed_work new_slave_work; /* Port alloc/free lock */ struct mutex port_lock; struct clk *hclk; @@ -606,6 +608,37 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus) return 0; }
+static void qcom_swrm_new_slave(struct work_struct *work) +{ + struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl, + new_slave_work.work); + + /* + * All Soundwire slave deviecs are expected to be in reset state (powered down) + * during sdw_bus_master_add(). The slave device should be brougth + * from reset by its probe() or bind() function, as a result of + * sdw_bus_master_add(). + * Add a simple check to avoid NULL pointer except on early interrupts. + * Note that if this condition happens, the slave device will not be + * enumerated. Its driver should be fixed. + * + * smp_load_acquire() paired with sdw_master_device_add(). + */ + if (!smp_load_acquire(&ctrl->bus.md)) { + dev_err(ctrl->dev, + "Got unexpected, early interrupt, device will not be enumerated\n"); + return; + } + + clk_prepare_enable(ctrl->hclk); + + qcom_swrm_get_device_status(ctrl); + qcom_swrm_enumerate(&ctrl->bus); + sdw_handle_slave_status(&ctrl->bus, ctrl->status); + + clk_disable_unprepare(ctrl->hclk); +}; + static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id) { struct qcom_swrm_ctrl *ctrl = dev_id; @@ -668,9 +701,17 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id) dev_dbg(ctrl->dev, "Slave status not changed %x\n", slave_status); } else { - qcom_swrm_get_device_status(ctrl); - qcom_swrm_enumerate(&ctrl->bus); - sdw_handle_slave_status(&ctrl->bus, ctrl->status); + unsigned long delay = 0; + + /* + * See qcom_swrm_new_slave() for + * explanation. smp_load_acquire() paired + * with sdw_master_device_add(). + */ + if (!smp_load_acquire(&ctrl->bus.md)) + delay = 10; + schedule_delayed_work(&ctrl->new_slave_work, + delay); } break; case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET: @@ -780,6 +821,7 @@ static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK; /* Mask soundwire interrupts */ + if (ctrl->version < SWRM_VERSION_2_0_0) ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR], SWRM_INTERRUPT_STATUS_RMSK); @@ -1485,6 +1527,7 @@ static int qcom_swrm_probe(struct platform_device *pdev) mutex_init(&ctrl->port_lock); init_completion(&ctrl->broadcast); init_completion(&ctrl->enumeration); + INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
ctrl->bus.ops = &qcom_swrm_ops; ctrl->bus.port_ops = &qcom_swrm_port_ops; @@ -1514,9 +1557,10 @@ static int qcom_swrm_probe(struct platform_device *pdev)
ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
+ qcom_swrm_init(ctrl); + ret = devm_request_threaded_irq(dev, ctrl->irq, NULL, qcom_swrm_irq_handler, - IRQF_TRIGGER_RISING | IRQF_ONESHOT, "soundwire", ctrl); if (ret) { @@ -1524,18 +1568,6 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- ctrl->wake_irq = of_irq_get(dev->of_node, 1); - if (ctrl->wake_irq > 0) { - ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL, - qcom_swrm_wake_irq_handler, - IRQF_TRIGGER_HIGH | IRQF_ONESHOT, - "swr_wake_irq", ctrl); - if (ret) { - dev_err(dev, "Failed to request soundwire wake irq\n"); - goto err_init; - } - } - pm_runtime_set_autosuspend_delay(dev, 3000); pm_runtime_use_autosuspend(dev); pm_runtime_mark_last_busy(dev); @@ -1549,7 +1581,18 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- qcom_swrm_init(ctrl); + ctrl->wake_irq = of_irq_get(dev->of_node, 1); + if (ctrl->wake_irq > 0) { + ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL, + qcom_swrm_wake_irq_handler, + IRQF_TRIGGER_HIGH | IRQF_ONESHOT, + "swr_wake_irq", ctrl); + if (ret) { + dev_err(dev, "Failed to request soundwire wake irq\n"); + goto err_init; + } + } + wait_for_completion_timeout(&ctrl->enumeration, msecs_to_jiffies(TIMEOUT_MS)); ret = qcom_swrm_register_dais(ctrl); @@ -1589,6 +1632,18 @@ static int qcom_swrm_remove(struct platform_device *pdev) { struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
+ /* + * Mask interrupts to be sure no delayed work can be scheduler after + * removing Soundwire bus master. + */ + if (ctrl->version < SWRM_VERSION_2_0_0) + ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR], + 0); + if (ctrl->mmio) + ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN], + 0); + + cancel_delayed_work_sync(&ctrl->new_slave_work); sdw_bus_master_delete(&ctrl->bus); clk_disable_unprepare(ctrl->hclk);
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
The problem statement is really hard to follow.
The peripheral can only be enumerated AFTER a) the manager starts the bus clock and transmitting PING frames b) the peripheral detects the sync words for 16 frames in a row. c) the peripheral reports as Attached in the Device0 slot
That sequence holds whether the manager does the enumeration manually or relies on hardware-assisted autoenumeration. This is what the spec requires.
So why can't the bus clock start be controlled by the manager driver, and started once all required initializations are done?
I mean, there's got to be some sort of parent-child hierarchy with manager first, peripheral(s) second, I don't get how these steps could be inverted or race.
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
- request_threaded_irq()
- sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers.
not if the bus clock hasn't started...
- qcom_swrm_init() - which initializes the link/bus and enables interrupts.
if you can move the clock start in 3) then problem solved. Why can't this be done?
Any access to device registers at (2) above, will fail because link/bus is not yet initialized.
However the fix is not as simple as moving qcom_swrm_init() before sdw_bus_master_add(), because this will cause early interrupt of new slave attached. The interrupt handler expects bus master (ctrl->bus.md) to be allocated, so this would lead to NULL pointer exception.
Rework the init sequence and change the interrupt handler. The correct sequence fixing accessing device registers before link init is now:
- qcom_swrm_init()
- request_threaded_irq()
- sdw_bus_master_add()
which still might cause early interrupts, if Soundwire devices are not in powered off state before their probe. This early interrupt issue is
You'd need to clarify in which step the bus clock starts. In general, you want to clock started last.
fixed by checking if bus master (ctrl->bus.md) was allocated and if not, scheduling delayed work for enumerating the slave device. Since we actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag from the interrupt, because it is not really valid and driver should use flags provided by DTS.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Change context depends on: https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.... https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@lina...
Cc: Patrick Lai quic_plai@quicinc.com
drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 17 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 679990dc3cc4..802d939ce7aa 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -19,6 +19,7 @@ #include <linux/slimbus.h> #include <linux/soundwire/sdw.h> #include <linux/soundwire/sdw_registers.h> +#include <linux/workqueue.h> #include <sound/pcm_params.h> #include <sound/soc.h> #include "bus.h" @@ -187,6 +188,7 @@ struct qcom_swrm_ctrl { #endif struct completion broadcast; struct completion enumeration;
- struct delayed_work new_slave_work; /* Port alloc/free lock */ struct mutex port_lock; struct clk *hclk;
@@ -606,6 +608,37 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus) return 0; }
+static void qcom_swrm_new_slave(struct work_struct *work) +{
- struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl,
new_slave_work.work);
- /*
* All Soundwire slave deviecs are expected to be in reset state (powered down)
* during sdw_bus_master_add(). The slave device should be brougth
typo/grammar: brought out of reset
* from reset by its probe() or bind() function, as a result of
* sdw_bus_master_add().
* Add a simple check to avoid NULL pointer except on early interrupts.
* Note that if this condition happens, the slave device will not be
* enumerated. Its driver should be fixed.
???
The codec driver is NEVER involved in enumeration.
The only thing a codec driver should do is provide a callback to be notified of a status change for additional initialization, but the enumeration can be done even in the absence of a codec driver.
The proof in the pudding is that you can 'blacklist' a codec driver and bind it later, after the hardware is enumerated. You can even unbind a codec driver and nothing bad will happen (we hardened that sequence last year).
probe != enumeration != initialization for SoundWire codecs.
Probe and enumeration can happen in any order Initialization can only happen after both probe and enumeration happened.
*
* smp_load_acquire() paired with sdw_master_device_add().
*/
- if (!smp_load_acquire(&ctrl->bus.md)) {
dev_err(ctrl->dev,
"Got unexpected, early interrupt, device will not be enumerated\n");
return;
- }
- clk_prepare_enable(ctrl->hclk);
- qcom_swrm_get_device_status(ctrl);
- qcom_swrm_enumerate(&ctrl->bus);
- sdw_handle_slave_status(&ctrl->bus, ctrl->status);
- clk_disable_unprepare(ctrl->hclk);
+};
static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id) { struct qcom_swrm_ctrl *ctrl = dev_id; @@ -668,9 +701,17 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id) dev_dbg(ctrl->dev, "Slave status not changed %x\n", slave_status); } else {
qcom_swrm_get_device_status(ctrl);
qcom_swrm_enumerate(&ctrl->bus);
sdw_handle_slave_status(&ctrl->bus, ctrl->status);
unsigned long delay = 0;
/*
* See qcom_swrm_new_slave() for
* explanation. smp_load_acquire() paired
* with sdw_master_device_add().
*/
if (!smp_load_acquire(&ctrl->bus.md))
delay = 10;
schedule_delayed_work(&ctrl->new_slave_work,
delay); } break; case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
@@ -780,6 +821,7 @@ static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK; /* Mask soundwire interrupts */
- if (ctrl->version < SWRM_VERSION_2_0_0) ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR], SWRM_INTERRUPT_STATUS_RMSK);
@@ -1485,6 +1527,7 @@ static int qcom_swrm_probe(struct platform_device *pdev) mutex_init(&ctrl->port_lock); init_completion(&ctrl->broadcast); init_completion(&ctrl->enumeration);
INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
ctrl->bus.ops = &qcom_swrm_ops; ctrl->bus.port_ops = &qcom_swrm_port_ops;
@@ -1514,9 +1557,10 @@ static int qcom_swrm_probe(struct platform_device *pdev)
ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
- qcom_swrm_init(ctrl);
- ret = devm_request_threaded_irq(dev, ctrl->irq, NULL, qcom_swrm_irq_handler,
if (ret) {IRQF_TRIGGER_RISING | IRQF_ONESHOT, "soundwire", ctrl);
@@ -1524,18 +1568,6 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- ctrl->wake_irq = of_irq_get(dev->of_node, 1);
- if (ctrl->wake_irq > 0) {
ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
qcom_swrm_wake_irq_handler,
IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
"swr_wake_irq", ctrl);
if (ret) {
dev_err(dev, "Failed to request soundwire wake irq\n");
goto err_init;
}
- }
- pm_runtime_set_autosuspend_delay(dev, 3000); pm_runtime_use_autosuspend(dev); pm_runtime_mark_last_busy(dev);
@@ -1549,7 +1581,18 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- qcom_swrm_init(ctrl);
- ctrl->wake_irq = of_irq_get(dev->of_node, 1);
- if (ctrl->wake_irq > 0) {
ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
qcom_swrm_wake_irq_handler,
IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
"swr_wake_irq", ctrl);
if (ret) {
dev_err(dev, "Failed to request soundwire wake irq\n");
goto err_init;
}
- }
- wait_for_completion_timeout(&ctrl->enumeration, msecs_to_jiffies(TIMEOUT_MS)); ret = qcom_swrm_register_dais(ctrl);
@@ -1589,6 +1632,18 @@ static int qcom_swrm_remove(struct platform_device *pdev) { struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
- /*
* Mask interrupts to be sure no delayed work can be scheduler after
typo/grammar: scheduled
* removing Soundwire bus master.
*/
- if (ctrl->version < SWRM_VERSION_2_0_0)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
0);
- if (ctrl->mmio)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
0);
- cancel_delayed_work_sync(&ctrl->new_slave_work); sdw_bus_master_delete(&ctrl->bus); clk_disable_unprepare(ctrl->hclk);
should the last two be inverted? Keeping a clock running while removing stuff is asking for trouble.
On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
The problem statement is really hard to follow.
The peripheral can only be enumerated AFTER a) the manager starts the bus clock and transmitting PING frames b) the peripheral detects the sync words for 16 frames in a row. c) the peripheral reports as Attached in the Device0 slot
That sequence holds whether the manager does the enumeration manually or relies on hardware-assisted autoenumeration. This is what the spec requires.
So why can't the bus clock start be controlled by the manager driver, and started once all required initializations are done?
I mean, there's got to be some sort of parent-child hierarchy with manager first, peripheral(s) second, I don't get how these steps could be inverted or race.
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
- request_threaded_irq()
- sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers.
not if the bus clock hasn't started...
- qcom_swrm_init() - which initializes the link/bus and enables interrupts.
if you can move the clock start in 3) then problem solved. Why can't this be done?
Responding to all your three responses: The clock is enabled in this 3. qcom_swrm_init(), so the old code to my knowledge is written exactly how you expect.
However even with stopped clock, the device enumerates at sdw_bus_master_add(), before anything is enabled.
I also checked the reset values of these registers - clock is off after reset. Assuming of course I look at correct clock registers... but I have only one.
Any access to device registers at (2) above, will fail because link/bus is not yet initialized.
However the fix is not as simple as moving qcom_swrm_init() before sdw_bus_master_add(), because this will cause early interrupt of new slave attached. The interrupt handler expects bus master (ctrl->bus.md) to be allocated, so this would lead to NULL pointer exception.
Rework the init sequence and change the interrupt handler. The correct sequence fixing accessing device registers before link init is now:
- qcom_swrm_init()
- request_threaded_irq()
- sdw_bus_master_add()
which still might cause early interrupts, if Soundwire devices are not in powered off state before their probe. This early interrupt issue is
You'd need to clarify in which step the bus clock starts. In general, you want to clock started last.
Clock is enabled in qcom_swrm_init() step, but as I wrote above, it looks like it does not matter for enumeration.
fixed by checking if bus master (ctrl->bus.md) was allocated and if not, scheduling delayed work for enumerating the slave device. Since we actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag from the interrupt, because it is not really valid and driver should use flags provided by DTS.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Change context depends on: https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.... https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@lina...
Cc: Patrick Lai quic_plai@quicinc.com
drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 17 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 679990dc3cc4..802d939ce7aa 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -19,6 +19,7 @@ #include <linux/slimbus.h> #include <linux/soundwire/sdw.h> #include <linux/soundwire/sdw_registers.h> +#include <linux/workqueue.h> #include <sound/pcm_params.h> #include <sound/soc.h> #include "bus.h" @@ -187,6 +188,7 @@ struct qcom_swrm_ctrl { #endif struct completion broadcast; struct completion enumeration;
- struct delayed_work new_slave_work; /* Port alloc/free lock */ struct mutex port_lock; struct clk *hclk;
@@ -606,6 +608,37 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus) return 0; }
+static void qcom_swrm_new_slave(struct work_struct *work) +{
- struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl,
new_slave_work.work);
- /*
* All Soundwire slave deviecs are expected to be in reset state (powered down)
* during sdw_bus_master_add(). The slave device should be brougth
typo/grammar: brought out of reset
ack
* from reset by its probe() or bind() function, as a result of
* sdw_bus_master_add().
* Add a simple check to avoid NULL pointer except on early interrupts.
* Note that if this condition happens, the slave device will not be
* enumerated. Its driver should be fixed.
???
The codec driver is NEVER involved in enumeration.
If the device stays in power down, only the driver bind can bring it on. enumeration won't happen when device is powered down, right?
The only thing a codec driver should do is provide a callback to be notified of a status change for additional initialization, but the enumeration can be done even in the absence of a codec driver.
The proof in the pudding is that you can 'blacklist' a codec driver and bind it later, after the hardware is enumerated. You can even unbind a codec driver and nothing bad will happen (we hardened that sequence last year).
probe != enumeration != initialization for SoundWire codecs.
Probe and enumeration can happen in any order Initialization can only happen after both probe and enumeration happened.
I am speaking here about component_master_ops->bind() callback.
*
* smp_load_acquire() paired with sdw_master_device_add().
*/
- if (!smp_load_acquire(&ctrl->bus.md)) {
dev_err(ctrl->dev,
"Got unexpected, early interrupt, device will not be enumerated\n");
return;
- }
- clk_prepare_enable(ctrl->hclk);
- qcom_swrm_get_device_status(ctrl);
- qcom_swrm_enumerate(&ctrl->bus);
- sdw_handle_slave_status(&ctrl->bus, ctrl->status);
- clk_disable_unprepare(ctrl->hclk);
+};
static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id) { struct qcom_swrm_ctrl *ctrl = dev_id; @@ -668,9 +701,17 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id) dev_dbg(ctrl->dev, "Slave status not changed %x\n", slave_status); } else {
qcom_swrm_get_device_status(ctrl);
qcom_swrm_enumerate(&ctrl->bus);
sdw_handle_slave_status(&ctrl->bus, ctrl->status);
unsigned long delay = 0;
/*
* See qcom_swrm_new_slave() for
* explanation. smp_load_acquire() paired
* with sdw_master_device_add().
*/
if (!smp_load_acquire(&ctrl->bus.md))
delay = 10;
schedule_delayed_work(&ctrl->new_slave_work,
delay); } break; case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
@@ -780,6 +821,7 @@ static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK; /* Mask soundwire interrupts */
- if (ctrl->version < SWRM_VERSION_2_0_0) ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR], SWRM_INTERRUPT_STATUS_RMSK);
@@ -1485,6 +1527,7 @@ static int qcom_swrm_probe(struct platform_device *pdev) mutex_init(&ctrl->port_lock); init_completion(&ctrl->broadcast); init_completion(&ctrl->enumeration);
INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
ctrl->bus.ops = &qcom_swrm_ops; ctrl->bus.port_ops = &qcom_swrm_port_ops;
@@ -1514,9 +1557,10 @@ static int qcom_swrm_probe(struct platform_device *pdev)
ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
- qcom_swrm_init(ctrl);
- ret = devm_request_threaded_irq(dev, ctrl->irq, NULL, qcom_swrm_irq_handler,
if (ret) {IRQF_TRIGGER_RISING | IRQF_ONESHOT, "soundwire", ctrl);
@@ -1524,18 +1568,6 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- ctrl->wake_irq = of_irq_get(dev->of_node, 1);
- if (ctrl->wake_irq > 0) {
ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
qcom_swrm_wake_irq_handler,
IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
"swr_wake_irq", ctrl);
if (ret) {
dev_err(dev, "Failed to request soundwire wake irq\n");
goto err_init;
}
- }
- pm_runtime_set_autosuspend_delay(dev, 3000); pm_runtime_use_autosuspend(dev); pm_runtime_mark_last_busy(dev);
@@ -1549,7 +1581,18 @@ static int qcom_swrm_probe(struct platform_device *pdev) goto err_clk; }
- qcom_swrm_init(ctrl);
- ctrl->wake_irq = of_irq_get(dev->of_node, 1);
- if (ctrl->wake_irq > 0) {
ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
qcom_swrm_wake_irq_handler,
IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
"swr_wake_irq", ctrl);
if (ret) {
dev_err(dev, "Failed to request soundwire wake irq\n");
goto err_init;
}
- }
- wait_for_completion_timeout(&ctrl->enumeration, msecs_to_jiffies(TIMEOUT_MS)); ret = qcom_swrm_register_dais(ctrl);
@@ -1589,6 +1632,18 @@ static int qcom_swrm_remove(struct platform_device *pdev) { struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
- /*
* Mask interrupts to be sure no delayed work can be scheduler after
typo/grammar: scheduled
ack
* removing Soundwire bus master.
*/
- if (ctrl->version < SWRM_VERSION_2_0_0)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
0);
- if (ctrl->mmio)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
0);
- cancel_delayed_work_sync(&ctrl->new_slave_work); sdw_bus_master_delete(&ctrl->bus); clk_disable_unprepare(ctrl->hclk);
should the last two be inverted? Keeping a clock running while removing stuff is asking for trouble.
It is a reversed probe(), which is usually correct. Do you expect probe() like:
clk_enable sdw_bus_master_add
but remove() like:
clk_disable sdw_bus_master_delete
?
Best regards, Krzysztof
On 5/1/23 07:24, Krzysztof Kozlowski wrote:
On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
The problem statement is really hard to follow.
The peripheral can only be enumerated AFTER a) the manager starts the bus clock and transmitting PING frames b) the peripheral detects the sync words for 16 frames in a row. c) the peripheral reports as Attached in the Device0 slot
That sequence holds whether the manager does the enumeration manually or relies on hardware-assisted autoenumeration. This is what the spec requires.
So why can't the bus clock start be controlled by the manager driver, and started once all required initializations are done?
I mean, there's got to be some sort of parent-child hierarchy with manager first, peripheral(s) second, I don't get how these steps could be inverted or race.
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
- request_threaded_irq()
- sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers.
not if the bus clock hasn't started...
- qcom_swrm_init() - which initializes the link/bus and enables interrupts.
if you can move the clock start in 3) then problem solved. Why can't this be done?
Responding to all your three responses: The clock is enabled in this 3. qcom_swrm_init(), so the old code to my knowledge is written exactly how you expect.
However even with stopped clock, the device enumerates at sdw_bus_master_add(), before anything is enabled.
Erm, that's not physically possible...
The peripheral can report as attached and be enumerated by the manager, i.e. assigned a non-zero "Device Number" after the peripheral synchronizes on 16 frames with valid static and dynamic syncwords. That can only happen if there is a clock toggling and PING frames transmitted on the data line.
There's something else at play here.
I also checked the reset values of these registers - clock is off after reset. Assuming of course I look at correct clock registers... but I have only one.
Any access to device registers at (2) above, will fail because link/bus is not yet initialized.
However the fix is not as simple as moving qcom_swrm_init() before sdw_bus_master_add(), because this will cause early interrupt of new slave attached. The interrupt handler expects bus master (ctrl->bus.md) to be allocated, so this would lead to NULL pointer exception.
Rework the init sequence and change the interrupt handler. The correct sequence fixing accessing device registers before link init is now:
- qcom_swrm_init()
- request_threaded_irq()
- sdw_bus_master_add()
which still might cause early interrupts, if Soundwire devices are not in powered off state before their probe. This early interrupt issue is
You'd need to clarify in which step the bus clock starts. In general, you want to clock started last.
Clock is enabled in qcom_swrm_init() step, but as I wrote above, it looks like it does not matter for enumeration.
without a clock you can't have any enumeration.
* from reset by its probe() or bind() function, as a result of
* sdw_bus_master_add().
* Add a simple check to avoid NULL pointer except on early interrupts.
* Note that if this condition happens, the slave device will not be
* enumerated. Its driver should be fixed.
???
The codec driver is NEVER involved in enumeration.
If the device stays in power down, only the driver bind can bring it on. enumeration won't happen when device is powered down, right?
The codec driver can indeed control the codec power with sideband links - i.e. not with SoundWire but gpios/I2C/SPI, etc. - and it could very well prevent the codec hardware from showing up on the bus until TBD platform-specific criteria are met.
But that's not really taking part in the enumeration process, rather gating the enumeration process.
The only thing a codec driver should do is provide a callback to be notified of a status change for additional initialization, but the enumeration can be done even in the absence of a codec driver.
The proof in the pudding is that you can 'blacklist' a codec driver and bind it later, after the hardware is enumerated. You can even unbind a codec driver and nothing bad will happen (we hardened that sequence last year).
probe != enumeration != initialization for SoundWire codecs.
Probe and enumeration can happen in any order Initialization can only happen after both probe and enumeration happened.
I am speaking here about component_master_ops->bind() callback.
That's on the manager side, I really don't see how this is related to the codec?
* removing Soundwire bus master.
*/
- if (ctrl->version < SWRM_VERSION_2_0_0)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
0);
- if (ctrl->mmio)
ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
0);
- cancel_delayed_work_sync(&ctrl->new_slave_work); sdw_bus_master_delete(&ctrl->bus); clk_disable_unprepare(ctrl->hclk);
should the last two be inverted? Keeping a clock running while removing stuff is asking for trouble.
actually it doesn't really matter, if the interrupts are disabled first and you wait for in-flight work to be done. Ignore this comment.
It is a reversed probe(), which is usually correct. Do you expect probe() like:
clk_enable sdw_bus_master_add
it's likely the other way around,
probe(): sdw_bus_master_add clk_enable
assuming that clk_enable() starts the bus - not sure it does based on the answers above.
On 01/05/2023 15:43, Pierre-Louis Bossart wrote:
On 5/1/23 07:24, Krzysztof Kozlowski wrote:
On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
The problem statement is really hard to follow.
The peripheral can only be enumerated AFTER a) the manager starts the bus clock and transmitting PING frames b) the peripheral detects the sync words for 16 frames in a row. c) the peripheral reports as Attached in the Device0 slot
That sequence holds whether the manager does the enumeration manually or relies on hardware-assisted autoenumeration. This is what the spec requires.
So why can't the bus clock start be controlled by the manager driver, and started once all required initializations are done?
I mean, there's got to be some sort of parent-child hierarchy with manager first, peripheral(s) second, I don't get how these steps could be inverted or race.
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
- request_threaded_irq()
- sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers.
not if the bus clock hasn't started...
- qcom_swrm_init() - which initializes the link/bus and enables interrupts.
if you can move the clock start in 3) then problem solved. Why can't this be done?
Responding to all your three responses: The clock is enabled in this 3. qcom_swrm_init(), so the old code to my knowledge is written exactly how you expect.
However even with stopped clock, the device enumerates at sdw_bus_master_add(), before anything is enabled.
Erm, that's not physically possible...
The peripheral can report as attached and be enumerated by the manager, i.e. assigned a non-zero "Device Number" after the peripheral synchronizes on 16 frames with valid static and dynamic syncwords. That can only happen if there is a clock toggling and PING frames transmitted on the data line.
There's something else at play here.
Yes, I think you are right and that "else" is my limited knowledge on the entire setup.
You gave me awesome hint in email before that probe != enumeration != initialization, however the wcd938x sound codec drivers were assuming some steps are equal.
wcd938x comes with three devices on two drivers: 1. RX Soundwire device (wcd938x-sdw.c driver) 2. TX Soundwire device, which is used as regmap (also wcd938x-sdw.c driver) 3. platform device (wcd938x.c driver) - glue and component master, actually having most of the code using TX Soundwire device regmap.
The probe of each RX and TX Soundwire devices added components, but that this did not mean devices were enumerated, as you said.
Considering what Mark said about using regcache (and sync it), I am now replacing entire solution with proper regcache handling device enumeration after component bind.
Best regards, Krzysztof
On 5/3/23 03:00, Krzysztof Kozlowski wrote:
On 01/05/2023 15:43, Pierre-Louis Bossart wrote:
On 5/1/23 07:24, Krzysztof Kozlowski wrote:
On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
Soundwire devices are supposed to be kept in reset state (powered off) till their probe() or component bind() callbacks. However if they are already powered on, then they might enumerate before the master initializes bus in qcom_swrm_init() leading to occasional errors like:
The problem statement is really hard to follow.
The peripheral can only be enumerated AFTER a) the manager starts the bus clock and transmitting PING frames b) the peripheral detects the sync words for 16 frames in a row. c) the peripheral reports as Attached in the Device0 slot
That sequence holds whether the manager does the enumeration manually or relies on hardware-assisted autoenumeration. This is what the spec requires.
So why can't the bus clock start be controlled by the manager driver, and started once all required initializations are done?
I mean, there's got to be some sort of parent-child hierarchy with manager first, peripheral(s) second, I don't get how these steps could be inverted or race.
qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops) wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops) qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
- request_threaded_irq()
- sdw_bus_master_add() - which will cause probe() and component bind() of Soundwire devices, e.g. WCD938x codec drivers. Device drivers might already start accessing their registers.
not if the bus clock hasn't started...
- qcom_swrm_init() - which initializes the link/bus and enables interrupts.
if you can move the clock start in 3) then problem solved. Why can't this be done?
Responding to all your three responses: The clock is enabled in this 3. qcom_swrm_init(), so the old code to my knowledge is written exactly how you expect.
However even with stopped clock, the device enumerates at sdw_bus_master_add(), before anything is enabled.
Erm, that's not physically possible...
The peripheral can report as attached and be enumerated by the manager, i.e. assigned a non-zero "Device Number" after the peripheral synchronizes on 16 frames with valid static and dynamic syncwords. That can only happen if there is a clock toggling and PING frames transmitted on the data line.
There's something else at play here.
Yes, I think you are right and that "else" is my limited knowledge on the entire setup.
You gave me awesome hint in email before that probe != enumeration != initialization, however the wcd938x sound codec drivers were assuming some steps are equal.
wcd938x comes with three devices on two drivers:
- RX Soundwire device (wcd938x-sdw.c driver)
- TX Soundwire device, which is used as regmap (also wcd938x-sdw.c driver)
- platform device (wcd938x.c driver) - glue and component master,
actually having most of the code using TX Soundwire device regmap.
The probe of each RX and TX Soundwire devices added components, but that this did not mean devices were enumerated, as you said.
Considering what Mark said about using regcache (and sync it), I am now replacing entire solution with proper regcache handling device enumeration after component bind.
I thought you were talking about ASoC "struct snd_soc_component" but no, that's "struct component_ops" in wcd938x-sdw.c
Now I think I get the scope of the problem, I didn't click on the fact that it's a platform driver that registers ASoC components, and not the SoundWire codec driver.
That's certainly more complicated than any other SoundWire-based systems I've seen so far. It's already difficult to keep track of the SoundWire peripheral driver .probe(), the ASoC component .probe(), the SoundWire bus .update_status() callback and now there's the component .bind() added.
Wow. What could possibly go wrong, eh?
participants (3)
-
Krzysztof Kozlowski
-
Mark Brown
-
Pierre-Louis Bossart