[PATCH v4 0/5] soundwire: Fixes for spurious and missing UNATTACH
The bus and cadence code has several bugs that cause UNATTACH notifications to either be sent spuriously or to be missed.
These can be seen occasionally with a single peripheral on the bus, but are much more frequent with multiple peripherals, where several peripherals could change state and report in consecutive PINGs.
The root of all of these bugs seems to be a code design flaw that assumed every PING status change would be handled separately. However, PINGs are handled by a workqueue function and there is no guarantee when that function will be scheduled to run or how much CPU time it will receive. PINGs will continue while the work function is handling a snapshot of a previous PING so the code must take account that (a) status could change during the work function and (b) there can be a backlog of changes before the IRQ work function runs again.
Tested with 4 peripherals on 1 bus, and 8 peripherals on 2 buses.
CHANGES SINCE V3: Fixed minor comment typo in patch #4.
Richard Fitzgerald (4): soundwire: bus: Don't lose unattach notifications soundwire: bus: Don't re-enumerate before status is UNATTACHED soundwire: cadence: Fix lost ATTACHED interrupts when enumerating soundwire: bus: Don't exit early if no device IDs were programmed
Simon Trimmer (1): soundwire: cadence: fix updating slave status when a bus has multiple peripherals
drivers/soundwire/bus.c | 44 +++++++++++++--- drivers/soundwire/cadence_master.c | 80 ++++++++++++++++-------------- 2 files changed, 80 insertions(+), 44 deletions(-)
From: Simon Trimmer simont@opensource.cirrus.com
The cadence IP explicitly reports slave status changes with bits for each possible change. The function cdns_update_slave_status() attempts to translate this into the current status of each of the slaves.
However when there are multiple peripherals on a bus any slave that did not have a status change when the work function ran would not have it's status updated - the array is initialised to a value that equates to UNATTACHED and this can cause spurious reports that slaves had dropped off the bus.
In the case where a slave has no status change or has multiple status changes the value from the last PING command is used.
Signed-off-by: Simon Trimmer simont@opensource.cirrus.com Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/cadence_master.c | 57 +++++++++++++----------------- 1 file changed, 25 insertions(+), 32 deletions(-)
diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c index 4fbb19557f5e..245191d22ccd 100644 --- a/drivers/soundwire/cadence_master.c +++ b/drivers/soundwire/cadence_master.c @@ -782,6 +782,7 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns, enum sdw_slave_status status[SDW_MAX_DEVICES + 1]; bool is_slave = false; u32 mask; + u32 val; int i, set_status;
memset(status, 0, sizeof(status)); @@ -789,41 +790,38 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns, for (i = 0; i <= SDW_MAX_DEVICES; i++) { mask = (slave_intstat >> (i * CDNS_MCP_SLAVE_STATUS_NUM)) & CDNS_MCP_SLAVE_STATUS_BITS; - if (!mask) - continue;
- is_slave = true; set_status = 0;
- if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) { - status[i] = SDW_SLAVE_RESERVED; - set_status++; - } - - if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) { - status[i] = SDW_SLAVE_ATTACHED; - set_status++; - } + if (mask) { + is_slave = true;
- if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) { - status[i] = SDW_SLAVE_ALERT; - set_status++; - } + if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) { + status[i] = SDW_SLAVE_RESERVED; + set_status++; + }
- if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) { - status[i] = SDW_SLAVE_UNATTACHED; - set_status++; - } + if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) { + status[i] = SDW_SLAVE_ATTACHED; + set_status++; + }
- /* first check if Slave reported multiple status */ - if (set_status > 1) { - u32 val; + if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) { + status[i] = SDW_SLAVE_ALERT; + set_status++; + }
- dev_warn_ratelimited(cdns->dev, - "Slave %d reported multiple Status: %d\n", - i, mask); + if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) { + status[i] = SDW_SLAVE_UNATTACHED; + set_status++; + } + }
- /* check latest status extracted from PING commands */ + /* + * check that there was a single reported Slave status and when + * there is not use the latest status extracted from PING commands + */ + if (set_status != 1) { val = cdns_readl(cdns, CDNS_MCP_SLAVE_STAT); val >>= (i * 2);
@@ -842,11 +840,6 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns, status[i] = SDW_SLAVE_RESERVED; break; } - - dev_warn_ratelimited(cdns->dev, - "Slave %d status updated to %d\n", - i, status[i]); - } }
Ensure that if sdw_handle_slave_status() sees a peripheral has dropped off the bus it reports it to the client driver.
If there are any devices reporting on address 0 it bails out after programming the device IDs. So it never reaches the second loop that calls sdw_update_slave_status().
If the missing device is one that is now showing as unenumerated it has been given a device ID so will report as attached next time sdw_handle_slave_status() runs.
With the previous code the client driver would only see another ATTACHED notification because the UNATTACHED state was lost when sdw_handle_slave_status() bailed out after programming the device ID.
This shows up most when the peripheral has to be reset after downloading updated firmware and there are multiple of these peripherals on the bus. They will all return to unenumerated state after the reset, and then there is a mix of unattached, attached and unenumerated PING states from the peripherals, as each is reset and they reboot.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/bus.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index d773eee71bc1..1cc858b4107d 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -1767,6 +1767,11 @@ int sdw_handle_slave_status(struct sdw_bus *bus, dev_warn(&slave->dev, "Slave %d state check1: UNATTACHED, status was %d\n", i, slave->status); sdw_modify_slave_status(slave, SDW_SLAVE_UNATTACHED); + + /* Ensure driver knows that peripheral unattached */ + ret = sdw_update_slave_status(slave, status[i]); + if (ret < 0) + dev_warn(&slave->dev, "Update Slave status failed:%d\n", ret); } }
Don't re-enumerate a peripheral on #0 until we have seen and handled an UNATTACHED notification for that peripheral.
Without this, it is possible for the UNATTACHED status to be missed and so the slave->status remains at ATTACHED. If slave->status never changes to UNATTACHED the child driver will never be notified of the UNATTACH, and the code in sdw_handle_slave_status() will skip the second part of enumeration because the slave->status has not changed.
This scenario can happen because PINGs are handled in a workqueue function which is working from a snapshot of an old PING, and there is no guarantee when this function will run.
A peripheral could report attached in the PING being handled by sdw_handle_slave_status(), but has since reverted to device #0 and is then found in the loop in sdw_program_device_num(). Previously the code would not have updated slave->status to UNATTACHED because it had not yet handled a PING where that peripheral had UNATTACHED.
This situation happens fairly frequently with multiple peripherals on a bus that are intentionally reset (for example after downloading firmware).
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/bus.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index 1cc858b4107d..6e569a875a9b 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -773,6 +773,16 @@ static int sdw_program_device_num(struct sdw_bus *bus) if (sdw_compare_devid(slave, id) == 0) { found = true;
+ /* + * To prevent skipping state-machine stages don't + * program a device until we've seen it UNATTACH. + * Must return here because no other device on #0 + * can be detected until this one has been + * assigned a device ID. + */ + if (slave->status != SDW_SLAVE_UNATTACHED) + return 0; + /* * Assign a new dev_num to this Slave and * not mark it present. It will be marked
The correct way to handle interrupts is to clear the bits we are about to handle _before_ handling them. Thus if the condition then re-asserts during the handling we won't lose it.
This patch changes cdns_update_slave_status_work() to do this.
The previous code cleared the interrupts after handling them. The problem with this is that when handling enumeration of devices the ATTACH statuses can be accidentally cleared and so some or all of the devices never complete their enumeration.
Thus we can have a situation like this: - one or more devices are reverting to ID #0
- accumulated status bits indicate some devices attached and some on ID #0. (Remember: status bits are sticky until they are handled)
- Because of device on #0 sdw_handle_slave_status() programs the device ID and exits without handling the other status, expecting to get an ATTACHED from this reprogrammed device.
- The device immediately starts reporting ATTACHED in PINGs, which will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
- cdns_update_slave_status_work() clears INTSTAT0/1. If the initial status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be cleared.
- The ATTACHED change for the device has now been lost.
- cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so if the new ATTACHED state had set it, it will be cleared without ever having been handled.
Unless there is some other state change from another device to cause a new interrupt, the ATTACHED state of the reprogrammed device will never cause an interrupt so its enumeration will not be completed.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/cadence_master.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c index 245191d22ccd..be9cd47f31ec 100644 --- a/drivers/soundwire/cadence_master.c +++ b/drivers/soundwire/cadence_master.c @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work) u32 device0_status; int retry_count = 0;
+ /* + * Clear main interrupt first so we don't lose any assertions + * that happen during this function. + */ + cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK); + slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0); slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
+ /* + * Clear the bits before handling so we don't lose any + * bits that re-assert. + */ + cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0); + cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1); + /* combine the two status */ slave_intstat = ((u64)slave1 << 32) | slave0;
@@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
update_status: cdns_update_slave_status(cdns, slave_intstat); - cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0); - cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
/* * When there is more than one peripheral per link, it's @@ -982,6 +993,11 @@ static void cdns_update_slave_status_work(struct work_struct *work) * attention with PING commands. There is no need to check for * ALERTS since they are not allowed until a non-zero * device_number is assigned. + * + * Do not clear the INTSTAT0/1. While looping to enumerate devices on + * #0 there could be status changes on other devices - these must + * be kept in the INTSTAT so they can be handled when all #0 devices + * have been handled. */
device0_status = cdns_readl(cdns, CDNS_MCP_SLAVE_STAT); @@ -1001,8 +1017,7 @@ static void cdns_update_slave_status_work(struct work_struct *work) } }
- /* clear and unmask Slave interrupt now */ - cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK); + /* unmask Slave interrupt now */ cdns_updatel(cdns, CDNS_MCP_INTMASK, CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
Only exit sdw_handle_slave_status() right after calling sdw_program_device_num() if it actually programmed an ID into at least one device.
sdw_handle_slave_status() should protect itself against phantom device #0 ATTACHED indications. In that case there is no actual device still on #0. The early exit relies on there being a status change to ATTACHED on the reprogrammed device to trigger another call to sdw_handle_slave_status() which will then handle the status of all peripherals. If no device was actually programmed with an ID there won't be a new ATTACHED indication. This can lead to the status of other peripherals not being handled.
The status passed to sdw_handle_slave_status() is obviously always from a point of time in the past, and may indicate accumulated unhandled events (depending how the bus manager operates). It's possible that a device ID is reprogrammed but the last PING status captured state just before that, when it was still reporting on ID #0. Then sdw_handle_slave_status() is called with this PING info, just before a new PING status is available showing it now on its new ID. So sdw_handle_slave_status() will receive a phantom report of a device on #0, but it will not find one.
Signed-off-by: Richard Fitzgerald rf@opensource.cirrus.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- drivers/soundwire/bus.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index 6e569a875a9b..8eded1a55227 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -729,7 +729,7 @@ void sdw_extract_slave_id(struct sdw_bus *bus, } EXPORT_SYMBOL(sdw_extract_slave_id);
-static int sdw_program_device_num(struct sdw_bus *bus) +static int sdw_program_device_num(struct sdw_bus *bus, bool *programmed) { u8 buf[SDW_NUM_DEV_ID_REGISTERS] = {0}; struct sdw_slave *slave, *_s; @@ -739,6 +739,8 @@ static int sdw_program_device_num(struct sdw_bus *bus) int count = 0, ret; u64 addr;
+ *programmed = false; + /* No Slave, so use raw xfer api */ ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0, SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf); @@ -797,6 +799,8 @@ static int sdw_program_device_num(struct sdw_bus *bus) return ret; }
+ *programmed = true; + break; } } @@ -1756,7 +1760,7 @@ int sdw_handle_slave_status(struct sdw_bus *bus, { enum sdw_slave_status prev_status; struct sdw_slave *slave; - bool attached_initializing; + bool attached_initializing, id_programmed; int i, ret = 0;
/* first check if any Slaves fell off the bus */ @@ -1787,14 +1791,23 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
if (status[0] == SDW_SLAVE_ATTACHED) { dev_dbg(bus->dev, "Slave attached, programming device number\n"); - ret = sdw_program_device_num(bus); - if (ret < 0) - dev_err(bus->dev, "Slave attach failed: %d\n", ret); + /* - * programming a device number will have side effects, - * so we deal with other devices at a later time + * Programming a device number will have side effects, + * so we deal with other devices at a later time. + * This relies on those devices reporting ATTACHED, which will + * trigger another call to this function. This will only + * happen if at least one device ID was programmed. + * Error returns from sdw_program_device_num() are currently + * ignored because there's no useful recovery that can be done. + * Returning the error here could result in the current status + * of other devices not being handled, because if no device IDs + * were programmed there's nothing to guarantee a status change + * to trigger another call to this function. */ - return ret; + sdw_program_device_num(bus, &id_programmed); + if (id_programmed) + return 0; }
/* Continue to check other slave statuses */
On 14-09-22, 17:02, Richard Fitzgerald wrote:
The bus and cadence code has several bugs that cause UNATTACH notifications to either be sent spuriously or to be missed.
These can be seen occasionally with a single peripheral on the bus, but are much more frequent with multiple peripherals, where several peripherals could change state and report in consecutive PINGs.
The root of all of these bugs seems to be a code design flaw that assumed every PING status change would be handled separately. However, PINGs are handled by a workqueue function and there is no guarantee when that function will be scheduled to run or how much CPU time it will receive. PINGs will continue while the work function is handling a snapshot of a previous PING so the code must take account that (a) status could change during the work function and (b) there can be a backlog of changes before the IRQ work function runs again.
Applied, thanks
participants (2)
-
Richard Fitzgerald
-
Vinod Koul