sdw_prep_deprep_slave_ports() cannot use the port interrupt to signal CP_SM completion because it is called while holding the bus lock, which blocks the alert handler from running.
Change to polling the PREPARESTATUS register and remove the infrastructure that was implemented for the previous interrupt waiting.
I am afraid the entire PORT_READY support is completely untested at the moment: all the existing codecs use the simpler state machine, e.g.
dpn[i].simple_ch_prep_sm = true;
So my main question is: how did you test this change? Is this on a platform already supported upstream? yes/no answer is fine, no details necessary.
No, it isn't upstream yet, but it doesn't support Simplified_CP_SM.
ok, good to know.
I am also not fully clear on the locking issue.
Is the problem that sdw_handle_slave_status() uses a mutex_lock(&bus->bus_lock), which is already held in sdw_prepare_stream
Yes
so dealing with an ALERT status while doing pre-deprep would cause a self-inflicted deadlock?
Not strictly a deadlock, but the ALERT handling will be delayed until the sdw_prepare_stream() has released the bus lock. Of course, by then the wait_for_completion() has timed out.
ok.
There is another bug in the original code. After wait_for_completion() times out, there is a read of the PREPARESTATUS register. But the test
if (!time_left || val)
will always treat a timeout as a failure even if the port is now reporting successful prepare.
I can do a fix for that bug so that full CP_SM devices will prepare successfully after the wait_for_completion() times out.
What you are describing seems to be a case of the device completing the preparation just after the timeout expires but just before double-checking the register value?
Doesn't this indicate that the timeout value is just wrong?
Or did you mean that a possible work-around would be to essentially ignore the timeout due to the locking? That might help you make progress and in a second step we could attempt to remove the locking issue.
If yes, that's a more general problem that would need to be fixed. this lock is taken in ALL stream operations, not just prepare/deprepare.
If e.g. a jack detection was signaled during any stream reconfiguration, we would also not be able to deal with the information, would we?
The ALERT would be delayed until after the stream reconfig has ended.
the purpose of the lock in sdw_handle_slave_status() was to check if devices attach or fall-off the bus before handling their status. The command/control protocol is always operational so nothing prevents the interrupt from being handled.
There's also something odd about directly polling on the status bits. The status bits will be used to signal the ALERT condition which is transmitted to the host during PING frames. This solution would result in the host noticing an interrupt: host controllers typically have a sticky status where all changes in PING frames are detected and used as a trigger for interrupt processing. In this case the interrupt would still be handled, but the sdw_handle_slave_status() would be deferred until the stream prepare is complete, and at that point the interrupt processing would not see any sources. It's not illegal but it's a bit odd to cause empty interrupts to be handled.
It might be a better solution to fix conflicts between stream reconfiguration and interrupts. I don't have a turn-key proposal but the suggested patch feels like a work-around. maybe using mutex_is_locked()?
Possibly but I am very reluctant to rewrite bus locking, as I don't have the ability to test a wide range of hardware configurations.
I wasn't referring to a complete rewrite of the bus locking, only for a check of the interaction between alerts and stream handling.
If we can't figure something out, then at least the PORT_READY mask should not be set, i.e. this code would need to be removed:
...
A new configuration field 'ch_prep_poll_us' is added to struct sdw_dpn_prop so that the peripheral driver may select a polling interval. If this is left at zero a default interval of 500 usec is used.
we already have a 'ch_prep_timeout' that's defined, do you need to redefine a new field?
The new field is how often to poll, so the driver can select a slower poll rate if it knows its CP_SM takes longer.
You missed my point: if you think this property is needed, we'd want to add it to the MIPI DisCo list of properties so that it can be optionally provided by platform firmware.
why not just read once at the end of that timeout? It's a prepare operation, there's no requirement to be fast, is there?
That is broken in the current code, as noted above. But I could make a patch only to fix that bug.
that might indeed be less invasive for an initial fix, and we can deal with the locking problem later.