Hi,
So what I ended up doing was to split off the context error handling into a separate helper API, which can be also called for the sync ep stop API. From there, based on say....the helper re queuing the stop EP command, it would return a specific value to signify that it has done so. The sync based API will then re-wait for the completion of the subsequent stop endpoint command that was queued.
AFAIK retries are only necessary on buggy hardware. I don't see them on my controllers except for two old ones, both with the same buggy chip.
In all other context error cases, it'd return the error to the caller, and its up to them to handle it accordingly.
For the record, all existing callers end up ignoring this return value.
Honestly, I don't know if improving this function is worth your effort if it's working for you as-is. There are no users except xhci-sideband and probably shouldn't be - besides failing to fix stalled endpoints, this function also does nothing to prevent automatic restart of the EP when new URBs are submitted through xhci_hcd, so it is mainly relevant for sideband users who never submit URBs the usual way.
My issue with this function is that it is simply poorly documented what it is or isn't expected to achieve (both here and in the calling code in xhci-sideband.c), and the changelog message is wrong to suggest that the default completion handler will run (unless somewhere there are patches to make it happen), making it look like this code can do things that it really cannot do. And this is apparently a public, exported API.
Regards, Michal