Is this timeout for suspend or resume? Somehow I was under the assumption that it is former? Or is the result seen on resume?
Rereading the race describe above in steps, I think this should be handled in step c above. Btw is that suspend or runtime suspend which causes this? Former would be bigger issue as we should not have work running when we return from suspend call. Latter should be dealt with anyway as device might be off after suspend.
This happens with a system suspend. Because we disable the interrupts, the workqueue never completes, and we have a timeout on system resume.
That's why we want to prevent the workqueue from starting, or let it complete, but not have this zombie state where we suspend but there's still a wait for completion that times out later. The point here is really making sure the workqueue is not used before suspend.