-
Notifications
You must be signed in to change notification settings - Fork 1k
TestMonitoredCmd is flaky #763
Comments
This flaky test been a thorn in my side for some months now. I managed to make it a bit less flaky by playing with the timeouts, but the issues still occur. I'd love to find a way to achieve the same or similar guarantees from the test, while avoiding the issues related to timing. |
Found a race condition that could happen on This is a different failure reason than the one @carolynvs reported. I'd love if we could document the failing builds here to collect the possible other reasons that could cause this to fail. And I'm still thinking of ways to figure out an alternative to the current test. 🤔 |
The current approach relies on unsynchronized clocks to agree, so it's pretty much always going to be flaky. We need to figure out how to get rid of the clock in the invoked process if there's going to be any hope of making it not flaky. |
Maybe we should not check the output for specific values, which is what leads to this. I'm not sure if just checking whether the command timed out or not is enough, though, because then we lose the check of when the command timed out. Other than that, I can't think of a way to test this that does not rely on the clock 😞 |
Well, it sounds like we need coordination between processes. Coordination means communication. Probably best to do with additional fd. If we replace echosleep with an implementation that expects two additional fds, then we can use those to coordinate IPC and exercise fine-grained control over the child process from the parent in order to simulate the desired behavior. |
lot less flaky now - i'm gonna close this as good enough. |
This test fails intermittently with the following error. This time I saw it fail on go tip, but haven't noticed if it fails on other versions as well.
The text was updated successfully, but these errors were encountered: