-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
post to studio in thread to avoid blocking #814
Conversation
src/dvclive/live.py
Outdated
@@ -882,7 +883,8 @@ def make_dvcyaml(self): | |||
|
|||
@catch_and_warn(DvcException, logger) | |||
def post_to_studio(self, event: Literal["start", "data", "done"]): | |||
post_to_studio(self, event) | |||
thread = threading.Thread(target=post_to_studio, args=(self, event)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions:
- do we still need
@catch_and_warn(DvcException, logger)
above? - do we start a new thread per each request now? sound expensive and can for example exhaust OS if post_to_studio is running faster then we process them
- can we use a single thread (and a queue for example)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points. I ended up leaving the start
and done
events in the main thread and create a single queue for all data
events to not overwhelm the OS or Studio. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not run start and done through the same queue? it's better a bit since start will initialize the queue - makes it a bit more thread-safer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dberenbaum for a prompt fix! A few questions / comments.
post_to_studio(item, "data") | ||
self._studio_queue.task_done() | ||
|
||
threading.Thread(target=worker, daemon=True).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is post_to_studio defined now?
One thing to test - Ctrl-C handling for this. Just to make sure that our threads and queue wait doesn't mess up apps in any way. Thanks @dberenbaum for iterating on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we wrap the call to post_to_studio in @catch_and_warn(DvcException, logger) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is post_to_studio defined now?
Line 99 in edb5ee3
def post_to_studio(live: Live, event: Literal["start", "data", "done"]): # noqa: C901 |
One thing to test - Ctrl-C handling for this. Just to make sure that our threads and queue wait doesn't mess up apps in any way. Thanks @dberenbaum for iterating on this!
No, thanks for keeping me honest here. It's a pretty simple change but could easily see it breaking some things. This is one reason I wanted to keep start and done separate per your question above. The queue is set up in daemon mode which can exit without making all expected calls (without it I noticed tests were hanging). By separating start and end, we ensure those are called and returned in most scenarios. If a particular live data call gets missed, I don't think it's as critical. (edit: it also simplifies testing to have those calls in the main thread)
should we wrap the call to post_to_studio in @catch_and_warn(DvcException, logger) ?
Sure, pushed an update with that change
Closes iterative/studio-support#95