-
Notifications
You must be signed in to change notification settings - Fork 382
Keep InProgressProperties up-to-date #1916
Keep InProgressProperties up-to-date #1916
Conversation
@kibbles-n-bytes Please check if this fix looks good. |
} | ||
const interval = 100 * time.Millisecond | ||
const timeout = 10 * time.Second | ||
err := wait.PollImmediate(interval, timeout, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @kibbles-n-bytes described previously, we need this retry loop to correctly handle the situation after we make locks optional, when user updates the spec before controller sends a successful status update.
instance.Status.InProgressProperties = &v1beta1.ServiceInstancePropertiesState{ | ||
ClusterServicePlanExternalName: testClusterServicePlanName, | ||
ClusterServicePlanExternalID: testClusterServicePlanGUID, | ||
Parameters: &runtime.RawExtension{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking that the InProgressProperties
stale fields will be updated at retry.
if s1 == nil && s2 == nil { | ||
return true | ||
} | ||
if s1 == nil && s2 != nil || s1 != nil && s2 == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know go will do the right thing, but to help readability, and newbie understandability, can you put parens around each side of the ||
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edited - sorry, I meant ||
if s1.UserInfo != nil || s2.UserInfo != nil { | ||
u1 := s1.UserInfo | ||
u2 := s2.UserInfo | ||
if u1 == nil && u2 != nil || u1 != nil && u2 == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
@@ -377,6 +379,16 @@ func (c *controller) reconcileServiceInstanceAdd(instance *v1beta1.ServiceInstan | |||
// recordStartOfServiceInstanceOperation has updated the instance, so we need to continue in the next iteration | |||
return nil | |||
} | |||
if !isServiceInstancePropertiesStateEqual(instance.Status.InProgressProperties, inProgressProperties) { | |||
instance, err = c.updateServiceInstanceInProgressProperties(instance, inProgressProperties) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just make it:
_, err = c.update...
return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, buf I follow the pattern just a few lines above with recordStartOfServiceInstanceOperation
. I guess it makes the code safer for the future refactoring + has nice comments.
@nilebox Definitely an improvement! Though I think it's not just the So really I think what we could do is group those two blocks as so: if instance.Status.CurrentOperation == "" || !isServiceInstancePropertiesStateEqual(instance.Status.InProgressProperties, inProgressProperties) {
instance, err = c.recordStartOfServiceInstanceOperation(instance, v1beta1.ServiceInstanceProvision, inProgressProperties)
if err != nil {
// bla bla bla
return err
}
// more bla
return nil
} I still think we should defensively start orphan mitigating before starting a new request if we notice And finally, I think we should add the conflict-protection as the default for all status updates, as the implementation of our |
@@ -377,6 +379,16 @@ func (c *controller) reconcileServiceInstanceAdd(instance *v1beta1.ServiceInstan | |||
// recordStartOfServiceInstanceOperation has updated the instance, so we need to continue in the next iteration | |||
return nil | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminded me of something else.
I'm unclear about what return err vs nil means in the context of this outermost function.
Should the if instance.Status.CurrentOperation == "" {
be a guard clause, with the recordStart
taking place fully after the entire of the outer if
body?
My read is, if we're not doing an operation, start one, but if we are doing an operation, go back to the beginning of the loop.
Is this to avoid updating the existing instance after we've already updated it, because it's been requeued by making a change? Shouldn't we proceed onward and do the rest of the instance processing? Is there something at the beginning of t he function before this that needs to be rechecked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear about what return err vs nil means in the context of this outermost function.
Returning error mostly tells the controller to perform exponential backoff (rate limiter) and not marking item as done.
Is this to avoid updating the existing instance after we've already updated it, because it's been requeued by making a change?
Yes.
Shouldn't we proceed onward and do the rest of the instance processing?
No, because there are issues with cache caused by this like #1639
We should only update instance once during a single iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MHBauer In case you (and others) wonder why do we update status in the middle of the process at all. The normal Kube controller logic is to never read the status - it's only used to report to the user. So it's perfectly fine if we do some multi-step work for a minute and then fail to update the status - the operation will be retried and everything's gonna be eventually fine.
We, on the other hand, have to read the status and drive our controller logic based on the last written status. Hence we can't just update the status in the end - we have to commit every stage of the process - effectively we have a "status driven workflow".
The reason for that is that OSB is not Kube friendly - it doesn't have GET endpoints (so we can't get the state of instance/binding), and it's hard to tell which exactly operation is currently in progress (there is just last_operation
without details of operation itself). I see this as a hole in OSB REST API, and hope that we will eventually fix it in v3
.
Ok, will do.
And this as well.
@kibbles-n-bytes I think it's a bit more complicated than that - it's not only about orphan mitigation, but also potentially about performing status cleanup in general.
That said, I don't think this is a blocker for #1917 given that we already have this issue no matter do we have locks or not. |
I'm just talking about in
You're right in that it is less priority, now that you've added conflict-resolution to the status updates. Without that we would have seen it a lot more frequently without the lock. With the lock it's not an issue because we just repeat the same request. But I'm fine with it not being included in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just a note of some follow-up work items I notice:
-
We should defensively start orphan mitigating before starting a new request if we notice
ObservedGeneration != Generation && instance.Status.DeprovisionStatus == v1beta1.ServiceInstanceDeprovisionStatusRequired
. -
We have a hole where if we start an operation, get a error where we don't have to orphan mitigate, and then retry and get an error where we do have to, we will skip actually deprovisioning at the broker because we never mark
DeprovisionStatus
asRequired
. -
We no longer do multiple status updates per reconciliation, so the contract of
updateServiceInstanceStatus
can be simplified to just returnerror
.
I'll make issues for the first two.
@kibbles-n-bytes links for issues created? |
@@ -4269,7 +4355,7 @@ func TestResolveReferencesNoClusterServiceClass(t *testing.T) { | |||
} | |||
|
|||
// TestReconcileServiceInstanceUpdateParameters tests updating a | |||
// ServiceInstance with new paramaters | |||
// ServiceInstance with new parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Thanks a lot for the patch |
Alternative fix for #1872 (comment) as described in #1903 (comment)