Keep InProgressProperties up-to-date #1916

nilebox · 2018-04-05T13:45:50Z

Alternative fix for #1872 (comment) as described in #1903 (comment)

nilebox · 2018-04-05T13:50:17Z

@kibbles-n-bytes Please check if this fix looks good.

nilebox · 2018-04-05T13:52:55Z

pkg/controller/controller_instance.go

-	}
+	const interval = 100 * time.Millisecond
+	const timeout = 10 * time.Second
+	err := wait.PollImmediate(interval, timeout, func() (bool, error) {


As @kibbles-n-bytes described previously, we need this retry loop to correctly handle the situation after we make locks optional, when user updates the spec before controller sends a successful status update.

nilebox · 2018-04-05T13:54:00Z

pkg/controller/controller_instance_test.go

+	instance.Status.InProgressProperties = &v1beta1.ServiceInstancePropertiesState{
+		ClusterServicePlanExternalName: testClusterServicePlanName,
+		ClusterServicePlanExternalID:   testClusterServicePlanGUID,
+		Parameters: &runtime.RawExtension{


Checking that the InProgressProperties stale fields will be updated at retry.

duglin · 2018-04-05T14:04:59Z

pkg/controller/controller_instance.go

+	if s1 == nil && s2 == nil {
+		return true
+	}
+	if s1 == nil && s2 != nil || s1 != nil && s2 == nil {


I know go will do the right thing, but to help readability, and newbie understandability, can you put parens around each side of the || ?

edited - sorry, I meant ||

duglin · 2018-04-05T14:06:54Z

pkg/controller/controller_instance.go

+	if s1.UserInfo != nil || s2.UserInfo != nil {
+		u1 := s1.UserInfo
+		u2 := s2.UserInfo
+		if u1 == nil && u2 != nil || u1 != nil && u2 == nil {


duglin · 2018-04-05T14:20:17Z

pkg/controller/controller_instance.go

@@ -377,6 +379,16 @@ func (c *controller) reconcileServiceInstanceAdd(instance *v1beta1.ServiceInstan
 		// recordStartOfServiceInstanceOperation has updated the instance, so we need to continue in the next iteration
 		return nil
 	}
+	if !isServiceInstancePropertiesStateEqual(instance.Status.InProgressProperties, inProgressProperties) {
+		instance, err = c.updateServiceInstanceInProgressProperties(instance, inProgressProperties)
+		if err != nil {


you could just make it:

_, err = c.update... return err

true, buf I follow the pattern just a few lines above with recordStartOfServiceInstanceOperation. I guess it makes the code safer for the future refactoring + has nice comments.

kibbles-n-bytes · 2018-04-05T22:14:08Z

@nilebox Definitely an improvement! Though I think it's not just the InProgressProperties that we need to make sure are set. We should really be doing everything that is part of recordStartOfServiceInstanceOperation; after all, what we're really doing is starting a new operation that is overriding the previous one we were attempting. So this includes resetting the OperationStartTime and setting DeprovisionStatus to Required as a safe-guard (currently the code would not orphan mitigate correctly if we originally have a 4XX error, but then have a 5XX error or async failure).

So really I think what we could do is group those two blocks as so:

if instance.Status.CurrentOperation == "" || !isServiceInstancePropertiesStateEqual(instance.Status.InProgressProperties, inProgressProperties) {
    instance, err = c.recordStartOfServiceInstanceOperation(instance, v1beta1.ServiceInstanceProvision, inProgressProperties)
    if err != nil {
        // bla bla bla
        return err
    }
    // more bla
    return nil
}

I still think we should defensively start orphan mitigating before starting a new request if we notice ObservedGeneration != Generation && instance.Status.DeprovisionStatus == v1beta1.ServiceInstanceDeprovisionStatusRequired. There will always be holes that could get us into this state, no matter how defensive we are.

And finally, I think we should add the conflict-protection as the default for all status updates, as the implementation of our c.updateServiceInstanceStatus function. It's not just missing successful provisions we are worried about; instance updates (in the OSB sense) are in fact even more dangerous, since if the status update fails, we lose all information as to what came back from the broker. At least for provisioning, we can always orphan mitigate, but we're SOL for instance updates.

MHBauer · 2018-04-05T22:26:33Z

pkg/controller/controller_instance.go

@@ -377,6 +379,16 @@ func (c *controller) reconcileServiceInstanceAdd(instance *v1beta1.ServiceInstan
 		// recordStartOfServiceInstanceOperation has updated the instance, so we need to continue in the next iteration
 		return nil
 	}


This reminded me of something else.

I'm unclear about what return err vs nil means in the context of this outermost function.

Should the if instance.Status.CurrentOperation == "" { be a guard clause, with the recordStart taking place fully after the entire of the outer if body?

My read is, if we're not doing an operation, start one, but if we are doing an operation, go back to the beginning of the loop.

Is this to avoid updating the existing instance after we've already updated it, because it's been requeued by making a change? Shouldn't we proceed onward and do the rest of the instance processing? Is there something at the beginning of t he function before this that needs to be rechecked?

I'm unclear about what return err vs nil means in the context of this outermost function.

Returning error mostly tells the controller to perform exponential backoff (rate limiter) and not marking item as done.

Is this to avoid updating the existing instance after we've already updated it, because it's been requeued by making a change?

Yes.

Shouldn't we proceed onward and do the rest of the instance processing?

No, because there are issues with cache caused by this like #1639
We should only update instance once during a single iteration.

@MHBauer In case you (and others) wonder why do we update status in the middle of the process at all. The normal Kube controller logic is to never read the status - it's only used to report to the user. So it's perfectly fine if we do some multi-step work for a minute and then fail to update the status - the operation will be retried and everything's gonna be eventually fine.

We, on the other hand, have to read the status and drive our controller logic based on the last written status. Hence we can't just update the status in the end - we have to commit every stage of the process - effectively we have a "status driven workflow".

The reason for that is that OSB is not Kube friendly - it doesn't have GET endpoints (so we can't get the state of instance/binding), and it's hard to tell which exactly operation is currently in progress (there is just last_operation without details of operation itself). I see this as a hole in OSB REST API, and hope that we will eventually fix it in v3.

nilebox · 2018-04-05T22:53:49Z

So really I think what we could do is group those two blocks

Ok, will do.

And finally, I think we should add the conflict-protection as the default for all status updates, as the implementation of our c.updateServiceInstanceStatus function.

And this as well.
The rest can be done as separate follow-ups.

I still think we should defensively start orphan mitigating before starting a new request if we notice ObservedGeneration != Generation && instance.Status.DeprovisionStatus == v1beta1.ServiceInstanceDeprovisionStatusRequired.

@kibbles-n-bytes I think it's a bit more complicated than that - it's not only about orphan mitigation, but also potentially about performing status cleanup in general.
There are at least 2 cases:

ProvisionStatus == NotProvisioned. You could start orphan mitigation via processTemporaryProvisionFailure.
ProvisionStatus == Provisioned. There is no need for orphan mitigation, since it was an update case, not provision. For that you could probably invoke processTemporaryUpdateServiceInstanceFailure to trigger cleanup.

That said, I don't think this is a blocker for #1917 given that we already have this issue no matter do we have locks or not.
Feel free to pick up this change.

kibbles-n-bytes · 2018-04-05T23:44:31Z

There are at least 2 cases

I'm just talking about in reconcileServiceInstanceAdd; we'd only get into that function if getReconciliationActionForServiceInstance returned NotProvisioned.

That said, I don't think this is a blocker for #1917 given that we already have this issue no matter do we have locks or not.

You're right in that it is less priority, now that you've added conflict-resolution to the status updates. Without that we would have seen it a lot more frequently without the lock. With the lock it's not an issue because we just repeat the same request. But I'm fine with it not being included in this PR.

kibbles-n-bytes

LGTM.

Just a note of some follow-up work items I notice:

We should defensively start orphan mitigating before starting a new request if we notice ObservedGeneration != Generation && instance.Status.DeprovisionStatus == v1beta1.ServiceInstanceDeprovisionStatusRequired.
We have a hole where if we start an operation, get a error where we don't have to orphan mitigate, and then retry and get an error where we do have to, we will skip actually deprovisioning at the broker because we never mark DeprovisionStatus as Required.
We no longer do multiple status updates per reconciliation, so the contract of updateServiceInstanceStatus can be simplified to just return error.

I'll make issues for the first two.

MHBauer · 2018-04-13T11:38:21Z

@kibbles-n-bytes links for issues created?

kibbles-n-bytes · 2018-04-27T18:22:53Z

@MHBauer #2001 for the first. For the second I need to double check to see if I can actually trigger it.

pmorie · 2018-04-27T20:44:48Z

pkg/controller/controller_instance_test.go

@@ -4269,7 +4355,7 @@ func TestResolveReferencesNoClusterServiceClass(t *testing.T) {
 }

 // TestReconcileServiceInstanceUpdateParameters tests updating a
-// ServiceInstance with new paramaters
+// ServiceInstance with new parameters


pmorie · 2018-04-27T20:53:56Z

Thanks a lot for the patch

Nail Islamov added 2 commits April 5, 2018 18:00

Update InProgressProperties on retry

387c275

Add unit tests for InProgressProperties update

a9a5af5

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 5, 2018

nilebox mentioned this pull request Apr 5, 2018

Handle InProgressProperties for retries properly #1903

Closed

nilebox commented Apr 5, 2018

View reviewed changes

duglin reviewed Apr 5, 2018

View reviewed changes

Fix review comments

1e15c36

duglin reviewed Apr 5, 2018

View reviewed changes

nilebox mentioned this pull request Apr 5, 2018

Allow the same user to edit an instance #1872

Closed

nilebox added the non-happy-path label Apr 5, 2018

MHBauer assigned duglin Apr 5, 2018

MHBauer reviewed Apr 5, 2018

View reviewed changes

Keep InProgressProperties up-to-date

2abf835

kibbles-n-bytes approved these changes Apr 6, 2018

View reviewed changes

kibbles-n-bytes added the LGTM1 label Apr 6, 2018

pmorie reviewed Apr 27, 2018

View reviewed changes

pmorie added the LGTM2 label Apr 27, 2018

pmorie merged commit 541866d into kubernetes-retired:master Apr 27, 2018

nilebox deleted the sync-in-progress-properties branch May 14, 2018 00:01

cblecker unassigned duglin Jun 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep InProgressProperties up-to-date #1916

Keep InProgressProperties up-to-date #1916

nilebox commented Apr 5, 2018

nilebox commented Apr 5, 2018

nilebox Apr 5, 2018

nilebox Apr 5, 2018

duglin Apr 5, 2018 •

edited

Loading

duglin Apr 5, 2018

duglin Apr 5, 2018

duglin Apr 5, 2018

nilebox Apr 5, 2018

kibbles-n-bytes commented Apr 5, 2018 •

edited

Loading

MHBauer Apr 5, 2018

nilebox Apr 5, 2018

nilebox Apr 5, 2018 •

edited

Loading

nilebox commented Apr 5, 2018

kibbles-n-bytes commented Apr 5, 2018

kibbles-n-bytes left a comment

MHBauer commented Apr 13, 2018

kibbles-n-bytes commented Apr 27, 2018 •

edited

Loading

pmorie Apr 27, 2018

pmorie commented Apr 27, 2018

Keep InProgressProperties up-to-date #1916

Keep InProgressProperties up-to-date #1916

Conversation

nilebox commented Apr 5, 2018

nilebox commented Apr 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duglin Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibbles-n-bytes commented Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nilebox Apr 5, 2018 • edited Loading

Choose a reason for hiding this comment

nilebox commented Apr 5, 2018

kibbles-n-bytes commented Apr 5, 2018

kibbles-n-bytes left a comment

Choose a reason for hiding this comment

MHBauer commented Apr 13, 2018

kibbles-n-bytes commented Apr 27, 2018 • edited Loading

Choose a reason for hiding this comment

pmorie commented Apr 27, 2018

duglin Apr 5, 2018 •

edited

Loading

kibbles-n-bytes commented Apr 5, 2018 •

edited

Loading

nilebox Apr 5, 2018 •

edited

Loading

kibbles-n-bytes commented Apr 27, 2018 •

edited

Loading