Revisit backoff logic #69

tehranian · 2016-05-31T19:09:56Z

Was looking through the backoff logic after we hit some "ProvisionedThroughputExceededException" errors. We see the following issues:

From

aws-fluent-plugin-kinesis/lib/fluent/plugin/kinesis_helper/api.rb

Lines 154 to 160 in b9ab13a

    
           def calc(count) 
        
             (2 ** count) * scaling_factor 
        
           end 
        
           def scaling_factor 
        
             0.3 + (0.5-rand) * 0.1 
        
           end

:

scaling_factor is always going to be a number between 0.25-0.35. That is not a very wide spread.
Given that the default number of retries is 3 and the scaling factor is relatively narrow, calc(count) for counts {0,1,2} is going to return backoffs of approximately 0.5 sec, 1 sec, and 2 seconds, respectively. Therefore a temporary (~5 second) spike in traffic could cause records to be dropped via a ProvisionedThroughputExceededException, by the default configuration.

We're getting around this by setting retries_on_batch_request to 7 in order to give ourselves 30+ seconds of retries, but I think that one could reasonably argue that the defaults here for backoffs & retries are not great.

The text was updated successfully, but these errors were encountered:

riywo · 2016-06-01T22:15:29Z

Hi @tehranian ,

Thank you for your feedback from the real world. I'll consider to change the default value, but it is an incompatible change which will affect to all users who use the default value. So, I'll update it when I bump the major version.

By the way, do you need more configurable parameters to adjust backoff logic, such as base_of_scaling_factor?

tehranian · 2016-06-01T23:22:40Z

Hi @riywo ,

re: changing the behavior w/a major version bump - Yea that makes sense.

re: more configurable parameters - Yea, I was going to suggest something like this. Seems ok to me. Or just clearly document to folks in the README section for retries_on_batch_request what the behavior of the default retry configuration will result in.

riywo · 2017-03-13T07:19:07Z

#112

riywo added the enhancement label Jun 1, 2016

riywo added this to the 2.0.0 milestone Jun 11, 2016

riywo closed this as completed Mar 13, 2017

riywo mentioned this issue Mar 13, 2017

v2.0.0 #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit backoff logic #69

Revisit backoff logic #69

tehranian commented May 31, 2016 •

edited

Loading

riywo commented Jun 1, 2016

tehranian commented Jun 1, 2016

riywo commented Mar 13, 2017

Revisit backoff logic #69

Revisit backoff logic #69

Comments

tehranian commented May 31, 2016 • edited Loading

riywo commented Jun 1, 2016

tehranian commented Jun 1, 2016

riywo commented Mar 13, 2017

tehranian commented May 31, 2016 •

edited

Loading