Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to flush the buffer with error_class=NoMethodError error="undefined method `compact' #133

Closed
gurunathj opened this issue Oct 11, 2017 · 18 comments · Fixed by #171 or #177
Closed
Labels
Milestone

Comments

@gurunathj
Copy link

The aws-fluentd-plugin-kinesis having following errors while running with type kinesis_streams_aggregated and fluentd version 0.14.21.

2017-10-11 17:07:45 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2017-10-11 17:07:45 +0000 chunk="55b4874e145808169ea53006b65fdb54" error_class=NoMethodError error="undefined method `compact' for 92:Fixnum"

2017-10-11 17:09:15 +0000 [warn]: #0 failed to flush the buffer. retry_time=8 next_retry_seconds=2017-10-11 17:09:21 +0000 chunk="55b487651c7eeff3d29c0c2caa6bc7c2" error_class=NoMethodError error="undefined method `compact' for 50:Fixnum"

Additionally I am using fluent-plugin-s3 along with fluent-plugin-kinesis.

@gurunathj
Copy link
Author

gurunathj commented Oct 11, 2017

Additional logs.

2017-10-11 20:27:25 +0000 [info]: gem 'fluent-plugin-kinesis' version '2.0.1'
2017-10-11 20:27:25 +0000 [info]: gem 'fluent-plugin-s3' version '1.0.0.rc7'
2017-10-11 20:27:25 +0000 [info]: gem 'fluentd' version '0.14.21'
2017-10-11 20:27:25 +0000 [info]: adding match in @syslog pattern="rtb" type="copy"
2017-10-11 20:27:25 +0000 [info]: adding source type="tail"
2017-10-11 20:27:25 +0000 [info]: #0 starting fluentd worker pid=24 ppid=20 worker=0
2017-10-11 20:27:25 +0000 [info]: #0 fluentd worker is now running worker=0
2017-10-11 20:30:25 +0000 [info]: #0 [input2] following tail of /val/log/rtb.log
2017-10-11 20:30:31 +0000 [warn]: #0 chunk bytes limit exceeds for an emitted event stream: 10974bytes
2017-10-11 20:30:32 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2017-10-11 20:30:33 +0000 chunk="55b4b4b8fdc9fd26227884b5baa68ae5" error_class=NoMethodError error="undefined method compact' for 34:Fixnum" 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/kinesis_helper/api.rb:67:in size_of_values'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/out_kinesis_streams_aggregated.rb:65:in size_of_values' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/kinesis_helper/api.rb:76:in block in split_to_batches'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/event.rb:323:in each' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/event.rb:323:in block in each'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/plugin/buffer/file_chunk.rb:170:in open' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/event.rb:322:in each'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/kinesis_helper/api.rb:75:in each' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/kinesis_helper/api.rb:75:in split_to_batches'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/kinesis.rb:147:in write_records_batch' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kinesis-2.0.1/lib/fluent/plugin/out_kinesis_streams_aggregated.rb:45:in write'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/compat/output.rb:131:in write' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/plugin/output.rb:1061:in try_flush'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/plugin/output.rb:1286:in flush_thread_run' 2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/plugin/output.rb:438:in block (2 levels) in start'
2017-10-11 20:30:32 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.21/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

@riywo
Copy link
Contributor

riywo commented Oct 11, 2017

Hi @gurunathj ,

Thank you for reporting the issue. Could you give me whole fluent.conf related to this plugin? Also, how is it working with @type kinesis_streams? Finally could you show an example record causing this problem?

Thanks,

@gurunathj
Copy link
Author

gurunathj commented Oct 12, 2017

test_record.txt

test_fluentd_conf.txt

Hi @riywo

Thanks for looking into this. I have further observations and details as below.

  • The error 'failed to flush the buffer' is followed by 'chunk bytes limit exceeds for an emitted event stream'.

  • The issue seems related to chunk_limit_size and total size of messages to be added in chunk.

  • For example if record size is 333 bytes and chunk_limit_size is 1k, then error is easily reproducible when input source has 3 or more records to be processed. (It works for 1 or 2 records in this case.)
    --chunk bytes limit exceeds for an emitted event stream: 1133bytes

  • Eventually it results into error - failed to flush the buffer, and hit limit for retries. dropping all chunks in the buffer queue. retry_times=10 records=10 error_class=NoMethodError error="undefined method `compact' for 51:Fixnum"

  • This is reproducible with both kinesis_streams_aggregated and kinesis_streams type

Attached fluentd config and message used for testing.
Let me know if anything wrong with config.

@riywo
Copy link
Contributor

riywo commented Oct 13, 2017

Thank you for providing the detail and examples. I've reproduced this issue. Let me investigate it.

@riywo riywo added bug and removed need more info labels Oct 13, 2017
@riywo
Copy link
Contributor

riywo commented Oct 16, 2017

Hi @gurunathj ,

Could you share why you configure chunk_limit_size so small? I think larger chunk_limit_size is the easiest way to avoid this issue.

@gurunathj
Copy link
Author

Hi @riywo

The chunk_limit_size of 1k in example is for purpose of reproducing the issue. Anyway in actual environment the issue occurs when I provide chunk_limit_size up to 1m. I am keeping chunk_limit_size less than 1m because of Kinesis shard put request limit.

So is chunk_limit_size actually related to Kinesis shard put request limit? Does it make any difference when multiple instances and threads are configured for fluentd? Otherwise I can configure chunk_limit_size much higher to handle heavy load.

@riywo
Copy link
Contributor

riywo commented Oct 17, 2017

When putting events of one chunk to Kinesis, this plugin split the chunk to align the limit of API limits automatically, so you can set chunk_limit_size as much as possible you want. The cons of larger chunk size is it will cause lots of duplicated put records to Kinesis since the retry unit of fluentd is by chunk.

You can also configure the number of slicing by batch_request_max_count and batch_request_max_size.

@gurunathj
Copy link
Author

Setting larger chunk_limit_size resolved the error. Thank you.

@riywo
Copy link
Contributor

riywo commented Oct 20, 2017

Hi @gurunathj ,

I'm happy to here that larget chumk_limit_size sovled the error.

This error only happens with fluentd v0.14 since it automatically split data to multiple chunk but this plugin doesn't take care of it so far. The solution could be using the standard format of v0.14 plugin, but this is a big change. So, I'd like to keep current code and put solving this issue for the next major version v3. As well as it, I'll add this workaround to README later.

@riywo riywo added this to the v3.0 milestone Oct 20, 2017
@cknowles
Copy link

cknowles commented Nov 2, 2017

Experiencing the same error with v2.0.1 of this plugin with fluentd 0.12.40. Prior to updating we were on v1.1.3 of this plugin and fluentd 0.12.36 and never experienced the same problem with that. I don't have a cause yet so cannot say if it's entirely bound to the plugin update.

@rgarcia
Copy link

rgarcia commented Apr 12, 2018

We're hitting this issue running fluentd v1.1 (fluent/fluentd:v1.1-debian docker image), version 2.1.1 of this plugin, and the following configuration:

  # kinesis plugin recommends these fluentd buffer settings:
  buffer_chunk_limit 1m
  buffer_queue_full_action block
  flush_interval 1
  try_flush_interval 0.1
  queued_chunk_flush_interval 0.01
  num_threads 15

  buffer_queue_full_action block
  buffer_queue_limit 16

  <kinesis_producer>
    aggregation_enabled true
    log_level info
    record_max_buffered_time 300
    record_ttl 120000
    metrics_namespace "#{ENV['KINESIS_METRICS_NAMESPACE']}"
  </kinesis_producer>

error message:

2018-04-12 23:23:46 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2018-04-12 23:23:46 +0000 chunk="569af0beb96f3e77e169bba1e2a54b00" error_class=NoMethodError error="undefined method `compact' for 57:Fixnum"

What's the recommended setting for chunk_limit_size?

@gurunathj
Copy link
Author

I think the chunk_limit_size needs to be configured according to message size and available memory. I used higher value like 128m.

@pawel-lmcb
Copy link

I increased my chink_limit_size but I am still receiving this error. Curious if anyone has any tips to debugging this.

@phagunbaya
Copy link

Same here. Please help!

@adammw
Copy link
Contributor

adammw commented Jul 21, 2018

Here is a test script that seems to reliably reproduce the error:

require 'aws-sdk-kinesis'
require 'fluent/output'
require 'fluent/plugin/out_kinesis_streams_aggregated'
require 'fluent/test'
require 'fluent/test/driver/output'

driver = Fluent::Test::Driver::Output.new(Fluent::KinesisStreamsAggregatedOutput).configure <<~CONF
  log_level debug
  region us-west-1
  stream_name dummy
  aws_key_id abcdef123
  aws_sec_key abcdef123

  <buffer>
    chunk_limit_size "1m"
  </buffer>
CONF

Aws::Kinesis::Client.prepend(Module.new do
  def put_records(*args)
    OpenStruct.new(
      encryption_type: "KMS",
      failed_record_count: 0,
      records: [
        OpenStruct.new(
          sequence_number: "12345",
          shard_id: "12345"
        )
      ]
    )
  end
end)

driver.run(force_flush_retry: true) do
  10.times do
    time = Fluent::EventTime.now
    events = Array.new(Kernel.rand(3000..5000)).map { [time, { msg: "x" * 256 }] }
    driver.feed("my.tag", events)
  end
end

puts driver.logs

@adammw
Copy link
Contributor

adammw commented Jul 25, 2018

From further investigation, this error seems to be occurring due to the use of the old v0.12 plugin api compatibility libraries. Internally, the records get msgpack'd and then appended to a single string that is emitted to the buffer, which is then split as a string when the size of the chunk is too large, without regard to the msgpack format. So when the plugin tries to parse the chunk as msgpack (via the msgpack_each helper), it fails and returns Integers instead.

A test version of the same script using the Fluentd v1.0 plugin API appears not to exhibit this same issue: https://gist.github.com/adammw/27b7a3f236cb8fbbea8e1b3a4907225e (as long as the Fluent::SetTimeKeyMixin and Fluent::SetTagKeyMixin are not included)

@riywo
Copy link
Contributor

riywo commented Jan 21, 2019

We've released v3.0.0.rc.1.0 https://rubygems.org/gems/fluent-plugin-kinesis/versions/3.0.0.rc.1.0

If there is no negative feedback, we'll release v3.0.0 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants