Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[403] {\"message\":\"The security token included in the request is expired\"} #24

Closed
vishalmamidi opened this issue Feb 23, 2022 · 9 comments · Fixed by #52
Closed

[403] {\"message\":\"The security token included in the request is expired\"} #24

vishalmamidi opened this issue Feb 23, 2022 · 9 comments · Fixed by #52
Labels
help wanted Extra attention is needed Kubernetes

Comments

@vishalmamidi
Copy link

Issue

Initially Fluentd is working file for some days,
but after some times facing this issue and Fluentd doesn't sends logs to opensearch
and this issue gets resolves after restarting POD or deleting and re-creating Fluentd POD

Issue log

2022-02-21 15:43:50 +0000 [debug]: #0 taking back chunk for errors. chunk="5d88918bc47758f16ab54ea9a55a8407"
2022-02-21 15:43:50 +0000 [warn]: #0 failed to flush the buffer. retry_times=0 next_retry_time=2022-02-21 15:43:53 +0000 chunk="5d88918bc47758f16ab54ea9a55a8407" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"vpc-east-1.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\"}): [429] 429 Too Many Requests /_bulk"
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:1101:in `rescue in send_bulk'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:1063:in `send_bulk'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:878:in `block in write'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:877:in `each'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:877:in `write'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1491:in `flush_thread_run'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-02-21 15:43:50 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-02-21 15:43:52 +0000 [warn]: #0 retry succeeded. chunk_id="5d88918bc47758f16ab54ea9a55a8407"
2022-02-21 15:43:58 +0000 [debug]: #0 Created new chunk chunk_id="5d88919cf1b1dbf266ce66e8c93f6df4" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="jb-audit-user", variables=nil, seq=0>
2022-02-21 15:44:16 +0000 [debug]: #0 Created new chunk chunk_id="5d8891ae20521b741c6002d1f72f4f2d" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="jb-audit-user", variables=nil, seq=0>

2022-02-22 06:26:56 +0000 [warn]: #0 failed to flush the buffer. retry_times=1 next_retry_time=2022-02-22 06:26:59 +0000 chunk="5d8956eca3b266ee08a756b1862745f7" error_class=Fluent::Plugin::OpenSearchOutput::RecoverableRequestFailure error="could not push logs to OpenSearch cluster ({:host=>\"vpc.es.amazonaws.com\", :port=>443, :scheme=>\"https\", :user=>\"admin\", :password=>\"obfuscated\"}): [403] {\"message\":\"The security token included in the request is expired\"}"
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:1101:in `rescue in send_bulk'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:1063:in `send_bulk'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:878:in `block in write'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:877:in `each'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-opensearch-1.0.1/lib/fluent/plugin/out_opensearch.rb:877:in `write'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:1491:in `flush_thread_run'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2022-02-22 06:26:56 +0000 [warn]: #0 /fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.14.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2022-02-22 06:26:58 +0000 [debug]: #0 taking back chunk for errors. chunk="5d8956eca3b266ee08a756b1862745f7"

Steps to replicate

DockeFile file used for deploying fluentd to Kubernetes

FROM fluent/fluentd-kubernetes-daemonset:v1.14-debian-kafka-1

USER root
RUN gem install fluent-plugin-opensearch
RUN gem install fluent-plugin-concat
RUN gem install fluent-plugin-stdout-pp
RUN gem install fluent-plugin-multi-format-parser

Config file used to connect to AWS OpenSearch

2022-02-23 02:32:29 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    read_from_head true
    tag "kubernetes"
    path "/var/log/containers/*java*container*.log"
    path_key "path"
    pos_file "/var/log/fluentd-containers.log.pos"
    exclude_path ["/var/log/containers/fluent*"]
    <parse>
      @type "json"
      json_parser json
      time_key "time"
      time_format "%iso8601"
      unmatched_lines 
      time_type string
    </parse>
  </source>
  <source>
    @type kafka_group
    brokers "b-1.amazonaws.com:9092,b-2.amazonaws.com:9092"
    consumer_group "amazon.broker-2"
    topics "jb-audit-user"
    format "json"
  </source>
  <source>
    @type kafka_group
    brokers "b-1.amazonaws.com:9092,b-2.amazonaws.com:9092"
    consumer_group "amazon.broker-2"
    topics "jb-audit-sys"
    format "json"
  </source>
  <source>
    @type kafka_group
    brokers "b-1.amazonaws.com:9092,b-2.amazonaws.com:9092"
    consumer_group "amazon.broker-2"
    topics "MT-Incoming-payments"
    format "json"
  </source>
  <filter kubernetes>
    @type parser
    key_name "log"
    reserve_time true
    reserve_data true
    remove_key_name_field true
    replace_invalid_sequence true
    emit_invalid_record_to_error true
    <parse>
      @type "multi_format"
      <pattern>
        format json
      </pattern>
      <pattern>
        format none
      </pattern>
    </parse>
  </filter>
  <match kubernetes>
    @type opensearch
    ssl_verify false
    @log_level "debug"
    logstash_format true
    logstash_prefix "jb-app-log-java"
    logstash_prefix_separator "-"
    logstash_dateformat "%Y.%m"
    user "admin"
    password xxxxxx
    <endpoint>
      url https://vpc.amazonaws.com:443
      region "us-east-1"
    </endpoint>
    <buffer>
      @type "file"
      path "/var/log/fluentd-buffers/kubernetes.system.buffer"
      flush_mode interval
      flush_interval 10s
      flush_thread_count 8
      flush_at_shutdown true
      chunk_full_threshold 0.9
      retry_forever true
      retry_type exponential_backoff
      retry_wait 2s
    </buffer>
  </match>
  <match jb-audit-user>
    @type opensearch
    ssl_verify false
    @log_level "debug"
    logstash_format true
    logstash_prefix "jb-audit-user"
    logstash_prefix_separator "-"
    logstash_dateformat "%Y.%m"
    user "admin"
    password xxxxxx
    <endpoint>
      url https://vpc.amazonaws.com:443
      region "us-east-1"
    </endpoint>
    <buffer>
      @type "file"
      path "/var/log/fluentd-buffers/kubernetes-jb-audit-user.system.buffer"
      flush_mode interval
      flush_interval 10s
      flush_thread_count 8
      flush_at_shutdown true
      chunk_full_threshold 0.9
      retry_forever true
      retry_type exponential_backoff
      retry_wait 2s
    </buffer>
  </match>
  <match jb-audit-sys>
    @type opensearch
    ssl_verify false
    @log_level "debug"
    logstash_format true
    logstash_prefix "jb-audit-sys"
    logstash_prefix_separator "-"
    logstash_dateformat "%Y.%m"
    user "admin"
    password xxxxxx
    <endpoint>
      url https://vpc.amazonaws.com:443
      region "us-east-1"
    </endpoint>
    <buffer>
      @type "file"
      path "/var/log/fluentd-buffers/kubernetes-jb-audit-sys.system.buffer"
      flush_mode interval
      flush_interval 10s
      flush_thread_count 8
      flush_at_shutdown true
      chunk_full_threshold 0.9
      retry_forever true
      retry_type exponential_backoff
      retry_wait 2s
    </buffer>
  </match>
  <match MT-Incoming-payments>
    @type opensearch
    ssl_verify false
    @log_level "debug"
    logstash_format true
    logstash_prefix "jb-event-txn"
    logstash_prefix_separator "-"
    logstash_dateformat "%Y.%m"
    user "admin"
    password xxxxxx
    <endpoint>
      url https://vpc.amazonaws.com:443
      region "us-east-1"
    </endpoint>
    <buffer>
      @type "file"
      path "/var/log/fluentd-buffers/kubernetes-jb-event-txn.system.buffer"
      flush_mode interval
      flush_interval 1s
      flush_thread_count 4
      chunk_full_threshold 0.9
    </buffer>
  </match>
</ROOT>

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html
have added backend role to connect to OpenSearch trough fine grade access control

image

Expected Behavior or What you need to ask

how to solve this issue without restarting POD

...

Using Fluentd and OpenSearch plugin versions

2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-avro' version '1.1.1'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-concat' version '2.5.0'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.14'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-grok-parser' version '2.6.2'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-json-in-json-2' version '1.0.2'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-kafka' version '0.17.3'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.9.2'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.0.1'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-parser-avro' version '0.3.1'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-parser-cri' version '0.1.1'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.2'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-stdout-pp' version '0.2.0'
2022-02-23 02:32:28 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2022-02-23 02:32:28 +0000 [info]: gem 'fluentd' version '1.14.3'
2022-02-21 13:50:38 +0000 [info]: starting fluentd-1.14.3 pid=7 ruby="2.6.9"

@cosmo0920
Copy link
Collaborator

cosmo0920 commented Mar 1, 2022

This could be handled by custom assume roles.
You can create an assume role and use that role.

ref.)

@DaemonDude23
Copy link

DaemonDude23 commented Mar 1, 2022

OP's Fluentd config doesn't mention passing any static credentials, and his screenshot shows the use of an IAM role associated with an EKS worker node. That makes me think he's already using IAM role(s) for Fluentd to assume. This is how I'm passing these to the plugin:

        <endpoint>
          url "#{ENV['REDACTED_FLUENT_ELASTICSEARCH_URL']}"
          region "#{ENV['AWS_REGION']}"
          assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
          assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
        </endpoint>

I encounter this same error. I am using IRSA so that the Fluentd pods assume the IAM role, not the node like OP. Same overall approach, though.

I was using fluent-plugin-aws-elasticsearch-service for over 6 months without issue. I switched from that older plugin to using this one, and bumped Fluentd from 1.13 to 1.14. The IAM role's default is a 1h duration on the IAM role session. As a result, after roughly 1 hour of running these new images with this plugin, I get the error from this thread. A restart 'fixes' it, as it does for OP. I've had to rollback to the old plugin in order to not lose tons of logs.

This looks to me like the plugin successfully retrieves the credentials for the role at startup, but does not renew them any time after that. Same behavior with v1.0.2 and v1.0.1.

Here's a graph of my buffers. The long lead up is when I was using the older container/plugin versions. Then I deployed updated containers with this plugin, I see logs flowing in fine in Kibana, no errors from Fluentd. Then it starts throwing this 403 error. I restart them, it's fine for a while, then buffers begin filling again as it gets the 403 since the credentials it's trying to pass are no longer valid.

Screenshot_20220225_134534

@vishalmamidi
Copy link
Author

@DaemonDude23 Could you please help in setting up IRSA so that the Fluentd pods assume the IAM role,
using fluent-plugin-aws-elasticsearch-service as a workaround for now,

facking this issue

2022-03-01 16:44:08 +0000 [error]: #0 unexpected error error_class=Aws::STS::Errors::InvalidIdentityToken error="Missing a required claim: aud"

is assume_role_web_identity_token_file location correct ?
or I am unable to do

        <endpoint>
          url    "#{ENV['FLUENT_OPENSEARCH_HOST'] + ':' + ENV['FLUENT_OPENSEARCH_PORT']}"
          region "#{ENV['FLUENT_OPENSEARCH_REGION']}"
          assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
          assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
        </endpoint>
          - name:  AWS_ROLE_ARN
            value: "arn:aws:iam::XXX:role/fluent-access-role"
          - name:  AWS_WEB_IDENTITY_TOKEN_FILE
            value: "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"

My Dockerfile

FROM fluent/fluentd-kubernetes-daemonset:v1.14-debian-kafka-1

USER root
RUN gem install elasticsearch -v 7.13.3
RUN gem install elasticsearch-api -v 7.13.3
RUN gem install elasticsearch-transport -v 7.13.3
RUN gem install fluent-plugin-elasticsearch -v 5.0.5
RUN gem install fluent-plugin-aws-elasticsearch-service
RUN gem install fluent-plugin-opensearch
RUN gem install fluent-plugin-concat
RUN gem install fluent-plugin-stdout-pp
RUN gem install fluent-plugin-multi-format-parser

My role in IAM dashboard

fluentd_access_role

OpenSearch Cluster security configuration

opensearch_access_policy

@DaemonDude23
Copy link

DaemonDude23 commented Mar 1, 2022

@vishalmamidi It's fairly complicated, but I'll try to help get you started.
I'm certain the maintainers here don't want this thread to turn into a troubleshooting session for general EKS config, so I won't help here with getting IRSA to work in general beyond this one post.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

What you don't need:

  • To specify the port number to the OpenSearch instance unless you're not running it on 443 (not sure if that's even possible to specify another port)
  • You do not need to explicitly pass in env vars AWS_ROLE_ARN or AWS_WEB_IDENTITY_TOKEN_FILE. This is done automatically with a valid/functioning k8s service account attachment to the pod.
    • I'm only adding in the opensearch endpoint into environment variables in the Fluentd container. Nothing more than just the address to reach it.
    • If you change your env vars to what I've posted previously in this thread (e.g. don't use FLUENT_OPENSEARCH_REGION but instead AWS_REGION), they will be picked up by Fluentd as they're already in the container if IRSA is configured correctly.

You need:

  • OIDC configured in your EKS cluster to enable use of IRSA.
  • My IAM role permission policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "es:ESHttpPost",
            "Resource": "arn:aws:es:us-east-1:REDACTED:domain/REDACTED"
        }
    ]
}
  • My IAM assume role policy/Trust relationship:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::REDACTED:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/REDACTED"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "oidc.eks.us-east-1.amazonaws.com/id/REDACTED:sub": "system:serviceaccount:NAMESPACE_OF_POD_HERE:EXACT_NAME_OF_K8S_SERVICE_ACCOUNT"
                }
            }
        }
    ]
}
  • The k8s service account used by your pods needs an annotation like this in order for Kubernetes to know how to map its objects to an IAM role:
eks.amazonaws.com/role-arn: arn:aws:iam::REDACTED:role/EXACT_NAME_OF_K8S_SERVICE_ACCOUNT
  • Edit the permissions inside of Kibana/OpenSearch to allow this role to perform actions against it, like write to some indices. Your original screenshot indicates you've pretty much done that, but you'll need to add the IAM role ARN that you're creating here.
  • Once that annotation is in place, the pod(s) need to be restarted
  • It should work at this point.

@vishalmamidi
Copy link
Author

@DaemonDude23 Thank you soo much

@sagimann
Copy link

sagimann commented Mar 16, 2022

Hi, I've encountered the same issue but in a slightly different environment. Note that I, too, have a working environment with the old plugin, so I know the IAM role is wired correctly.

  • I'm using Docker Beanstalk v3 - not EKS
  • I'm using the plugin v1.0.2 inside a fluentd docker container
  • I see at runtime that the plugin picks up the instance profile credentials and there is no local credential file
  • The error in the container is the same, happens after an hour or so
  • My fluentd config is:
<match **>
    @type opensearch
    @id   output_os
    type_name "access_log"
    logstash_format true
    include_tag_key true
    flush_interval 1s
    retry_limit 2
    tag_key "@log_name"
    reconnect_on_error true
    buffer_chunk_limit 400k
    <endpoint>
        url REDACTED
        region REDACTED
    </endpoint>
</match>

I don't see anything missing between what's written above and what I have, that would explain this "outage" after 1 hour and the fact that it worked with the older plugin.

Any ideas?

@vishalmamidi
Copy link
Author

vishalmamidi commented Apr 4, 2022

@cosmo0920 @DaemonDude23

Still, I am facing the "The security token included in the request is expired" issue and in my fluentd config I am not passing any
credentials like user & password in the match block I am just giving url & region in config and fluentd is connecting using EKS Node Role.
and this issue is not occurring for every 1 hr and this issue is coming after 20+ hrs and some times 10 hrs

and issue stays like that for some hrs and suddenly it starts working again

below are the updated configuration

<match kubernetes>
    @type opensearch
    ssl_verify false
    @log_level "debug"
    logstash_format true
    logstash_prefix "jb-app-log-java"
    logstash_prefix_separator "-"
    logstash_dateformat "%Y.%m"
    <endpoint>
      url https://vpc.amazonaws.com:443
      region "us-east-1"
    </endpoint>
  </match>

Any help is appreciated. Thanks in advance!

@kyleli666
Copy link

kyleli666 commented Apr 8, 2022

I have the same issue here with bare metal td-agent on an AWS Ubuntu EC2.

  1. I have the IAM role with AWS opensearch access attached to my EC2, and it works fine with the old fluent-plugin-aws-elasticsearch-service.
  2. By executing curl http://169.254.169.254/latest/meta-data/iam/security-credentials/examplerole, I can see the IAM role's token expires after about 6 hours, and meanwhile new tokens are created every several minutes with the same expiration duration.
  3. But, the opensearch plugin seems only uses the first token it got when td-agent start (or restart), and never re-get the new token in the later requests. Just after about 6 hours, I get the error {\"message\":\"The security token included in the request is expired\"}.

I was observing the process of this issue all the day today, I feel it's a bug now.

I have to turn back to the old aws-elasticsearch plugin, and block my td-agent version at 4.2.0, because there is some other compatibility issue.

@kyleli666
Copy link

I feel I found a workaround, see #46 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Kubernetes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants