Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container telemetry not shown in AppHost dashboard #4131

Open
paulomorgado opened this issue May 9, 2024 · 40 comments
Open

Container telemetry not shown in AppHost dashboard #4131

paulomorgado opened this issue May 9, 2024 · 40 comments
Labels
area-telemetry rancher Issues related to rancher
Milestone

Comments

@paulomorgado
Copy link
Contributor

paulomorgado commented May 9, 2024

In preview 5, adding .WithOtlpExporter() to .AddContainer(...) would show OpenTelemetry logs, traces and metrics in the dashboard.

Since preview 6 this doesn't happen.

But if I had a dashboard container (mcr.microsoft.com/dotnet/aspire-dashboard or mcr.microsoft.com/dotnet/nightly/aspire-dashboard:8.0-preview):

var aspireDashboard = builder.AddContainer(
    "aspire-dashboard",
    "mcr.microsoft.com/dotnet/aspire-dashboard",
    "8.0-preview")
    .WithEndpoint(targetPort: 18888, port: 5018, name: "dash", scheme: "http", isProxied: true)
    .WithEndpoint(targetPort: 18889, port: 5019, name: "otel", scheme: "http", isProxied: true)
    .WithEnvironment("DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS", "true")
    .WithEnvironment("ASPIRE_ALLOW_UNSECURED_TRANSPORT", "true")
    .WithEnvironment("DASHBOARD__OTLP__AUTHMODE", "ApiKey")
    .WithEnvironment("DASHBOARD__OTLP__PRIMARYAPIKEY", builder.Configuration["AppHost:OtlpApiKey"])
    ;
var service = builder.AddContainer(...)
    .WithEndpoint(targetPort: 8080, port: 5012, name: "http", scheme: "http")
    .WithEnvironment("OTEL_BLRP_SCHEDULE_DELAY", "1000")
    .WithEnvironment("OTEL_BSP_SCHEDULE_DELAY", "1000")
    .WithEnvironment("OTEL_METRIC_EXPORT_INTERVAL", "1000")
    .WithEnvironment("OTEL_TRACES_SAMPLER", "always_on")
    .WithEnvironment("OTEL_EXPORTER_OTLP_PROTOCOL", "grpc")
    .WithEnvironment("OTEL_SERVICE_NAME", "hub")
    .WithEnvironment("OTEL_RESOURCE_ATTRIBUTES", "service.instance.id=hub")
    .WithEnvironment("OTEL_EXPORTER_OTLP_HEADERS", $"x-otlp-api-key={builder.Configuration["AppHost:OtlpApiKey"]}")
    .WithEnvironment(ctx => ctx.EnvironmentVariables["OTEL_EXPORTER_OTLP_ENDPOINT"] = new HostUrl(hubAspireDashboard.GetEndpoint("otel").Url))
...

the dashboard from the aspireDashboard container show telemetry from the service container.

service container is referenced from other resources and references other resources without any issues.

@DamianEdwards
Copy link
Member

Which container runtime are you using?

@paulomorgado
Copy link
Contributor Author

Rancher Desktop: https://rancherdesktop.io/

@davidfowl
Copy link
Member

Could it be https?

@paulomorgado
Copy link
Contributor Author

I think I tried every possibility.

Always works with the external dashboard, but never with the host dashboard.

@davidfowl
Copy link
Member

Provide a minimal repro that we can run. That’ll help us diagnosis the problem.

@paulomorgado
Copy link
Contributor Author

I was able to reproduce with the Aspire starter demo on Rancher Desktop.

  1. Create a new starter solution: dotnet new aspire-starter.
  2. Change the ApiService to produce a container.
  3. Build and publish the container.
  4. Change the Aspire host:
var builder = DistributedApplication.CreateBuilder(args);

//var apiService = builder.AddProject<Projects.ApiService>("apiservice")
var apiService = builder.AddContainer("apiservice", "apiservice")
    .WithHttpEndpoint(targetPort: 8080, name: "apiservice")
    .WithOtlpExporter()
    ;

builder.AddProject<Projects.Web>("webfrontend")
    .WithExternalHttpEndpoints()
    .WithReference(apiService.GetEndpoint("apiservice"))
    ;

builder.Build().Run();
  1. Run the Aspire host.

You can check that all OTEL environment variables are set for the container, but no telemetry is captured by the Aspire host.

@davidfowl
Copy link
Member

I'm guessing it's to do with https.

@paulomorgado
Copy link
Contributor Author

Were you able to repro?

It used to work.

@DamianEdwards
Copy link
Member

Can you turn up logging for the app in the container and look at the logs?

@paulomorgado
Copy link
Contributor Author

I omitted, for brevity, but I'm setting these environment variables:

    .WithEnvironment("Logging__LogLevel__Default", "Debug")
    .WithEnvironment("Logging__LogLevel__Microsoft.AspNetCore", "Debug")

@davidfowl might be on something here:

I'm guessing it's to do with https.

But it works fine with a containerized dashboard (mcr.microsoft.com/dotnet/aspire-dashboard:8.0-preview). That's what I'm using now:

var aspireDashboard = builder.AddContainer(
    "hub-aspire-dashboard",
    "mcr.microsoft.com/dotnet/aspire-dashboard",
    "8.0-preview")
    .WithEndpoint(targetPort: 18888, name: "dash", scheme: "http", isProxied: true)
    .WithEndpoint(targetPort: 18889, name: "otel", scheme: "http", isProxied: true)
    .WithEnvironment("DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS", "true")
    .WithEnvironment("ASPIRE_ALLOW_UNSECURED_TRANSPORT", "true")
    .WithEnvironment("DASHBOARD__OTLP__AUTHMODE", "ApiKey")
    .WithEnvironment("DASHBOARD__OTLP__PRIMARYAPIKEY", builder.Configuration["AppHost:OtlpApiKey"])
    .WithHealthCheck(
        endpointName: "otel",
        path: string.Empty,
        httpClientFactory: () => new HttpClient
        {
            DefaultRequestVersion = HttpVersion.Version20,
            DefaultVersionPolicy = HttpVersionPolicy.RequestVersionOrHigher,
        })
    ;
...
var mongoDB = builder.AddMongoDB("mongodb")
    .WithImageTag("5.0")
    .WithHealthCheck()
    ;
var hubIdp = builder.AddProject<Projects.Project1>(name: "hub-identity", launchProfileName: null)
    .WithEndpoint(scheme: "http")
    .WithEnvironment(ctx => ctx.EnvironmentVariables["OTEL_EXPORTER_OTLP_ENDPOINT"] = new HostUrl(aspireDashboard.GetEndpoint("otel").Url))
    ;
var hubContainer = builder.AddContainer("container", "Image" "Tag")
    .WithEndpoint(targetPort: 8080, name: "http", scheme: "http")
    .WithReference(mongoDB, connectionName: "MongoDB")
    .WithOtlpExporter()
    .WithEnvironment(ctx => ctx.EnvironmentVariables["OTEL_EXPORTER_OTLP_ENDPOINT"] = new 
    ;
...

@afscrome
Copy link
Contributor

OTEL logs don't show up in normal logs - you have to do a bit of extra magic to get them by creating an OTEL_DIAGNOSTICS.json file -
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/README.md#self-diagnostics

@DamianEdwards
Copy link
Member

It's likely HTTPS as the HTTPS certificate being used to secure the dashboard endpoints when it's run from the host is the ASP.NET Core Development Certificate which will not be trusted by default from inside a container. This will cause the OTLP exporter to fail. This change was introduced in preview 6 as part of security work. In the example you provided of running the dashboard itself in a container, you're explicitly disabling the HTTPS transport via the ASPIRE_ALLOW_UNSECURED_TRANSPORT environment variable and creating only HTTP endpoints using WithEndpoint.

Having composed containers be automatically configured to trust the ASP.NET Core Development Certificate is something we want to investigate enabling in the future.

@paulomorgado
Copy link
Contributor Author

It's likely HTTPS as the HTTPS certificate being used to secure the dashboard endpoints when it's run from the host is the ASP.NET Core Development Certificate which will not be trusted by default from inside a container. This will cause the OTLP exporter to fail. This change was introduced in preview 6 as part of security work. In the example you provided of running the dashboard itself in a container, you're explicitly disabling the HTTPS transport via the ASPIRE_ALLOW_UNSECURED_TRANSPORT environment variable and creating only HTTP endpoints using WithEndpoint.

Thanks. I have a demo working with this launch profile:

"http": {
  "commandName": "Project",
  "dotnetRunMessages": true,
  "launchBrowser": true,
  "applicationUrl": "http://localhost:15190",
  "environmentVariables": {
    "ASPNETCORE_ENVIRONMENT": "Development",
    "DOTNET_ENVIRONMENT": "Development",
    "DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS": "true",
    "ASPIRE_ALLOW_UNSECURED_TRANSPORT": "true",
    "DOTNET_DASHBOARD_OTLP_ENDPOINT_URL": "http://localhost:19164",
    "DOTNET_RESOURCE_SERVICE_ENDPOINT_URL": "http://localhost:20178"
  }
}

Having composed containers be automatically configured to trust the ASP.NET Core Development Certificate is something we want to investigate enabling in the future.

For images that are being pulled, wouldn't forcing the container to trust a certificate that shouldn't be trusted defeat the whole purpose of the security added to Aspire?

It would probably be better to use HTTP endpoints for OTEL. Or both HTTP and HTTPS.

@DamianEdwards
Copy link
Member

For images that are being pulled, wouldn't forcing the container to trust a certificate that shouldn't be trusted defeat the whole purpose of the security added to Aspire?

I don't think so. Configuring a container to trust a self-signed certificate that is already trusted by the host for development purposes (and only at development time) seems no different to what we do today WRT to trusting the ASP.NET Core Development Certificate for the purposes of localhost HTTPS testing. The trust would only happen when running during local development as the certificate is only for localhost anyway.

Using an unencrypted transport for OTEL means processes running as other accounts on the same machine during local development can potentially observe sensitive information being passed from apps to the dashboard's OTLP endpoint so we decided that behavior must be opt-in.

@davidfowl
Copy link
Member

Using an unencrypted transport for OTEL means processes running as other accounts on the same machine during local development can potentially observe sensitive information being passed from apps to the dashboard's OTLP endpoint so we decided that behavior must be opt-in.

For the standalone dashboard though it is unsecured and there's a warning in the dashboard for it.

@leslierichardson95
Copy link

@JamesNK should we consider adding some new docs about this scenario?

@Depechie
Copy link
Contributor

Depechie commented Sep 29, 2024

It's likely HTTPS as the HTTPS certificate being used to secure the dashboard endpoints when it's run from the host is the ASP.NET Core Development Certificate which will not be trusted by default from inside a container. This will cause the OTLP exporter to fail. This change was introduced in preview 6 as part of security work. In the example you provided of running the dashboard itself in a container, you're explicitly disabling the HTTPS transport via the ASPIRE_ALLOW_UNSECURED_TRANSPORT environment variable and creating only HTTP endpoints using WithEndpoint.

Thanks. I have a demo working with this launch profile:

"http": {
  "commandName": "Project",
  "dotnetRunMessages": true,
  "launchBrowser": true,
  "applicationUrl": "http://localhost:15190",
  "environmentVariables": {
    "ASPNETCORE_ENVIRONMENT": "Development",
    "DOTNET_ENVIRONMENT": "Development",
    "DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS": "true",
    "ASPIRE_ALLOW_UNSECURED_TRANSPORT": "true",
    "DOTNET_DASHBOARD_OTLP_ENDPOINT_URL": "http://localhost:19164",
    "DOTNET_RESOURCE_SERVICE_ENDPOINT_URL": "http://localhost:20178"
  }
}

Having composed containers be automatically configured to trust the ASP.NET Core Development Certificate is something we want to investigate enabling in the future.

For images that are being pulled, wouldn't forcing the container to trust a certificate that shouldn't be trusted defeat the whole purpose of the security added to Aspire?

It would probably be better to use HTTP endpoints for OTEL. Or both HTTP and HTTPS.

I had a similar setup... but moving from aspire 8.0.2 to 8.2.0 the dashboard will again show untrusted exception and an empty resources list, even with this configuration. ( forcing HTTP and allow unsure true )
@JamesNK or @paulomorgado anything that changed that is not in the release docs ( because I can't find anything related to this )

@afscrome
Copy link
Contributor

@Depechie Try upgrading to 8.2.1 - sounds like you're hitting #5532 .

@Depechie
Copy link
Contributor

@afscrome awesome, I missed that listed issue. Thx for the heads up and indeed now everything works fine agan.

@davidfowl davidfowl added the rancher Issues related to rancher label Sep 30, 2024
@davidfowl davidfowl removed the bug label Oct 16, 2024
@davidfowl davidfowl reopened this Jan 13, 2025
@davidfowl
Copy link
Member

Did we confirm that this was an https issue?

@paulomorgado
Copy link
Contributor Author

Did we confirm that this was an https issue?

I tried HTTP and it has the same issue.

@davidfowl
Copy link
Member

If you can make a minimal repro that we can run with instructions that would be great.

@paulomorgado
Copy link
Contributor Author

If you can make a minimal repro that we can run with instructions that would be great.

@davidfowl, I've provided instructions more than once in this issue. Unless someone tells me what's wrong with those instructions, I don't know what other instructions I can provide.

@paulomorgado
Copy link
Contributor Author

paulomorgado commented Jan 13, 2025

@davidfowl, you can add the OpenTelemetry collector to test this:

var otel = builder.AddContainer("otel", "otel/opentelemetry-collector-contrib", "0.117.0")
    .WithHttpEndpoint(targetPort: 4317, name: "grpc")
    .WithHttpEndpoint(targetPort: 4318, name: "http")
    .WithBindMount("Configuration/otel/config.yaml", "/etc/otelcol-contrib/config.yaml")
    .WithOtlpExporter()
    ;

var webservice = builder.AddProject<Projects.MyWebService>("webservice", launchProfileName: null)
    .WithHttpEndpoint()
    .WithServerAccessTokenHandler(idp)
    .WithLoggingDefaults()
    .WithEnvironment("OTEL_EXPORTER_OTLP_ENDPOINT", otel.GetEndpoint("grpc")).WaitFor(otel)
    ;

Where config.yaml is:

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  otlp:
    endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

If you look at the logs from the OTEL collector, you'll see something like this:

2025-01-13T14:39:37
 2025-01-13T14:39:37.467Z	warn	[email protected]/clientconn.go:1381	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "host.containers.internal:19270", ServerName: "host.containers.internal:19270", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 169.254.1.2:19270: i/o timeout"	{"grpc_log": true}

@DamianEdwards
Copy link
Member

I can't repro this in Docker Desktop on Windows. I get a connection error but it's due to HTTPS which is expected:

2025-01-13T11:14:51
 2025-01-13T19:14:51.419Z	warn	[email protected]/clientconn.go:1381	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "host.docker.internal:21120", ServerName: "host.docker.internal:21120", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for localhost, not host.docker.internal"	{"grpc_log": true}

@paulomorgado you're getting a timeout which suggests it's actually a networking issue. Does container->host networking just work with Rancher on macOS?

@afscrome
Copy link
Contributor

@paulomorgado This works of me - see https://github.com/afscrome/AspireOtelExample/tree/main/AspireApp1.AppHost. Can't explain the no connection issue - If you're still using Rancher make sure you've on 1.17.0.

In terms of your config file, there are a few things that mean the config file is never likely to work

  • Since v0.104.0, you'll need to explicitly set the endpoint if you want to be able to access the collector outside of localhost
  • If your destination is https and you want to ignore TLS cert errors, you'll need to set insecure_skip_verify: true
  • If your destination is http, you'll need to explicitly tell the otlp exporter to not use tls insecure: true
  • You'll likely need to pass the otlp api key to authenticate with the aspire dashboard

See https://github.com/afscrome/AspireOtelExample/blob/main/AspireApp1.AppHost/otel-config.yaml for a working example. That successfully forwards telemetry from projects on the host into the otel collector in a container and then back out to the dashboard. You can verify this by looking for the from-collector attribute which the collector adds

Image

@paulomorgado
Copy link
Contributor Author

paulomorgado commented Jan 13, 2025

I can't repro this in Docker Desktop on Windows. I get a connection error but it's due to HTTPS which is expected:

2025-01-13T11:14:51
 2025-01-13T19:14:51.419Z	warn	[email protected]/clientconn.go:1381	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "host.docker.internal:21120", ServerName: "host.docker.internal:21120", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for localhost, not host.docker.internal"	{"grpc_log": true}

@paulomorgado you're getting a timeout which suggests it's actually a networking issue. Does container->host networking just work with Rancher on macOS?

@DamianEdwards, I'm using podman now and with OpenTelemetry coming from a container.

@davidfowl
Copy link
Member

So 2 issues:

  1. HTTPS support inside of the container.
  2. podman container to host communication is borked (similar to Container to host networking not working with podman #6846)

@paulomorgado
Copy link
Contributor Author

@afscrome, you don't need the API key if your Aspire host has DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS=true, right?

@paulomorgado
Copy link
Contributor Author

So 2 issues:

  1. HTTPS support inside of the container.
  2. podman container to host communication is borked (similar to dapr sidecar not visible to container when running via podman #6846)

@davidfowl, I'm using ASPIRE_ALLOW_UNSECURED_TRANSPORT=true on the Aspire host to avoid HTTPS issues.

I don't have issues having one container validating JWTs against a Keycloak container - all HTTP.

Could this be a GRPC issue, because it's HTTPS?

@afscrome
Copy link
Contributor

@afscrome, you don't need the API key if your Aspire host has DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS=true, right?

Perhaps - I've never tried that.

Could this be a GRPC issue, because it's HTTPS?

The otlp exporter in the otel collector will use tls by default, even if the endpoint url contains http. You need to tell it to explicitly use http. That said if this were the problem, the error you get should be transport: authentication handshake failed: tls: first record does not look like a TLS handshake. The dial error you get suggests it's failing long before that point.

@atrauzzi
Copy link

atrauzzi commented Jan 13, 2025

@davidfowl - I don't think container to host networking is borked after the tests I did late last year. I think Aspires built in proxy isn't working correctly.

See this comment: #6846 (comment)

@richshadman
Copy link

richshadman commented Feb 15, 2025

I ran into this issue this week and am proposing a feature to BYO certificates for aspire (just like we can for other kestrel certs via kestrel config). See issue #7627 for details.

@davidfowl
Copy link
Member

We're working on a broader solution for this @DamianEdwards can add color.

@DamianEdwards
Copy link
Member

OK so the issue here is that when containers want to talk to services running on the host machine (rather than other containers) they do so via a special host name, host.docker.internal or host.containers.internal (depending on whether you're using Docker or Podman). The Aspire Dashboard is running on the host machine and is hosting the OTLP endpoint via HTTPS. The cerficiate used for the HTTPS endpoint is the ASP.NET Core HTTPS dev cert, the same one used by all ASP.NET Core apps by default during development. This self-signed certificate is generally trusted on the host machine as part of the .NET inner-loop (either by VS/C#DK prompting you to trust it, or by running dotnet dev-certs https --trust manually), but it won't be trusted by processes running in containers, or executables that don't use the hosting machine's certificate store/infra for cert operations (e.g. Node.js). Additionally, the certificate is only valid for the localhost host name (i.e. that's the certificate subject).

So to make this work we need two things:

  1. The dev cert needs to have additional SANs (subject alternative names) for the special container network host names (host.docker.internal and host.containers.internal). In the future it's possible we'll expand this to other special reserved local-only host names like *.localhost to enable scenarios like Provide a non-localhost domain for Aspire project & service endpoints #5508
  2. The dev cert needs to be trusted by any clients trying to access a service via an HTTPS endpoint using that cert

For 1, we're planning on updating dotnet dev-certs in a .NET SDK update so that the dev cert includes the SANs described. We'd likely do that in a servicing release of the 8.0.xxx and 9.0.xxx SDKs. The dev cert infrastructure includes a mechanism to version the cert so that we can make changes like this and the new cert will be laid down for you when you update your SDK. You'll need to re-trust it of course but if you're using VS or VS Code with C# Dev Kit you'll be prompted to do so the first time you launch after updating your SDK, otherwise you'll need to run dotnet dev-certs https --trust yourself (like usual).

For 2, we'll productize the capability from our samples repo that facilitates injecting the dev cert into Aspire resources (containers or executables) by exporting it from the host (using dotnet dev-certs https --export-path), setting an environment variable on the resource pointing to the cert file(s), and, in the case of containers, bind-mounting it into the container. Then each hosting integration can be updated to use this capability and do whatever other configuration is required to configure the specific container/executable to use and/or trust the dev cert (e.g. Node.js requires setting the NODE_EXTRA_CA_CERTS environment variable to point to the paths of certificate files that you want to be trusted beyond the usual public root CAs). (Related dotnet/aspnetcore#60369)

We'd like to be able to get all this done in an Aspire 9.x update. Soonest would be 9.2 but that's optimistic.

@richshadman
Copy link

Thank you for the detailed response on your planned solution. Regarding container injection, can you provide a bit more detail on how that would work and what level of configuration one has to modify default behavior? I am concerned with the potential for bind mount clashes, or non typical workloads or custom dockerfile images that may not play well with steps to inject the certificate into the container. If the bind mount is customizable we could more easily incorporate it into our cert bootstrapping processes already present in our docker builds.

What about the ability to simply set the certificate used by aspire (request #7627 above)? Depending on how simple that is, it gives the development team the ability to take ownership over authentication, and those with a mature approach already in place could simply use our existing generated certificates and immediately solve the problem. (we include all necessary subject alternative names in our short lived self signed developer certs already in play).

@DamianEdwards
Copy link
Member

RE how flexible the cert injection will be, you'll be able to customize the environment variables and bind mount path. If you want to take over completely, you can simply call an API to request the dev cert file path (Pem or Pfx format) and then take full control of how you want to inject it into the container.

RE custom certificates, seems like a nice complimentary feature. For ASP.NET Core apps you can already specify the path to a cert on disk to use (via Kestrel configuration) so that shouldn't be a problem. Making that first class is something I could see us doing.

@richshadman
Copy link

Awesome, thank you for the updates! Looking forward to this seeing this feature released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-telemetry rancher Issues related to rancher
Projects
None yet
Development

No branches or pull requests

10 participants