fix enclave panic reporting so that it always shows the error #3274

cbeck88 · 2023-03-23T03:31:36Z

Most of the time when an enclave panics, particularly during local development, the process dies with a hard abort before slog gets a chance to flush it's buffer.

This commit makes it emit the panic message on STDERR first, using eprintln, which means it will be in an OS buffer before the process ends. We continue to write it as well as to slog, which writes to STDOUT.

I see no reason not to do this, it's a massive quality of life improvement for anyone actually working on enclave-impl's.

Without this patch, 100% of the time i do not get panic messages from the enclave, and just a core dumped (aborted) message.

Most of the time when an enclave panics, particularly during local development, the process dies with a hard abort before slog gets a chance to flush it's buffer. This commit makes it emit the panic message on STDERR first, which means it will be in an OS buffer before the process ends. We continue to write it as well as to slog, which writes to STDOUT. I see no reason not to do this, it's a massive quality of life improvement for anyone actually working on enclave-impl's.

jcape

LGTM, in sgx-std we're just writing the bytes to stderr, because:

There's no expectation the logging system would get involved in a regular panic, so stderr needs to get captured somehow anyways
It doesn't seem like there's any appetite to do anything more sophisticated in production.

cbeck88 · 2023-03-23T04:00:14Z

There's no expectation the logging system would get involved in a regular panic,

I think eran has made it this way for most of our services though:

https://github.com/mobilecoinfoundation/mobilecoin/blob/master/common/src/panic_handler.rs#L18

I am not sure if we capture anything other than the slog output right now in production

cbeck88 · 2023-03-23T04:01:10Z

I think everything in production is calling this today:

mobilecoin/common/src/panic_handler.rs

Line 12 in 3259a5d

pub fn setup_panic_handler() {

jcape · 2023-03-23T04:24:58Z

There's no expectation the logging system would get involved in a regular panic,

I think eran has made it this way for most of our services though:

https://github.com/mobilecoinfoundation/mobilecoin/blob/master/common/src/panic_handler.rs#L18

This is starting to feel like we should be trying to panic in the logging OCALL (i.e. in untrusted). It's not necessary for this PR, but we can bake this in the SGX2/DCAP stuff.

I am not sure if we capture anything other than the slog output right now in production

AFAIK, it's the other way around: we're only capturing stdout/stderr in production. slog is only even involved because the logs it writes go to stdout/stderr. c.f. #2273

cbeck88 · 2023-03-23T04:41:28Z

This is starting to feel like we should be trying to panic in the logging OCALL (i.e. in untrusted). It's not necessary for this PR, but we can bake this in the SGX2/DCAP stuff.

Yeah that's a good call -- propagate the panic from the enclave to the outside.

In principle, we could try to keep the server up and bring back a new copy of the enclave? but I think in reality there's no good reason to attempt that.

cbeck88 · 2023-03-23T05:03:54Z

In unwind mode it might be fucked up i guess, if rust is trying to get a backtrace through the ocall?

jcape · 2023-03-23T05:10:06Z

In unwind mode it might be fucked up i guess, if rust is trying to get a backtrace through the ocall?

The nomicon seems to indicate that it should work OK:

... If an unwinding operation does encounter an ABI boundary that is not permitted to unwind, the behavior depends on the source of the unwinding (Rust panic or a foreign exception):

panic will cause the process to safely abort.

A foreign exception entering Rust will cause undefined behavior.

AFAIK the first rule should apply since we're the one calling panic, and the function is marked extern "C". ~~Alternatively, we could mark the OCALL "C-unwind", and it would unwind up through the generated code.~~ [I don't think that actually will do what we want]

sgx/urts/src/panic.rs

cbeck88 requested review from jcape, eranrund, a team and varsha888 and removed request for a team March 23, 2023 03:31

fmt

0e6b728

jcape assigned cbeck88 Mar 23, 2023

jcape approved these changes Mar 23, 2023

View reviewed changes

cbeck88 requested a review from nick-mobilecoin March 23, 2023 03:58

nick-mobilecoin approved these changes Mar 23, 2023

View reviewed changes

sgx/urts/src/panic.rs Outdated Show resolved Hide resolved

add a code comment

c8ccf98

cbeck88 merged commit 061d4e7 into master Mar 23, 2023

cbeck88 deleted the fix-enclave-panic-reporting branch March 23, 2023 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix enclave panic reporting so that it always shows the error #3274

fix enclave panic reporting so that it always shows the error #3274

cbeck88 commented Mar 23, 2023 •

edited

Loading

jcape left a comment

cbeck88 commented Mar 23, 2023

cbeck88 commented Mar 23, 2023

jcape commented Mar 23, 2023 •

edited

Loading

cbeck88 commented Mar 23, 2023

cbeck88 commented Mar 23, 2023

jcape commented Mar 23, 2023 •

edited

Loading

fix enclave panic reporting so that it always shows the error #3274

fix enclave panic reporting so that it always shows the error #3274

Conversation

cbeck88 commented Mar 23, 2023 • edited Loading

jcape left a comment

Choose a reason for hiding this comment

cbeck88 commented Mar 23, 2023

cbeck88 commented Mar 23, 2023

jcape commented Mar 23, 2023 • edited Loading

cbeck88 commented Mar 23, 2023

cbeck88 commented Mar 23, 2023

jcape commented Mar 23, 2023 • edited Loading

cbeck88 commented Mar 23, 2023 •

edited

Loading

jcape commented Mar 23, 2023 •

edited

Loading

jcape commented Mar 23, 2023 •

edited

Loading