Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neither way can build zig w/o SegFault #21941

Open
lefsha opened this issue Nov 9, 2024 · 19 comments
Open

Neither way can build zig w/o SegFault #21941

lefsha opened this issue Nov 9, 2024 · 19 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@lefsha
Copy link

lefsha commented Nov 9, 2024

Zig Version

latest

Steps to Reproduce and Observed Behavior

using standard emerge zig at Gentoo produces:

/var/tmp/portage/dev-lang/zig-9999/temp/environment: line 3320: 12302 Segmentation fault "${BUILD_DIR}/stage3/bin/zig" env

  • ERROR: dev-lang/zig-9999::gentoo failed (compile phase):
  • Zig compilation failed
  • Call stack:
  • ebuild.sh, line 136:  Called src_compile
    
  • environment, line 3329: Called die
  • The specific snippet of code:
  •   "${BUILD_DIR}/stage3/bin/zig" env || die "Zig compilation failed";
    

with any zig version!

The problem arise at zig stage3, which ends up with generating Segmentation fault

valgrind ./zig

generates:

==30791== Memcheck, a memory error detector
==30791== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==30791== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==30791== Command: ./zig
==30791==
==30791== Jump to the invalid address stated on the next line
==30791== at 0x0: ???
==30791== by 0x12A7B44: ??? (in /var/tmp/portage/dev-lang/zig-9999/work/zig-9999/build/stage3/bin/zig)
==30791== by 0x1FFEFFF027: ???
==30791== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==30791==
==30791==
==30791== Process terminating with default action of signal 11 (SIGSEGV)
==30791== Bad permissions for mapped region at address 0x0
==30791== at 0x0: ???
==30791== by 0x12A7B44: ??? (in /var/tmp/portage/dev-lang/zig-9999/work/zig-9999/build/stage3/bin/zig)
==30791== by 0x1FFEFFF027: ???
==30791==
==30791== HEAP SUMMARY:
==30791== in use at exit: 0 bytes in 0 blocks
==30791== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==30791==
==30791== All heap blocks were freed -- no leaks are possible
==30791==
==30791== For lists of detected and suppressed errors, rerun with: -s
==30791== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault

The same happens if building zig manually by using git repos:
https://github.com/ziglang/zig
or
https://github.com/ziglang/zig-bootstrap

No other similar kind of issues are detected on my Gentoo system despite several thousands packages installed.

reproducible with 2x laptops with Tigerlake and Alderlake CPUs, 64Gb RAM each.

Expected Behavior

No SegFault

@lefsha lefsha added the bug Observed behavior contradicts documented or intended behavior label Nov 9, 2024
@BratishkaErik
Copy link
Contributor

  1. Report this to https://bugs.gentoo.org/ first in the future please, there might be same bugs with more details or even solutions. You didn't upload build.log for example, so how would I, Gentoo side or Zig core team know your CFLAGS etc. :)
  2. Does zig binary from dev-lang/zig-bin (or official site) works without segfaulting for you?
  3. If it does, can you show output of zig env, zig libc and zig build-exe --show-builtin here?

@TCROC
Copy link
Contributor

TCROC commented Nov 9, 2024

After following https://github.com/ziglang/zig-bootstrap and then trying https://github.com/ziglang/zig/wiki/Building-Zig-From-Source#option-b-use-a-pre-built-zig-binary, I can confirm it is sefaulting on my system as well.

Linux Pop OS 22.04

@TCROC
Copy link
Contributor

TCROC commented Nov 9, 2024

zig env

{
 "zig_exe": "/home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/zig",
 "lib_dir": "/home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/lib",
 "std_dir": "/home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/lib/std",
 "global_cache_dir": "/home/tcroc/.cache/zig",
 "version": "0.14.0-dev.1876+41dbd0d0d",
 "target": "x86_64-linux.6.9.3...6.9.3-gnu.2.35",
 "env": {
  "ZIG_GLOBAL_CACHE_DIR": null,
  "ZIG_LOCAL_CACHE_DIR": null,
  "ZIG_LIB_DIR": null,
  "ZIG_LIBC": null,
  "ZIG_BUILD_RUNNER": null,
  "ZIG_VERBOSE_LINK": null,
  "ZIG_VERBOSE_CC": null,
  "ZIG_BTRFS_WORKAROUND": null,
  "ZIG_DEBUG_CMD": null,
  "CC": null,
  "NO_COLOR": null,
  "CLICOLOR_FORCE": null,
  "XDG_CACHE_HOME": null,
  "HOME": "/home/tcroc"
 }
}
zig libc

"$ZIG_PREFIX/zig" libc
/home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/lib/std/Target/Query.zig:377:38: error: no field or member function named 'isAndroid' in '?Target.Abi'
        if (os == .linux and self.abi.isAndroid()) return true;
                             ~~~~~~~~^~~~~~~~~~
referenced by:
    main: /home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/lib/compiler/libc.zig:116:40
    posixCallMainAndExit: /home/tcroc/dev/zig/zig-bootstrap/out/zig-native-linux-gnu-native/lib/std/start.zig:621:37
    4 reference(s) hidden; use '-freference-trace=6' to see all references
zig build-exe --show-builtin
const std = @import("std");
/// Zig version. When writing code that supports multiple versions of Zig, prefer
/// feature detection (i.e. with `@hasDecl` or `@hasField`) over version checks.
pub const zig_version = std.SemanticVersion.parse(zig_version_string) catch unreachable;
pub const zig_version_string = "0.14.0-dev.1876+41dbd0d0d";
pub const zig_backend = std.builtin.CompilerBackend.stage2_llvm;

pub const output_mode = std.builtin.OutputMode.Exe;
pub const link_mode = std.builtin.LinkMode.static;
pub const is_test = false;
pub const single_threaded = false;
pub const abi = std.Target.Abi.gnu;
pub const cpu: std.Target.Cpu = .{
    .arch = .x86_64,
    .model = &std.Target.x86.cpu.znver3,
    .features = std.Target.x86.featureSet(&[_]std.Target.x86.Feature{
        .@"64bit",
        .adx,
        .aes,
        .allow_light_256_bit,
        .avx,
        .avx2,
        .bmi,
        .bmi2,
        .branchfusion,
        .clflushopt,
        .clwb,
        .clzero,
        .cmov,
        .crc32,
        .cx16,
        .cx8,
        .f16c,
        .fast_15bytenop,
        .fast_bextr,
        .fast_imm16,
        .fast_lzcnt,
        .fast_movbe,
        .fast_scalar_fsqrt,
        .fast_scalar_shift_masks,
        .fast_variable_perlane_shuffle,
        .fast_vector_fsqrt,
        .fma,
        .fsgsbase,
        .fsrm,
        .fxsr,
        .idivq_to_divl,
        .invpcid,
        .lzcnt,
        .macrofusion,
        .mmx,
        .movbe,
        .mwaitx,
        .nopl,
        .pclmul,
        .pku,
        .popcnt,
        .prfchw,
        .rdpid,
        .rdpru,
        .rdrnd,
        .rdseed,
        .sahf,
        .sbb_dep_breaking,
        .sha,
        .shstk,
        .slow_shld,
        .sse,
        .sse2,
        .sse3,
        .sse4_1,
        .sse4_2,
        .sse4a,
        .ssse3,
        .vaes,
        .vpclmulqdq,
        .vzeroupper,
        .wbnoinvd,
        .x87,
        .xsave,
        .xsavec,
        .xsaveopt,
        .xsaves,
    }),
};
pub const os = std.Target.Os{
    .tag = .linux,
    .version_range = .{ .linux = .{
        .range = .{
            .min = .{
                .major = 6,
                .minor = 9,
                .patch = 3,
            },
            .max = .{
                .major = 6,
                .minor = 9,
                .patch = 3,
            },
        },
        .glibc = .{
            .major = 2,
            .minor = 35,
            .patch = 0,
        },
    }},
};
pub const target: std.Target = .{
    .cpu = cpu,
    .os = os,
    .abi = abi,
    .ofmt = object_format,
    .dynamic_linker = std.Target.DynamicLinker.init("/lib64/ld-linux-x86-64.so.2"),
};
pub const object_format = std.Target.ObjectFormat.elf;
pub const mode = std.builtin.OptimizeMode.Debug;
pub const link_libc = false;
pub const link_libcpp = false;
pub const have_error_return_tracing = true;
pub const valgrind_support = true;
pub const sanitize_thread = false;
pub const fuzz = false;
pub const position_independent_code = false;
pub const position_independent_executable = false;
pub const strip_debug_info = false;
pub const code_model = std.builtin.CodeModel.default;
pub const omit_frame_pointer = false;

@lefsha
Copy link
Author

lefsha commented Nov 10, 2024

  1. Report this to https://bugs.gentoo.org/ first in the future please, there might be same bugs with more details or even solutions. You didn't upload build.log for example, so how would I, Gentoo side or Zig core team know your CFLAGS etc. :)

How does it related to Gentoo? I see zero relation whatsoever! I have used all possible ways to build zig. All of them
with the same results. And the results are always consistent - equal. "emerge zig" - is just a straightforward way to replicate the issue. It could be, that with a different CPU it works well. This I don't know. Therefore I have mentioned 2 of them I was using.

Please, read the report again. I will not report this bug to Gentoo, because it is not their issue.

How that is related to ANY CFLAGS in the system if Zig stage3 is built by.... ZIG stage2 with what ever ZIG flags used by developers? ZIG is a statically linked binary. No one can blame wrong system modules.

The same issue persist with CFLAGS="" - empty string. It is not related at all. I was trying any CFLAGS and total absence of them. I was using tigerlake, alderlake and generic CPU - no difference.
It is always trying to access NULL pointer at start. The ZIG code is faulty.

I was trying to search for bugs and added sanitize and debug built, but I have failed already at zig-wasm, which I have reported previously. Andrew mentioned it is by design. I am sorry I cannot use buggy language in production which has such primitive design issues and no one feels like being responsible, not even the creator.

I hate C++, but I have literally no other choice.

2. Does zig binary from `dev-lang/zig-bin` (or official site) works without segfaulting for you?

Binary version downloaded from ziglang, which is the same available from dev-lang/zig-bin works fine.
I cannot compile zig even with hand built LLVM/Clang. Both from zig-bootstrap or my local build.

I have used 5x versions of LLVM:

  1. llvm-18 from Gentoo - zig 0.13
  2. llvm-19 from Gentoo - zig 0.14
  3. llvm-18 local built within zig-bootstrap for zig 0.13
  4. llvm-18 my own custom version for zig 0.13
  5. llvm-19 my own custom version for zig 0.14

I did everything and tried everything I ever could to build zig w/o success. The problem is always the same
zig stage 3 is made and it doesn't work trying to access zero address in the memory - nullptr.

This report came after 2-3 months playing with zig. I am done. I cannot do that anymore.

I remember I was able to build old 0.7 or 0.9 version, but not anymore.

3. If it does, can you show output of `zig env`, `zig libc` and `zig build-exe --show-builtin` here?

zig env

{
"zig_exe": "/opt/zig-bin-0.13.0/zig",
"lib_dir": "/opt/zig-bin-0.13.0/lib",
"std_dir": "/opt/zig-bin-0.13.0/lib/std",
"global_cache_dir": "/root/.cache/zig",
"version": "0.13.0",
"target": "x86_64-linux.6.6.58...6.6.58-musl",
"env": {
"ZIG_GLOBAL_CACHE_DIR": null,
"ZIG_LOCAL_CACHE_DIR": null,
"ZIG_LIB_DIR": null,
"ZIG_LIBC": null,
"ZIG_BUILD_RUNNER": null,
"ZIG_VERBOSE_LINK": null,
"ZIG_VERBOSE_CC": null,
"ZIG_BTRFS_WORKAROUND": null,
"ZIG_DEBUG_CMD": null,
"CC": null,
"NO_COLOR": null,
"CLICOLOR_FORCE": null,
"XDG_CACHE_HOME": null,
"HOME": "/root"
}
}

zig build-exe --show-builtin

const std = @import("std");
/// Zig version. When writing code that supports multiple versions of Zig, prefer
/// feature detection (i.e. with @hasDecl or @hasField) over version checks.
pub const zig_version = std.SemanticVersion.parse(zig_version_string) catch unreachable;
pub const zig_version_string = "0.13.0";
pub const zig_backend = std.builtin.CompilerBackend.stage2_llvm;

pub const output_mode = std.builtin.OutputMode.Exe;
pub const link_mode = std.builtin.LinkMode.static;
pub const is_test = false;
pub const single_threaded = false;
pub const abi = std.Target.Abi.musl;
pub const cpu: std.Target.Cpu = .{
.arch = .x86_64,
.model = &std.Target.x86.cpu.x86_64,
.features = std.Target.x86.featureSet(&[_]std.Target.x86.Feature{
.@"64bit",
.adx,
.aes,
.avx,
.avx2,
.avx512bitalg,
.avx512bw,
.avx512cd,
.avx512dq,
.avx512f,
.avx512ifma,
.avx512vbmi,
.avx512vbmi2,
.avx512vl,
.avx512vnni,
.avx512vp2intersect,
.avx512vpopcntdq,
.bmi,
.bmi2,
.clflushopt,
.clwb,
.cmov,
.cx16,
.cx8,
.f16c,
.fma,
.fsgsbase,
.fxsr,
.gfni,
.idivq_to_divl,
.invpcid,
.lzcnt,
.macrofusion,
.mmx,
.movbe,
.movdir64b,
.movdiri,
.nopl,
.pclmul,
.popcnt,
.prfchw,
.rdpid,
.rdrnd,
.rdseed,
.sahf,
.sha,
.shstk,
.slow_3ops_lea,
.slow_incdec,
.sse,
.sse2,
.sse3,
.sse4_1,
.sse4_2,
.ssse3,
.vaes,
.vpclmulqdq,
.vzeroupper,
.x87,
.xsave,
.xsavec,
.xsaveopt,
.xsaves,
}),
};
pub const os = std.Target.Os{
.tag = .linux,
.version_range = .{ .linux = .{
.range = .{
.min = .{
.major = 6,
.minor = 6,
.patch = 58,
},
.max = .{
.major = 6,
.minor = 6,
.patch = 58,
},
},
.glibc = .{
.major = 2,
.minor = 28,
.patch = 0,
},
}},
};
pub const target: std.Target = .{
.cpu = cpu,
.os = os,
.abi = abi,
.ofmt = object_format,
.dynamic_linker = std.Target.DynamicLinker.none,
};
pub const object_format = std.Target.ObjectFormat.elf;
pub const mode = std.builtin.OptimizeMode.Debug;
pub const link_libc = false;
pub const link_libcpp = false;
pub const have_error_return_tracing = true;
pub const valgrind_support = true;
pub const sanitize_thread = false;
pub const position_independent_code = false;
pub const position_independent_executable = false;
pub const strip_debug_info = false;
pub const code_model = std.builtin.CodeModel.default;
pub const omit_frame_pointer = false;

@mocompute
Copy link

@lefsha i don't know if this would help you but after having some trouble building zig from source I eventually succeeded with the use of nix to pull in the build dependencies: https://github.com/mocompute/zig/tree/dev/nix

@BratishkaErik
Copy link
Contributor

BratishkaErik commented Nov 15, 2024

"target": "x86_64-linux.6.6.58...6.6.58-musl",
.glibc = .{
    .major = 2,
    .minor = 28,
    .patch = 0,
},

Can you please also:

  1. show output of ldd /usr/bin/env,
  2. try to compile https://github.com/BratishkaErik/zig-libc-test using official binary and:
    2.1 zig build
    2.2 zig build -Ddynamic-linker=/lib64/ld-linux-x86-64.so.2
    2.3 zig build -Ddynamic-linker=/lib64/ld-linux-x86-64.so.2 -Dtarget=native-native-gnu

@lefsha
Copy link
Author

lefsha commented Nov 16, 2024

.glibc = .{
    .major = 2,
    .minor = 28,
    .patch = 0,
},

That is own ZIG issue. IDK why it shows that information:
Here is what I have:

  • sys-libs/glibc
    Latest version available: 2.40-r5
    Latest version installed: 2.40-r5
    Size of files: 18,399 KiB
    Homepage: https://www.gnu.org/software/libc/
    Description: GNU libc C library
    License: LGPL-2.1+ BSD HPND ISC inner-net rc PCRE
1. show output of `ldd /usr/bin/env`,

ldd /usr/bin/env
not a dynamic executable

qfile /usr/bin/env
sys-apps/coreutils: /usr/bin/env

2. try to compile https://github.com/BratishkaErik/zig-libc-test using official binary and:
   2.1 `zig build`

./zig-libc-test
Segmentation fault

   2.2 `zig build -Ddynamic-linker=/lib64/ld-linux-x86-64.so.2`

./zig-libc-test
printf from C standard library works fine!

   2.3 `zig build -Ddynamic-linker=/lib64/ld-linux-x86-64.so.2 -Dtarget=native-native-gnu`

./zig-libc-test
printf from C standard library works fine!

@BratishkaErik
Copy link
Contributor

ldd /usr/bin/env
not a dynamic executable

qfile /usr/bin/env
sys-apps/coreutils: /usr/bin/env

Is multicall USE-flag enabled in your coreutils? If so, can you show all USE-flags for coreutils, glibc, content of /usr/bin/env and ldd /usr/bin/coreutils?

That is own ZIG issue. IDK why it shows that information:
Here is what I have:

Yes, but this is supposed to be fixed in 0.13 and later versions: #19749 . And it works for me, no matter with or without multicall, so I sadly can't reproduce:

$ zig targets | jq '.native.triple'
"x86_64-linux.6.11.5...6.11.5-gnu.2.39"

$ zig env | jq '.target'                                         
"x86_64-linux.6.11.5...6.11.5-gnu.2.39"

$ zig build-exe --show-builtin
// ...
pub const abi = std.Target.Abi.gnu;
// ...
pub const os = std.Target.Os{
    .tag = .linux,
    .version_range = .{ .linux = .{
        .range = .{
            .min = .{
                .major = 6,
                .minor = 11,
                .patch = 5,
            },
            .max = .{
                .major = 6,
                .minor = 11,
                .patch = 5,
            },
        },
        .glibc = .{
            .major = 2,
            .minor = 39,
            .patch = 0,
        },
        .android = 14,
    }},
};
pub const target: std.Target = .{
    .cpu = cpu,
    .os = os,
    .abi = abi,
    .ofmt = object_format,
    .dynamic_linker = std.Target.DynamicLinker.init("/lib64/ld-linux-x86-64.so.2"),
};

And we have same sys-libs/glibc versions...

@lefsha
Copy link
Author

lefsha commented Nov 17, 2024

Is multicall USE-flag enabled in your coreutils? If so, can you show all USE-flags for coreutils, glibc, content of /usr/bin/env and ldd /usr/bin/coreutils?

[binary R ] sys-apps/coreutils-9.5::gentoo USE="acl (-caps) -gmp -hostname -kill -multicall -nls openssl (-selinux) (split-usr) static (-test) -vanilla -verify-sig (-xattr)" 0 KiB

It is not enabled.

Likely zig-bin has been built against glibc 2.28, that is why it does report this.
If building from sources it should report the actual one. But it fails.

Still that must be a bug. If only one program generates SegFault, than it is guilty. Andrew seems not to care about
memory use, as demonstrated previously. So I cannot use ZIG in production - too risky.

Developers don't care to develop - users don't care to use. That is trivial and simple.

On the same machine I can build RUST w/o any problem and it works using the same LLVM.
Having >1500 gentoo packages I don't have a single issue like that one with Segfaults.

Only once, when I transferred binaries from Tigerlake to Alderlake I had it, but that is only because Intel switched off avx512 at Alderlake despite being released later.

$ zig targets | jq '.native.triple'
"x86_64-linux.6.11.5...6.11.5-gnu.2.39"

"x86_64-linux.6.6.58...6.6.58-gnu.2.39"
why is that actually? What is wrong with zig?
I do have 2.40, but it reports 2.39 or 2.28 like above..
zig is not consistent and not reliable!

$ zig env | jq '.target'
"x86_64-linux.6.11.5...6.11.5-gnu.2.39"
"x86_64-linux.6.6.58...6.6.58-gnu.2.39"

$ zig build-exe --show-builtin
// ...
pub const abi = std.Target.Abi.gnu;
// ...
pub const os = std.Target.Os{
.tag = .linux,
.version_range = .{ .linux = .{
.range = .{
.min = .{
.major = 6,
.minor = 11,
.patch = 5,
},
.max = .{
.major = 6,
.minor = 11,
.patch = 5,
},
},
.glibc = .{
.major = 2,
.minor = 39,
.patch = 0,
},
.android = 14,
}},
};

Now I have the same - 2.39 like above

@lefsha
Copy link
Author

lefsha commented Nov 17, 2024

WOW! I have rebuilt coreutils w/ USE=-static and suddenly it starts working.

No other package on the system had any sensitivity to that. It seems like zig is using coreutils in a special way
and if built statically it does wrong things.

I could not even imagine that this USE flag from another package would have such a influence.

Thanks to BratishkaErik who made me thinking into that direction.

Still I would consider that is ZIG bug.

@BratishkaErik
Copy link
Contributor

Still that must be a bug. If only one program generates SegFault, than it is guilty.

I think it's (missing) dynamic linker that causes segfault here and not program code itself. You could do same with patchelf on a working binary and it would still cause segfault, this is not fault of codegen.

Andrew seems not to care about memory use, as demonstrated previously.

If you meant memory leaks in wasm2c, it does not causes your issue here, freed up by system anyway and insignificant compared to how many RAM stage3 building consumes, which is ~4G on my system and up to ~8G on CI.

why is that actually? What is wrong with zig?
I do have 2.40, but it reports 2.39 or 2.28 like above..

Yes, 2.28 was reported because it's default version when cross-compiling glibc (which is enabled cause it couldn't detect yours), and 2.39 is reported because of specifics of detecting glibc version:

$ readelf --string-dump='.dynstr' /lib64/libc.so.6 | grep --only-matching 'GLIBC_.*'

GLIBC_2.2.5
GLIBC_2.2.6
GLIBC_2.3
GLIBC_2.3.2
GLIBC_2.3.3
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.5
GLIBC_2.6
GLIBC_2.7
GLIBC_2.8
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
GLIBC_2.13
GLIBC_2.14
GLIBC_2.15
GLIBC_2.16
GLIBC_2.17
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23
GLIBC_2.24
GLIBC_2.25
GLIBC_2.26
GLIBC_2.27
GLIBC_2.28
GLIBC_2.29
GLIBC_2.30
GLIBC_2.31
GLIBC_2.32
GLIBC_2.33
GLIBC_2.34
GLIBC_2.35
GLIBC_2.36
GLIBC_2.38
GLIBC_2.39
GLIBC_ABI_DT_RELR
GLIBC_PRIVATE

As you can see, on Glibc 2.40 we have symbols up to 2.39, on 2.39 up to 2.38 etc., that's why detection lags a bit behind. But it's not critical.

WOW! I have rebuilt coreutils w/ USE=-static and suddenly it starts working.

Good it works for you now :)

No other package on the system had any sensitivity to that. It seems like zig is using coreutils in a special way and if built statically it does wrong things.

Yes, because it checks /usr/bin/env or its interpreter (/usr/bin/coreutils, /bin/sh) to find out libc, and if you compile it statically, it would assume that whole system is static -> musl needed.

I could not even imagine that this USE flag from another package would have such a influence.

I can:

  • Hardcode coreutils[-static] in dev-lang/zig and dev-lang/zig-bin packages, but I want to ask more knowledgeable people from Gentoo before that,
  • or do like other distros (I think Guix, Nix and Termux do that) and patch sources to check /usr/bin/python or /bin/bash or smth like this instead, since they should be guaranteed to be installed in dynamic form in Gentoo.

@lefsha
Copy link
Author

lefsha commented Nov 17, 2024

Still that must be a bug. If only one program generates SegFault, than it is guilty.

I think it's (missing) dynamic linker that causes segfault here and not program code itself. You could do same with patchelf on > a working binary and it would still cause segfault, this is not fault of codegen.

we are not talking about ANY binary made by ZIG. We are talking about ZIG stage3 itself!
ZIG stage3 itself is statically linked binary and why it segfaulted while having statically linked coreutils is a miracle for me.

Nothing prevents me to build zig against musl or glibc in the same system. The detection of whether glibc is the core c-lib or musl is unrelated to coreutils built statically. At least no other program or language like RUST has no such an issue.

If you meant memory leaks in wasm2c, it does not causes your issue here, freed up by system anyway and insignificant compared to how many RAM stage3 building consumes, which is ~4G on my system and up to ~8G on CI.

Well, I just turned on all possible debug flags to track down the issue with stage3, but I was not able to get there, because it faulted earlier. So I have reported that.

Yes, because it checks /usr/bin/env or its interpreter (/usr/bin/coreutils, /bin/sh) to find out libc, and if you compile it statically, it would assume that whole system is static -> musl needed.

Yes. And that is a WRONG assumption. One cannot assume system is running on musl because of statically linked binary, which is fully unrelated to glibc. I can build bash also statically linked, so what?

I do personally prefer always statically linked binaries if I can make them. It doesn't mean I am not using glibc unfortunately.
I would get rid of glibc completely, but I cannot, because to many components or binaries depend on it.
Musl is not fully compatible with glibc, unfortunately.

* Hardcode `coreutils[-static]` in `dev-lang/zig` and `dev-lang/zig-bin` packages, but I want to ask more knowledgeable people from Gentoo before that,

Yes, you can and should add this dependency for dev-lang/zig package, but not for dev-lang/zig-bin - it is unaffected.
The binary which works already will continue to work.

* or do like other distros (I think Guix, Nix and Termux do that) and patch sources to check `/usr/bin/python` or `/bin/bash` or smth like this instead, since they should be guaranteed to be installed in dynamic form in Gentoo.

Not really. Bash can be statically linked - no problem. Why should anyone check if a program is statically linked or not?
Why not searching for either musl or glibc directly?

I do have both at my system... so what?

@BratishkaErik
Copy link
Contributor

BratishkaErik commented Nov 17, 2024

we are not talking about ANY binary made by ZIG. We are talking about ZIG stage3 itself!
Yes, you can and should add this dependency for dev-lang/zig package, but not for dev-lang/zig-bin - it is unaffected.
The binary which works already will continue to work.

zig-bin is affected at runtime, as you checked by yourself — when compiling https://github.com/BratishkaErik/zig-libc-test, there is a segmentation fault because dynamic interpeter and libc are added wrongly. So yes, it applies to every binary that is compiled with Zig and linked to libc, including LLVM-enabled Zig compiler itself.

ZIG stage3 itself is statically linked binary and why it segfaulted while having statically linked coreutils is a miracle for me.

It is not a statically linked binary in Gentoo ebuild, it's dynamically linked binary that uses at least libllvm and libclang, and C++ and C standard library. There is no static LLVM in gentoo repository.

Yes. And that is a WRONG assumption. One cannot assume system is running on musl because of statically linked binary, which is fully unrelated to glibc. I can build bash also statically linked, so what?
I do have both at my system... so what?

In this case, I can suggest you to try to set target exactly, not letting Zig detect it wrongly, by using dev-lang/zig from my repository https://github.com/gentoo-mirror/bratishkaerik-overlay. It is a temporary copy of https://www.github.com/gentoo/gentoo/pull/37283 for testing until that PR is merged. You just need to set ZIG_TARGET same way you set CFLAGS or CHOST, in make.conf

ZIG_TARGET="native-native-gnu.2.39"
# or musl if you want it and LLVM is using musl

Then emerge -a1 -j1 dev-lang/zig::bratishkaerik-overlay and it should work even with static coreutils (for regular usage you would need to pass it like this: zig build -Dtarget=native-native-gnu.2.39)

BTW, are you the "Alexey Ivanov" user who was writing in https://t.me/ziglang_en on August with similar problem?

@TCROC
Copy link
Contributor

TCROC commented Nov 28, 2024

Are there any updates on this issue? I'd love to test my changes to add thin lto but I can't build due to segfaults :/

@BratishkaErik
Copy link
Contributor

Are there any updates on this issue? I'd love to test my changes to add thin lto but I can't build due to segfaults :/

I think your problem is different, because it detects libc and other information correctly. What kind of segfault do you have? If its smth like "unsupported instruction" it can be a wrong detected CPU (it shows znver3 for you currently), in this case you can try to compile using older znver2 or maybe some generic variant x86_64_v3.

@TCROC
Copy link
Contributor

TCROC commented Nov 29, 2024

Are there any updates on this issue? I'd love to test my changes to add thin lto but I can't build due to segfaults :/

I think your problem is different, because it detects libc and other information correctly. What kind of segfault do you have? If its smth like "unsupported instruction" it can be a wrong detected CPU (it shows znver3 for you currently), in this case you can try to compile using older znver2 or maybe some generic variant x86_64_v3.

How do I tell it to do this? Looking at the build instructions here: https://github.com/ziglang/zig-bootstrap?tab=readme-ov-file#build-instructions

Am I supposed to pass x86_64_v3 instead of baseline?

@TCROC
Copy link
Contributor

TCROC commented Nov 29, 2024

I tried building with this command: CMAKE_GENERATOR=Ninja ./build x86_64-linux-gnu x86_64_v3. It is progressing so I guess we will see what happens :)

@TCROC
Copy link
Contributor

TCROC commented Nov 29, 2024

@BratishkaErik That got me past the segualts! :) Thank you! :)

@fogti
Copy link
Contributor

fogti commented Dec 4, 2024

If one would really want to check some binary for being statically linked, shouldn't one instead use one actually being packaged with glibc?, e.g.:

% qlist glibc | grep /usr/bin
/usr/bin/getconf
/usr/bin/ld.so (*)
/usr/bin/pldd
/usr/bin/sprof
/usr/bin/sotruss (*)
/usr/bin/ldd (*)
/usr/bin/makedb
/usr/bin/getent
/usr/bin/xtrace (*)
/usr/bin/pcprofiledump
/usr/bin/mtrace (*)
/usr/bin/gencat
/usr/bin/localedef
/usr/bin/locale
/usr/bin/iconv

(those marked with (*) aren't recognized as shared linked by ldd, so wouldn't work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

5 participants