-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace StaticStringMap with far more optimized version #21498
base: master
Are you sure you want to change the base?
Conversation
be28052
to
4de75ff
Compare
impressive results! can you post the benchmark(s) source code and also expose a version that lets |
08a1339
to
6ce870d
Compare
Why remove the initComptime() and init() API? The idea behind #19719 was to move kvs_list from a type param to a init() param in order to avoid very long type names which result from them being a type param. EDIT: I like the simplicity of this implementation and the results seem to speak for themselves 👍 . FWIW, I don't care about runtime init(). But I think keeping the StaticStringMap name and moving kvs_list to an initComptime() param might be preferrable. |
Thank you for explaining the original reasoning! It's well founded. - I'll work on that and see how soon I can get it working. What are your thoughts on replacing |
034277d
to
56b7655
Compare
e24762b
to
67e31c5
Compare
Seems like the x86_64 backend hasn't implemented gte vector comparison yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive results.
Please document the breaking changes in the PR writeup (pretend you are writing release notes).
Please also explain why you deleted a lot of test coverage, otherwise it's difficult to be sure these changes don't break existing use cases.
Sounds good - let me know if you would like further changes. |
c5b9e07
to
d60e6d1
Compare
c90a8b0
to
d93419d
Compare
I updated the simple benchmarking code: |
I finally learned how to use git, and resolved that pesky rebase conflict, squashing my changes into one commit. If this issue warrants some release notes, the commit message is a good resource. Please let me know if there is something different that should be done for this PR. |
3d6e6ef
to
0143320
Compare
I'm pretty sure the failures are unrelated, but I am curious about the aarch64-linux-release failure. Looks like it is incremental related. Is this test nondeterministic by any chance? |
My working theory for our inconsistent CI failures is that some system condition when under load causes processes to frequently crash. That aligns with what we see here -- note that on the failed incremental test, we also get a It's possible there is an incremental test case which is sometimes crashing the compiler, but I'd have to see the crash on a I'll re-run the failed jobs when the in-progress one finishes. EDIT: oh, that incremental test totally is actually flaky, my bad -- see this run. Will fix now! |
0143320
to
86bf50c
Compare
lib/std/static_array_map.zig
Outdated
fn ignoreCaseEql(comptime len: usize, comptime expected: [len]u8, actual: [len]u8) bool { | ||
const lower_expected = comptime toLowerSimd(len, expected); | ||
|
||
// TODO: x86_64 self hosted backend hasn't implemented genBinOp for cmp_gte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the x86 backend has been improved a ton since I made this PR, should I yeet this workaround code and see what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer to this question is always "yes" 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit changed one (possibly insignificant) part of the My motivation for this is that the length can be easily grabbed from one of the other arguments, and in refactoring a bit of the code I noticed I had accidentally removed the need to do anything with the length at all. In any case; I don't see anything else I should do to this branch, so unless there are other changes that are proposed this is ready to merge. |
Fortunately, the design of StaticArrayMap here reduces the number of comparisons that need to be made when checking for duplicate keys, which I just added :) Ayo - I thought x86_64-windows debug took more than 23 minutes? This is a pretty nice speedup. |
|
There is a limitation to both the old approach, and the new approach. Custom equal functions cannot say that two different-length arrays/strings are equal, because in both implementations the length is compared before they even reach the equal function. There are a few options we can take:
The third option could be done in two ways: we could say "make a function that takes one comptime-known element and one runtime-known element, and it decides how to compare them". This has the distinct advantage that the interface is further simplified, but cripples the current advantage of the defaultEql function. The second way to accept the third option is to keep the current interface, changing nothing. Personally I would prefer the first option, as I like simplicity. In my opinion, we don't need StaticArrayMap, but a comptime-optimized mapping between strings (normal or case-insensitive) is more than enough. Thoughts? I could make this into an issue if it's desired, but I figure it would be smoother to make such a change now rather than later. |
I don't know how aarch64-macos-debug is failing. It looks like a MachO error, though my code shouldn't have touched any of that. I believe that the x86_64-windows-debug error is coincidental, something about build system file permissions? |
I agree. |
Please rebase against latest master branch and force push. When I do that with github ui I receive this:
|
398e89f
to
5f3ba85
Compare
I'll make the necessary changes - this should simplify things. :) |
5f3ba85
to
4aefeb8
Compare
StaticArrayMap and StaticArrayMapWithEql are more alternatives to StaticStringMap and StaticStringMapWithEql, the difference being that they can have an array of any type of uniquely represented data as the key, rather than just an array of u8. StaticStringMapIgnoreCase has been added to provide a case-insensitive definition of StaticArrayMap. While you *can* use StaticArrayMapWithEql, StaticStringMapIgnoreCase is a common usecase of StaticStringMap while also being optimized with SIMD. The construction of StaticStringMap (provided by StaticArrayMap) has changed to be comptime-only. This change allows us to greatly improve the generated machine code through sorting the keys by length and comparing multiple bytes at the same time. The breaking changes include: - StaticStringMapWithEql interface has changed - specifically, take a look at the function interface of the eql function, which now takes a comptime-time known length, a comptime-time known array of an expected key, a runtime-known array of the recieved key, and returns a boolean indicating whether the two classify as matching. - The getLongestPrefix and getLongestPrefixIndex functions have been removed. They did not see much use. As a result of these changes, performance is greatly improved. If you have been using a runtime initialization of a StaticStringMap, consider using comptime initialization or std.StaticStringMap. Many test cases depended on getLongestPrefix, getLongestPrefixIndex, or runtime initialization. Those test cases have been removed, as they are no longer testing existing code.
…the extraneous equal function parameter on array length. Add test for a static mapping between arrays that aren't bitpacked.
…the expected key is a starting subset of the input key. Add a test. Implement ziglang#20055 - use the fact we have already grouped the keys by length to reduce the number of comparisons, and use the eql function to compile-error on redundant entries, determined by whether the eql function believes they are identical. Remove tests which now compile-error.
…ly) unnecessary comptime. Fixed StaticStringMapWithEql and added a test.
… too complicated, or was going to have limitations that made it equivalent to a runtime staticStringMap) - remove staticArrayMap (and withEql) - YAGNI - keep only StaticStringMap and StaticStringMapIgnoreCase
4aefeb8
to
7f43808
Compare
Running the benchmarks again does not net me the same perf increase that I had originally seen. I have done some investigation and found that my old laptop was at fault. Don't worry, the new version is still more than double as performant. However, the old version of StaticStringMap is more optimized on my new laptop (2 core i7 versus 24 core i9 14th gen), thus instead of about a 66% reduction, it's more of a 56% reduction in wall clock time. |
const too_many_bits = child_bits > 65535; | ||
|
||
// TODO: riscv64 backend can't airBitCast [7]u8 to u56 | ||
const limited_backend = @import("builtin").zig_backend == .stage2_riscv64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My code has somehow triggered a MachO linker bug -
aarch64-macos-none MachO linker bug?
[19/20] Install the project...
-- Install configuration: "Debug"
+ stage3-debug/bin/zig build test docs --zig-lib-dir /Users/ci/zach1/_work/zig/zig/build-debug/../lib -Denable-macos-sdk -Dstatic-llvm -Dskip-non-native --search-prefix /Users/ci/zig+llvm+lld+clang-aarch64-macos-none-0.14.0-dev.1622+2ac543388
test
+- zig build-exe zig Debug native failure
error: thread 53377973 panic: integer cast truncated bits
/Users/ci/zach1/_work/zig/zig/src/link/MachO/UnwindInfo.zig:71:43: 0x10b6db177 in generate (zig)
rec.enc.setDwarfSectionOffset(@intCast(fde.out_offset));
^
/Users/ci/zach1/_work/zig/zig/src/link/MachO.zig:2074:34: 0x10b6f09ef in generateUnwindInfo (zig)
self.unwind_info.generate(self) catch |err| switch (err) {
^
/Users/ci/zach1/_work/zig/zig/src/link/MachO.zig:535:32: 0x10b41ab27 in flushModule (zig)
try self.generateUnwindInfo();
^
/Users/ci/zach1/_work/zig/zig/src/link/MachO.zig:348:25: 0x10b25c0ab in flush (zig)
try self.flushModule(arena, tid, prog_node);
^
/Users/ci/zach1/_work/zig/zig/src/link.zig:833:77: 0x10b0f8e37 in flush (zig)
return @as(*tag.Type(), @fieldParentPtr("base", base)).flush(arena, tid, prog_node);
^
/Users/ci/zach1/_work/zig/zig/src/Compilation.zig:2432:17: 0x10b0f86b3 in flush (zig)
lf.flush(arena, tid, prog_node) catch |err| switch (err) {
^
/Users/ci/zach1/_work/zig/zig/src/Compilation.zig:2369:22: 0x10b0fc35b in update (zig)
try flush(comp, arena, .{
^
/Users/ci/zach1/_work/zig/zig/src/main.zig:4209:32: 0x10b16e417 in serve (zig)
try comp.update(main_progress_node);
^
/Users/ci/zach1/_work/zig/zig/src/main.zig:3649:22: 0x10b18f2f7 in buildOutputType (zig)
try serve(
^
/Users/ci/zach1/_work/zig/zig/src/main.zig:271:31: 0x10b077383 in mainArgs (zig)
return buildOutputType(gpa, arena, args, .{ .build = .Exe });
^
/Users/ci/zach1/_work/zig/zig/src/main.zig:212:20: 0x10b074d17 in main (zig)
return mainArgs(gpa, arena, args);
^
/Users/ci/zach1/_work/zig/zig/lib/std/start.zig:656:37: 0x10b074a2f in main (zig)
const result = root.main() catch |err| {
^
???:?:?: 0x19d390273 in ??? (???)
???:?:?: 0x0 in ??? (???)
error: the following command terminated unexpectedly:
/Users/ci/zach1/_work/zig/zig/build-debug/stage3-debug/bin/zig build-exe --stack 48234496 -cflags -std=c++17 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D_GNU_SOURCE -fno-exceptions -fno-rtti -fno-stack-protector -fvisibility-inlines-hidden -Wno-type-limits -Wno-missing-braces -Wno-comment -DNDEBUG=1 -- /Users/ci/zach1/_work/zig/zig/src/zig_llvm.cpp /Users/ci/zach1/_work/zig/zig/src/zig_clang.cpp /Users/ci/zach1/_work/zig/zig/src/zig_llvm-ar.cpp /Users/ci/zach1/_work/zig/zig/src/zig_clang_driver.cpp /Users/ci/zach1/_work/zig/zig/src/zig_clang_cc1_main.cpp /Users/ci/zach1/_work/zig/zig/src/zig_clang_cc1as_main.cpp -lclangFrontendTool -lclangCodeGen -lclangFrontend -lclangDriver -lclangSerialization -lclangSema -lclangStaticAnalyzerFrontend -lclangStaticAnalyzerCheckers -lclangStaticAnalyzerCore -lclangAnalysis -lclangASTMatchers -lclangAST -lclangParse -lclangAPINotes -lclangBasic -lclangEdit -lclangLex -lclangARCMigrate -lclangRewriteFrontend -lclangRewrite -lclangCrossTU -lclangIndex -lclangToolingCore -lclangExtractAPI -lclangSupport -lclangInstallAPI -llldMinGW -llldELF -llldCOFF -llldWasm -llldMachO -llldCommon -lLLVMWindowsManifest -lLLVMXRay -lLLVMLibDriver -lLLVMDlltoolDriver -lLLVMTextAPIBinaryReader -lLLVMCoverage -lLLVMLineEditor -lLLVMSandboxIR -lLLVMXCoreDisassembler -lLLVMXCoreCodeGen -lLLVMXCoreDesc -lLLVMXCoreInfo -lLLVMX86TargetMCA -lLLVMX86Disassembler -lLLVMX86AsmParser -lLLVMX86CodeGen -lLLVMX86Desc -lLLVMX86Info -lLLVMWebAssemblyDisassembler -lLLVMWebAssemblyAsmParser -lLLVMWebAssemblyCodeGen -lLLVMWebAssemblyUtils -lLLVMWebAssemblyDesc -lLLVMWebAssemblyInfo -lLLVMVEDisassembler -lLLVMVEAsmParser -lLLVMVECodeGen -lLLVMVEDesc -lLLVMVEInfo -lLLVMSystemZDisassembler -lLLVMSystemZAsmParser -lLLVMSystemZCodeGen -lLLVMSystemZDesc -lLLVMSystemZInfo -lLLVMSparcDisassembler -lLLVMSparcAsmParser -lLLVMSparcCodeGen -lLLVMSparcDesc -lLLVMSparcInfo -lLLVMRISCVTargetMCA -lLLVMRISCVDisassembler -lLLVMRISCVAsmParser -lLLVMRISCVCodeGen -lLLVMRISCVDesc -lLLVMRISCVInfo -lLLVMPowerPCDisassembler -lLLVMPowerPCAsmParser -lLLVMPowerPCCodeGen -lLLVMPowerPCDesc -lLLVMPowerPCInfo -lLLVMNVPTXCodeGen -lLLVMNVPTXDesc -lLLVMNVPTXInfo -lLLVMMSP430Disassembler -lLLVMMSP430AsmParser -lLLVMMSP430CodeGen -lLLVMMSP430Desc -lLLVMMSP430Info -lLLVMMipsDisassembler -lLLVMMipsAsmParser -lLLVMMipsCodeGen -lLLVMMipsDesc -lLLVMMipsInfo -lLLVMLoongArchDisassembler -lLLVMLoongArchAsmParser -lLLVMLoongArchCodeGen -lLLVMLoongArchDesc -lLLVMLoongArchInfo -lLLVMLanaiDisassembler -lLLVMLanaiCodeGen -lLLVMLanaiAsmParser -lLLVMLanaiDesc -lLLVMLanaiInfo -lLLVMHexagonDisassembler -lLLVMHexagonCodeGen -lLLVMHexagonAsmParser -lLLVMHexagonDesc -lLLVMHexagonInfo -lLLVMBPFDisassembler -lLLVMBPFAsmParser -lLLVMBPFCodeGen -lLLVMBPFDesc -lLLVMBPFInfo -lLLVMAVRDisassembler -lLLVMAVRAsmParser -lLLVMAVRCodeGen -lLLVMAVRDesc -lLLVMAVRInfo -lLLVMARMDisassembler -lLLVMARMAsmParser -lLLVMARMCodeGen -lLLVMARMDesc -lLLVMARMUtils -lLLVMARMInfo -lLLVMAMDGPUTargetMCA -lLLVMAMDGPUDisassembler -lLLVMAMDGPUAsmParser -lLLVMAMDGPUCodeGen -lLLVMAMDGPUDesc -lLLVMAMDGPUUtils -lLLVMAMDGPUInfo -lLLVMAArch64Disassembler -lLLVMAArch64AsmParser -lLLVMAArch64CodeGen -lLLVMAArch64Desc -lLLVMAArch64Utils -lLLVMAArch64Info -lLLVMOrcDebugging -lLLVMOrcJIT -lLLVMWindowsDriver -lLLVMMCJIT -lLLVMJITLink -lLLVMInterpreter -lLLVMExecutionEngine -lLLVMRuntimeDyld -lLLVMOrcTargetProcess -lLLVMOrcShared -lLLVMDWP -lLLVMDebugInfoLogicalView -lLLVMDebugInfoGSYM -lLLVMOption -lLLVMObjectYAML -lLLVMObjCopy -lLLVMMCA -lLLVMMCDisassembler -lLLVMLTO -lLLVMPasses -lLLVMHipStdPar -lLLVMCFGuard -lLLVMCoroutines -lLLVMipo -lLLVMVectorize -lLLVMLinker -lLLVMInstrumentation -lLLVMFrontendOpenMP -lLLVMFrontendOffloading -lLLVMFrontendOpenACC -lLLVMFrontendHLSL -lLLVMFrontendDriver -lLLVMExtensions -lLLVMDWARFLinkerParallel -lLLVMDWARFLinkerClassic -lLLVMDWARFLinker -lLLVMCodeGenData -lLLVMGlobalISel -lLLVMMIRParser -lLLVMAsmPrinter -lLLVMSelectionDAG -lLLVMCodeGen -lLLVMTarget -lLLVMObjCARCOpts -lLLVMCodeGenTypes -lLLVMIRPrinter -lLLVMInterfaceStub -lLLVMFileCheck -lLLVMFuzzMutate -lLLVMScalarOpts -lLLVMInstCombine -lLLVMAggressiveInstCombine -lLLVMTransformUtils -lLLVMBitWriter -lLLVMAnalysis -lLLVMProfileData -lLLVMSymbolize -lLLVMDebugInfoBTF -lLLVMDebugInfoPDB -lLLVMDebugInfoMSF -lLLVMDebugInfoDWARF -lLLVMObject -lLLVMTextAPI -lLLVMMCParser -lLLVMIRReader -lLLVMAsmParser -lLLVMMC -lLLVMDebugInfoCodeView -lLLVMBitReader -lLLVMFuzzerCLI -lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMTargetParser -lLLVMSupport -lLLVMDemangle -lz -lzstd -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=/Users/ci/zach1/_work/zig/zig/src/main.zig -Maro=/Users/ci/zach1/_work/zig/zig/lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=/Users/ci/zach1/_work/zig/zig/lib/compiler/aro_translate_c.zig -Mbuild_options=/Users/ci/zach1/_work/zig/zig/zig-local-cache/c/c8629b17c14c8b08115144a888890be1/options.zig -lc++ -lc --cache-dir /Users/ci/zach1/_work/zig/zig/zig-local-cache --global-cache-dir /Users/ci/zach1/_work/zig/zig/zig-global-cache --name zig -L /Users/ci/zig+llvm+lld+clang-aarch64-macos-none-0.14.0-dev.1622+2ac543388/lib -I /Users/ci/zig+llvm+lld+clang-aarch64-macos-none-0.14.0-dev.1622+2ac543388/include --zig-lib-dir /Users/ci/zach1/_work/zig/zig/build-debug/../lib/ --listen=-
Build Summary: 6018/6306 steps succeeded; 286 skipped; 1 failed; 38614/39925 tests passed; 1311 skipped
My best guess is that MachO isn't correctly handling the big integers in the defaultEql
function, but I have no way of testing my theory other than making the CI run once again. (I don't have MacOS unfortunately). If this were the case, I could simply change this line of code to also include the aarch64-macos target.
What is the policy on testing changes like this? Should I make the CI run once more on my educated guess, or should I instead wait for/ask someone else to test it on their hardware?
Optimized StaticStringMap
StaticStringMap
is a type that allows one to map strings to values. It is typically used in parsing files, handling command-line arguments, and can be used as a general "switch" over strings.The main driving point of this PR is speed. Because we now know the strings that are in the map at compile-time, the compiler will now generate more efficient code to compare the strings. A basic string comparison works somewhat like this:
This can be improved. Similar to how the old
StaticStringMap
worked, we actually branch on the length of the string earlier to determine the other strings we need to compare it to! This greatly speeds up both implementations. Because we now know the length and contents of each of the "key" strings at compile-time, we compare many bytes at a time, which the compiler can optimize much better than before.Due to these changes, performance for
data:image/s3,"s3://crabby-images/2a6da/2a6da049b37d7aea23dee9a93ab636ae3aed8705" alt="image"
get
andhas
operations has more than doubled:Of course, this PR comes with small cost. Any code that relied on
getLongestPrefix
,getLongestPrefixIndex
, the runtimeinit()
, or theStaticStringMapWithEql
type of the previous implementation is now broken. If your key/value pairs were comptime-known before, then you can simply switch to theinitComptime()
initialization for this new implementation. In case your key/value pairs are runtime-known, you can use the fabulousstd.StringHashMap
andstd.StringHashMapUnmanaged
types instead for the same purpose.Through research, it was determined that
StaticStringMapWithEql
was all but used for case-insensitive string matching. This is whyStaticStringMapIgnoreCase
has been introduced. It has all the advantages ofStaticStringMap
, converts the comptime-known keys to lowercase at comptime, removing the runtime overhead with previous implementations. The lower-casing function it uses is also vectorized.This PR additionally adds redundancy checking of input strings for both
StaticStringMap
andStaticStringMapIgnoreCase
- so if you accidentally put two equivalent strings in a map, don't worry - the compiler will let you know.In summary:
std.StaticStringMapIgnoreCase
has been added - case insensitive string matchingstd.StaticStringMapWithEql
has been removed (common use case satisfied byStaticStringMapIgnoreCase
)StaticStringMap
(.init()
) has been removedgetLongestPrefix
andgetLongestPrefixIndex
functions have been removed.Test cases were modified or removed if they were testing the removed runtime-known key/value initialization,
getLongestPrefix
,getLongestPrefixIndex
, or if they compile-error due to testing redundant keys.Closes #20055