-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile, runtime: investigate Windows stack overflows calling into system C libraries #20975
Comments
Stack sizes isn't a knob programmers should think about in Go. That's why Go stacks dynamically split or adjust in size as needed. When making a call to code not managed by Go's dynamic stack size checks, Go should be switching to a conservatively-sized stack. Maybe there's a bug on how that works on Windows. |
For what it's worth, here's a callstack of the crash. It's on main thread:
Go code doing the windows message pump https://github.com/lxn/walk/blob/master/form.go#L336 |
For easy reproduction, I've created https://github.com/kjk/go20975 |
The problem isn't in Go code.
Based on my reading of the code, the stack should be set to 2MB in cgo binaries. It's only 128KB in pure Go binaries. Could you double check what the PE header says? |
The thing is, this code doesn't use cgo. All calls to win libraries are via syscall. |
The discussion in lxn/walk#261 claims that adding a dummy |
Scratch that, it probably would have worked if I had
|
I had quick look at how we set reserved stack size. For Cgo program, it is 1M for 386 and 2M for amd64.
in runtime/cgo/gcc_windows_386.c and runtime/cgo/gcc_windows_amd64.c, and _beginthread defaults to what first thread does. For non-Cgo program, it is 128K.
So Cgo programs run with standard (by Windows standards) stacks. But non-Cgo programs run with small stacks. CL 2237 doubled the stack size (from 64K to 128K) already. @kjk can you, please, try make it 256K for non-Cgo - see if it fixes your problem. Maybe we can increase stack size again. I don't think it matters on amd64, and we have less and less of 386 computers. Thank you Alex |
I tried 256kb, 512kb, 1mb, 2mb. My websites stops crashing at 512mb. Other websites need more. My vote would be for 2MB on win/64bit. |
I was hoping it would work with 256K.
What about 386? Should we make stacks different on 386 and amd64? Alex |
For 386 the standard seems to be 1 MB. |
This will only allow Go program to start about 1000 threads on 386. See my comment https://go-review.googlesource.com/#/c/2237/2//COMMIT_MSG@18 Is that acceptable? Alex |
1000 real OS threads on 32 bit windows should be more than enough for anybody. Any program with that many threads would trash the CPU anyway - there's really no point launching more than ~NUMCPU os threads. It's really mostly about main GUI thread, which is the thread pumping the message loop which ends up calling wndproc of windows and the code called from there. That's the main thread that is created outside of control of Go runtime and uses the stack size from PE header. I imagine other threads created by the Go runtime can have a stack size independent of PE value (given to CreateThread) and there's argument that those threads are not expected to have such heavy nesting as main GUI thread so could get away with less stack. |
It is plenty for me. But other people might think differently.
I really do not know. I will let @aclements decide.
GUI does not have to run on the main thread. You can even run multiple GUI threads.
I think these different thread sizes for different environments are already complicated enough. I do not want to make things even more complicated. Alex |
Can you explain how a "system call" wound up deep in the MSHTML library? (Every time I think I'm starting to get a feel for Windows there's something even more mystifying...) Skimming over lxn, it looks like the Go syscall package has provided enough rope to actually call into arbitrary Windows libraries without going through cgo, but I'm still not sure how that wound up in MSHTML... Maybe a call to
I wish this were true. :) There's (basically) no point in launching more than NUMCPU compute threads, but threads are also the unit of blocking for many operations, including many system calls and cgo calls. We generally do a good job of mapping blocking IO on to asynchronous system calls, but other things consume extra threads. So it's really NUMCPU + the number of things that can be blocked or in cgo simultaneously.
Unless you're calling |
Please see the callstack in #20975 (comment) The structure of Window GUI app is:
Message pump is:
DispatchMessage is a syscall:
This triggers arbitrarily deep processing inside windows code. In this case mshtml control was told to display a website, so, after downloading html, it started to parse and render it and that happened on main GUI thread, as part of processing a message that was triggered with |
LoadLibrary loads DLL from a file. Alex |
@kjk, thanks for the explanation. It's interesting that message loops haven't changed since I last wrote Windows GUI code 15+ years ago. :) I take it some other part of your application set up the MSHTML control, again without using any cgo?
Right. My point is that we're now well outside the realm of what's really considered a "system call". I realize the distinction is blurry on Windows, but I assert that all of the "system calls" exposed by the std syscall package on Windows use very little stack and then enter the kernel (or use clever tricks to stack in user space, but still use very little stack). The stack size is set based on this assumption. The lxn package may be using That suggests that we need to either always use 2MB stacks on Windows, or we need to do something cleverer to detect if you're escaping into the deep C world, such as observing /cc @randall77 |
I note that the linker already changes its behavior based on whether |
Correct. The MSHTML control is just one of many used to build GUI windows. So you create a window, then you add controls to it (buttons, edit controls, list and others) - MSHTML control is just another control. You do all that by calling Windows APIs. I can give you some links, if you are interested, but these are just functions that are loaded from DLLs. There are some some complications, like for example, all GUI code must run on single thread (syscall.LockThread) and it uses callbacks (syscall.NewCallback), but Go program is no different from others that use theses DLLs.
On Windows every thread (including first) starts with standard Windows stack. All Windows syscalls switch to that stack before calling DLL functions. These DLL functions are called in user space. Some of them switch to kernel space, but some (for example if you build C DLL yourself) do not switch to kernel space.
I do not see what is violated.
We can do that. @kjk program stops crashing at 512kb, so we can go with that instead. Alex |
I don't think 512 kb is enough. It's where the simplest program stopped crashing on the simplest website. I haven't done much testing beyond that but it stands to reason other websites (or other programs) might require more than 512 kb. I think the value should be what the standard for windows linker is i.e. 1 MB (https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx), which seems to be the same for 32bit and 64bit (https://blogs.technet.microsoft.com/markrussinovich/2009/07/05/pushing-the-limits-of-windows-processes-and-threads/). Less than that and Go code will run into this crash more often than C code. And I would still like the linker option to bump it to larger value if I determine that my particular application needs that, especially on 64bit where larger reserved stack has essentially no penalty because address space is aplenty. I get the desire that make it "just work" but you can't in this particular case. |
Please, see if you can break Go with 512kb stack. Thank you. Alex |
There is no doubt that I can. I'll just allocate more than 512 kb on the stack. I broke 256 kb on literally one of the first real programs I wrote that used mshtml. There's infinite number of websites and infinite number of programs that use other potentially stack hungry libraries. If I'm intentional about it then I'll break any limit. If I'm not intentional and just writing random windows programs then I have infinite search space. The question is: what is a reasonably safe value. And the answer is: what everyone else is using i.e. 1 MB. Those are the constraints under which Window OS developers and other WIndows developers operate. If they write code that uses more than 1 MB of stack, their programs will crash, so they have incentive to stay under that limit. The lower the limit the higher probability that Go programs will crash because they'll hit the limit. I'll also note that there might more bad going on than just stack size limit. When I use the repro program I linked to and visit https://vox.com, it crashes even if I set very large limit, like 2 MB. Sometimes it prints "fatal: more gc" sometimes it doesn't and it crashes it a very bad way (not an orderly panic that prints a crashing callstack but a vanishing act where process is wiped out without a trace). Unfortunately delve doesn't debug this kind of code at all and windbg doesn't understand Go symbols and I know nothing about debugging runtime so that's as deep as I can go on this. But I do have a consistent repro of a bad crash. |
@ianlancetaylor, I believe (
@alexbrainman, I think we're talking past each other, since the violation I'm talking about is what this whole issue is about. The runtime assumes anything called a "syscall" will not consume more than 128kB of user-space stack. I believe that is true of everything exported by the
@kjk, I think this is an excellent reason to raise the size to the default when there's any possibility of a call to regular Windows code (which we clearly need to be more conservative about). Maybe we should just always do this on 64-bit. But it also suggests that a linker flag to set it manually may be overkill? Unless it's common to build Windows applications with larger stacks?
What if you try setting it to something significantly larger? Say, 32MB? |
@ianlancetaylor Ah, so your point was that setting the stack size via In any case, the stack size should be sufficiently large even in the non-CGO case in the Go 1.9 RCs (it's at least as big as the stacks for C/C++ binaries that use IE via COM/OLE and don't crash), but since the reproducer seems to crash with the Go 1.9 RCs, it definitely seems like there's something beyond stack size at play here. |
@kjk You're right about the behavior of But again, in any case, Go 1.9 RCs should be giving every thread loads of stack, and something is still clearly broken. |
After stepping through the code in the debugger, I don't think it's related to stack size. The issue is that we have:
We are in:
The source of the problem, it seems, is that Best I can tell, this is called from exitsyscallfast:
It seems like systemstack is supposed to ensure that the function called is not on scheduler stack but sometimes it fails. Then the called function has unconditional call to morecallstack which triggers call to Unfortunately systemstack purposefully cuts off callstack, so I can't tell which function calls it but this is correlated with mshtml showing a modal dialog, so it most likely has to do with nested Go => C => Go transitions. |
Narrowing it down further, it seems like it's always a result of calling back from C to Go:
I captured it by using
The last callstack is the one which triggered |
This seems to be limited to amd64. I spent a while trying to repro this on 386 build and couldn't. |
Just to be clear - this debugging is on non-CGO binaries (i.e. those without a blank |
Yes, non-CGO, latest tests are with go 1.9 rc2 |
newosproc is not used to create main thread. main thread is created by Windows process loader. The loader uses pe file parameters to configure main thread stack.
If any non Go code calls Windows CreateThread API, the call cannot magically "be re-routed through newosproc". In fact this scenario is broken at this moment (see issue #6751). I would not be surprised if your problem is a dup of #6751. Given how much external code you use, can you be certain that none of that code calls Go code on a thread that has not been created by Go? Alex |
The crash always happen on the main thread. If you read earlier comments I think I narrowed it down to So my best guess is that |
Commit c2c07c7 (CL 49331) changed the linker and runtime to always use 2MB stacks on 64-bit Windows. This is the corresponding change to make 32-bit Windows always use large (1MB) stacks because it's difficult to detect when Windows applications will call into arbitrary C code that may expect a large stack. This is done as a separate change because it's possible this will cause too much address space pressure for a 32-bit address space. On the other hand, cgo binaries on Windows already use 1MB stacks and there haven't been complaints. Updates #20975. Change-Id: I8ce583f07cb52254fb4bd47250f1ef2b789bc490 Reviewed-on: https://go-review.googlesource.com/49610 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Alex Brainman <[email protected]>
Hi @kjk, I have tested your app.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<assemblyIdentity version="1.0.0.0" processorArchitecture="*" name="SomeFunkyNameHere" type="win32"/>
<dependency>
<dependentAssembly>
<assemblyIdentity type="win32" name="Microsoft.Windows.Common-Controls" version="6.0.0.0" processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"/>
</dependentAssembly>
</dependency>
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
<dpiAware>true</dpiAware>
</asmv3:windowsSettings>
</asmv3:application>
</assembly> $ go get -u github.com/lxn/win
$ go get -u github.com/lxn/walk
$ go get -u github.com/akavel/rsrc
$ rsrc -manifest test.manifest -o rsrc.syso
$ go build -ldflags="-H windowsgui"
$ ./testwalk.exe # It can run as expected, not be crashed! My Windows OS is $ go version
go version go1.10 windows/amd6
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\xgf\AppData\Local\go-build
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=D:\gopath
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\xgf\AppData\Local\Temp\go-build028597650=/tmp/go-build -gno-record-gcc-switches /all, FYI! Updated: |
@xgfone my website (https://blog.kowalczyk.info) is no longer a good test because I fixed a bug on my website generator that created mis-formatted, deeply nested html that triggered this problem easily. Try https://vox.com or some other complicated website and make sure to click on links etc. It doesn't necessarily just crash on first render. |
@kjk Yes, I have also tested https://vox.com. It works after starting, and the program does not crash. Then I clicked and opened some links. When going on clicking a link, it crashed as follow. # ...
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
fatal: morestack on g0
Segmentation fault The number of the buffer line of my console is 10000, and the output has exceeded the whole console buffer. FYI. |
Based on your debugging, both The question is why we ran out of space on the scheduler stack. Are you able to print the value of RSP and g0.stack.lo and g0.stack.hi at the crash from windbg?
I suspect I know why it's repeating until it crashes hard. After printing this message, it does a |
Actually, the fact that it was able to print >10000 "morestack on g0" errors before segfaulting is telling. I assume the OS stack has a guard page, which means there must be a lot more space on the OS stack than Go thinks there is. |
Oh, I'm pretty sure I know what's going on. While we allocate large stacks for both the main thread and new threads, both (Edited to add: in the cgo case it looks like we get the stack bounds right for all of the threads. |
@aclements That makes sense. I'm not sure I'll be capable of testing unreleased Go version but if you have a fix, it should be easy to verify using https://github.com/kjk/go20975 test program and going e.g. to vox.com and clicking around. It should crash 100% with current Go and not crash after the fix. |
@kjk, thanks. It would be easy if I had a graphical Windows machine/VM. :) @alexbrainman, can you think of an easy automated test for this? I think I need a C function in a DLL that uses lots of stack and then calls back into Go, but all in a non-cgo binary. |
Change https://golang.org/cl/120195 mentions this issue: |
Change https://golang.org/cl/120336 mentions this issue: |
It does sounds reasonable (you assume that
I suppose we could call
We do handle debug exceptions (search for _EXCEPTION_BREAKPOINT), but it is too late for this scenario. So, sure, we could modify
We have quite a few tests in runtime package that builds C dlls on the fly and use them to test runtime code. See TestStdcallAndCDeclCallbacks, TestReturnAfterStackGrowInCallback, TestFloatArgs and TestDLLPreloadMitigation. Alex |
Currently, on Windows, the thread stack size is set or assumed in many different places. In non-cgo binaries, both the Go linker and the runtime have a copy of the stack size, the Go linker sets the size of the main thread stack, and the runtime sets the size of other thread stacks. In cgo binaries, the external linker sets the main thread stack size, the runtime assumes the size of the main thread stack will be the same as used by the Go linker, and the cgo entry code assumes the same. Furthermore, users can change the main thread stack size using editbin, so the runtime doesn't even really know what size it is, and user C code can create threads with unknown thread stack sizes, which we also assume have the same default stack size. This is all a mess. Fix the corner cases of this and the duplication of knowledge between the linker and the runtime by querying the OS for the stack bounds during thread setup. Furthermore, we unify all of this into just runtime.minit for both cgo and non-cgo binaries and for the main thread, other runtime-created threads, and C-created threads. Updates #20975. Change-Id: I45dbee2b5ea2ae721a85a27680737ff046f9d464 Reviewed-on: https://go-review.googlesource.com/120336 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Alex Brainman <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
This came up with 1.8.3 when working on GUI Windows programs using lxn\walk library.
Relevant issues:
The problem is that Go linker sets very small stack size (I think 128 kB) in PE executables. The standard on Windows is more like 1-2 MB.
This is fine for code that only lightly uses system C libraries but when writing code that talks to win32 UI APIs, it's very likely to hit the stack limit and silently crash.
I encountered it because I tried to use webview (mshtml.dll) control and it crashed on 64-bit when rendering my (not very complicated) website. Other people seen such crashes as well.
There are work-arounds: one can edit PE header after the exe is built using e.g.
editbin.exe
, but such tool is not necessarily available to the programmer (it's part of Visual Studio).It would be much easier on Windows developers if there was a linker flag to set custom stack size so that it could be done directly with
go build
. A library like lxn\walk could then document the need to increase stack size for Go Windows programs and recommend aI don't know if such option would be relevant/needed for other OSes/exe formats.
The text was updated successfully, but these errors were encountered: