adding in-process tests #525

jmank88 · 2017-05-06T00:04:04Z

Fixes #510

This change adds in-process tests. Most of the churn is around refactoring logging away from global functions and state, and replacing various os package references (Stdout,StdErr,Getenv,Getwd,etc.).

googlebot · 2017-05-06T00:04:07Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please let us know the company's name.

jmank88 · 2017-05-06T00:06:33Z

log/logger.go

+}
+
+// LogDepfln logs a formatted line, prefixed with `dep: `.
+func (l *Logger) LogDepfln(format string, args ...interface{}) {


This method could use a better name. Or we could just inline the format adjustments and call Logf. There are 13 usages.

This minimal logger is racy, so I'm going to try dissolving this logger package completely and using the std lib log.Logger instead, so this helper will move or be in-lined.

jmank88 · 2017-05-06T00:10:27Z

bump googlebot: I signed it!

googlebot · 2017-05-06T00:10:29Z

CLAs look good, thanks!

sdboyer

just some quick comments while you're on it

sdboyer · 2017-05-06T01:28:43Z

cmd/dep/integration_test.go

-				testProj.CompareImportPaths()
-				testCase.CompareVendorPaths(testProj.GetVendorPaths())
+				t.Run("-external", testIntegration(testName, wd, execCmd))
+				t.Run("-internal", testIntegration(testName, wd, runMain))


no dash needed, the system automatically separates with a slash

sdboyer · 2017-05-06T01:46:29Z

cmd/dep/remove_test.go

-		t.Fatal("rm with one arg not in manifest should have failed")
-	}
+	t.Run(testName+"-external", removeErrors(testName, wd, execCmd))
+	t.Run(testName+"-internal", removeErrors(testName, wd, runMain))


ibid - no need for dash

sdboyer · 2017-05-06T01:47:57Z

cmd/dep/status.go

-	o string
-	g *graphviz
-	p *dep.Project
+	logger *log.Logger


I think it's probably preferable to have this remain as an io.Writer, then map that to the *log.Logger in some intermediate step.

Yeah I ended up having to do that for the tabwriter anyways.

sdboyer · 2017-05-06T01:51:57Z

gps/source_manager.go

@@ -216,7 +218,7 @@ func (sm *SourceMgr) HandleSignals(sigch chan os.Signal) {

 				opc := sm.suprvsr.count()
 				if opc > 0 {
-					fmt.Printf("Signal received: waiting for %v ops to complete...\n", opc)
+					sm.logf("Signal received: waiting for %v ops to complete...\n", opc)


sorry, this one has to stay as-is. yes, it's obnoxious to print directly in this way, but the interface here is carefully designed; we're not injecting a logger for this one, narrow case. if it's that bad, we'll just delete this logging output entirely.

It would clean up a lot to remove it, but how would it affect the user experience?
On the other hand, is there a chance that leaving it to log as before could result in arbitrary test failures?

No, there's no chance of it bunking up tests; the only possible way for it to occur is if a signal is sent. It's a very specific case, which is why it exists as it does in the first place.

The proper solution here involves moving direct signal handling out of gps, and into the tool-level code - not changing gps' interface for this sole logging statement.

sdboyer · 2017-05-06T01:53:40Z

gps/solver.go

@@ -105,12 +105,9 @@ type SolveParameters struct {
 	// typical case.
 	Downgrade bool

-	// Trace controls whether the solver will generate informative trace output
-	// as it moves through the solving process.
-	Trace bool


Let's just leave everything underneath gps/ out of this refactor, unless it absolutely must change.

I can put this back, but it's not doing anything, and I suspect it will confuse others in the future as it did me. (Maybe a separate issue if it's just clouding the water on this PR)

If it was set to true, then there was a validation check for a non-nil TraceLogger (even though nil checking the logger is the actual mechanism that triggers the trace logging internally).

If it was set to false, then it wasn't used. So you could pass a non-nil logger, and it would be passed along and tracing will take place, in spite of the false trace value.

So in it's current state it really amounts to "double check that TraceLogger is non-nil", but the only usages are setting a constant true value directly adjacent a call to set TraceLogger non-nil.

Yep, I know that it looks like it amounts to a double non-nil check, and it may actually be that (I vaguely recall, from when I wrote it a year ago, there being an actual reason for having the extra bool). But that needs its own PR - as you said, the issue is about muddying the water.

sdboyer · 2017-05-06T01:57:18Z

cmd/dep/integration_test.go

@@ -35,8 +35,6 @@ func TestIntegration(t *testing.T) {
 			parse := strings.Split(path, string(filepath.Separator))
 			testName := strings.Join(parse[2:len(parse)-1], "/")
 			t.Run(testName, func(t *testing.T) {
-				t.Parallel()


It's a major speedup to run these tests in parallel; there'd need to be a justification for dropping back to running them in serial.

Yeah this was an attempt to mitigate the race popping up on the integration test buffers. Definitely trying to keep this in.

jmank88 · 2017-05-06T16:25:52Z

There are still currently race conditions blocking re-parallelization of the integration tests. It appears to be orphaned goroutines continuing to log after the initial call has returned, which explains why the external-process tests are not affected. It may be possible to quick-fix by swapping out the logger when returning, so any lingering logging will go to ioutil.Discard, but really each test should be waiting for all spawned goroutines to complete.

jmank88 · 2017-05-06T16:30:42Z

Sometimes, but not always, the osx tests fail: https://travis-ci.org/golang/dep/jobs/229474199
Hopefully this is a race to set working dir or an env var that can be fixed, but I've not been able to reproduce it (and don't have OSX available).

…tests

… process

jmank88 · 2017-05-08T15:09:12Z

Resolved the race conditions with a few sync.WaitGroups. There are still some remaining background goroutines that aren't synced up before completion (e.g. lots of SyncSourceFor called downstream from Solve), but they don't log so they aren't racing for the stderr buffer.

Resolved the OSX issue as well - the existing hack needed only be applied to the external process tests.

sdboyer · 2017-05-08T19:15:02Z

awesome, thanks for keeping at this! 🎉 🎉 i'm working through a review. meantime, to this:

(e.g. lots of SyncSourceFor called downstream from Solve)

If gps' internals are working correctly, then these should all be cleaned up/terminated by calling sourceMgr.Release(), though the current implementation may not wait until all of the outstanding calls terminate before it returns. (If it does not, it's a bug.)

A note - the current philosophy of test design is, when the SUT is not the SourceManager itself, but some functionality that relies on it, to treat the underlying sourceMgr as safe for [heavily, aggressively] concurrent reuse across tests. This is partly for performance reasons, but also a way of casting a broad net to catch possible concurrency bugs in sourceMgr. We have tests that attempt to do this directly, as well, but the possibility space here is large, and to the extent that all of our other tests represent more realistic workloads (and they often do), they add quite a bit of value.

jmank88 · 2017-05-08T20:19:15Z

I don't think that what you are describing is fundamentally a problem. I was primarily concerned with reliance on the process exiting to terminate background goroutines (either by design, or from a race). I think that some of these calls, like firing off an un-monitored go s.b.SyncSourceFor(dep.Ident) or go sm.DeduceProjectRoot(arg) to pre-fetch, are technically racy since nothing waits for them to complete. They are likely no problem under normal execution (the critical path has more work), but these parallel tests are ripe for irregular scheduling scenarios, so if we have issues in the future, I'd look to these first.

sdboyer

this is looking pretty good - still chewing on it a little, but here are some comments while i do 😄

sdboyer · 2017-05-08T18:37:50Z

gps/result_test.go

@@ -104,6 +110,8 @@ func testWriteDepTree(t *testing.T) {
 	if _, err = os.Stat(filepath.Join(tmp, "bitbucket.org", "sdboyer", "withbm")); err != nil {
 		t.Errorf("Directory for bitbucket.org/sdboyer/withbm does not exist")
 	}
+
+	wg.Wait()


Mostly for my curiosity - was this causing an actual problem, or is the change here prophylactic?

The implementation of WriteDepTree() should entail that all of the SyncSourceFor() goroutines finish before WriteDepTree() does. It's possible that they wouldn't if one of the items encountered an error (and it's generally not great test design to rely on internal details of the SUT to resolve concurrency issues established by the test), but I'd definitely like to know if that entailment doesn't hold.

Only the logging goroutines were causing actual race failures. I fixed this one the easy way when I started chasing these down, but I will revert it.

sdboyer · 2017-05-08T18:45:54Z

context.go

-	// this way we get the default GOPATH that was added in 1.8
-	buildContext := build.Default
-	wd, err := os.Getwd()
+func NewContext(wd string, env []string) (*Ctx, error) {


I'm glad you went to injecting this without me having to suggest it 😄 You went further than I'd envisioned by passing in a whole env []string; I was imagining just GOPATH string. But that's great - it affords us flexibility for the future, in the event that we have more env vars we want to incorporate into dep.Ctx. Well done 😄 🎉

Then my unfamiliarity with the code base worked to my advantage, because I started by targeting general parity with the exec.Cmd API in use since I didn't know the minimum input requirements 😆

sdboyer · 2017-05-08T19:27:55Z

cmd/dep/integration_test.go

+
+func testIntegration(name, wd string, externalProc bool, run test.RunFunc) func(t *testing.T) {
+	return func(t *testing.T) {
+		// Set up environment


Should probably also t.Parallel() here.

sdboyer · 2017-05-09T00:47:14Z

txn_writer.go

@@ -436,43 +437,43 @@ fail:
 	return failerr
 }

-func (sw *SafeWriter) PrintPreparedActions() error {
+func (sw *SafeWriter) PrintPreparedActions(stdout *log.Logger) error {


nit: let's name the local var output instead of stdout. Even if we always (currently) end up attaching stdout here, I don't think we need to suggest that via the var name.

sdboyer · 2017-05-09T00:52:43Z

I think that some of these calls, like firing off an un-monitored go s.b.SyncSourceFor(dep.Ident) or go sm.DeduceProjectRoot(arg) to pre-fetch, are technically racy since nothing waits for them to complete.

Right, this is what I was trying to address with the comment about sm.Release() - that call should wait for all them to complete (but currently may not, and if it doesn't it's a bug). It also does more than just wait - if there are any outstanding calls, it should also send a cancelation signal. I'm sure it does that much.

So, as long as the source manager is released, we shouldn't be relying on process termination to end errant goroutines.

Are we saying the same thing?

jmank88 · 2017-05-09T12:09:48Z

Are we saying the same thing?

Yes I think so, we were both just a bit ambiguous. I expect these tests to be fine, since no remaining cases fail the race detector, and none resemble the fixed cases. The Release process as it stands can short-circuit background work (so it returns an error), but there is no explicit mechanism to block the routine requesting the release all the way until that return takes place. doRelease blocks on sm.suprvsr.wait(), which does book keeping and cancels/waits for calls launched through suprvsr.do (and e.g. go SyncSourceFor calls sg.suprvsr.do internally), but the doRelease caller will be unblocked after those internal dos and before SynceSourceFor has fully completed. However, I don't see any internal state problems, since the remaining work would only be relevant if something was waiting for the returned value.

So basically I think that this PR and general program correctness should be fine. I only suspect technical incorrectness by definition, due to the orphaned top-level routines (w/o side affects. Of course the ones that were logging as well were a problem). Perhaps a separate issue to audit these is in order - some usages look like good candidates for translation into a simpler fork-join model, and others will be removed soon (e.g. as part of the remove command).

sdboyer · 2017-05-09T16:23:12Z

doRelease blocks on sm.suprvsr.wait(), which does book keeping and cancels/waits for calls launched through suprvsr.do (and e.g. go SyncSourceFor calls sg.suprvsr.do internally)...

Ah good, I did do this when I did the latest big source manager refactor; I couldn't remember if I did or not.

some usages look like good candidates for translation into a simpler fork-join model

Ultimately, it's important to keep in mind that these initial goroutines are only spawned as helpers, potentially achieving some up-front parallelism gains on work we happen to know the SourceManager will need to do. I'd be open to an argument for pushing these down into gps directly, either into the solver and/or a new SourceManager method.

That said, what are you envisioning, and what value do you think it would add?

I only suspect technical incorrectness by definition, due to the orphaned top-level routines

Ehh...I mean, yeah, if the rule is that "logic that starts a goroutine is directly responsible for ensuring its appropriate termination." Not a bad rule, though idk if it rises to the level of correctness - it seems like mostly a question of whether it's good design to delegate that responsibility to guarantees provided by SourceManager.Release(). Which...

...but the doRelease caller will be unblocked after those internal dos and before SynceSourceFor has fully completed. However, I don't see any internal state problems, since the remaining work would only be relevant if something was waiting for the returned value.

The old system, prior to the refactor, kept an atomic counter of the number of running processes, which it incremented on ingress into one of the top-level SourceManager methods, and decremented only immediately prior to returning. I opted to remove that in favor of this internal checking, with the goal of adding the cancelation semantics. The OK-ness of that change was generally built on the assumption that any operations that occur between the supervised action and are safe to lose.

There's logic that, ostensibly, runs counter to this. In three places, we update the cache after the supervised segment ends: in getManifestAndLock(), listPackages(), and revisionPresentIn() (which actually has a totally unsupervised call into the src layer, ugh - that does need to be fixed).

This is clearly not a problem now, as the "cache" is memory-only and ephemeral - but we do plan to move to a persistent caching layer - #431. When that change happens, we could have a situation where the SourceManager releases while writing to disk. The Release() call will return, and the caller will presumably proceed to process termination immediately - but the goroutine attempting to update the cache will still be running. Arguably, this would be a reason for restoring the outermost-boundary call counters and blocking in Release() until they all clear. However, my initial tendency is to follow the principles of crash-only design, and place our focus instead on (correctly) using a (reasonably correct) transactional persistence lib, rather than trying to handle just this one relatively narrow case of graceful shutdown.

sdboyer · 2017-05-09T16:37:18Z

oh, also, i think we're good to go here 😄

jmank88 · 2017-05-09T18:41:13Z

"logic that starts a goroutine is directly responsible for ensuring its appropriate termination."

That's slightly stricter than I was thinking (in that I'm not worried about ownership, so much as any eventual sync/join back into the main gouroutine), but I was flipping it around and observing:
"Any goroutine that doesn't join/sync back with the main routine directly or indirectly either (1) has observable side affects and thus is a race condition, or (2) doesn't have observable side affects, so why execute it at all?".

If nothing is waiting for the returned values from these routines, then why is there work queued or a stack to unwind and value to return in the first place? Practically speaking, likely because the method was already written and available, but can we use a similar method that doesn't queue up any extra work, and only requests what is necessary? Or maybe the async routine can be spawned internally, rather than at the current level. (Hopefully these approaches wouldn't require a messy trade-off like a bunch of parallel helper methods, but I haven't explored)

In any case, some of the current usages seem brittle, since you only have to try and log the error returned from go SyncSourceFor... to introduce the same sort of races fixed here. Maybe I'm over thinking it, and this is something that a few blank identifiers and comments can sufficiently document, or we can just trust the race detector.

sdboyer · 2017-05-09T19:20:05Z

(1) has observable side affects and thus is a race condition

I'm not sure this necessarily follows; if the effect is idempotent (as it is in this case), then there's no race. Though perhaps it doesn't apply here simply because the goroutine is indirectly joined back with the main goroutine.

then why is there work queued or a stack to unwind and value to return in the first place?

To initiate work in parallel that would have been serially initiated later anyway (by the solver), with the understanding that the later serial call will pick up the same error as was unwound and returned but discarded here.

Practically speaking, likely because the method was already written and available, but can we use a similar method that doesn't queue up any extra work, and only requests what is necessary?

If we're comparing to another hypothetical method that's still invoked in its own goroutine, but has no return value, then the only additional work that's being queued up is returning an error value through one, maybe two stack frames.

Or maybe the async routine can be spawned internally, rather than at the current level.

They could, yes - such a thing is what I was referring to with "I'd be open to an argument for pushing these down into gps directly, either into the solver and/or a new SourceManager method."

Internally, the mechanism would still be pretty much the same: initiate a bunch of goroutines to do the sync work in the background so that that work is done perhaps a bit sooner (wall time) by the time the main serial thread needs that data. And it would still be implicitly supervised and managed in the same way via Release(). The difference is that it would all be behind the SourceManager interface boundary, rather than crossing it. That's the bit of coupling here that I do agree is uncomfortable.

A possible signature for such a method would be SourceManager.PrefetchLock(gps.Lock).

Maybe I'm over thinking it

I think so 😄

jmank88 · 2017-05-09T19:51:38Z

if the effect is idempotent (as it is in this case), then there's no race.

I would consider idempotent actions as not-observable by definition in this case, since the observer can't distinguish them.

To initiate work in parallel that would have been serially initiated later anyway (by the solver)

If we're comparing to another hypothetical method that's still invoked in its own goroutine, but has no return value, then the only additional work that's being queued up is returning an error value through one, maybe two stack frames.

Yes, I'm referring strictly to the work done on the orphaned goroutines after doRelease is unblocked. Sure, in some cases this is as simple and trivial as preparing and returning a variable to a routine that immediately finishes (perhaps with a few calls on the stack to unwind first), but other cases looked more complicated (like some locks and async calls in order to prep the return value) but I didn't dissect any further.

The larger point is just that they are doing more work than they need to, which strikes me as analogous to something like leaving an unused variable hanging around - it's just something to clean up that has no effect, but otherwise can be semantically distracting or misleading.

sdboyer · 2017-05-09T20:22:33Z

I'm happy to look at issues/PRs for any particular issues that you identify. My general rubric will be whether the more tailored calls/logic merit both the added code weight and (if applicable) the expanded API surface area that the user has to choose from.

adding in-process tests

adding in-process tests

e112b2f

googlebot added the cla: no label May 6, 2017

jmank88 commented May 6, 2017

View reviewed changes

googlebot added cla: yes and removed cla: no labels May 6, 2017

jmank88 added 4 commits May 5, 2017 19:13

Merge branch 'master' into in_process_tests

92942db

go fmt

f7a8a8c

resolve race condition

76c033f

serial tests

0c314d4

sdboyer reviewed May 6, 2017

View reviewed changes

jmank88 added 6 commits May 6, 2017 08:23

review changes

e50a643

fix example; cleanup

8c221e1

cleanup

6f63402

add license to loggers

a2c5350

restoring Trace solve parameter

8ac72da

cleanup

d436d09

This was referenced May 6, 2017

gps/solver - Remove SolveParameters.Trace #526

Merged

Hide dep guts under internal/ #527

Closed

jmank88 added 4 commits May 8, 2017 08:41

sync logging goroutines before completing; reparallelize integration …

8fccd79

…tests

vet: range variable captured by func literal

b929b9c

fix custom getEnv: return last instance

c6e4b60

integration_tests: run darwin travis CI GOPATH hack only for external…

faa158f

… process

sdboyer reviewed May 9, 2017

View reviewed changes

jmank88 added 3 commits May 9, 2017 08:30

review changes

4ec47a6

Merge branch 'master' into in_process_tests

b6cf7bb

go fmt

3b0360b

sdboyer merged commit a919470 into golang:master May 9, 2017

sdboyer mentioned this pull request May 9, 2017

Restore warning logging from #528 #531

Closed

jmank88 deleted the in_process_tests branch May 9, 2017 18:53

jmank88 mentioned this pull request May 9, 2017

status output is no longer in table form #532

Closed

jmank88 mentioned this pull request May 9, 2017

buffer status output - fixes #532 #535

Merged

sdboyer mentioned this pull request May 10, 2017

Add itemized feedback for dep resolution in init #512

Merged

ibrasho pushed a commit to ibrasho-forks/dep that referenced this pull request May 10, 2017

Merge pull request golang#525 from jmank88/in_process_tests

0282257

adding in-process tests

adding in-process tests #525

adding in-process tests #525

Conversation

jmank88 commented May 6, 2017

googlebot commented May 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmank88 commented May 6, 2017

googlebot commented May 6, 2017

sdboyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmank88 commented May 6, 2017

jmank88 commented May 6, 2017

jmank88 commented May 8, 2017 • edited Loading

sdboyer commented May 8, 2017

jmank88 commented May 8, 2017

sdboyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdboyer commented May 9, 2017

jmank88 commented May 9, 2017 • edited Loading

sdboyer commented May 9, 2017 • edited Loading

sdboyer commented May 9, 2017

jmank88 commented May 9, 2017

sdboyer commented May 9, 2017

jmank88 commented May 9, 2017

sdboyer commented May 9, 2017

jmank88 commented May 8, 2017 •

edited

Loading

jmank88 commented May 9, 2017 •

edited

Loading

sdboyer commented May 9, 2017 •

edited

Loading