-
Notifications
You must be signed in to change notification settings - Fork 1k
Warn on use of abbreviated sha1 commit hash #582
Conversation
Update manifest validation to warn when an abbreviated version of a sha1 commit hash is used. Add test for warn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! Thanks for picking this up.
manifest.go
Outdated
// valid key | ||
case "revision": | ||
// Check if sha1 hash is abbreviated | ||
if valueStr, ok := value.(string); ok && len(valueStr) < 40 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that revisions are only hg/git (which are both commit hashes) but dep also supports other version control systems, such as bazaar. Let's add a test with bazaar revision id's as well to make sure that those won't be affected by this new validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if it's the bazaar revision-id
or revno
that's used as the revision
in the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it's a revision-id, see https://github.com/golang/dep/blob/master/cmd/dep/ensure_test.go#L24 for what we are validating against in our tests. I think that's a revision-id, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! thanks for incredibly speedy reply
Update manifest validation to accept a valid bzr revision id. Update git abbreviated hash check to use a regex instead of len check. Add tests for passing and failing bzr revision.
811717f
to
57a79c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better to me but I worry that I'm missing something around why the code didn't originally use regex. 😀
@jessfraz Was there a reason why you didn't originally use regex for validating a bazaar revision id that we should watch out for here?
manifest.go
Outdated
case "revision": | ||
if valueStr, ok := value.(string); ok { | ||
// Check if sha1 hash is abbreviated | ||
validGit := regexp.MustCompile("[a-f0-9]{40}").MatchString(valueStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend renaming this variable to validHash
instead, since it could be from git or hg
Yeah absolutely! I was looking that and wondering |
@carolynvs the validation logic we use there is originally inherited from me - @jessfraz is in the blame b/c she discovered that i was totally wrong about the final component of the revid being hex-encoded 😄 No, there's no reason we don't use a regex. Before we go down that path, though...maybe it'd be best to look through the bzr codebase to find where they validate the shape of these identifiers, and mimic that directly? |
this is how bzr generate new I'm not able to find the code that validate the format of |
Also, I would recommend storing the result of |
manifest.go
Outdated
case "revision": | ||
if valueStr, ok := value.(string); ok { | ||
// Check if sha1 hash is abbreviated | ||
validHash := regexp.MustCompile("[a-f0-9]{40}").MatchString(valueStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's prefer to use encoding/hex.DecodeString()
here - faster than compiling a regexp, and also just accurately reflects the data.
Update the regex to match that from the bzr revision-id generation tests and reuse it's compiled form. Update to match sha1 revision hash on hex string and length rather than regex. Add a test for an even length, abbreviated, sha1 revision hash.
Use raw string with backticks to avoid having to double escape inside a regex.
OK, my apologies, but I'm just actually thinking this all through now properly, and I've realized that we're kinda going about this the wrong way. 😦 First, we've disregarded svn here in a way that might be problematic. svn revision numbers will eventually be able to go into this field, then the algorithm as-written will mistake those numbers for abbreviated hex-encoded SHA1s. So, that's no good. We need to narrow the scope of our search to a particular, expected form, rather than trying to validate the field more broadly. To that end, we actually need to be looking for just one specific pattern - a 7-char long (git's abbrev length) or 12-char long (hg's abbrev length, I believe) string containing only [0-9a-f], with at least one character being in Being that we've already done the work to figure out how to validate bzr revs, though, let's not throw that work away. Let's open up a separate PR in which we apply it to |
Ok, I agree this seems wrong and / or problematic I am not super familiar with the code base, and I don't know if it's even possible, but it feels to me that it would be better to know which vcs we were using at the point at which we validate the revision, and ensure that we are using the correct validation rules for that vcs. Else you are right, it does feel a little like blindly guessing at what's acceptable. |
Yes, this definitely would make it easier. Getting that information, though, isn't zero-cost:
Fortunately, we can make these changes now, then improve them down the line - say, once we have #431 and can reliably avoid network access to get vcs type. (Note that we actually probably could do this all locally with the information we cache now, but because , I think it may be better to not try to tackle this until after #431) |
Ok nice! Thanks for the explanation. So plan going forward? Closing this and opening another with changes to match the specific cases you laid out above? Should that logic live in validateManifest or else where? |
@sdboyer I'm not sure about this part:
In Unless I misunderstood your point... |
Sure 😄 I'd say this PR should implement the character-counting solution I outlined; if it's easier for you to move that to a separate PR, though, we can close this and do that.
Yep, that's still the right place.
Right - assuming a perfectly random distribution, it would happen (10/16)^7 = 3.7% of the time. My thinking here is that it's worse to have false positives than miss a small percentage of cases, especially when this is a relative rare situation we're guarding against, anyway. |
@zknill have you decided? would you rather close this and start fresh, or just push up revised commits to this PR? |
just making the changes, I will close and push to another PR. Should be easier to review etc. |
Closing in favour of #671 |
Update manifest validation to warn when an abbreviated version of a
sha1 commit hash is used.
Add test for warn.
Fixes #567