-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow cops to invalidate results cache #7496
Conversation
ca5e454
to
62bf479
Compare
62bf479
to
722c25a
Compare
5674fea
to
5f930a2
Compare
I'm fine with the proposed solution, but I'll defer to @jonas054 to evaluate the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks very good to me. I just found a couple of small things to complain about. :)
5f930a2
to
e7fa9a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Here's a blog post that goes into more detail about the Rails Engine cops mentioned in the PR description: https://flexport.engineering/isolating-rails-engines-with-rubocop-210feaba3164 When this is merged, I will be able to full upstream those cops. Thanks! |
Thanks! |
# ResultCache system when those external dependencies change, | ||
# ie when the ResultCache should be invalidated. | ||
def external_dependency_checksum | ||
nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run the cop on n
files, this method will be called every time, recomputing the checksum n
times, correct?
Maybe it would be useful for rubocop to have a "run" context that keeps state across files and is accessible from a cop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question -- please see #7543. I believe these results should be cached per team/config, so don't need to be recomputed per inspected file.
…updating schema.rb Fixes rubocop#227. This PR makes `Rails/UniqueValidationWithoutIndex` aware of updating db/schema.rb `Rails/UniqueValidationWithoutIndex` cop needs to know both model and db/schema.rb changes to register an offense. However, with default RuboCop, only changes to the model affect cache behavior. This PR ensures that changes to db/schema.rb affect the cache by overriding the following method: ```ruby # This method should be overridden when a cop's behavior depends # on state that lives outside of these locations: # # (1) the file under inspection # (2) the cop's source code # (3) the config (eg a .rubocop.yml file) # # For example, some cops may want to look at other parts of # the codebase being inspected to find violations. A cop may # use the presence or absence of file `foo.rb` to determine # whether a certain violation exists in `bar.rb`. # # Overriding this method allows the cop to indicate to RuboCop's # ResultCache system when those external dependencies change, # ie when the ResultCache should be invalidated. def external_dependency_checksum nil end ``` https://github.com/rubocop-hq/rubocop/blob/v0.81.0/lib/rubocop/cop/cop.rb#L222-L239 See for more details: rubocop/rubocop#7496
Since its inception in rubocop#7496, this never really worked despite good intentions. Because `ResultCache` instances are per-file, the instance variable supposed to do the heavy lifting just gets discarded. Benchmarking with this change on the RuboCop repo itself yields about 100ms improvements for me with a primed cache: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.800 s ± 0.023 s [User: 2.247 s, System: 1.050 s] Range (min … max): 1.774 s … 1.845 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.670 s ± 0.014 s [User: 1.999 s, System: 1.040 s] Range (min … max): 1.657 s … 1.696 s 10 runs ``` For Rails, ~3s improvement: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 10.302 s ± 0.173 s [User: 26.179 s, System: 2.307 s] Range (min … max): 10.158 s … 10.743 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 7.574 s ± 0.102 s [User: 17.133 s, System: 1.977 s] Range (min … max): 7.451 s … 7.758 s 10 runs ``` On the GitLab repo containing ~38k files (nice to benchmark against) the gains are even more impressive at 45s (~55% faster!). Lots more files (and cops) means lots more redundant work: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 80.163 s ± 0.944 s [User: 199.783 s, System: 13.855 s] Range (min … max): 79.215 s … 81.796 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 35.413 s ± 0.312 s [User: 93.588 s, System: 9.236 s] Range (min … max): 35.007 s … 35.989 s 10 runs ```
Since its inception in rubocop#7496, this never really worked despite good intentions. Because `ResultCache` instances are per-file, the instance variable supposed to do the heavy lifting just gets discarded. Benchmarking with this change on the RuboCop repo itself yields about 100ms improvements for me with a primed cache: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.800 s ± 0.023 s [User: 2.247 s, System: 1.050 s] Range (min … max): 1.774 s … 1.845 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.670 s ± 0.014 s [User: 1.999 s, System: 1.040 s] Range (min … max): 1.657 s … 1.696 s 10 runs ``` For Rails, ~3s improvement: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 10.302 s ± 0.173 s [User: 26.179 s, System: 2.307 s] Range (min … max): 10.158 s … 10.743 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 7.574 s ± 0.102 s [User: 17.133 s, System: 1.977 s] Range (min … max): 7.451 s … 7.758 s 10 runs ``` On the GitLab repo containing ~38k files (nice to benchmark against) the gains are even more impressive at 45s (~55% faster!). Lots more files (and cops) means lots more redundant work: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 80.163 s ± 0.944 s [User: 199.783 s, System: 13.855 s] Range (min … max): 79.215 s … 81.796 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 35.413 s ± 0.312 s [User: 93.588 s, System: 9.236 s] Range (min … max): 35.007 s … 35.989 s 10 runs ```
Since its inception in #7496, this never really worked despite good intentions. Because `ResultCache` instances are per-file, the instance variable supposed to do the heavy lifting just gets discarded. Benchmarking with this change on the RuboCop repo itself yields about 100ms improvements for me with a primed cache: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.800 s ± 0.023 s [User: 2.247 s, System: 1.050 s] Range (min … max): 1.774 s … 1.845 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.670 s ± 0.014 s [User: 1.999 s, System: 1.040 s] Range (min … max): 1.657 s … 1.696 s 10 runs ``` For Rails, ~3s improvement: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 10.302 s ± 0.173 s [User: 26.179 s, System: 2.307 s] Range (min … max): 10.158 s … 10.743 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 7.574 s ± 0.102 s [User: 17.133 s, System: 1.977 s] Range (min … max): 7.451 s … 7.758 s 10 runs ``` On the GitLab repo containing ~38k files (nice to benchmark against) the gains are even more impressive at 45s (~55% faster!). Lots more files (and cops) means lots more redundant work: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 80.163 s ± 0.944 s [User: 199.783 s, System: 13.855 s] Range (min … max): 79.215 s … 81.796 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 35.413 s ± 0.312 s [User: 93.588 s, System: 9.236 s] Range (min … max): 35.007 s … 35.989 s 10 runs ```
Internally at Flexport, we've written a few cops to enforce Rails Engine isolation. They've helped us incrementally modularize our system, and we think they might be valuable to others too. I'm working to open source them now.
For example,
GlobalModelAccessFromEngine
(flexport/rubocop-flexport#5) forbids code within Rails Engines from directly accessing global models in the mainapp/
directory. Another example isEngineApiBoundary
(flexport/rubocop-flexport#6).(Note: we considered upstream these to
rubocop-rails
, and @koic suggested we create our gem instead: rubocop/rubocop-rails#152 (comment))These cops read from the filesystem, taking a holistic view of our codebase while they inspect. Unfortunately, this approach does not play nice with the RuboCop results cache. As a workaround, we created a custom
lint.rb
that wrapsrubocop
and busts the cache when needed using the--cache
param. It works ok.But it's not ideal, especially for broader use. This PR aims to fix the issue at its source by allowing cops themselves to invalidate the cache.
Example
Consider the following example:
(1) A filesystem contains these files:
(2) We run
rubocop engines/my_engine/app/services/my_engine/my_service.rb
and find a violation --service.rb
containsMyModel.find(123)
.(2b) During that run, a cached result is stored, keyed by: the inspected source code, the RuboCop config, command-line options, and executable version.
(3) We run the same command again. Cache is hit, same violation shown. All good.
(4) We move the model file into the engine. So now
app/models/my_model.rb
no longer exists and we haveengines/my_engine/app/models/my_engine/my_model.rb
.(5) We run the same RuboCop command again. Because none of the cache key inputs changed, the same violation is shown again. This is incorrect.
(6) We run the same command again with
--cache false
and see that the violation no longer exists. This is correct.Implementation overview
This PR allows cops to define an
external_dependency_checksum
method that busts the cache when their external dependencies change. Cops are responsible for computing their own checksum however they deem appropriate.Among existing cops, I believe there are no use cases for this method. For the cops we've written, there are two types of external dependencies: (1) the presence or absence of certain files and/or (2) the contents of certain files.
Discussion
I recognize that this is perhaps an unusual use case for RuboCop. As an alternative, we could create our own repo with these cache-unfriendly cops and package them with a custom runner script that does cache busting inside the script. But that seems suboptimal. And I suspect other cops may make use of this feature in the future as well.
Feedback welcome! Thanks!
Before submitting the PR make sure the following are checked:
[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).bundle exec rake default
. It executes all tests and RuboCop for itself, and generates the documentation.