Allow cops to invalidate results cache #7496

maxh · 2019-11-11T02:58:54Z

Internally at Flexport, we've written a few cops to enforce Rails Engine isolation. They've helped us incrementally modularize our system, and we think they might be valuable to others too. I'm working to open source them now.

For example, GlobalModelAccessFromEngine (flexport/rubocop-flexport#5) forbids code within Rails Engines from directly accessing global models in the main app/ directory. Another example is EngineApiBoundary (flexport/rubocop-flexport#6).

(Note: we considered upstream these to rubocop-rails, and @koic suggested we create our gem instead: rubocop/rubocop-rails#152 (comment))

These cops read from the filesystem, taking a holistic view of our codebase while they inspect. Unfortunately, this approach does not play nice with the RuboCop results cache. As a workaround, we created a custom lint.rb that wraps rubocop and busts the cache when needed using the --cache param. It works ok.

But it's not ideal, especially for broader use. This PR aims to fix the issue at its source by allowing cops themselves to invalidate the cache.

Example

Consider the following example:

(1) A filesystem contains these files:

app/models/my_model.rb
engines/my_engine/app/services/my_engine/my_service.rb

(2) We run rubocop engines/my_engine/app/services/my_engine/my_service.rb and find a violation -- service.rb contains MyModel.find(123).

(2b) During that run, a cached result is stored, keyed by: the inspected source code, the RuboCop config, command-line options, and executable version.

(3) We run the same command again. Cache is hit, same violation shown. All good.

(4) We move the model file into the engine. So now app/models/my_model.rb no longer exists and we have engines/my_engine/app/models/my_engine/my_model.rb.

(5) We run the same RuboCop command again. Because none of the cache key inputs changed, the same violation is shown again. This is incorrect.

(6) We run the same command again with --cache false and see that the violation no longer exists. This is correct.

Implementation overview

This PR allows cops to define an external_dependency_checksum method that busts the cache when their external dependencies change. Cops are responsible for computing their own checksum however they deem appropriate.

Among existing cops, I believe there are no use cases for this method. For the cops we've written, there are two types of external dependencies: (1) the presence or absence of certain files and/or (2) the contents of certain files.

Discussion

I recognize that this is perhaps an unusual use case for RuboCop. As an alternative, we could create our own repo with these cache-unfriendly cops and package them with a custom runner script that does cache busting inside the script. But that seems suboptimal. And I suspect other cops may make use of this feature in the future as well.

Feedback welcome! Thanks!

Before submitting the PR make sure the following are checked:

Wrote good commit messages.
Commit message starts with [Fix #issue-number] (if the related issue exists).
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Added an entry to the Changelog if the new code introduces user-observable changes. See changelog entry format.
The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
Run bundle exec rake default. It executes all tests and RuboCop for itself, and generates the documentation.

bbatsov · 2019-11-14T20:44:10Z

I'm fine with the proposed solution, but I'll defer to @jonas054 to evaluate the implementation.

lib/rubocop/cop/cop.rb

jonas054

Everything looks very good to me. I just found a couple of small things to complain about. :)

lib/rubocop/runner.rb

jonas054

👍

maxh · 2019-11-22T20:20:37Z

Here's a blog post that goes into more detail about the Rails Engine cops mentioned in the PR description:

https://flexport.engineering/isolating-rails-engines-with-rubocop-210feaba3164

When this is merged, I will be able to full upstream those cops. Thanks!

bbatsov · 2019-11-23T08:06:58Z

Thanks!

exterm · 2019-11-25T22:46:58Z

lib/rubocop/cop/cop.rb

+      # ResultCache system when those external dependencies change,
+      # ie when the ResultCache should be invalidated.
+      def external_dependency_checksum
+        nil


If you run the cop on n files, this method will be called every time, recomputing the checksum n times, correct?

Maybe it would be useful for rubocop to have a "run" context that keeps state across files and is accessible from a cop.

Good question -- please see #7543. I believe these results should be cached per team/config, so don't need to be recomputed per inspected file.

Follow up of #7496.

…updating schema.rb Fixes rubocop#227. This PR makes `Rails/UniqueValidationWithoutIndex` aware of updating db/schema.rb `Rails/UniqueValidationWithoutIndex` cop needs to know both model and db/schema.rb changes to register an offense. However, with default RuboCop, only changes to the model affect cache behavior. This PR ensures that changes to db/schema.rb affect the cache by overriding the following method: ```ruby # This method should be overridden when a cop's behavior depends # on state that lives outside of these locations: # # (1) the file under inspection # (2) the cop's source code # (3) the config (eg a .rubocop.yml file) # # For example, some cops may want to look at other parts of # the codebase being inspected to find violations. A cop may # use the presence or absence of file `foo.rb` to determine # whether a certain violation exists in `bar.rb`. # # Overriding this method allows the cop to indicate to RuboCop's # ResultCache system when those external dependencies change, # ie when the ResultCache should be invalidated. def external_dependency_checksum nil end ``` https://github.com/rubocop-hq/rubocop/blob/v0.81.0/lib/rubocop/cop/cop.rb#L222-L239 See for more details: rubocop/rubocop#7496

Since its inception in rubocop#7496, this never really worked despite good intentions. Because `ResultCache` instances are per-file, the instance variable supposed to do the heavy lifting just gets discarded. Benchmarking with this change on the RuboCop repo itself yields about 100ms improvements for me with a primed cache: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.800 s ± 0.023 s [User: 2.247 s, System: 1.050 s] Range (min … max): 1.774 s … 1.845 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.670 s ± 0.014 s [User: 1.999 s, System: 1.040 s] Range (min … max): 1.657 s … 1.696 s 10 runs ``` For Rails, ~3s improvement: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 10.302 s ± 0.173 s [User: 26.179 s, System: 2.307 s] Range (min … max): 10.158 s … 10.743 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 7.574 s ± 0.102 s [User: 17.133 s, System: 1.977 s] Range (min … max): 7.451 s … 7.758 s 10 runs ``` On the GitLab repo containing ~38k files (nice to benchmark against) the gains are even more impressive at 45s (~55% faster!). Lots more files (and cops) means lots more redundant work: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 80.163 s ± 0.944 s [User: 199.783 s, System: 13.855 s] Range (min … max): 79.215 s … 81.796 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 35.413 s ± 0.312 s [User: 93.588 s, System: 9.236 s] Range (min … max): 35.007 s … 35.989 s 10 runs ```

Since its inception in #7496, this never really worked despite good intentions. Because `ResultCache` instances are per-file, the instance variable supposed to do the heavy lifting just gets discarded. Benchmarking with this change on the RuboCop repo itself yields about 100ms improvements for me with a primed cache: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.800 s ± 0.023 s [User: 2.247 s, System: 1.050 s] Range (min … max): 1.774 s … 1.845 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 1.670 s ± 0.014 s [User: 1.999 s, System: 1.040 s] Range (min … max): 1.657 s … 1.696 s 10 runs ``` For Rails, ~3s improvement: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 10.302 s ± 0.173 s [User: 26.179 s, System: 2.307 s] Range (min … max): 10.158 s … 10.743 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 7.574 s ± 0.102 s [User: 17.133 s, System: 1.977 s] Range (min … max): 7.451 s … 7.758 s 10 runs ``` On the GitLab repo containing ~38k files (nice to benchmark against) the gains are even more impressive at 45s (~55% faster!). Lots more files (and cops) means lots more redundant work: ``` old: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 80.163 s ± 0.944 s [User: 199.783 s, System: 13.855 s] Range (min … max): 79.215 s … 81.796 s 10 runs new: $ hyperfine -w 3 "bundle exec rubocop" Benchmark 1: bundle exec rubocop Time (mean ± σ): 35.413 s ± 0.312 s [User: 93.588 s, System: 9.236 s] Range (min … max): 35.007 s … 35.989 s 10 runs ```

maxh force-pushed the maxh/enable-cache-key branch from ca5e454 to 62bf479 Compare November 11, 2019 03:04

maxh mentioned this pull request Nov 11, 2019

Add new Rails/EngineGlobalModelAccess cop rubocop/rubocop-rails#151

Closed

8 tasks

maxh force-pushed the maxh/enable-cache-key branch from 62bf479 to 722c25a Compare November 11, 2019 14:34

maxh mentioned this pull request Nov 11, 2019

Add new Rails/EngineApiViolation cop rubocop/rubocop-rails#153

Closed

8 tasks

maxh force-pushed the maxh/enable-cache-key branch 2 times, most recently from 5674fea to 5f930a2 Compare November 12, 2019 15:04

bbatsov requested a review from jonas054 November 14, 2019 20:44

jonas054 reviewed Nov 15, 2019

View reviewed changes

lib/rubocop/cop/cop.rb Outdated Show resolved Hide resolved

jonas054 requested changes Nov 16, 2019

View reviewed changes

lib/rubocop/runner.rb Outdated Show resolved Hide resolved

Allow cops to invalidate results cache

e7fa9a9

maxh force-pushed the maxh/enable-cache-key branch from 5f930a2 to e7fa9a9 Compare November 16, 2019 17:42

maxh requested a review from jonas054 November 16, 2019 17:55

jonas054 approved these changes Nov 16, 2019

View reviewed changes

This was referenced Nov 19, 2019

Add GlobalModelAccessFromEngine cop flexport/rubocop-flexport#3

Closed

Add GlobalModelAccessFromEngine cop flexport/rubocop-flexport#5

Merged

Add EngineApiBoundary cop flexport/rubocop-flexport#6

Merged

bbatsov merged commit 5654998 into rubocop:master Nov 23, 2019

exterm reviewed Nov 25, 2019

View reviewed changes

koic added a commit that referenced this pull request Nov 28, 2019

Fix a typo

b21e8f2

Follow up of #7496.

maxh mentioned this pull request Dec 2, 2019

Add comments explaining external dependency checksum caching #7543

Merged

8 tasks

maxh deleted the maxh/enable-cache-key branch December 5, 2019 22:53

koic mentioned this pull request Apr 9, 2020

[Fix #227] Make Rails/UniqueValidationWithoutIndex aware of updating schema.rb rubocop/rubocop-rails#229

Merged

8 tasks

Earlopain mentioned this pull request Aug 29, 2024

Properly cache team checksums #13169

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow cops to invalidate results cache #7496

Allow cops to invalidate results cache #7496

maxh commented Nov 11, 2019 •

edited

Loading

bbatsov commented Nov 14, 2019

jonas054 left a comment

jonas054 left a comment

maxh commented Nov 22, 2019

bbatsov commented Nov 23, 2019

exterm Nov 25, 2019

maxh Dec 2, 2019

Allow cops to invalidate results cache #7496

Allow cops to invalidate results cache #7496

Conversation

maxh commented Nov 11, 2019 • edited Loading

Example

Implementation overview

Discussion

bbatsov commented Nov 14, 2019

jonas054 left a comment

Choose a reason for hiding this comment

jonas054 left a comment

Choose a reason for hiding this comment

maxh commented Nov 22, 2019

bbatsov commented Nov 23, 2019

exterm Nov 25, 2019

Choose a reason for hiding this comment

maxh Dec 2, 2019

Choose a reason for hiding this comment

maxh commented Nov 11, 2019 •

edited

Loading