Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: move rust tests out of vms onto dedicated server #1384

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JakeHillion
Copy link
Contributor

Move Rust tests out of VMs as they don't need to run with a specific kernel, and should run on any dev's machine. Nix'ify the build process and move them to the large machine.

Specific changes:

  • Move rust-tests to the big self-hosted machine when in the sched-ext org.
  • Nix'ify the dependencies. This uses a fixed version of cargo, rustc and clippy rather than the moving target of rust-toolchain.toml, but it is still stable.
  • Remove the dependency on build-kernel so the tests start sooner.
  • Add a new package.metadata.scx section to describe whether to run clippy, moving this out of the GitHub actions spec and into each package's spec.
  • Run for the entire workspace instead of a set of named packages. This means we can require rust-tests in the CI instead of requiring each individual package's matrix entry, reducing errors. It also significantly reduces duplicated work.

This runs much faster in repo and avoids repeated work. It takes longer out of repo on the smaller GitHub runners, but still takes far less machine time which seems like a fair compromise.

If we do require a specific kernel for cargo tests in the future, we should either implement this in the test or add a Cargo package.metadata field to opt into that behaviour where necessary, given how much slower tests in a VM run.

Test plan:

  • This PR's CI.
  • Ran in my fork with the usual GitHub Actions runners. It works.

Move Rust tests out of VMs as they don't need to run with a specific kernel,
and should run on any dev's machine. Nix'ify the build process and move them to
the large machine.

Specific changes:
- Move `rust-tests` to the big self-hosted machine when in the `sched-ext` org.
- Nix'ify the dependencies. This uses a fixed version of `cargo`, `rustc` and
  `clippy` rather than the moving target of `rust-toolchain.toml`, but it is
  still `stable`.
- Remove the dependency on `build-kernel` so the tests start sooner.
- Add a new `package.metadata.scx` section to describe whether to run clippy,
  moving this out of the GitHub actions spec and into each package's spec.
- Run for the entire workspace instead of a set of named packages. This means
  we can require `rust-tests` in the CI instead of requiring each individual
  package's matrix entry, reducing errors. It also significantly reduces
  duplicated work.

This runs much faster in repo and avoids repeated work. It takes longer out of
repo on the smaller GitHub runners, but still takes far less machine time which
seems like a fair compromise.

If we do require a specific kernel for cargo tests in the future, we should
either implement this in the test or add a Cargo `package.metadata` field to opt
into that behaviour where necessary, given how much slower tests in a VM run.

Test plan:
- This PR's CI.
- Ran in my fork with the usual GitHub Actions runners. It works.
@likewhatevs
Copy link
Contributor

This move is in the wrong direction IMO.

It presumes the following:

  • Perfect backwards compatibility (tests will never need a VM to be ran). This sets us up for a self-fulling prophecy (tests w/o such compatibility will never be wrote because it makes the bar for writing them dealing w/ nix and re-adding a virtualized test env). This is, IMO, bad.

It removes the following:

  • Actual reproducibility (like, the kind of reproducibility that actually matters, which nix does not actually provide) -- the ability for someone to copy-paste a couple lines of shell script and run exactly what is ran on CI for the purposes of getting kernel regressions fixed.

These aren't problems for some folks (who know how to like, work with this stuff from scratch, kinda), but I guess my point is that all this complex nix magic is setting the starting point for folks who want to contribute/get things done to "here's some code figure it out" from "yeah just run what CI runs/it's all standard/basic shell commands on ubuntu etc.".

@likewhatevs likewhatevs requested a review from htejun February 19, 2025 18:38
@htejun
Copy link
Contributor

htejun commented Feb 19, 2025

We had kernel API breakages in the past will have them in the future too. They shouldn't be frequent but they will be there. Also, being able to target specific kernel versions with CI is great - we also want to use CI to verify kernel side after all. We can't really do perf testing in VMs but keeping correctness testing in VMs makes sense to me.

@JakeHillion
Copy link
Contributor Author

JakeHillion commented Feb 20, 2025

I disagree here but I think the easiest way to show this works is to stack some PRs on top of it demonstrating it. Will follow up with a more concrete stack showing how this can work without compromises (leaving this as a draft for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants