-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Jess Frazelle <[email protected]>
- Loading branch information
Showing
1 changed file
with
73 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# RawProc Option | ||
|
||
## Background | ||
|
||
Currently the way docker and most other container runtimes work is by masking | ||
and setting as read-only certain paths in `/proc`. This is to prevent data | ||
from being exposed into a container that should not be. However, there are | ||
certain use-cases where it is necessary to turn this off. | ||
|
||
## Motivation | ||
|
||
For end-users who would like to run unprivileged containers using user namespaces | ||
_nested inside_ CRI containers, we need an option to have a `RawProc`. That is, | ||
to explicitly turn off masking and setting read-only of paths so that we can | ||
mount `/proc` in the nested container as an unprivileged user. | ||
|
||
Please see the following filed issues for more information: | ||
- [opencontainers/runc#1658](https://github.com/opencontainers/runc/issues/1658#issuecomment-373122073) | ||
- [moby/moby#36597](https://github.com/moby/moby/issues/36597) | ||
- [moby/moby#36644](https://github.com/moby/moby/pull/36644) | ||
|
||
Please also see the [use case for building images securely in kubernetes](https://github.com/jessfraz/blog/blob/master/content/post/building-container-images-securely-on-kubernetes.md). | ||
|
||
This option really only makes sense for when a user is nesting | ||
unprivileged containers with user namespaces as it will allow more information | ||
than is necessary to the program running in the container spawned by | ||
kubernetes. | ||
|
||
The main use case for this option is to run | ||
[genuinetools/img](https://github.com/genuinetools/img) inside a kubernetes | ||
container. That program then launches sub-containers that take advantage of | ||
user namespaces and re-mask /proc and set /proc as read-only. So therefore | ||
there is no concern with having a raw proc open in the top level container. | ||
|
||
Since the only use case for this option is to run unprivileged nested | ||
containers, | ||
this option should only be allowed if the user in the container is not `root`. | ||
Since the user inside is still unprivileged, | ||
doing things to `/proc` would be off limits regardless, since linux user | ||
support already prevents this. | ||
|
||
## Existing SecurityContext objects | ||
|
||
Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` | ||
for `PodSpec`. `SecurityContext` objects define the related security options | ||
for Kubernetes containers, e.g. selinux options. | ||
|
||
To support "rawProc" options in Kubernetes, it is proposed to make | ||
the following changes: | ||
|
||
## Changes of SecurityContext objects | ||
|
||
Add a new `bool` type field named `rawProc` to the `SecurityContext` | ||
definition. | ||
|
||
By default,`rawProc` is `false`. | ||
|
||
The API will reject as invalid `rawProc=true` and `user=0/root`, since `rawProc` | ||
only makes sense if you want to nest unprivileged user namespaces. | ||
|
||
This then means that no root user can exploit the unmasked/read-write paths in | ||
`/proc` since it will rely on the already implemented linux user support for | ||
this. | ||
|
||
This requires changes to the CRI runtime integrations so that | ||
kubelet will add the specific `raw_access` or `whatever_it_is_named` option. | ||
|
||
## Pod Security Policy changes | ||
|
||
A new `bool` field named `allowRawProc` will be added to the Pod | ||
Security Policy as well to gate whether or not a user is allowed to set the | ||
security context to `rawProc=true`. This field will default to | ||
false. |