Difference between revisions of "Capabilities and Flags"
From Linux-VServer
AlexanderS (Talk | contribs) (→Context flags (cflags)) |
AlexanderS (Talk | contribs) (→Network context flags (nflags)) |
||
Line 422: | Line 422: | ||
| if set, allows guests without LBACK_REMAP to connect to 127.0.0.0/8 | | if set, allows guests without LBACK_REMAP to connect to 127.0.0.0/8 | ||
|- | |- | ||
− | | style="text-align: right" | | + | | style="text-align: right" | 25 |
| style="font-family: monospace" | 0x02000000 | | style="font-family: monospace" | 0x02000000 | ||
| HIDE_NETIF | | HIDE_NETIF | ||
Line 428: | Line 428: | ||
| Hide foreign network interfaces (ie: network interfaces not carrying and IP belonging to the guest) | | Hide foreign network interfaces (ie: network interfaces not carrying and IP belonging to the guest) | ||
|- | |- | ||
− | | style="text-align: right" | | + | | style="text-align: right" | 26 |
| style="font-family: monospace" | 0x04000000 | | style="font-family: monospace" | 0x04000000 | ||
| HIDE_LBACK | | HIDE_LBACK |
Latest revision as of 18:29, 22 July 2016
In computer science, a capability is a token used by a process to prove that it is allowed to perform an operation on an object. The Linux Capability System is based on "POSIX Capabilities", a somewhat different concept, designed to split up the all powerful root privilege into a set of distinct privileges.
Contents |
[edit] The Capability/Flag System
[edit] POSIX Capabilities
A process has three sets of bitmaps called the inheritable(I), permitted(P), and effective(E) capabilities. Each capability is implemented as a bit in each of these bitmaps that is either set or unset.
When a process tries to do a privileged operation, the operating system will check the appropriate bit in the effective set of the process (instead of checking whether the effective uid of the process is 0 as is normally done).
For example, when a process tries to set the clock, the Linux kernel will check that the process has the CAP_SYS_TIME bit (which is currently bit 25) set in its effective set.
The permitted set of the process indicates the capabilities the process can use. The process can have capabilities set in the permitted set that are not in the effective set.
This indicates that the process has temporarily disabled this capability. A process is allowed to set a bit in its effective set only if it is available in the permitted set. The distinction between effective and permitted exists so that processes can "bracket" operations that need privilege.
The inheritable capabilities are the capabilities of the current process that should be inherited by a program executed by the current process. The permitted set of a process is masked against the inheritable set during exec(). Nothing special happens during fork() or clone(). Child processes and threads are given an exact copy of the capabilities of the parent process.
The implementation in Linux stopped at this point, whereas POSIX Capabilities require the addition of capability sets to files too, to replace the SUID flag (at least for executables)
[edit] Upper Bound for Capabilities
Because the current Linux Capability system does not implement the filesystem related portions of POSIX Capabilities which would make setuid and setgid executables secure, and because it is much safer to have a secure upper bound for all processes within a context, an additional per-context capability mask has been added to limit all processes belonging to that context to this mask. The meaning of the individual caps (bits) of the capability bound mask is exactly the same as with the permitted capability set.
[edit] Context Capabilities
As the Linux capabilities have almost reached the maximum number that is possible without heavy modifications to the kernel, it was a natural step to add a context-specific capability system.
The Linux-VServer context capability set acts as a mechanism to fine tune existing Linux capabilities. It is not visible to the processes within a context, as they would not know how to modify or verify it.
In general there are two ways to use those capabilities:
- Require one or a number of context capabilities to be set in addition to a given Linux capability, each one controlling a distinct part of the functionality. For example the CAP_NET_ADMIN could be split into RAW and PACKET sockets, so you could take away each of them separately by not providing the required context capability.
- Consider the context capability sufficient for a specified functionality, even if the Linux Capability says something different. For example mount() requires CAP_SYS_ADMIN which adds a dozen other things we do not want, so we define VXC_SECURE_MOUNT to allow mounts for certain contexts.
The difference between the context flags and the context capabilities is more an abstract logical separation than a functional one, because they are handled in a very similar way.
[edit] List of capabilities/flags
Below is a list of capabilities and flags used for contexts and processes within. The tables contain the following information:
- Bit
- The bit number to enable the capability/flag
- Mask
- The bit number in hexadecimal notation
- Name
- Human readable identifier used in userspace utilities
- Tag
- Special capability/flag code to denote special behaviour, legacy usage and others (see below)
- Description
- Description of capability/flag effects
[edit] Special capability/flags codes
The tag column may contain one or more of the following tags:
Tag | Description |
---|---|
I | Internal use only |
L | Only supported with legacy enabled |
O | One time capability/flag (once it's cleared, it can't be re-enabled again) |
U | Unsupported |
X | Slightly different meaning in legacy |
[edit] Context capabilities (ccaps)
The set of available context capabilities is specific to Linux-VServer and applied to all processes contained within a context. Below is a list of capabilities currently available in 2.1.1 and above.
Bit | Mask | Name | Tag | Description |
---|---|---|---|---|
0 | 0x00000001 | SET_UTSNAME | Allow setdomainname(2) and sethostname(2) | |
1 | 0x00000002 | SET_RLIMIT | Allow setrlimit(2) | |
2 | 0x00000004 | FS_SECURITY | Allow setxattr for security attributes | |
4 | 0x00000010 | TIOCSTI | Allow the tiocsti ioctl (fake input character) | |
8 | 0x00000100 | RAW_ICMP | L | Allow usage of raw ICMP sockets |
12 | 0x00001000 | SYSLOG | Allow syslog(2) | |
13 | 0x00002000 | OOM_ADJUST | Allow 'safe' oom adjustments | |
14 | 0x00004000 | AUDIT_CONTROL | Allow loginuid write (for auditing) | |
16 | 0x00010000 | SECURE_MOUNT | Allow secure mount(2) | |
17 | 0x00020000 | SECURE_REMOUNT | Allow secure remount | |
18 | 0x00040000 | BINARY_MOUNT | Allow binary/network mounts | |
20 | 0x00100000 | QUOTA_CTL | Allow quota ioctls | |
21 | 0x00200000 | ADMIN_MAPPER | Allow access to device mapper | |
22 | 0x00400000 | ADMIN_CLOOP | Allow access to loop devices | |
24 | 0x01000000 | KTHREAD | Allow creating kernel threads | |
25 | 0x02000000 | NAMESPACE | Allow namespace related operations |
[edit] Context flags (cflags)
The set of available context flags is specific to Linux-VServer and applied to all processes contained within a context. Below is a list of flags available in 2.1.1 and above.
Bit | Mask | Name | Tag | Description |
---|---|---|---|---|
0 | 0x00000001 | INFO_LOCK | L | Prohibit further context migration |
1 | 0x00000002 | INFO_SCHED | L | Account all processes as one |
2 | 0x00000004 | INFO_NPROC | L | Apply process limits to context |
3 | 0x00000008 | INFO_PRIVATE | L | Context cannot be entered |
4 | 0x00000010 | INFO_INIT | X | Show a fake init process |
5 | 0x00000020 | INFO_HIDE | X | Hide context information in task status |
6 | 0x00000040 | INFO_ULIMIT | L | Apply ulimits to context |
7 | 0x00000080 | INFO_NSPACE | L | Use private namespace |
8 | 0x00000100 | SCHED_HARD | Enable hard scheduler | |
9 | 0x00000200 | SCHED_PRIO | Enable priority scheduler | |
10 | 0x00000400 | SCHED_PAUSE | Pause context (unschedule) | |
16 | 0x00010000 | VIRT_MEM | Virtualize memory information | |
17 | 0x00020000 | VIRT_UPTIME | Virtualize uptime information | |
18 | 0x00040000 | VIRT_CPU | Virtualize cpu usage information | |
19 | 0x00080000 | VIRT_LOAD | Virtualize load average information | |
20 | 0x00100000 | VIRT_TIME | Allow per guest time offsets | |
24 | 0x01000000 | HIDE_MOUNT | Hide entries in /proc/$pid/mounts | |
25 | 0x02000000 | HIDE_NETIF | L | Hide foreign network interfaces |
26 | 0x04000000 | HIDE_VINFO | Hide context information in task status | |
32 | 0x0001<<32 | STATE_SETUP | IO | Enable setup state |
33 | 0x0002<<32 | STATE_INIT | IO | Enable init state |
34 | 0x0004<<32 | STATE_ADMIN | O | Enable admin state |
36 | 0x0010<<32 | SC_HELPER | I | Enable state change helper |
37 | 0x0020<<32 | REBOOT_KILL | Kill all processes on reboot(2) | |
38 | 0x0040<<32 | PERSISTENT | Make context persistent | |
48 | 0x0001<<48 | FORK_RSS | Block fork when RSS limit is exceeded | |
49 | 0x0002<<48 | PROLIFIC | Allow context to create new contexts | |
52 | 0x0010<<48 | IGNEG_NICE | Ignore priority raise |
[edit] Network capabilities (ncaps)
The set of available network capabilities is specific to Linux-VServer and applied to all processes contained within a network context. Below is a list of flags available since at least 2.3.0.34
Bit | Mask | Name | Tag | Description |
---|---|---|---|---|
0 | 0x00000001 | TUN_CREATE | Allows the guest to open TUN devices (from the kernel's TUN/TAP-interface / network devices which link to userspace). ie: tun_set_iff()
| |
8 | 0x00000100 | RAW_ICMP | Allow usage of raw ICMP sockets. (Makes ping work)
|
[edit] Network context flags (nflags)
The set of available network context flags is specific to Linux-VServer and applied to all processes contained within a network context. Below is a list of flags available in 2.1.1 and above.
Bit | Mask | Name | Tag | Description |
---|---|---|---|---|
0 | 0x00000001 | INFO_LOCK | L | Prohibit further context migration |
3 | 0x00000008 | INFO_PRIVATE | Context cannot be entered | |
8 | 0x00000100 | SINGLE_IP | Enable special handling of network contexts with a single IP only | |
9 | 0x00000200 | LBACK_REMAP | use loopback-virtualisation (will only work in 2.3.0.xx or greater) | |
10 | 0x00000400 | LBACK_ALLOW | if set, allows guests without LBACK_REMAP to connect to 127.0.0.0/8 | |
25 | 0x02000000 | HIDE_NETIF | Hide foreign network interfaces (ie: network interfaces not carrying and IP belonging to the guest) | |
26 | 0x04000000 | HIDE_LBACK | hides the real loopback-address from the guest (rewrites to 127.0.0.1 when queried) (will only work in 2.3.0.xx or greater) | |
32 | 0x0001<<32 | STATE_SETUP | IO | Enable setup state |
34 | 0x0004<<32 | STATE_ADMIN | O | Enable admin state |
36 | 0x0010<<32 | SC_HELPER | I | Enable state change helper |
38 | 0x0040<<32 | PERSISTENT | Make network context persistent. Allows you to configure a network context to use a certain IP with specific flags, and quickly run processes at later points with long pauses between them, rather than re-creating it every time a process needs to be executed. |
[edit] System capabilities (bcaps)
The set of available system capabilities is inherited from the Linux kernel and applied to all processes contained within a context. Below is a list of capabilities currently available in the vanilla kernel.
BIG FAT WARNING: Adding any system capability to your virtual server WILL reduce security. Do not change the default values unless you absolutely know what you are doing!
Bit | Mask | Name | Description |
---|---|---|---|
0 | 0x00000001 | CHOWN | In a system with the [_POSIX_CHOWN_RESTRICTED] option defined, this overrides the restriction of changing file ownership and group ownership. |
1 | 0x00000002 | DAC_OVERRIDE | Override all DAC access, including ACL execute access if [_POSIX_ACL] is defined. Excluding DAC access covered by CAP_LINUX_IMMUTABLE. |
2 | 0x00000004 | DAC_READ_SEARCH | Overrides all DAC restrictions regarding read and search on files and directories, including ACL restrictions if [_POSIX_ACL] is defined. Excluding DAC access covered by CAP_LINUX_IMMUTABLE. |
3 | 0x00000008 | FOWNER | Overrides all restrictions about allowed operations on files, where file owner ID must be equal to the user ID, except where CAP_FSETID is applicable. It doesn't override MAC and DAC restrictions. |
4 | 0x00000010 | FSETID | Overrides the following restrictions that the effective user ID shall match the file owner ID when setting the S_ISUID and S_ISGID bits on that file; that the effective group ID (or one of the supplementary group IDs) shall match the file owner ID when setting the S_ISGID bit on that file; that the S_ISUID and S_ISGID bits are cleared on successful return from chown(2) (not implemented). |
5 | 0x00000020 | KILL | Overrides the restriction that the real or effective user ID of a process sending a signal must match the real or effective user ID of the process receiving the signal. |
6 | 0x00000040 | SETGID |
|
7 | 0x00000080 | SETUID |
|
8 | 0x00000100 | SETPCAP | Transfer any capability in your permitted set to any pid, remove any capability in your permitted set from any pid |
9 | 0x00000200 | LINUX_IMMUTABLE | Allow modification of S_IMMUTABLE and S_APPEND file attributes |
10 | 0x00000400 | NET_BIND_SERVICE |
|
11 | 0x00000800 | NET_BROADCAST | Allow broadcasting, listen to multicast |
12 | 0x00001000 | NET_ADMIN |
|
13 | 0x00002000 | NET_RAW |
|
14 | 0x00004000 | IPC_LOCK |
|
15 | 0x00008000 | IPC_OWNER | Override IPC ownership checks |
16 | 0x00010000 | SYS_MODULE |
|
17 | 0x00020000 | SYS_RAWIO |
|
18 | 0x00040000 | SYS_CHROOT | Allow use of chroot() |
19 | 0x00080000 | SYS_PTRACE | Allow ptrace() of any process |
20 | 0x00100000 | SYS_PACCT | Allow configuration of process accounting |
21 | 0x00200000 | SYS_ADMIN |
|
22 | 0x00400000 | SYS_BOOT | Allow use of reboot() |
23 | 0x00800000 | SYS_NICE |
|
24 | 0x01000000 | SYS_RESOURCE |
|
25 | 0x02000000 | SYS_TIME |
|
26 | 0x04000000 | SYS_TTY_CONFIG |
|
27 | 0x08000000 | MKNOD | Allow the privileged aspects of mknod() |
28 | 0x10000000 | LEASE | Allow taking of leases on files |
29 | 0x20000000 | AUDIT_WRITE | ?? |
30 | 0x40000000 | AUDIT_CONTROL | ?? |
[edit] Setting flags and capabilities
To see how to set the flags and capabilities, see util-vserver:Capabilities and Flags if you're using util-vserver.
If you would like to edit those flags without restarting the vservers, you can use vattribute and nattribute. See util-vserver:Cheatsheet