CPU Scheduler
From Linux-VServer
Contents |
Overview of Processes and Threads
It is important to have a decent understanding of both processes and threads before learning about schedulers.
Programs and Processes
A program is a combination of instructions and data put together to perform a task when executed. A process is an instance of a program (what one might call a "running" program). An analogy is that programs are like classes in languages like C++ and Java, and processes are like objects (instantiated instances of classes). Processes are an abstraction created to embody the state of a program during its execution. This means keeping track of the data that is associated with a thread or threads of execution, which includes variables, hardware state (e.g. registers and the program counter, etc...), and the contents of an address space.
Threads
A process can have multiple threads of execution that work together to accomplish its goals. These threads of execution are aptly named threads. A kernel must keep track of each thread's stack and hardware state, or whatever is necessary to track a single flow of execution within a process. Usually threads share address spaces, but they do not have to (often they merely overlap). It is important to remember that only one thread may be executing on a CPU at any given time, which is basically the reason kernels have CPU schedulers. An example of multiple threads within a process can be found in most web browsers. Usually at least one thread exists to handle user interface events (like stopping a page load), one thread exists to handle network transactions, and one thread exists to render web pages.
Scheduling in Linux
Multitasking kernels (like Linux) allow more than one process to exist at any given time, and furthermore each process is allowed to run as if it were the only process on the system. Processes do not need to be aware of any other processes unless they are explicitly designed to be. This makes programs easier to develop, maintain, and port. Though each CPU in a system can execute only one thread within a process at a time, many threads from many processes appear to be executing at the same time. This is because threads are scheduled to run for very short periods of time and then other threads are given a chance to run. A kernel's scheduler enforces a thread scheduling policy, including when, for how long, and in some cases where (on Symmetric Multiprocessing (SMP) systems) threads can execute. Normally the scheduler runs in its own thread, which is woken up by a timer interrupt. Otherwise it is invoked via a system call or another kernel thread that wishes to yield the CPU. A thread will be allowed to execute for a certain amount of time, then a context switch to the scheduler thread will occur, followed by another context switch to a thread of the scheduler's choice. This cycle continues, and in this way a certain policy for CPU usage is carried out.
Token Bucket Extension
While the basic idea of Linux-VServer is a peaceful coexistence of all contexts, sharing the common resources in a respectful way, it is sometimes useful to control the resource distribution for resource hungry processes.
The basic principle of a Token Bucket is not very new. It is given here as an example for the Hard CPU Limit. The same principle also applies to scheduler priorities, network bandwidth limitation and resource control in general.
The Linux-VServer scheduler uses this mechanism in the following way: consider a bucket of a certain size S which is filled with a specified amount of tokens R every interval T, until the bucket is "full" - excess tokens are spilled. At each timer tick, a running process (here running means actually needing the CPU as opposed to "running" as in "existing") consumes exactly one token from the bucket, unless the bucket is empty, in which case the process is put on a hold queue until the bucket has been refilled with a minimum M of tokens. The process is then rescheduled.
A major advantage of a Token Bucket is that a certain amount of tokens can be accumulated in times of quiescence, which later can be used to burst when resources are required.
Where a per-process Token Bucket would allow for a CPU resource limitation of a single process, a Context Token Bucket allows to control the CPU usage of all confined processes.
Another approach, which is also implemented, is to use the current fill level of the bucket to adjust the process priority, thus reducing the priority of processes belonging to excessive contexts.
Token Bucket Examples
Hard Limit
The simplest configuration is to just give every context an upper bound for CPU allocation. The important factor is the ratio:
Note that this is the proportion of a single CPU in the system. So,
if you have four CPUs and you want one context to get an average of one whole
CPU to itself, then you would set fill-rate to 1 and interval to 4.
It is advantageous to smooth operation of the algorithm to make the interval as small as possible (or much smaller than the bucket size). You can in most cases simplify the fraction, such as changing 30/100 to 3/10.
Burst time
To penalize processes after a certain amount of burst time, i.e. putting them on the hold queue, you can use the maximum size S of the bucket and the minimum number of tokens M to hold processes.
Consider a context with a limit of 1/2 of CPU time, a bucket of 15000 tokens and a minimum of 2500 tokens. Given that your scheduler runs at 1000Hz, processes that have used the CPU for 30 seconds will be put on hold for 5 seconds. The following formula can be used to calculate S and M, using B as burst time and H as hold time:
Guarantees
A guarantee is nearly the same as a pure hard limit, except that you must not allocate more than 100% CPU time to all contexts. In other words, if you have N contexts and give each one a guarantee of more than 1/N CPU time, it would result in more CPU time needed than physically available, which cannot work out. The important factor here is the sum of all ratios:
The fair share configuration is similar to guarantees, except that if the CPU is idle a context can allocate more CPU time than its guarantee/limit. The scheduler and bucket configuration was extended in Linux-VServer 2.1.1 to allow fair share scheduling and is also know as IDLE time.
Consider a configuration with 5 contexts each limited to 1/5 of CPU time, where two of these contexts run CPU intensive processes and the rest is idle. Given that each context may only allocate 1/5 of CPU time, 3/5 of CPU time are wasted since 3 contexts are idle.
To distribute the wasted CPU time fair among contexts that could need it, you can configure an allocation ratio for R/T during idle times, namely R2/T2. To calculate the cpu distribution for context k the following formula is used:
where C is the idle CPU time, 3/5 in our example. Consider a R2/T2 ratio of 1/2 for
the first guest and 1/4 for the second. This would result in:
If the idle time ratio is the same for all contexts, the formula can be simplified:
Therefore, if 3 of the above 5 contexts would run, i.e. <math>C = \frac{2}{5}, N = 3</math>, it would result in the expted 33% split: