Skip to main content

Unix-style Load Balance

All Unix and Unix-like systems generate a metric of three "load average" numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:

$ uptime  14:34:03 up 10:43,  4 users,  load average: 0.06, 0.11, 0.09 

The w and top commands show the same three load average numbers, as do a range of graphical user interface utilities. In Linux, they can also be accessed by reading the /proc/loadavg file.

An idle computer has a load number of 0 and each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system. This, for example, includes processes blocking due to an NFS server failure or to slow media (e.g., USB 1.x storage devices). Such circumstances can result in an elevated load average, which does not reflect an actual increase in CPU use (but still gives an idea on how long users have to wait).

Systems calculate the load average as the exponentially damped/weighted moving average of the load number. The three values of load average refer to the past one, five, and fifteen minutes of system operation.

For single-CPU systems that are CPU-bound, one can think of load average as a percentage of system utilization during the respective time period. For systems with multiple CPUs, one must divide the number by the number of processors in order to get a comparable percentage.

For example, one can interpret a load average of "1.73 0.50 7.98" on a single-CPU system as:

  • during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
  • during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
  • during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)

This means that this CPU could have handled all of the work scheduled for the last minute if it were 1.73 times as fast, or if there were two (the ceiling of 1.73) times as many CPUs, but that over the last five minutes it was twice as fast as necessary to prevent runnable processes from waiting their turn.

In a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.

On modern UNIX systems, the treatment of threading with respect to load averages varies. Some systems treat threads as processes for the purposes of load average calculation: each thread waiting to run will add 1 to the load. However, other systems, especially systems implementing so-called N:M threading, use different strategies, such as counting the process exactly once for the purpose of load (regardless of the number of threads), or counting only threads currently exposed by the user-thread scheduler to the kernel, which may depend on the level of concurrency set on the process.

Many systems generate the load average by sampling the state of the scheduler periodically, rather than recalculating on all pertinent scheduler events. They adopt this approach for performance reasons, as scheduler events occur frequently, and scheduler efficiency impacts significantly on system efficiency. As a result, sampling error can lead to load averages inaccurately representing actual system behavior. This can pose a particular problem for programs that wake up at a fixed interval that aligns with the load-average sampling, in which case a process may be under- or over-represented in the load average numbers.

CPU load vs CPU utilization

A comparative study of different load indices carried out by Ferrari et al. reported that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization. The reason CPU queue length did better is probably because when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU. As an example, two systems, one with 3 and the other with 6 processes in the queue, will probably have utilizations close to 100% although they obviously differ.

CPU Load Computation

On Linux systems, the load-average is not calculated on each clock tick, but driven by a variable value that is based on the HZ frequency setting and tested on each clock tick. (HZ variable is the pulse rate of particular Linux kernel activity. 1HZ is equal to one clock tick; 10ms by default.) Although the HZ value can be configured in some versions of the kernel, it is normally set to 100. The calculation code uses the HZ value to determine the CPU Load calculation frequency. Specifically, the timer.c::calc_load() function will run the algorithm every 5 * HZ, or roughly every five seconds. Following is that function in its entirety

Comments

Popular posts from this blog

Docker Container Management from Cockpit

Cockpit can manage containers via docker. This functionality is present in the Cockpit docker package. Cockpit communicates with docker via its API via the /var/run/docker.sock unix socket. The docker API is root equivalent, and on a properly configured system, only root can access the docker API. If the currently logged in user is not root then Cockpit will try to escalate the user’s privileges via Polkit or sudo before connecting to the socket. Alternatively, we can create a docker Unix group. Anyone in that docker group can then access the docker API, and gain root privileges on the system. [root@rhel8 ~] #  yum install cockpit-docker    -y  Once the package installed then "containers" section would be added in the dashboard and we can manage the containers and images from the console. We can search or pull an image from docker hub just by searching with the keyword like nginx centos.   Once the Image download...

Remote Systems Management With Cockpit

The cockpit is a Red Hat Enterprise Linux web-based interface designed for managing and monitoring your local system, as well as Linux servers located in your network environment. In RHEL 8 Cockpit is the default installation candidate we can just start the service and then can start the management of machines. For RHEL7 or Fedora based machines we can follow steps to install and configure the cockpit.  Following are the few features of cockpit.  Managing services Managing user accounts Managing and monitoring system services Configuring network interfaces and firewall Reviewing system logs Managing virtual machines Creating diagnostic reports Setting kernel dump configuration Configuring SELinux Updating software Managing system subscriptions Installation of cockpit package.  [root@rhel8 ~] #  dnf   install cockpit cockpit-dashboard  -y  We need to enable the socket.  [root@rhel8 ~] #  systemctl enable --n...

Add The Group Information IN Yum Repository in simple Two steps

= Yum groups and repositories = Yum supports the group commands   * grouplist   * groupinfo   * groupinstall   * groupremove   * groupupdate Groups are read from the "group" xml metadata that is optionally available from each repository. If yum has no repositories which support groups then none of  the group operations will work.  #yum grouplist    This will list the installed and available groups for your system in two    separate lists. If you pass the optional 'hidden' argument then all of     the groups which are set to 'no' in the group xml tag.   yum groupinfo groupname     This will give you detailed information for each group including:   description, mandatory, default and optional packages.       #yum groupinstall groupname      #yum groupupdate groupname   Despite their differing names both of these commands perform the same   func...