From LQWiki
Jump to: navigation, search

The Deadline IO Scheduler/elevator

The algorithms used to schedule disk IO requests are referred to as elevator algorithms. Since 2.6 you have had the choice of the Deadline elevator, the Anticipatory elevator (as), the No-op (noop) elevator and the Completely Fair Queuing (CFQ) elevator.

The deadline elevator, as the name suggests, assigns each IO requests an expiry time/deadline. When the expiry time is reach the elevator will move the disk heads to the correct location and carry out the IO operation. In addition to carrying out the IO operation that has just reached its deadline the elevator will also attempt to service requests which are near the disk heads' current location; this is done to avoid thrashing the disk with excess head movement. An IO request may be carried out before its deadline due to either this location based behaviour, or if there's simply nothing else for the elevator to process - the elevator won't stall an IO operation if it is possible to service the requests before the its deadline.

The deadline scheduler can be useful on systems where one process is expected to dominate disk IO, such as a database server. In this situation you don't necessarily want fairness between processes but do want to ensure your IO requests are not stalled for too long.

The tunable settings can be found in /sys/block/<device>/queue/iosched/ and are

  • read_expire - the number of milliseconds before a read IO request expires
  • write_expire - the number of milliseconds before a write IO request expires
  • fifo_batch - the number of requests which will be moved from the elevator's list to the FIFO queue for the block device
  • writes_starved - how much preference reads have over writes. After writes_starved read operations have been places on the block device FIFO queue some writes will be processed
  • front_merges - typically IO requests are placed at the end of the queue, if this boolean is set true then the requests are instead placed at the head of the queue. This takes more work so is disabled (0) by default.