CSMN612:
Computer Operating Systems

Dr. James Cook
|| Content  ||  Focus Questions  || Assignments/Readings  ||
||  Recommended Readings  ||  Web Sites  ||

wkf_brg.gif (1750 bytes)
Device Management

wkf_brg.gif (1742 bytes)

Session Objectives 
Continue and extend the device management discussion started in the previous session.
desk_med.jpg (13447 bytes)
To Do List

     Tasks

  • Get started on your readings. 
  • Respond to the session discussion item. 
     Readings: 
  • Lecture notes (see below) 
  • Nutt, Chapter 5

  •  
Focus Questions
  • What component manipulates the device controller interface? 
  • What steps are involved in performing a write operation using polling?
  • What is the purpose of the Device Status Table for interrupt-driven I/O? 
  • Do memory-mapped I/O computers provide application programs direct access to the device's registers?
  • Why to computers with direct memory access (DMA) normally have higher performance than those without it? 
  • Why is buffering needed for CPU device operation?
  • What is double buffering? 
  • What is circular buffering? 
  • How many bytes per second can be transmitted over a 9,600 baud line? 
  • What the difference between seek and latency times?
  • What is scan disk scheduling?
  • What is circular scan disk scheduling?
  • What is SSTF disk scheduling? 
  • What is FCFS disk scheduling? 
  • What components are used for a polling I/O read operation. 
  • What components are used for interrupt-driven I/O read operation. 
  • How does the OS provide a generic interface for making function calls on specialized device drivers?
  • How does an application process invoke a particular device driver function in a system with reconfigurable device drivers?

  •  
wkf_brg.gif (1750 bytes)

Lecture Notes

Overview
We now continue the device management discussion.  This was the first of the four primary OS functional requirements introduced in the previous session.  Since I/O devices generally involve the mechanical movement of physical parts, their operational speeds are typically considerably less than those of the CPU's, which is based on electronic switching.  The efficient handling of this operational difference is a primary challenge of device management.
 Nutt Chapter 5 Reading Guide
Section
Key Concepts
.The I/O System  Device Manager Abstraction
 I/O Processor Overlap within an application
 I/O Processor Overlap across threads
.I/O Strategies  Direct I/O Polling
 Interrupt-Driven I/O
 Polling versus interrupt-driven I/O performance
.Device Manager Design  Device-dependent driver infrastructure framework
 Servicing interrupts
 Linux device I/O
.Buffering  Input-output, hardware buffer, double buffering
.Device Class Characteristics
 
 
 
 

 

 Communication devices
 Asynchronous serial devices
 Sequentially accessed storage devices – traditional magnetic tape
 Randomly accessed storage devices – magnetic disk
 Optimizing access on magnetic disks – FCFS, SSTF, Scan/Lock, Circular Scan/Lock
 CD-ROM and DVD devices

 

The I/O System

Device Manager Abstraction: The Device Manager part of the OS consists of a device independent part connected with device dependent parts associated with each device. The following diagram overlays the diagram we used in previous session, when we were introducing the device controllers, onto a diagram similar to that in Nutt's Figure 5.2.  The combination of the Device Interface and the collection of Device Drivers constitute the OS software that provides the Device Manager Abstraction. Previously  we mentioned, but did not picture, the Device Driver software. This software is the device dependent part of the Device Manager and is  responsible for interfacing with the abstractions created by the Device Controller.
 
Figure 1.  This illustrates the various parts of the Device Manager for a portable scanner.

The Device Interface, the device independent part of the Device Manager, is responsible for exporting these abstractions as a standard interface. This interface, referred to as "device manager system calls," represents the devices in the programmers abstract machine.

Such system calls in POSIX compliant systems would include the open(), close(), read(), write(), lseek(), and fcntl() calls listed in Nutt's Table 2.1. Due to the differences between devices, some of these calls must be interpreted according to the specific device. For instance, the scanner pictured here would not implement write() and lseek() calls. A monitor wouldn't implement a read() call, unless it implemented touch screen functionality.

I/O Processor Overlap within an application: In Chapter 2 Nutt mentions the assumption of sequential computation that is fundamental to the semantics of imperative (based on von Neumann architectures) programming languages. In Chapter 5 he describes how serial execution semantics (same concept as sequential computation) can lead to significant inefficiencies for the operation of a single process. It forces the process to block until the called device has completed its task. If the next few statements after a call would not have required the device call to have been completed, the processor could have continued working in parallel with the controller and device until they finished their work. It's conceivable that a language could be designed without sequential computation semantics for these device calls by including a mechanism that tests for system call completion.

I/O Processor Overlap across threads: Even with sequential computation, efficiencies can be realized by letting the OS provide processor time to other processes or threads while a given process blocks waiting for completion of a device call.  Of course, the efficiency improvement for the given process only accrues when it find itself on the receiving end of such a sharing policy. If it's the only process running that requires device access, it may actually operate less efficiently, since once having given up the processor, it may not get it back immediately after the device completes its work. Providing the scheduling policies to optimize such sharing is a major responsibility of the OS.

I/O Strategies

Nutt talks about four I/O strategies, Direct I/O, Direct Memory Access (DMA), Polling, and Interrupt Driven.

Direct I/O and Direct Memory Access (DMA) are approaches for making a data transfer, assuming the device is ready.  We were introduced to both of these briefly in the previous session.  The DMA approach requires a specially designed device controller, but takes 60% fewer processor steps.

Direct I/O: This is the conventional five step access approach.  For a read, it involves:
1. The CPU puts a read command on the bus addressed to the device.
2. The controller puts the data on the bus addressed to a CPU register.
3. The CPU reads this data into the register.
4. The CPU then places the data back onto the bus, addressed to somewhere in memory.
5. The memory location receives the data addressed to it.

DMA: The more efficient DMA approach is implemented by the controller being designed to directly access memory, collapsing the 5 steps of the Direct I/O into 3.
1. The CPU puts a read command plus a target memory location on the bus addressed to the device.
2. The controller puts the data on the bus addressed to the memory location.
3. The memory location receives the data addressed directly to it.

As mentioned, these steps assume the device is ready.  If it isn't, more steps are needed to find out when it is. These steps will involve either Polling or Interrupt Driven I/O strategies.
Polling Approach: The following are the conventional approach for reading data using Polling:
1. The application requests a read operation.
2. The driver queries the device's status register to see if it's ready(busy=0 and done=0) and waits if it isn't.
3. When the device is ready, the driver stores an input command into the controller's command register, starting the device and setting busy=1.
4. The driver repeatedly polls the device's status register to see if it's completed the read (busy=0 and done=1).
5. Once it finds that the read has completed, the driver transfers the contents of the controller's data register(s) in the user process's space using either DMA or Direct I/O approaches.
6. The device handler relinquishes the processor.

Interrupt Driven Approach: Like DMA, the Interrupt approach requires a special design, but it avoids the repetitive polling of the Polling approach. Only step 4, the polling step, is different between the two approaches. (Look at Nutt's Figure 5.8).

1. The application requests a read operation.
2. The driver queries the device's status register to see if it's ready (busy=0, done=0) and waits if it isn't.
3. When the device is ready, the driver stores an input command into the controller's command register, starting the device and setting busy=1.
4a. The driver stores the status of what's been done to this point in the device status table and relinquishes the processor.
4b. When the device completes the operation, it interrupts the CPU, which starts the Interrupt Handler.
4c. The Interrupt Handler determines which device set the interrupt and allocates processor time to that device's device handler.
4d. The device handler retrieves the status information from the device status table.
5. The device handler transfers the contents of the controller's data register(s) in the user process's space using either DMA or Direct I/O approaches.
6. The device handler relinquishes the processor.

The key difference between the polling and interrupt driven I/O approach for device managers is having an application relinquish the processor after step 4a in the interrupt-driven approach. This splits the functionality of the device driver into two parts, a top half (steps 1 - 4a) and bottom half (step 4d - 6). In between these two halves (steps 4b and 4c) is the interrupt functionality. (Nutt mentions this top/bottom device driver concept in the text around his Figure 5.8.)

The reason for separating the device manager functionality into these two parts on either side of an interrupt involves processor efficiency. Due to the relative slowness of I/O devices, a processor running a device driver would spend a considerable amount of unproductive time polling the device controller to find out if its work has been completed. An interrupt capable device driver is designed such that the top half does just enough to get the device controller started up and then relinquishes the processor to let another process make use of this unproductive polling time. After the device has completed its work, it throws an interrupt to get the processor back and let the bottom half of the device driver present the results. So splitting the device driver into halves like this improves processor efficiency.

Interrupt handlers are also themselves split into top and bottom halves, but this is for a slightly different reason.  It's considered unsafe to block interrupts, but when one is being set it's necessary to do that. (See Nutt's discussion about "race condition" on page 134.) So the top half of an interrupt consists of just the minimum  functionality during which it is necessary to block other interrupts. This top half consists of throwing the interrupt and setting a pointer to the proper handler. The bottom half, to be run whenever the scheduler decides to give it access to the processor, is the handler that was pointed to. The separation into these two halves allows other processes to have access to the processor between the two halves, just as in the case of the two halves of the device driver. Thus if another process had a higher priority than the interrupt handler that was pointed to, the scheduler would give it access after the top half completes and queue the handler process. However, the main reason for the top and bottom half architecture of an interrupt is to minimize the time during which other interrupts must be blocked. A good additional reference for this is found at  http://www.xml.com/ldd/chapter/book/ch09.html#t5

Device Manager Design

The device manager software pictured earlier in Figure 1 consists of device-independent and device dependent parts. The device-independent part provides a generic framework of device management functions for application programs to use. Frameworks of routines such as this are known as Application Programming Interfaces (API). The system calls, open(), close(), read(), write(), lseek(), and fcntl(), mentioned above and in Nutt's Table 2.1 are part of a particular UNIX API known as POSIX.  POSIX, a registered trademark of the IEEE, stands for Portable Operating System Interface. The Open Group, a standards organization addressing software interfaces, has a short FAQ and other background information about POSIX.

The device manager API functions, when implemented, will point to the device specific functions of the appropriate device drivers. In Figure 1, the device specific functions for the scanner would be in the box labeled "Scanner Driver." There are two ways for binding the device specific functions to the device-independent framework, (1) compile them with the OS code and (2) implement an indirection capability in the OS code that enables them to be dynamically bound at run time. The latter approach is referred to as using reconfigurable device drivers.

Nutt's Figure 5.11 contains some pseudo code for the various Interrupt-driven I/O functions pictured in his Figure 5.8.  In part (b) of Figure 5.11 he shows how the introduction of a synchronization mechanism can make the code more efficient by eliminating the need for writing to the Device Status Table. We'll see much more about synchronization techniques is a later session.

On page 168 in "Linux device I/O", Nutt describes some of the details associated with implementation of the specific device driver functions under Linux.

Buffering

Buffering is a common practice for making the bandwidth requirements for provider-consumer processes more manageable. In Figure 5.14 Nutt shows how hardware (controller) and software (driver) double buffering can be combined.  Note that, although Nutt doesn't mention it, the controller should have a Direct Memory Access (DMA) capability for the software double buffering to provide true parallelism. This would let the controller fill one buffer via DMA while the processor is transferring the contents of the other buffer to the application program.

As Nutt points out, input buffering requires predictable data inputs for maximum advantage. Also, processes with significant I/O will realize greater improvements from buffering than compute intensive ones since they provide more opportunities for the device controllers to work in parallel with the processor. Taken to extremes though, this advantage disappears.

An I/O bound process will be very fast to fill or empty controller buffers since it has nothing else to do. As soon as the device freshens a controller's buffers the processor will read or fill them and block, waiting for them to be refreshed again. Now buffering won't be an advantage since there is little opportunity for the device to work in parallel with the processor. The throughput will essentially be limited to the speed of the I/O device.

Device Class Characteristics

There are two device classes, those involved with the communication of data and those involved with the storage of data.

Communication Devices, such as remote terminals, printers, mice, etc., transmit bytes of information (characters) to or from their device controller in the computer. The device and its controller will be designed to a standard that specifies not only the physical interface but also the syntax and semantics of the byte patterns being transmitted. The basic characteristics of several popular standards along with links for tutorials on each are summarized in the following table:
 
Protocol Bus-type Speed Tutorial Link
RS232 Serial
<57.6 kbps
ARC Electronics
GPIB Parallel
1 MBps
National Instruments
USB Serial/power
1.5-12Mbps
Philips Semiconductors
Firewire Serial
<400 Mbps
National Instruments
SCSI Parallel
 < 160 MBps
Hardware Central
Table 1  This lists several popular communications standards.

Communications protocols generally deal with (provide specifications for) the smallest group of bits being sent that fully describe a piece of information, calling it a frame. The meaning of any particular bit within the frame can only be understood if the entire frame is present. The data items being sent are always represented by particular bit patterns. Asynchronous devices define the bits in a frame by absolute voltage levels. They also focus on the character as the basic unit of information, limiting each frame to just one character. Synchronous devices define the bits in a frame by changes in voltage levels, and they focus on a multi-character message as the basic unit of information found in a frame. These difference between synchronous and asynchronous communications devices and other interesting information about data communications are explained rather nicely in an on-line tutorial titled "Industrial Data Communications - Fundamentals" by IDC Technologies.

The speeds listed here are in bits per second (bps) or bytes per second (Bps), prefaced by "k" or "M" for thousands or millions respectively. The data transfer speed of a device is a tricky characteristic to specify.  There is the actual number of signal bits per second one would see moving across a cable, generally called the baud rate.  These bits are organized into frames in which some of them perform overhead functions, such as synchronization, instead of carrying data.  In addition, some standards provide for the signal bits to have multiple voltage levels instead of just representing on and off states. A signal bit traveling down a cable having 8 possible voltage levels carries 3 bits of information. Thus if one is interested in the information transfer in bits per second, knowing just the baud rate isn't enough.
 
The word "bit" has a different meaning when it refers to an element of a digital signal than it does when it refers to an amount of information. The term "baud rate" is used when the reference is to the number of signal bits (fundamental symbols of the signal) being transferred per second. The "bit rate" or "bps" is used when the reference is to an amount of information being transferred per second. If the fundamental symbol of a signal has multiple states, such as the 8 possible voltage levels just mentioned, the bit rate of the information transfer will exceed the signal's baud rate unless the signal coding provides lots of redundancy and/or overhead. See http://en.wikipedia.org/wiki/Baud for more on this difference.

Storage Devices come in two types, sequential access devices such as the traditional magnetic tape drives and random access devices such as magnetic disks. They handle data in blocks of bytes, usually 512 bytes per block.

There's lots of potential for the disk driver to optimize disk access in a multiprogramming environment by clever disk access strategies to minimize seek time and latency.  Nutt describes several approaches (FCFS, SSTF, Scan/Lock, Circular Scan/Lock) on pages 182 - 185. We'll see some of these same strategies applied for process scheduling in a later session.

The following summarize typical characteristics for currently available storage devices. The link(s) is/are a source for some, but necessarily all, of the data. The Conference discussion question will provide the class an opportunity to verify and extend this data.

Digital Audio Tape (DAT)
    -- Magnetic Tape Cartridge
    -- 40 GB storage
    --   8 GB/hr data transfers
    --  30s - 45s average access times
    -- http://en.wikipedia.org/wiki/Digital_Audio_Tape
    -- SEAGATE STD624000N-RYT TAPESTOR DAT 24
Floppy Disk
    -- 3.5 inch removable magnetic disk
    -- 1.44MB storage
    -- 60 kBps data transfers
    --  4 ms typical seek time
    --  83 ms typical latency time (floppies typically rotate 360 rpm)
    -- http://www.usbyte.com/common/floppy_disk.htm
Hard Drive
    -- fixed magnetic disk
    -- 40 GB storage (typical entry-level PC)
    -- 5 MBps data transfers
    -- 5 - 25 ms typical seek time
    -- 2 ms typical latency time (fixed hard disks can rotate 15,000 rpm or more)
    -- http://seagate.com/docs/pdf/whitepaper/disc_capacity_performance.pdf
CD-ROM
    -- 120mm removable composite disk
    -- 700-800 MB storage
    -- 150kBps data transfers at 1x up to 32x, although above 20x the increase isn't linear
    -- 110 ms typical access time (combination of seek and latency)
    -- http://en.wikipedia.org/wiki/CD-ROM
CD-REadWrite or REwritable (CD-RW) (Samsung CDRW 52X32X52X)
    -- 120mm removable disk
    -- 700MB  storage (may be reduced somewhat by overhead)
    -- 7.8 MBps data transfers (write and read) 4.8MBps rewrite
         (the 52x 32x 52x standing for write-rewrite-read,  where x=150MBps)
    -- 110 ms typical access time (combination of seek and latency)
    --  http://osnews.pricegrabber.com/search_getprod.php/masterid=2292832/search=Samsung+CDRW+52X32X52X
DVD-ROM
    --  120mm removable disk
    --  5 - 17 GB storage
    --  21.6 MBps data transfers (Dell Dimension 2200 DVD-ROM)
    --  170 ms typical access time (Sony’s DRU-700A DVD+R DL)
    -- http://osnews.pricegrabber.com/search_attrib.php/page_id=7

Summary

In this session, we've covered the first basic OS functional requirement, management of device drivers, involving concepts such as polling and interrupt drivers, buffering, seek times, and latency.  In the next session, we'll move to the second basic function, Process, Threads, and Resource Management.
 
 

Recommended Reading
Books (Note: Check details, availability etc. at Amazon.com site) 

Rubini, A., & Corbet, J. (2001). Linux Device Drivers (2nd ed.). Sebastopol, CA: O'Reilly. ISBN: 0-59600-008-1

Web Sites
http://www.xml.com/ldd/chapter/book/bookindexpdf.html
Rubini, A., & Corbet, J. (2001). Linux Device Drivers (2nd ed.). Sebastopol, CA: O'Reilly. ISBN: 0-59600-008-1

http://www.idc-online.com/technical_references/data.asp?country=United+States
IDC Technologies Technical References and Tutorials on Data Communications

wkf_brg.gif (1750 bytes)
Library Services  ||  Graduate School HomePage  ||  E-Mail Directory
 Graduate DE Home || Back to Tycho Login

© 2003-2004 University of Maryland University College.