Operating System Concepts — Silberschatz, Galvin, Gagne
Ch 10 Mass Storage • Ch 11 File-System Interface Ch 12 File-System Implementation • Ch 13 I/O Systems
The Storage Hierarchy
10
Mass-Storage Systems
HDD • SSD • Disk Scheduling • RAID
Ch 10
Hard Disk Drive — Anatomy
Access Time
Ch 10
Disk Access Time Breakdown
Ch 10
Practice: Disk Access Time
✎PracticeA 15,000 RPM hard disk has an average seek time of 4 ms.Each track has 500 sectors of 512 bytes.(a) Avg rotational latency = ?Rotation period = 60 / 15000= 4 ms per rotationAvg latency = 4 / 2= 2 ms(b) Transfer time per sector = ?Time per track = 4 msSectors per track = 500Transfer = 4 / 500= 0.008 ms(c) Total avg access time = ?Seek = 4 msRotation = 2 msTransfer = 0.008 ms= 6.008 ms ≈ 6 ms4 + 2 + 0.008 = 6.008 ms ≈ 6 ms⚡ Bonus: Same operation on NVMe SSD: ~0.1 msThat’s 60× faster!
Minimize total seek distance when servicing queued I/O requests
Request queue:98, 183, 37, 122, 14, 124, 65, 67
Head starts at: cylinder 53
Range: 0 – 199
Cylinders 0 – 199053100199HEAD1437656798122124183Algorithms we'll compare:FCFS— First Come First ServedSSTF— Shortest Seek Time FirstSCAN— Elevator AlgorithmC-SCAN— Circular SCANLOOK / C-LOOK— Practical variants
Ch 10
FCFS vs SSTF
FCFS — 640 cylinders
0100199539818337122141246567Wild zigzag!
SSTF — 236 cylinders
0100199536567371498122124183Much smoother, but may starve
Ch 10
SCAN & C-SCAN
SCAN (Elevator)
0199143765679812212418353 (start)Go to end, reverse, continueNo starvation • Bounded wait
C-SCAN (Circular)
0199Jump back (no service)Service one direction only →More uniform wait time than SCAN
Ch 10
LOOK & C-LOOK (Practical Variants)
LOOK
0199head startreverse here!SCAN goes to end of disk (0 or 199)LOOK reverses at last request
C-LOOK
0199jump (no service)C-SCAN goes to end, wraps to 0C-LOOK wraps at first request
LOOK/C-LOOK are the algorithms actually used in real operating systems — more efficient than pure SCAN/C-SCAN
Ch 10
Scheduling Comparison
Total Head Movement (cylinders)FCFS640C-SCAN382C-LOOK322SCAN236SSTF236LOOK208SSTF/LOOK best for light loads • SCAN/C-SCAN best for heavy loads • SSDs need no scheduling!
Ch 10
Practice: Disk Scheduling
✎PracticeRequest queue: [23, 89, 132, 42, 187, 64, 157, 98] — Head at 100, disk 0–199Calculate total head movement for each algorithmFCFS:|100-23|+|23-89|+|89-132|+|132-42|+|42-187|+|187-64|+|64-157|+|157-98|= 77+66+43+90+145+123+93+59= 696SSTF:100→98→89→64→42→23→132→157→187= 2+9+25+22+19+109+25+30= 241SCAN (toward 0):100→98→89→64→42→23→0→132→157→187= 100 (to 0) + 187 (to 187)= 287C-LOOK (toward 199):100→132→157→187→23→42→64→89→98= 87 (up) + 164 (jump) + 75 (up)= 326Total Head Movement Comparison7003500696FCFS241SSTF287SCAN326C-LOOK
Ch 10
RAID Levels
RAID 0StripingA1A2A3A4No redundancy!RAID 1MirroringAA'BB'50% overhead, full redundancyRAID 5Distributed ParityA1B1CpA2BpC2ApB2C1Disk 1Disk 2Disk 3Parity rotates, can lose 1 diskRAID 6Dual ParityP + QXORTolerates 2 failuresRAID 10 (1+0) — Production FavoriteMirror first, then stripe across mirror setsAA'Mirror 1BB'Mirror 2CC'Mirror 3++MTTDL:57,000 yr(100K²)/(2×10)
Ch 10
Practice: RAID Capacity & Fault Tolerance
✎ Practice — You have 8 disks, each 2 TB. Raw total = 16 TB.
Calculate usable capacity and fault tolerance for each RAID level:
RAID Level
Formula
Usable Capacity
Fault Tolerance
RAID 0
8 × 2 TB
16 TB (100%)
None — any disk fails, all data lost
RAID 1
(8 / 2) × 2 TB
8 TB (50%)
1 failure per mirror pair
RAID 5
(8 − 1) × 2 TB
14 TB (87.5%)
1 disk failure
RAID 6
(8 − 2) × 2 TB
12 TB (75%)
2 disk failures
RAID 10
(8 / 2) × 2 TB
8 TB (50%)
1 per mirror pair (up to 4)
For a database server, which RAID? → RAID 10
Best random write performance + good redundancy
Ch 10
Swap Space Management
Physical Memory (RAM)P1P2P3 (active)Limited capacity, fastswap outswap inSwap Space (Disk)P4 pagesP5 pagesP1 pagesExtends virtual memory beyond RAMSwap Space LocationDedicated partition — faster, no FS overheadSwap file — flexible size, easier to manageLinux: both supported, use swapon / swapoffWindows: pagefile.sys (swap file)How Much Swap?Traditional rule: 2× RAMModern (with lots of RAM): = RAM or lessToo much swapping = thrashing(system spends all time swapping, no real work)SSD swap >> HDD swap (random access matters most)
Namereport.pdfSize2.4 MBOwneralicePermsrwxr-x---TimeApr 14 2026Locationinode #4872Operationscreate • open • read • write • seek • close • delete
Open-File Table
OS caches metadata of open files in memory to avoid repeated disk lookups
System-wide Open File Tablefdfile ptropen countinode3offset 10242#48725offset 01#9201
Ch 11
Access Methods
Sequential
12345read in order →
Read/write one by one
Like a tape
Direct (Random)
12345read(4)
Jump to block n
Fixed-length records
Indexed
Indexkey → blk"foo" → 4Block 4
Key lookup → block
Index in memory
Ch 11
Directory Structures
Single-Levela b c d eName conflicts!All files in one directoryTwo-LevelRootUser1User2No grouping within userTree (most common)/binhomeetcalicebobAbsolute & relative pathsAcyclic Graph/ABfileHard / soft links
Ch 11
Hard Links vs Symbolic Links
Hard Link
fileA.txtfileB.txtinode #4872link count = 2data blocks+ Same inode, same data+ Delete one, other still works- Cannot cross filesystems- Cannot link to directories
Symbolic (Soft) Link
shortcut.txtoriginal.txtinode (symlink)"/path/original.txt"follows pathinode #4872link count = 1data blocks+ Can cross filesystems+ Can link to directories- Dangling if target deleted- Extra indirection (slower)
✎Practicer=4w=2x=10=noneQ1:Convert -rw-r----- to octal-rw- = 6r-- = 4--- = 0→ 640Q2:What does chmod 755 mean in rwx?7 = rwx5 = r-x5 = r-x→ rwxr-xr-xQ3:File permissions 644, owner=alice. Can user bob (group=staff, file group=staff) write?6=rw- (owner) • 4=r-- (group) • 4=r-- (other)Bob is in group staff → group permission = r-- (read only)Answer: No, bob cannot write.Q4:You run chmod 4755 script.sh. What does the 4 do?SUID bit (Set User ID)When executed, runs with the file owner’s privileges, not the caller’s.Shown as -rwsr-xr-x (note the s in owner execute)
Ch 11
Mounting & File Sharing
Mount Point
/homeetcmntUSB Drive (FAT32)photos/ docs/mount /dev/sdb1 /mntAttach any FS to any directoryTransparent to applications/etc/fstab — auto-mount at boot
File Sharing (NFS)
Clientmounts remote dirServerexports directoryRPCConsistency SemanticsUnix: writes visible immediately to allSession (AFS): visible on closeSharing ChallengesConcurrent access → locking neededDifferent user IDs across machinesNetwork failures → stateless protocol (NFSv3)
Ch 11
Access Control Lists (ACL)
Traditional rwx Limitation-rwxr-x--- alice staff file.txtWhat if Bob (not in staff) needs read access?Only 3 categories: owner / group / otherNot fine-grained enough!ACL Solution$ getfacl file.txtuser::rwxuser:bob:r--group::r-xother::---FeatureTraditional rwxACLGranularity3 categories onlyPer-user, per-groupComplexitySimpleMore complexStorage9 bits in inodeExtended attributesUsed inAll Unix/LinuxNTFS, ext4, macOS, NFSv4
12
File-System Implementation
Layered Structure • Allocation • Free Space • Journaling
Ch 12
Layered File System Architecture
Application ProgramsLogical File SystemMetadata, directories, protection, FCB/inodeFile-Organization ModuleLogical → physical blocks, free-space managementBasic File SystemGeneric block I/O, buffer cacheI/O Control (Device Drivers)Translates to hardware commandsHardware: Disk / SSD / RAID↓↓↓↓↓
Ch 12
Virtual File System (VFS)
Process AProcess BProcess CSystem Call Interface: open() read() write() close()VFS — Virtual File SystemUniform API • vnode interface • filesystem-independent operationsext4Linux defaultXFSHigh performanceFAT32USB drivesNFSNetworkSSD (/dev/sda)HDD (/dev/sdb)USB (/dev/sdc)Remote serverSame open/read/write works on any filesystemApplications never know which FS they're using
Ch 12
Directory Implementation
Linear List
Directory Filefilenameinode #readme.md4201main.c4205data.csv4210test.py4218???✓+ Simple to implement- Linear search: O(n)- Slow for large directoriesCan sort for binary search O(log n)
Disk Blocks012345678910File A: start=2, len=4File B: start=8, len=3+ Best sequential & random perf+ Simple: just (start, length)- External fragmentation- File size must be known at creation- Files can't grow easily
Fragmentation ProblemholeholeholeholeNew file needs 5 blocks — won't fit contiguously!Modern Solution: ExtentsFile = one or more contiguous chunks (extents)Extent 1...Extent 2Used by ext4, NTFS, XFS, Btrfs
Ch 12
Allocation: Linked & FAT
Linked AllocationBlock 9next: 16Block 16next: 1Block 1next: nil+ No external fragmentation- No random access (must traverse)- Broken pointer = data lossFAT (File Allocation Table)blk#012...916next-EOF-...161
Indexed (inode)inodemode, uid, size...direct 0direct 1..11single indirectdouble indirecttriple indirectdata12 blocksidxdata1K blks1M blkstriple: 1G blksMax file ~ 4 TB (4KB blocks)+ Random access, + No ext. fragmentationSmall files fast (direct), large files scale (indirect)
Also: Linked list (no waste), Grouping, Counting (start+count for contiguous runs)
Journaling (Write-Ahead Log)
1Begin TX2Write tojournal3Commit4Apply todisk5Free logOn crash: replay journalRecovery in secondsvs. fsck taking hoursUsed by ext3/ext4, NTFS, XFS, HFS+
Ch 12
Log-Structured File System (LFS)
Traditional FS ProblemSmall writes = many random seeksUpdate inode + data + bitmap + dir → 4 seeksHDD random write: ~200 IOPSLFS IdeaBuffer all writes, flush as one sequential logAll writes become sequential → fast!HDD sequential write: ~100 MB/sDisk = One Big LogSegment 1inode+data+dirSegment 2inode+data+dirSegment 3inode+data+dirSegment 4inode+data+dirWRITE →(free space)Inode Mapinode# → current location in log (also in log!)Garbage Collection (Cleaner)Compact live data, reclaim stale segments+ Write throughput: 10x improvement+ Crash recovery: just replay log tail- Random reads: must look up inode mapUsed by: F2FS (Flash), WAFL (NetApp), ZFS (copy-on-write)
13
I/O Systems
Hardware • Polling • Interrupts • DMA
Ch 13
I/O Hardware Architecture
CPUExecutes I/O instructionsMemorySystem Bus (PCIe)Disk ControllerSATA / NVMeUSB ControllerKeyboard, MouseNetwork (NIC)Ethernet / WiFiGPUDisplayHDDSSDDevice Registers: data-in •
data-out •
status •
control • accessed via I/O ports or memory-mapped I/O
Ch 13
Polling vs Interrupts vs DMA
PollingCPU busy-waits checking statusStill checking... still checking...Done! Transfer 1 byteBusy-wait again...CPU does nothingbut loop & check+ Simple, low latency+ OK for fast devices- Wastes CPU cycles- Terrible for slow devicesInterruptsCPU does other workStill working on tasks...IRQ! Handle interruptProcess data, returnResume other workCPU free untildevice signals+ CPU efficient+ Good for slow devices- Context switch overhead- CPU still moves each byteDMACPU: setup DMA commandCPU does other work(DMA transfers data directly)IRQ! Transfer completeCPU processes resultDMA ControllerDevice ↔ Memory directly+ CPU barely involved+ Best for bulk data+ Essential for disk & network- Setup overhead
Ch 13
Kernel I/O Subsystem
SchedulingPer-device request queueReorder for efficiency(disk scheduling algorithms)Priority • Fairness • QoSBufferingCope with speed mismatchCope with size mismatchCopy semanticsDouble bufferingCachingCopy on faster storageKey to performance!Unified buffer cacheBuffer + Page cache mergedSpoolingQueue for exclusive devices(e.g. printer queue)Error HandlingRetry transient failuresReturn error codesSystem error logsI/O ProtectionAll I/O instructions areprivilegedMust go through syscalls
Blocking I/O (process waits) • Non-blocking (returns immediately) • Async (signal on complete)
Ch 13
Life Cycle of an I/O Request
User Processread(fd, buf, n)1Syscalltrap to kernel2Kernel I/Obuffer cache checkCache HITfast return!3 missDevice Driverbuild I/O command4ControllerDMA transfer5Disk / SSDseek + read sectors6IRQ Handlerdata → buffer7Wake upreturn dataKERNELSPACEProcess blocks at step 3 • Wakes at step 7 • Cache hit skips steps 3-7 entirely
Ch 13
Blocking vs Non-blocking vs Async I/O
Blocking(Synchronous)Process runs...read() calledBLOCKED(waiting for I/O)data ready, resumeProcess continuesSimple to programNon-blocking(Returns immediately)Process runs...read() → returns EAGAINProcess does other workread() again → EAGAINmore work...read() → got data!Must poll repeatedlyUsed for UI, networkAsynchronous(Signal when done)Process runs...aio_read() → returns nowProcess works freelyno polling neededSignal/callback: done!Process uses dataBest CPU utilizationUsed for high-perf serversBlocking = simple • Non-blocking = responsive • Async = scalable
Ch 13
STREAMS Architecture
Full-Duplex Communication Channel (System V UNIX)User Processread/writeStream HeadModule A (e.g. protocol)Module B (e.g. filter)Device Driver EndHardware Device↓ downstreamupstream ↑Key ConceptsMessages flow through a pipelineEach module has read & write queuesModules are stackable at runtimeAdvantages+ Modular: add/remove processing layers+ Reusable modules across driversExample UseNetwork: IP module → TCP module → NIC driverLinux uses a different approach (socket layer), not STREAMS
Ch 13
Improving I/O Performance
I/O is the major bottleneck in most systemsReduce CPU LoadUse DMA over pollingOffload to smart controllersReduce interrupt frequency(coalesced interrupts)NIC offload: checksum, TCPReduce CopiesTraditional read():device → kernel buf → user bufZero-copy: skip user bufferdevice → kernel buf → socketsendfile(), splice(), mmap()Smart SchedulingBuffer cache / page cacheRead-ahead (prefetch)I/O scheduler reorderingAsync I/O (io_uring)Overlap compute + I/ODevice Performance SpectrumKeyboard10 B/sWiFi100 MB/sSATA SSD550 MB/sNVMe SSD3,500 MB/sPCIe 5 SSD12,000 MB/sRAM50 GB/s
Part 4: Key Takeaways
Ch 10: Mass StorageHDD: seek + rotation = 99% of access timeSSD: 1000x faster random I/O, no scheduling neededScheduling: SSTF/LOOK (light), SCAN (heavy load)RAID 10: best for production databasesMTTDL dramatically improved by mirroringCh 11: FS InterfaceFile = named + metadata + data blocksAccess: sequential, direct, indexedDirectories: tree structure + linksProtection: rwx × owner/group/otherNFS for remote file sharingCh 12: FS ImplementationLayered: app → logical FS → basic FS → I/OAlloc: contiguous vs linked vs indexed (inode)Free space: bitmap most commonJournaling = fast crash recovery (seconds)VFS unifies multiple FS typesCh 13: I/O SystemsPolling: simple but wastes CPUInterrupts: efficient, CPU free until signalDMA: essential for bulk data transferAll I/O instructions are privilegedKernel: scheduling, buffering, caching, spooling
Thank You!
Part 4: Storage Management
Chapters 10–13 • Operating System Concepts, 9th Edition