Reference:
See also Dr. Barnard's notes on
file system implementation
Chapter 11 described the public-facing API of file systems
Chapter 12 describes how the API might be implemented
Algorithms and data structures to map the logical file system onto the physical
secondary-storage devices
We will discuss
12.1 File-System Structure
12.2 File-System Implementation
12.3 Directory Implementation
12.4 Allocation Methods
12.5 Free-Space Management
12.6 Efficiency and Performance
12.7 Recovery
12.9 NFS
12.1 File-System Structure
Magnetic disks:
Can be rewritten in place
Can access directly any block
Levels, see Figure 12.1
12.2 File-System Implementation
On the disk
- Boot control block - Information needed to boot the OS from that volume
- Volume control block - Volume parameters, e.g., number of blocks, size
of blocks, free block count, free block pointers, free FCB count, free
FCB pointers
- Structure to organize files, e.g., file names and inode numbers
- Per-file File Control Block (FCB), e.g., inode with permissions, ownership,
size, location of data blocks
In memory
- Mount table - Each mounted volume
- Directory structure cache - Information about recently-accessed directories
- System-wide open-file table - Copy of FCB for each open file
- Per-process open-file table - Pointer to system-wide table
Details vary widely by system
inode (from Minix code): or go to Minix
book > Files >
src > inode.h
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
src/fs/inode.h
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
20500 /* Inode table. This table holds inodes that are currently in use. In some
20501 * cases they have been opened by an open() or creat() system call, in other
20502 * cases the file system itself needs the inode for one reason or another,
20503 * such as to search a directory for a path name.
20504 * The first part of the struct holds fields that are present on the
20505 * disk; the second part holds fields not present on the disk.
20506 * The disk inode part is also declared in "type.h" as 'd1_inode' for V1
20507 * file systems and 'd2_inode' for V2 file systems.
20508 */
20509
20510 EXTERN struct inode {
20511 mode_t i_mode; /* file type, protection, etc. */
20512 nlink_t i_nlinks; /* how many links to this file */
20513 uid_t i_uid; /* user id of the file's owner */
20514 gid_t i_gid; /* group number */
20515 off_t i_size; /* current file size in bytes */
20516 time_t i_atime; /* time of last access (V2 only) */
20517 time_t i_mtime; /* when was file data last changed */
20518 time_t i_ctime; /* when was inode itself changed (V2 only)*/
20519 zone_t i_zone[V2_NR_TZONES]; /* zone numbers for direct, ind, and dbl ind */
20520
20521 /* The following items are not present on the disk. */
20522 dev_t i_dev; /* which device is the inode on */
20523 ino_t i_num; /* inode number on its (minor) device */
20524 int i_count; /* # times inode used; 0 means slot is free */
20525 int i_ndzones; /* # direct zones (Vx_NR_DZONES) */
20526 int i_nindirs; /* # indirect zones per indirect block */
20527 struct super_block *i_sp; /* pointer to super block for inode's device */
20528 char i_dirt; /* CLEAN or DIRTY */
20529 char i_pipe; /* set to I_PIPE if pipe */
20530 char i_mount; /* this bit is set if file mounted on */
20531 char i_seek; /* set on LSEEK, cleared on READ/WRITE */
20532 char i_update; /* the ATIME, CTIME, and MTIME bits are here */
20533 } inode[NR_INODES];
create(): To create a new file
- Application program calls logical file system create()
- Allocated new FCB (where?)
- Read appropriate directory into memory
(File system has no concept of "current directory")
- Update with file name and FCB
- Writes (what?) back to disk
open():
- Pass a file name
- Search system-wide open files table
- If found, create per-process open file table entry pointing to system-wide
open file entry
- Search directory structure for given file name (parts of directory structure
many be cached in memory)
- When file is found, copy FCB to system-wide open file table, which also
tracks how many processes have that file open
- Create per-process open file table entry pointing to system-wide
open file entry
- open() returns a pointer to the appropriate entry in the per-process
open file table
- All subsequent accesses are through that pointer or file handle
- Errors?
read() or write():
- Access through pointer to per-process open file table
- For sequential access, per-process open file table includes pointer to
current location
- For direct access
- Errors?
close():
- Remove per-process open file table entry
- Decrement process count in system-wide open file table entry
- When process count == 0, copy any updated file meta data back to disk-based
directory structure
- Remove system-wide open file table entry
- Should be done "automatically" for every dying process
- Errors?
12.3 Directory Implementation
Operations include:
- create
- read
- create new file
- find file
- delete file
- remove
- get/set attributes
Directory attributes include:
Primarily on disk. Cached in memory
12.3.1 Linear list
Directory is
Create file
- find file --> none
- add entry
Delete file
- find file
- recover its space
- "remove" entry
- Observe: easy to mark as "deleted"
Table as array, mark as available, move entries, linked list
Problem: linear search time for find file
12.3.2 Hash table
As you learned in Data Structures
hash(fileName) --> table index
Chained overflow hash table
12.4 Allocation Methods
Where on the disk are data blocks for a file?
How do we Create(); Write(); Read(); Delete(); recover deleted?
Disk address includes: volume, cylinder, sector, block
For now, consider as a linear array
12.4.1 Contiguous allocation
File is a contiguous set of blocks
Directory (cf. Figure 12.5):
File name |
Start block |
Number of blocks |
Where? First fit, best fit, worst fit
See 12.5 Free-space management
Advantages:
- Easy access: Sequential?, direct?, indexed?
-
Disadvantages:
- External fragmentation
- Initial allocation size
- File growth
Disk defragmentation?
Variant: Extent - Clusters of blocks
12.4.2 Linked allocation
File is a linked list of blocks, located anywhere
Directory (cf. Figure 12.6):
File name |
Start block |
End block |
Blocks of file are in a linked list
Where? Anywhere
Create();
Write();
Advantages:
- Efficient space allocation
- No external fragmentation
- No need to compact
Disadvantages:
- Random access? Sequential?, direct?, indexed?
- Speed of disk search
- Reliability - lost or damaged pointers?
- Overhead of disk space?
Disk defragmentation makes linked allocation behave like contiguous allocation
Variant: Allocate clusters of blocks
Many variants, including File-Allocation Table (FAT) of MS-DOS. See Figure
12.7. Cache FAT?
12.4.3 Indexed allocation
Gather block pointers into a single(?) index block. See Figure 12.8
UNIX inode, see Figure 12.9
Advantages:
Disadvantages:
- Overhead of disk space? E.g., 1 - 2 block file
Answer?
All of the above?
12.5 Free-Space Management
Disk equivalent of malloc and free
A new file is composed of blocks
How do we know which disk blocks are free for assignment?
Reuse space from deleted files
Obvious: List of free blocks
12.5.1 Bit vector: One bit for each disk block
Advantages: Simple
Disadvantages: Must cache for performance
Cache size?
12.5.2 Linked list
Use head? Stack
Immediate reuse
Recovery?
FIFO?
12.5.3 Grouping: On first free block, store links to n-1 free blocks. nth
is another block of links
12.5.4 Counting: Keep address of free block and number of contiguous free
blocks there
12.6 Efficiency and Performance
Disk performance is a major system bottleneck
Solutions include
- Disk strategies, see Chapter 10
- Caching, e.g., for sequential file access, use free-behind and read-ahead
12.7 Recovery
Problems:
- Do in-memory structures agree with on-disk structures?
E.g., Write-through and write-back memory management strategies
- Are on-disk structures consistent? E.g., files in directory, locations
of file blocks
How? Crash?
12.7.1 Consistency checking
Exercise: Write fsck in UNIX or chkdsk in MS-DOS
What can you check?
What can you fix?
Suggests redundancy?
Error-detecting or error correcting codes?
Cyclic Redundancy Check (CRC)?
12.7.2 Backup and restore
Your hard drive will fail
Back up to tape? CD? other?
Backups: Full, incremental, differential
Schedule?
Recovery?
Disk synchronization
RAID (Redundant Arrays of Independent/Inexpensive Disks), see section 12.7
12.8 NFS (Omit)
Postpone until we talk more about networks.
|