$40
Lab 5 – Simple File Operations and Mystery
Section 1. Overview
To celebrate the end of lab session for CS2106, this lab should be "a bit lighter" on your time (theoretically). Exercise 1 is a simple task on unix file operations. The lab demo exercise 1 probably takes less than 5 lines of code J.
Please note that additional exercise(s) will be released only in week 11 as they requires additional topics covered in lecture 11. Note that there is no change to the lab demo schedule (i.e. demo only week 11 and 12 as per normal). There is no formal lab session on week 13, so you can use the lab as needed to finish the additional exercises.
Section 2. Exercise One [Lab Demo Exercise]
Take some time to familiarize yourself with the following Unix file related system calls:
read(): Reading data from an opened file descriptor.
lseek(): Move to a specified location in the file
You can read through the lecture slide and/or the relevant man pages on Unix for the above functions. Hint: You do not need any other file-related system calls to solve exercise 1 and 2.
Take a look at the sample execution session below. User input in bold font.
Sample Run 1
File Name: 10int.dat
Size = 40 bytes
123
124
One of the provided input data file.
File size is printed
Data in file are read and printed (a total of 10 integers in this case).
1 | P a g e
Prepared By SYJ AY1819S1 CS2106
125
126
127
128
129
130
131
132
Sample Run 2
File Name: wrong.dat Cannot Open
This is a non-existent file. Complain and exit.
The given skeleton file ex1.c has quite a large chunk of logic implemented. Your tasks are essentially:
Check that the file can be opened.
Find out the file size (hint: use lseek()).
Read all data until end of the file.
A note on 32-bit vs 64-bit
The exercises in this lab are sensitive to the word size of the underlying execution environment. As some of you may have a 64-bit processor and OS, it is a little tricky to ensure the lab works across both 32-bit and 64-bit environments. For maximum compatibility, we have decided to stick to 32-bit for this lab.
To ensure the correct execution environment, the first part of the skeleton source will do a simple check on the integer size and warn you if your machine + compiler operates in 64-bit. It will terminate the program if 64-bit environment is detected.
The remedy is quite simple, you just need to compile your source code with an additional flag "-m32", e.g.
gcc ex1.c –m32 //compiles as 32-bit application.
Please make sure you compile your code correctly.
For your own exploration:
If you open up the 10int.dat in a normal editor, what do you see? Can you explain?
Lab 5 – USFAT File System
Section 1. USFAT File System Overview
USFAT (pronounced as /ˈʌŋkl/ /suː/ /ɪz/ FAT) is a fictional file system invented just for CS2106 J! It draws inspirations from the basic FAT based file allocation scheme and the MS-DOS FAT16 file system. Your tasks in this lab is to understand and provide functionalities that interact with the underlying USFAT file system.
1.1 USFAT File System Layout
Absolute Sector
# è 0 1 2 3 … … 128
FAT
Data
Data
Data
…
…
…
Data Sector # 0 1 2 … … 127
è
Note that a "logical block" is the same size as a "sector" in USFAT and we use the two terms interchangeably.
There are a total of 129 sectors (with absolute index 0 to 128) and each sector is 256 bytes. So, a typical USFAT file system is 129 * 256 = 33,024 bytes in size. The FAT table occupies one sector and is located at sector 0. All remaining sectors (128) are used for data storage.
Each FAT entry is 2 bytes in size, i.e. the FAT contains 256 / 2 = 128 entries. Note that the FAT index refers only to blocks in the file data region. For ease of reference (and coding), we will use data sector number to indicate sectors in the data region. For example, the status of data sector 2 can be found in FAT[2], but the actual storage on the "hard disk" is at sector 3. So, pay attention to whether you are using data sector number of the absolute sector number during coding to avoid "off-by-one" errors.
Each FAT entry can contain one of the following values:
Values
Meaning
0xFFFA
The sector is free.
0xFFF7
The sector is bad (i.e. not working, don’t store any content here).
0xFFFF
The sector is the END of a linked sector chain.
0x0000 to 0x007F
The sector leads to the indicated sector as part of a linked sector chain.
Here’s a sample FAT printout:
From the FAT printout above, we can see that the data sector 0x0000 is free; the data sectors 0x004aà0x004bà0x004c (end) is part of a linked sector chain, etc.
1.2 Directory (Folder) and File under USFAT
Under USFAT, directory and file both use the file data sectors to store information. For a directory, the data sector stores directory entries, which contains information about files under that directory. For a file, the data sector stores the actual file content.
For simplicity, the USFAT media provided in this lab has the following limitations:
There is only one directory, the Root Directory. It is located at data sector 5.
Directory uses only 1 sector for its directory entries, which place an upper limit on the number of files it can store.
Your code only need to work with these limitations in place.
[Note: These limits are imposed to simplify the exercises, the design of the USFAT is much more general / flexible.]
Each of the directory entry in a directory's data sector occupies 32 bytes and has the following layout:
Offset 0 … 10 11 12 … 25 26 27 28 … 31
Name
Attr
<not used>
Start Sector
File Size
Usage
The name uses the old "8+3" format, where the file name is 8 characters long and the extension takes up 3 characters, e.g., a file with name "sample.cc" is stored as:
0 1 2 3 4 5 6 7 8 9 10
ꟷ
ꟷ
S
a
m
p
l
e
c
c
ꟷ
Note that the filename is right aligned to the "." while the extension is left aligned. The "." itself is not stored. We use 'ꟷ' represents a space, i.e. ' '.
The attribute is a single byte (8 bits):
Bit 7 0
Is directory?
Is
System?
Is
Hidden?
Is
Readable?
Usage
For our exercises, you can assume that all files have an attribute 0x01, i.e. readable, not hidden, not a system file and not a directory.
Since each directory entry is 32 bytes and the directory in USFAT can utilize only 1 sector for directory entries, this gives us 256 bytes / 32 bytes = 8 files under a directory at most.
1.3 USFAT "Media", Library Calls and Utility Program
There are a number of “disk image” files provided for this lab, e.g. 4files.img, empty.img, etc. Each of the file represents a complete USFAT file system. You can imagine they represent simulated storage media like a hard disk, etc.
A large number of library calls are provided for you to focus on “high level” file system functionalities. In the common/ directory, take a look at the USFAT.h header files which defines all important system parameters and the available library calls. Essentially, “low level” functionalities that deals with reading / write information from / to the media, e.g. sector / FAT reading / writing, etc are available for use.
In addition, a “debug inspector” program, known as USFATI (USFAT Insepctor) is also available so that you can view the raw content on a USFAT media easily. Instructions to setup the inspector etc is given in Section 2.
Section 2. Exercises for USFAT
2.1 Directory structure of the skeleton code
There is one additional folder common/ with the following files:
Filename
Purpose
USFAT.h
USFAT header file with all key definitions and declarations.
USFAT_Util.c
Implementation of all USFAT library functions.
USFAT_Insepct.c
The debug inspector utility program. Compiles into the “USFATI” executable.
Various *.img
Backup copies of all USFAT disk images. In exercise 3, your program will modify the USFAT disk image, so if you ever need to “reset” the disk images, copy the backup over.
makefile
For compiling the USFATI debug inspector as mentioned above.
reset.sh: A simple script file to copy the backup images to the exercise directories.
Preparation:
1. Go into the common/ folder and type “make” to produce the USFATI executable.
2. Enable the “reset.sh” script file by “chmod 700 reset.sh”
3. Execute the “reset.sh” script file “./reset.sh”, this copy a fresh set of disk images to the exercise directories. Use this step whenever you need to reset your disk images.
2.2 Exercise 2
Main task: Display the file content of a file under the root directory.
The main function is already written for you. The main function will repeatedly print the directory content of the root directory (i.e. similar to a “ls”), then prompt the user for a file to display (i.e. similar to a “cat” / “less” command). Your task is to implement the function “read_file( FAT_RUNTIME* rt, char filename[])” which returns:
0 if the file with filename cannot be found under the root directory.
1 if the operation is successful.
This function attempts to locate the directory entry for the file filename, then read all data sectors of this print and print them to the screen. Note: use the print_as_text() function when you need to print out the content of a file data sector. This ensure your output format is exactly the same as ours to facilitate checking.
Several key criteria:
Entire content of the file should be shown (duh!). This requires you to follow the “linked sector chain” by traversing in the FAT……
Note that the last sector may not be full! You need to print out only the valid content. (hint: use file size…..).
You are allowed to define as many helper functions as you need.
You can add / change the parameter(s) of the read_file() function if needed.
The main function should not be changed except the function call to read_file() can be modified with new parameters if you change them.
Sample Output (using 4files.img, user input in bold, file content in red):
Filename Attr Start Size -------------------------------------- fat.txt 01 <file> [0x0067] 1563 mystery.abc 01 <file> [0x003a] 1092 hello.c 01 <file> [0x007e] 74 rain.txt 01 <file> [0x0042] 12194
Read File ("DONE" to quit) > hello.c
#include <stdio.h>
int main() Note that only 74
{ bytes of "hello.c" printf("Hello World!\n"); are valid out of 256 bytes in the sector. return 0;
}
Filename Attr Start Size
-------------------------------------- fat.txt 01 <file> [0x0067] 1563 mystery.abc 01 <file> [0x003a] 1092 hello.c 01 <file> [0x007e] 74
rain.txt 01 <file> [0x0042] 12194
There is no
Read File ("DONE" to quit) > hi.txt
"hi.txt" not found! "hi.txt" in the root
Filename Attr Start Size -------------------------------------- fat.txt 01 <file> [0x0067] 1563 mystery.abc 01 <file> [0x003a] 1092 hello.c 01 <file> [0x007e] 74 rain.txt 01 <file> [0x0042] 12194
Read File ("DONE" to quit) > mystery.abc
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
...... <Some file content omitted to save space> ......
}
The entire
fclose(fat_rt.media_f);
"mystery.abc" is return 0; printed.
}
Filename Attr Start Size -------------------------------------- fat.txt 01 <file> [0x0067] 1563 mystery.abc 01 <file> [0x003a] 1092 hello.c 01 <file> [0x007e] 74 rain.txt 01 <file> [0x0042] 12194
Read File ("DONE" to quit) > DONE
To aid your checking, the original files "fat.txt", "mystery.abc", "hello.c" and "rain.txt" (as well as a few other text files) are included in the exercise folder.
2.2 Exercise 3
Main task: Import a normal file into the USFAT file system.
Similar to exercise 2, the main function is already coded for you. You only need to provide the implementation of the import_file() function. This function returns:
o -1: if there is any error. Full error lists is given below. OR o Non-negative value: Actual number of bytes copied over.
To facilitate explanation, let us assume we make the following call using the given function prototype:
import_file( &runtime, “example.txt”, 25 );
The function should perform the following checks:
Ensure the normal file “txt” can be opened. You can assume the filename given by user follows the “8+3” filename restriction.
Ensure the root directory of the USFAT media does not have another file with the same filename as “txt” as filename should be unique under a directory.
Ensure the root directory is not full, i.e. has less than 8 files currently.
If any of the above fails, the function returns “-1”.
Once checks are all cleared, the function will now attempt to copy the “txt” into the data sectors.
The function will try to use the first sector as specified (data sector 25) in this example. If the sector is free, copying can start there. Otherwise, you should check subsequent data sectors (e.g. 26, 27, 28 …..) and wraps around if needed. If there are no free data sector, the function terminates and return -1.
Once copying starts, data sector chain needs to be constructed if you need more than one data sector. The logic for getting the next sector is the same: look for the free data sector in the subsequent indices and wrap around if needed. Remember to modify the FAT entries accordingly as you move along.
Copy stops when i) the input file e.g. “txt” has been copied fully OR ii) there are no more free data sectors. Remember to “terminate” your sector chain by setting the END flag in the FAT.
The function will also add a new directory entry with the right information into the root directory.
Don’t forget to flush the FAT into the actual USFAT media.
Finally, the function returns the total number of bytes copied to the caller.
Note that this exercise changes the disk image Use the “reset.sh” to restore the images if needed.
Hints and Tips:
Browse the library calls. You have many helpful functions there to keep your pain in check….
Define useful helper functions to reduce code spaghetti…
The sample solution is about 120+lines (with newlines, debug code, comments included).
You can use your c to check whether the files are imported properly.
Use the utility program USFATI to monitor the changes of sectors.
Sample Output (using empty.img, user input in bold, notable key info in red).
The empty.img is an empty USFAT media with only the root directory taking up data sector 5 at the beginning.
Filename Attr Start Size --------------------------------------
Import File ("DONE" to quit) > hi.c
" exist.
Start sector (in Hex) > 0x0 Hi.c" doesn’t
Import "hi.c" to [0x0000] Data Sector...FAILED!
Filename Attr Start Size --------------------------------------
Import File ("DONE" to quit) > hello.c
Start sector (in Hex) > 0x5
Import "hello.c" to [0x0005] Data Sector...Written 74 bytes.
Filename Attr Start Size
Data sector 5 is occupied,
-------------------------------------- so next available sector
hello.c 01 <file> [0x0006] 74
(i.e. 6) is used.
Import File ("DONE" to quit) > hello.c
Start sector (in Hex) > 0x4A
Import "hello.c" to [0x004a] Data Sector...FAILED!
Filename Attr Start Size
-------------------------------------- There is already a
" import failed.
hello.c 01 <file> [0x0006] 74 hello.c" è
Import File ("DONE" to quit) > fat.txt
Start sector (in Hex) > 0x2f
Import "fat.txt" to [0x002f] Data Sector...Written 1563 bytes.
Filename Attr Start Size -------------------------------------- hello.c 01 <file> [0x0006] 74 fat.txt 01 <file> [0x002f] 1563
Import File ("DONE" to quit) > mystery.abc
Start sector (in Hex) > 0x0
Import "mystery.abc" to [0x0000] Data Sector...Written 1092 bytes.
Filename Attr Start Size
Both "
-------------------------------------- fat.txt" and
"
hello.c 01 <file> [0x0006] 74 mystery.abc" are
fat.txt 01 <file> [0x002f] 1563 imported fully. You can mystery.abc 01 <file> [0x0000] 1092 verify their file size.
Import File ("DONE" to quit) > alice.txt
Start sector (in Hex) > 0x4A
Import "alice.txt" to [0x004a] Data Sector...Written 30720 bytes.
Filename Attr Start Size
-------------------------------------- The USFAT disk is hello.c 01 <file> [0x0006] 74 almost full at this point fat.txt 01 <file> [0x002f] 1563 and can only stores mystery.abc 01 <file> [0x0000] 1092 29,184 bytes out of the alice.txt 01 <file> [0x004a] 29184 full 177,428 bytes for
"alice.txt"
Import File ("DONE" to quit) > DONE
The FAT table (use USFATI to inspect) should looks like the following afterwards:
Several notable observations:
"txt" is in sector 6, where the FAT entry is indicated with the END flag as it occupies only 1 sector.
"abc" starts from sector 0, follow the linked sector list to understand the requirement better (use adjacent if possible, otherwise search forward for free sector).
2.3 And beyond….
As exercise 3 is “a bit” challenging, I have decided to drop the bonus questions. L However, I hope you have the curiosity (and time) to explore further. Several things you can try (in increasing insanity order):
Expand the directory to use multiple sectors. This removes the 8 files per directory limitation. Your code in ex2 can help.
Implement subdirectory. (Rather straightforward actually).
With (2), extend ex2 and ex3 to support subdirectory, i.e. read file with full path “/dir1/dir2/example.txt”, import file for deeper directory structure etc.
Reference:
"Design of the FAT file system" (Very good read – Much deeper than you'll need) https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system
"Alice in the wonderland." (IP free!), by Lewis Carroll, Project Gutenberg version.
"There Will Come Soft Rains", by Ray Bradbury