CS 3210 - Lab5 - Files and Directories

This lab is all about files and directories.

The chapters corresponding to this lab are Chapter 10 (Files) and Chapter 11 (Directories) in your book.

There is a tenet of the Unix Philosophy that "Everything is a File". Understanding just that will help you digest everything in these notes.

All About Files

Firstly, we'll talk about how to use files on a UNIX system. Chapter 10 in your book corresponds to these notes. Be sure to review the terms on pgs. 141-143. We'll go over all of these in class.

File Permissions

Each file on a UNIX system has permissions assigned to it that regulates who is allowed to access it. For an example of this, look in your book on p. 145. Here's what you see if you type ls -l at the command line:

-rw-rw-r--    1 markw    markw        3481 Jun 12 12:35 index.html

There are 10 "slots" in that first column.

You can change a file's permissions at the command-line with the chmod command. It has man page, and an example is given in your book on p. 145. We'll show how to do that in class.

There is likewise an accompanying chmod syscall (type man 2 chmod to see this one), which allows you to programatically change permissions. The man page shows the "easy" way to use chmod. If you haven't figured it out already, most all of the system calls we've reviewed have an accompanying userspace program by the same name.

Another thing you can do is set your umask to apply default permissions to a file. This is discussed in your book on p. 149. Some examples of umask are given on p. 150. We'll talk about how this works (including a review of how to count in octal).

Lab Activity: Everybody set your umask to a value that will make new files non-world readable, and non-world writable by default. After you've tested it, put this command in your ~/.bash_profile file.

There are also chown and chgrp commands (with accompanying syscalls) that allow you to change user and group ownership.

Basic Operations

There are two ways to programatically perform operations files: (1) The low-level syscalls that use file descriptors (numbers), described in your book beginning on p. 151, and (2) the higher-level syscalls that use the FILE* ADT.

Many of the most commonly used UNIX commands center on doing things with files. The following table spells it out. All of these things have man pages.

Operation Shell Low-Level syscall Higher-Level function
create / open a new file touch creat() p. 152
or open()
fopen()
read a file cat read() p. 154 fread()
write / append to a file echo > filename replaces
echo >> filename appends
write() p. 154 fwrite()
seeking usually grep, actually lseek() p. 158 fseek(), ftell()
going back to the beginning just re-cat it lseek() again frewind(), ftell()
renaming a file mv oldname newname rename() (just use rename)
deleting a file rm unlink (see next section) (just use unlink)
closing a file happens automatically close() fclose()

The above table is by no means a complete list, but it hits on the high points.

Links, Hard and Soft

In addition to creating files, you can also create links. What's a link? Good question. Before we answer it, we need to know a little more about the way a UNIX system stores files on disk.

Inodes

There are basically three layers involved in file storage:

Files, complete with filenames, directories, etc. (top level)
inodes, where all files are referenced by number (middle level)
physical medium, where files are magnetic charges (bottom level)

The middle layer (inodes) affords us a great deal of flexibility. By tracking files by their number, it allows us to have numerous different filenames that all "point" to that number. So, a way of thinking of a link is two files that point to the same inode. (In Windows parlance, these would be called "Shortcuts".)

Your book talks about how to get inode information on p. 162. Example code (from your book): statsamp.c.

Hard Links

The first kind of link is a hard link. This is a file that points to the same inode. You can use the ln command to create a link. Here's an example:

$ touch file1
$ ls -l
total 0
-rw-rw-r--    1 markw    markw           0 Jun 12 13:58 file1
$ ln file1 file2
$ ls -l
total 0
-rw-rw-r--    2 markw    markw           0 Jun 12 13:58 file1
-rw-rw-r--    2 markw    markw           0 Jun 12 13:58 file2

Note that the number in the second column changed from 1 to 2. This is because the underlying inode has 2 references to it. Linking a new file ("file3") would bump up the reference count to 3, and so on.

We can see the inode numbers of the file by passing the -i flag to ls:

$ ls -i *
 970821 file1   970821 file2
$ 

Note that they are the same because they refer to the same underlying file. Changes made to one are immediately seen in the other.

This is why you use the unlink() syscall to remove a file. As far as the OS is concerned, "deleting" a file is just the same as removing a link to it's inode. Once all the links are gone, the inode beneath is freed and the disk space is reclaimed for use by other files.

The link() syscall can be used to programatically make a hard link. (man 2 link) (See p. 176 in your book)

Symbolic Links

The other kind of a link is a symbolic link, also called a soft link or, most commonly, a symlink. These are made with ln -s and differ significantly from hard links.

$ touch file1
$ ls -l
total 0
-rw-rw-r--    1 markw    markw           0 Jun 12 14:07 file1
$ ln -s file1 file2
$ ls -l
total 0
-rw-rw-r--    1 markw    markw           0 Jun 12 14:07 file1
lrwxrwxrwx    1 markw    markw           5 Jun 12 14:07 file2 -> file1
[markw@anubis blah]$ ls -i *
 970821 file1   970822 file2@
$ 

Note the the reference count for file1 was not increased by making a symlink. This is because a whole new file was created for file2, containing the path to the file it references. You can see that the two have different inode numbers with the ls -i command.

In addition to ls -l there is also a readlink userspace program that you can use to dereference a symlink.

If you remove file1 at this point, you will create a dangling link. (See p. 177)

The symlink() syscall can be used to programatically make a symbolic link. (man symlink) (Book, p. 177)

All About Directories

There are just a couple of things you need to know about directories. This section corresponds to Chapter 11 in your book.

Directory Basics

For starters, you have two "special" directories:

The following table sums up other directory operations, how to do them both in the shell and in C.

Operation Shell Syscall
get the current directory pwd getcwd() p. 187
change directory cd chdir() p. 189
also fchdir() p. 189
make a new directory mkdir mkdir() p. 191
removing a directory rmdir
also rm -rf (dangerous)
rmdir() p. 191
read a directory's contents ls opendir(), readdir(), closedir() p. 191-193

The following source code shows how to read the contents of a directory: dircontents.c

Globbing

Q: What the heck is globbing?

A: "Globbing" is basically another way of saying "using wildcards" in directory listings. Some common examples are using the star (*) to match any string and the question mark (?) to match a single character. See p. 193 in your book.

Suffice to say that there are several glob*() functions that you can call to use wildcards in directory listings. They are described in your book on p. 195-199. I'm content if you just know what the word "globbing" means.

An example program that shows how to glob: globit.c. (from your book on p. 197-199)

The Magic of chroot()

Your book describes the chroot() syscall on p. 190. There is also a userspace program by the same name that does the same thing. Basically what this does is make another directory appear to be the root directory. In order for this to work, you need to have the bare minimum of system programs and libraries available in the chroot-ed directory.

The reason why this syscall exists is to create a "quarantined" place where some program will live, like an ftp daemon. If someone enters a chrooted environment, they will be unable to cd into other parts of the system, preventing them from poking around, rattling doors and windows. In other words, it improves the security on a system to have a potentially dangerous process running in a chrooted environment.

All About Pipes

Pipes are one of the most innovative and useful features of a UNIX system. Basically, they allow you to send the output from one command as input to a second command. As you would guess, a pipe is a FIFO structure (first in, first out).

Using Anonymous Pipes in the Shell

Anonymous pipes are made in the shell by using the pipe (|) character between commands. Page 141 in your book shows a short example. Here's another one: (We'll go over this step-by-step in class.

$ find | grep index | wc -l
     7

The first command recursively lists all files in this directory and any subdirectories. The output is sent to the grep command which lists only those lines that contain the substring "index". The output from grep is then sent to the word count (wc) program which lists the number of lines found (7).

This kind of construct is called a pipeline. It consists of a program that generates output at the head (left) and one or more filter programs afterward. A filter program is one that can read from standard input and write to standard output. Thus, the output of the left-hand commands become the input to the right-hand commands, just like water flowing through a pipe.

Using Named Pipes in the Shell

You can programatically make a named pipe with mknod() (See p. 173) There is also a userspace program by the same name, and one called mkfifo that is specially designed for making named pipes. The following shows an example.

$ mkfifo myfifo
$ ls -l myfifo 
prw-rw-r--    1 markw    markw           0 Jun 12 15:21 myfifo|
$ echo "one" > myfifo &
[1] 25710
$ echo "two" > myfifo &
[2] 25711
$ echo "three" > myfifo &
[3] 25712
$ cat myfifo
one
two
three
[1]   Done                    echo "one" >myfifo
[2]-  Done                    echo "two" >myfifo
[3]+  Done                    echo "three" >myfifo
$

As we can see from the long listing (ls -l), a "p" character is placed at the beginning of the list of file permissions so we know that this is a special file. GNU ls will also append a "|" character to the end of the filename to give you further indication. Otherwise, it looks like any normal file.

Note that I appended an ampersand (&) to the first three echo commands. This is because they block (do not return) until the output is read from the fifo (we'll go over this in class). When I cat the contents fo myfifo, it prints all the stuff that we queued up in there with the echo commands. The "Done" lines report that the background tasks are finished.

You could alternatively do this with two terminals. Your book on p. 207 shows a diagram of this.

Using Anonymous Pipes in C: pipe()

Your book describes how to make anonymous pipes on p. 182. Like anonymous pipes in the shell made with '|', these do not have a presence on the file Suffice to say that the pipe() syscall creates a pipe with two sides: one side for writing and one side for reading. This program shows an example of a process that creates a pipe and then forks a child process. The child writes a string and the parent reads it: strpipe.c

Water does not flow in two directions through a pipe. If you want two-way read-write communication, you need to open two pipes. This example shows a program similar to the first that opens two pipes and then forks a child which exec's gdb. The parent then writes commands to the child process in one pipe and reads them back in another pipe. pipegdb.c

I hope that after looking at these examples, you appreciate just how much the single character '|' buys you when you're in the shell.

The Easy Way to Do Pipes in C: popen()

Your book on p. 119 describes the easy way to read from a pipe by using popen(). Suffice to say that this gives you the benefits of the pipe() syscall, but is much nicer because it provides a nicer, higher-level interface.

An example programs from your book that opens a pipe for writing to the basic calculator ("bc"): calc.c

Another example that opens a file for reading from the "ls" command (an easier way to do globbing"): popenglob.c

Note that, just as with the pipe() syscall, pipes created with popen are unidirectional, you can only read or write, not do both. (Water does not flow in two directions at once.)

Further Reading

Unix Programming which contains a variety of topics, including file handling.

FIFOs by Beej.

Your book, of course. Chapter 9.

Have a look at the Midnight Commander program on gautama (type mc to get it). This is a little text-based file manager program. This is a good example of a program that someone could clone for a term project.

Speaking of "filter" programs, you might want to check out some of the "speak" filters installed on gautama, like jethro, cockne, eleet, chef, upside-down, and fudd.

And Now, Lab5

This will be a lab on files, specifically popen().

Pre-requisite: Make a lab5 Directory

After logging into the lab server, change into the unix directory by typing cd unix and then make a directory called 'lab5' by typing mkdir lab5. Then change into the lab5 directory by typing cd lab5.

All of the files you make for this lab belong in the lab5 directory. Don't put them anywhere else.

Description / Requirements

For this lab, I want you to write a program that finds out the name servers for a host on the Internet. Follow these steps:

Save this in a file called getns.c ("get name servers). The whole file should be about 40 or so lines long. Be sure to have a Makefile as well.

Here is a shell script that accomplishes the same thing:

#!/bin/sh
whois $1 | sed -n -e '/Domain servers/,$p'

Output

If you follow the directions above, you should get some output that looks like this:

$ ./getns cnn.com
   Domain servers in listed order:

    TWDNS-01.NS.AOL.COM          149.174.213.151
    TWDNS-02.NS.AOL.COM          152.163.239.216
    TWDNS-03.NS.AOL.COM          205.188.146.88
    TWDNS-04.NS.AOL.COM          64.12.147.120

If your output looks like that, you probably did the program correctly.

Files

Here is a list of the files created in this lab.

Please turn both of these files in with your name, lab # and class # at the top.