CS 3210 - Lab6 - Advanced File Handling

This lab is all about more interesting things that you can do with files, especially managing the sharing of files between two or more running processes. The chapter correponding to this lab is Chapter 12. As in previous weeks, the example programs we use can be found in the /tmp directory on the gautama server.

Multiplexing with select()

Beginning on p. 205 in your book, it begins talking about the problem with trying to read more than one file at a time. Here, we'll look at the problem and the solution. We will follow pretty closely the examples given on pages 205-214.

Blocked Reading (Bad)

As we observed last time, when you try to read from a pipe, it blocks until something is written into the other end of the pipe. This can create some problems, especially if you have more than one pipe to read from.

Example Program: mpx-blocks.c - This program reads from two pipes, p1 and p2. You must write something into p1 before you write something into p2 or else the program doesn't work. Bad solution.

Non-Blocking Reading (Better)

An improvement is to programatically tell the OS to open the file in non-blocking mode. When there is no data to be read, the read() call will not block, but will return immediately.

Example Program: mpx-nonblock.c - This program opens both pipes in non-blocking mode. This means that it will toggle back and forth between the two, looking for something to read. Note that while this is an improvement (you don't have to type something into p1's window first) it is still problematic because it consumes so many system resources (type 'top' in a fourth window and look at the CPU usage). This is called busy looping and it's a sub-optimal solution.

What we really need is some way to block on both files until something is available.

Reading with select (Best)

The best way to read from more than one file is by using select(), which allows you to specify a set of file descriptors to listen on. The advantage to doing it this way is that the OS will block (thereby not consuming enormous CPU time) until any of the files have been written to.

Example Program: mpx-select.c - Same as above, but note that the CPU consumption of this program (as seen in 'top') is much nicer.

Using select() is how forking servers such as Apache (web server) are written.

An Alternative: poll()

An alternative to select() is the poll() syscall which does basically the same thing, but allows you to specify exactly which events you'd like to listen for (input, output, errors, invalid requests). As an added bonus, it also features a somewhat cleaner interface (all the FD_* macros are gone, for example).

Memory Mapping with mmap()

UNIX allows you to map files into memory (or create anonymous files in memory) with the mmap() syscall. A list of various uses of mmap() is given in your book on p. 215.

Mapping a Whole File Into Memory

The primary purpose of mmap() is to fill a region of memory with the contents of a file. This enables you to perform operations on a file by manipulating memory rather than manipulating the file. The following two examples demonstrate how this works:

Example Program: cat.c - A small implementation of cat which uses the read() syscall to read the contents of a file and the write() syscall to write the contents of the file out to stdout. (All very logical.) Note that to run this program you must type:

./cat < filename
where filename is substituted with the name of an actual file (cat.c will work). Note that you must use the < (redirection) operator to read the file because this version of cat only reads from stdin.

Example Program: map-cat.c (p. 220) - This variation on cat mmap's the whole file into a buffer in memory and then writes the buffer to stdout. To run this program you must type:

./map-cat filename

Note the following points:

Anonymous Mappings

In addition to mapping actual files that exist on disk, you can also create anonymous mappings. This just allocates a big hunk of memory that you can use for whatever purpose. Have you ever wondered how malloc() works? Now you know.

To make an anonymous mapping, call mmap like this:

mmap(NULL, 1, PROT_READ | PROT_WRITE,
	MAP_ANONYMOUS | MAP_PRIVATE | MAP_GROWSDOWN, 0, 0);

The man page or your book on pgs. 216-219 explains all the arguments and the various values you can pass. Here's the anonymous mapping, but with more explanation:

mmap(

	NULL, /* starting address: we don't care where it starts so we pass NULL
	to allow the operating system to pick an address for us (p. 216) */

	1, /* size of the file to map: it's going to grow, so just start with a
	small size (one byte) (p. 217) */

	PROT_READ | PROT_WRITE, /* access flags: we want to be able to both read
	and write to this memory, but not execute it's contents (hence, we
	do not pass PROT_EXEC) (p. 217) */

	MAP_ANONYMOUS | MAP_PRIVATE | MAP_GROWSDOWN, /* flags: (p. 218) we want
	(1) an anonymous mapping, (2) we don't want any other process to be able
	to use this mapping (the alternative is "shared"; anonymous mappings must
	be private because there is no actual file that could be shared with
	another process), (3) we want this memory region to grow as we put more
	stuff in it; it will grow to the size of one page */

	0, /* file descriptor to map: this is ignored when we make an anonymous
	mapping, so it doesn't matter what we put here */

	0 /* file offset: where we begin mapping the file. 0 means "at the
	beginning" */
	
	);

Other mmap() Odds and Ends

(For ESL students: "Odds and Ends" is an English idiom that means "Miscellaneous Details")

Unmapping: To remove a memory mapping, you can call munmap() (see p. 221). Note that you do not actually have to call this as all mapped regions are reclaimed by the OS when the program exits (or replaces itself with a new program via a call to exec), but manually unmapping is considered a "best practices" sort of thing.

Synchronizing: If you have a file mapped into memory and want to sync the contents of the file to disk, call msync() (p. 221).

Memory Locking: If you want to lock regions of memory to prevent other programs from using it, you can call mlock() (p. 222)

File Locking

One common problem when sharing a file between two or more processes is when two processes try to write to the file at the same time. Such a situation produces unpredictable results, and can alter a file in an unexpected and damaging way. To solve this problem we need to lock other processes from accessing the file. There are two ways to do this.

We're going to just touch on this stuff briefly.

The Easy Hack: Drop a Lockfile

The simplest way to coordinate access between multiple processes is to have them all try to open a lockfile (which is not the same as the file they will actually be updating) before they write to a data file, like so:

int fd = open("lockfile", O_CREAT | O_EXCL);

Example Program: easylock.c - This program uses a lockfile. Start two instances in two different windows at roughly the same time and observe the results.

Another example of how to use a lockfile to coordinate access is given in your book on pgs. 224-225.

The easy way, while nice and simple, has some disadvantages. These are spelled out in your book in a bulletted list at the bottom of p. 225.

The Harder But Better Way: Use fcntl()

Your book on p. 226 describes the fcntl() syscall and how it can be used to lock portions of a file. This is how database programs could implement table / row locking. This is a better approach because the locking mechanism is enforced by the operating system and even badly-behaved programs will not be able to stomp all over a file.

Example Program: lock.c - This program uses fcntl to aquire locks. Try running the program in two different windows and observe the behavior.

This page from the UNIX programming FAQ, spells out in fairly clear terms how you can use fcntl. It includes some source code examples that you might find useful.

Further Study

How to Manage Multiple Connections from the UNIX programming FAQ. Includes some examples of how to use both select() and poll().

Memory Mapped Files by Beej.

How do I lock a file from the UNIX programming FAQ.

File Locking by Beej.

Your book, Chapter 12

For example programs on Gautama, you could look at nearly any daemon, which will likely use a fork() / select() approach for listening to multiple connections.

Stuff We Ignored

The things in this chapter that we didn't touch on are:

Lab6

This will be a lab on mmap().

Pre-requisite: Make a lab6 Directory

Make a subdirectory called 'lab6' underneath the unix directory in your home dir. All of the files you make for this lab belong in the lab6 directory. Don't put them anywhere else.

Background: Dealing with Strings in C

When you make a string in C, you can't put in a string that is larger than the declared size. Example:

char str[6] = "hello world!"; /* trouble! you just overwrote memory! */
One way around this is to make a pointer that points to a literal string that is stored in the executable, like so:
char *str = "hello world!";
If, however, you don't know what the string is going to be, you can use the strdup() library call:
char *str = strdup("hello world!"); /* returns malloc'ed memory */
...
free(str); /* necessary, otherwise you leak memory */

(Dealing with strings in C can be kind of a pain.)

Description / Requirements

For this lab, I want you to write a program that will create a string that will automatically grow as needed to accommodate larger and larger strings. One way to test it is to use code like this:

strcpy(mystr, "Stuff");
printf("strlen(mystr) after strcpy = %i\n", strlen(mystr));
strcat(mystr, " and more stuff");
printf("strlen(mystr) after strcat = %i\n", strlen(mystr));

Save this in a file called anonmmap.c. The whole file should be about 40 or so lines long, including whitespace and curly braces. Be sure to have a Makefile as well.

Output

I'd encourage you to use lots of printf calls in this program so you can see what's going on. The two lines that I really want to see are:

Largest size string: ????
getpagesize() returns: ????

The ???? question marks will, of course, be replaced with actual numbers in the program you write. Feel free to output a whole lot more stuff than just that, though.

Word Of Warning

Plese note that while this is a cute trick, you shouldn't use this for all your strings because you can end up consuming a whole lot of memory. Also, there are occasions where just having an immutable pointer to a literal string will do the job just fine.

Files

Here is a list of the files created in this lab.

Please turn in just the anonmmap.c file with your name, lab # and class # at the top.