CS 3210 - Lab4 - Processes

This lab is all about the UNIX process model and spawning new processes.

The chapter corresponding to this lab is Chapter 9 in your book.

You'll find that this lab is a little different than the previous ones we went over where I held your hand doing step-by-step instructions. This time, I'm going to show you how all the pieces work and then I'll let you put the pieces together on your own. I'll give you a description of what I'd like to see along with some sample output and let you write a program that does it. Yes, I'll answer questions if you have them.

All About Processes

This chapter is fairly long and covers a lot of theory in the book. We're going to skim some of the sections either because they're addressed in later chapters, or I just don't feel like covering them. What follows is the high points.

Many of the things we cover here will remind you of things you learned in the Operating Systems class, so this will be a good refresher.

What's A Process? What's A Thread?

A process is a memory-resident program that is executing. It is stored in memory (or paged in from / out to disk via the VM subsystem) and contains context information (such as the instruction pointer, stack pointer, registers, etc.). Your book gives a bulleted list of things a process contains on p. 93.

A thread is an individual task within a process (often called a "lightweight process"). It is similar to a process in that it is a seperate stream of execution, but different in that it shares variables with it's parent. These are explained in your book on p. 94.

The Linux approach is to make threads simply a special case of processes, since processes are already plenty lightweight. (p. 94)

Process Information

At the command-line type ps which is short for "process status". (Unix folks are notoriously lazy typists.) This listing shows you a listing of the processes that you are running. Try adding the 'u' ("user"), 'f' ("forest" / "fork"), 'a' ("all"), and 'x' ("show daemons") flags to get more information, i.e. ps faux. We'll talk about these in class. Have a look at the man page as well (man ps). The top program is likewise instructive. (Note: a 'top' clone would be a good term project idea.)

All of the information about processes is found in a virtual file system mounted on /proc play around in there for a bit and see what you find.

Some terms you should definitely know are: pid, parent, child, zombie, orphan, init, These are all explained in your book on pg. 95

Note also that processes have a UserID (uid) and GroupID (gid). The superuser is root with an ID of 0 (standard on all UNIX systems). You can see your own UID/GID by typing id at the command line. You can retrieve this information programatically with the system calls (hereafter syscalls -- discussed in your book in Chapter 8) getpid() and getppid(). Here's a little program that does just that: pid.c

Note that I #included the file unistd.h which contains the declaration for getpid() and the file sys/types.h which contains the declaration for pid_t. "But how are we supposed to know to do that?". Easy: look at the man page (man getpid).

Accessing Environment Variables

In the debugging lab we used environment variables to enable options in Electric Fence and checkergcc. Now we'll have a look at how you can access environment variables programatically. There are actually two ways to do this.

The first way is to declare a third argument (after argc and argv) passed to main() in your program as in this example: env-arg.c

Betcha you didn't know you could do that.

The second way is to declare a extern char **environ variable at global scope in your program. This is discussed in your book on p. 103-105. Example: env-var.c.

The output for this program is the same as in the previous example. (Try it, you'll see!) One advantage to this approach is that the environment variables can be accessed anywhere, not just in main().

Note that the variable must be called environ or it won't work. The extern keyword means that the variable is declared externaly in another file. Specifically, it's declared in unistd.h (which is why we #included it). "But how are we supposed to know to do that?". Man pages to the rescue again (man getpid). Handy things, those man pages.

Process Resource Usage / Limits

Your book on p. 105 talks about process resource usage. We're not going to hit on this too hard; you can just skim over it. Suffice to say that every process uses system resources (surprise!) which you can find out programatically, and via the top / ps commands.

One of the reasons why resource limits were introduced was to counter the ever-popular "fork bomb" that ankle-biters were fond of writing. Here's an example: fork-bomb.c

For a list of current limits, type ulimit -a at the command line. Your book also mentions these on p. 107. For online docs, you can type 'help ulimit'

Making New Processes

Most systems you are used to feature more of a "spawn" approach to threads. By that I mean that you pass a function pointer to a "create process" type of system call and it makes a new thread that starts executing with the function you passed it. This is the approach used in Windows, for example, and also the pthread library.

Unix takes a slightly different approach. Rather than spawn a new process, you call the fork() system call which creates a duplicate process on the system that looks just like the first one. Think of it as the approach used by tree branches as they grow longer. See p. 109 in your book for an explanation. The "child" process inherits nearly everything the parent has, including, variables, open files, etc. See the man page (man fork) for more. Here's an example program that shows a simple fork (similar to the one in your book on p. 110): fork.c.

Note that the fork() system call has the unique trait of being a function that returns twice, once in the parent and once in the child.

Linux improved on the idea of fork() with the clone() system call (mentioned on p. 138), which allows the child to specify exactly which resources of the parent process should be inherited. Typically, this is not a function that you call directly. Rather, this is a syscall used by user-space thread libraries, such as pthreads.

There is also a vfork() system call that is now obsolete and you don't need to use it. It was invented in the days before "copy on write" virtual memory systems were implemented in UNIX OSes.

Waiting For Your Children to Die

Having forked a new process, you can wait for your child processes to exit with any of the family of wait* system calls. Your book describes a number of them on pgs. 110-112. One of the most useful is waitpid(), which allows you to wait for a specific process to exit. Here's an example: waitpid.c. (Note that this example shows how you can make syscalls with the wrong number of parameters, but get unpredictable results.)

Different Ways to Kill Yourself and Others

You can kill your own process by calling the exit() function or the _exit() syscall. Your book describes them on p. 115. The big difference between the two is that exit() will call functions registered via atexit() where _exit() just aborts immediately. Some examples: exit.c, _exit.c.

Note that you can use atexit() to perform the kind of cleanup work that you normally do in a destructor in object-oriented languages (such as C++).

To kill another process, you can use the kill() syscall, mentioned on p. 116. There are also some userspace programs called kill and killall that you can use to kill running processes that you look up in the 'ps' listing.

When a process dies unexpectedly, it will leave a core file (called a "core dump"). See your book on p. 117. These core files can be very handy when debugging, as we mentioned previously in the debugging lab.

Ways to Launch Other Programs

The purpose for calling the fork() function is often so you can launch another program via the exec() system call. Your book discusses the family of exec* syscalls on pgs. 112-115. I've often found the execlp() function the most useful. Example program: execlp.c.

Note that the last printf line is never printed. This is because any of the exec* syscalls replace the current execution image with a brand new one.

For those of you interested in quick-and-dirty ways to launch other programs, you have the system() syscall. It's mentioned in your book on p. 118. Example program: system.c.

Note how the output of system.c differs from the output of execlp.c.

Further Reading

Process Control from the UNIX programming FAQ.

A fork() Primer from the inimitable "Beej".

Your book, of course. Chapter 9.

Stuff We Ignored

The things in this chapter that we didn't touch on are:

I am mentioning them here so that you know you do not need to study them for the test. (Except for popen(), which we'll cover later.)

Lab4

This will be a lab on processes.

Pre-requisite: Make a lab4 Directory

After logging into the lab server, change into the unix directory by typing cd unix and then make a directory called 'lab4' by typing mkdir lab4. Then change into the lab4 directory by typing cd lab4.

All of the files you make for this lab belong in the lab4 directory. Don't put them anywhere else.

Description / Requirements

For this lab, I want you to write a program that uses fork() to create two processes.

In the child process, do the following:

In the parent process do the following:

Save this in a file called process.c. Be sure to have a Makefile as well.

Output

If you follow the directions above, you should get some output that looks like this:

PARENT: pid = 10063
CHILD: my pid = 10064
CHILD: my parent's pid = 10063
CHILD: Sleeping...
PARENT: my child's pid = 10064
PARENT: Waiting for the child to exit...
CHILD: Done sleeping...
PARENT: child is dead.

If your output looks more-or-less like that, you probably did the program correctly. Please note that your output will probably not look exactly like mine, simply because the OS will schedule processes differently and choose different process ID #'s every time you execute the program.

Files

Here is a list of the files created in this lab.

Please turn both of these files in with your name, lab # and class # at the top.