CS 3210 - Lab 13 - Using the Berkely `db` Library

Today we'll learn how to use the Berkely db library to store key-value pairs. If you need to save data that involves a unique identifier (like a "primary key" in a database) and some associated data, the Berkely db is for you. Those of you that are doing address book / recipie / movie listing -type projects might consider using a Berkely db.

The chapter correponding to this lab is Chapter 23.

All the programs that use Berkely DBs need to have the following options passed to gcc:

-I/usr/include/db1 tells the compiler to look in the include directory that has the db.h file, which we will #include at the top of all our programs.
-ldb1 tells the linker to link with version 1 of the DB library (libdb1.so).

First, Some Theory

Abstract Data Types and You: Partners in Freedom

Before the term "Object-Oriented Programming" was coined, programmers used Abstract Data Types (or "ADTs") to make code more organized and maintainable. Basically put, an ADT is a structure (i.e. a C struct - usually typedef'ed) that you use with a collection of related routines. Perhaps the best example that we have seen so far is the FILE * data type that you pass to all the higher-level file routines (fopen, fclose, fread, fwrite, etc.).

At the bottom of p. 429 you can see the DB struct that is used in all of the library access routines. Something you will note is that the struct contains a number of function pointers, which you use for all the library access routines. You can think of these as "methods", to use an O-O term.

What's a Hash? A B-Tree? A RECNO?

Page 430 in your book describes the three types of databases that you can create. Summary:

hashes: "Hash Tables" simple key / value pairs. Good comparison: an array is a list indexed by a number, a hash is an list indexed by a string.
btrees: "Binary Trees" efficiently ordered, indexed data, also treated as key / value pairs. This db type allows you to traverse to the previous record as well as the next. It also features more sophisticated cursor support. (You folks that work with databases will know what I'm talking about.)
recno: "Record Number" simple indexing with records numbered 1 to N. Removing one record re-numbers all the following records. This db type allows you to traverse to the previous record as well as the next.

For further enlightenment on these, look at the man pages for db(3), hash(3), btree(3), and recno(3). In the labs we do today, we'll only use hashes. Left as an exersize for the interested student to experiment with other DB formats.

Example Program: hash - This Perl program shows an example of how hashes work using key-value pairs. Note the program outputs key / value pairs in a different order than the way we set it. This is because hashes order things however they want. Berkely DB files do the same thing. If you want your data ordered, use btree or recno.

And Now, The Berkely DB

Opening and Closing

To open a db file, you use the dbopen method, described in your book on p. 431. You could think of this as the "constructor" in O-O terms. This is the only db function that is not accessed via a function pointer.

Example Program: dbopen.c - An example program from your book on p. 432. It just opens and closes a db. Observe how the file test.db gets created in the current working directory. You can use the file command to see what kind of file it is.

Adding / Editing a Record: `put`

Both adding and editing a record is done with the put accessor, described in your book on p. 436. If you set a value for a key that already exists, the old value will be overwritten. You can specify R_NOOVERWRITE as the flags value if you don't want to overwrite the value.

Example Program: dbput.c - A program that uses the put method to add the same values that we added in hash.

Iterating Through All Records: `seq`

To list all the records in a db you use the seq accessor, described in your book on p. 434. There are a variety of flags that you can pass seq to control how you will iterate through the records. The only ones that work with hashes are R_FIRST (get the first record) and R_NEXT (gets successive records).

Example Program: dbseq.c - Iterates through all the records in the test db. (You should run dbput.c before running this one.)

Retrieving One Record: `get`

To get just one, specific record you can use get passing it the key of the desired record. See p. 435. Note that the flags parameter is unused.

Example Program: dbget.c - Uses get to retrieve just "Fred" from the db. Also, tries to retrieve a non-existent "Barney" record so you can see it fail.

Removing a Record: `del`

You can remove a record with del, described in your book on p. 436. (If you're using a BTREE db, you can delete by cursor).

Example Program: dbdel.c - Uses del to remove the Dino record. Also, tries to remove a non-existent "Barney" record so you can see it fail. You can re-run dbput.c to re-add the removed recurd.

A Few Extras: `fd` & `sync`

Page 433 shows two extra accessors.

First, fd which returns a file descriptor; you would use this primarilly for locking (using flock / fcntl), as the db library does not support any locking by itself.

Next, sync which commits changes to disk. DB files are memory-mapped for more efficient access and synchronize periodically. You can use sync to force synchronization to occurr.

What if I have more than one "value" to associate with the data?

One approach: Just smoosh all the values together in a single string, using some kind of character for the delimiter. IOW, you'd have a value string that looks like:

	"value1:value2:value3:value4"

Then, when you read the key / value(s), you can just use strtok (or similar) to split them up.

Another approach: The data used for keys and values are just void * types, so you can actually pass whole structs in for either one.

Further Study

Sleepycat Software - Company that provides commercial support for Berkely DB versions 1, 2, and 3. They also host the official Berkely db documentation.

The man pages for db(3), hash(3), btree(3), and recno(3).

Berkely DB by Sleepycat Software Inc. on Amazon.

Your book, Chapter 23

The Ultimate Flintsones Site includes an episode guide

Lab 13

For this lab, you will make a program that adds, deletes, retrieves and lists records in a Berkely DB database, depending on which command line arguments are passed. A summary of the flags are:

-a KEY VALUE: add a new record
-d KEY: delete a record
-g KEY: get a single record
-l list all records

Some Steps to Help You

Check the number of command-line arguments that were passed. If no arguments were passed (argc == 1), print a help message that shows which options are available (use the -a, -d, -r, -l text above) and exit.
If they passed at least one command-line arg, open the database "lab.db". (Call dbopen)
If they passed "-a" as the first command-line arg (argv[1] is "-a" -- use strcmp), make sure there are two more command-line args (argc == 4) and use them as the key / value pair to add. (Call db->put)
If they passed "-d" as the first command-line arg (argv[1] is "-d"), make sure there is one more command-line arg (argc == 3) and use the next arg as the key to delete. (Call db->del)
If they passed "-g" as the first command-line arg (argv[1] is "-g"), make sure there is one more command-line arg (argc == 3) and use the next arg as the key to retrieve. (Call db->get)
If they passed "-l" as the first command-line arg (argv[1] is "-l"), list all of the records in the database. (Call db->seq)
If they passed any other flag for argv[1], print an error message ("unrecognized option").
At the end of the program, close the database "lab.db". (Call db->close)

Save this in a file called dblab.c. The whole file should be about 80 or so lines long, but you will be able to re-use a lot of the code from the previous examples. You can do everything in just the main funciton, you don't need any other functions. (Some of you will want to complicate things by using lots of support functions. Fine, whatever...) Be sure to have a Makefile as well. Put them both in a lab13/ directory.

Output

If you follow the directions above, you can test your program as follows. Note that every line beginning with a $ is something you type at the command-line.

$ ./dblab 
usage: dblab [option]
	-a KEY VALUE: add a new record
	-d KEY: delete a record
	-g KEY: get a single record
	-l list all records
$ ./dblab -l
$ ./dblab -a Fred husband
$ ./dblab -l
Fred = husband
$ ./dblab -a Wilma wife  
$ ./dblab -l
Wilma = wife
Fred = husband
$ ./dblab -a Dino pet  
$ ./dblab -l
Wilma = wife
Dino = pet
Fred = husband
$ ./dblab -g Dino    
Dino = pet
$ ./dblab -d Dino
$ ./dblab -l
Wilma = wife
Fred = husband
$ ./dblab -g Dino
No record for "Dino"
$ ./dblab -d Dino
No such record "Dino"

If your output looks like that, you probably did the program correctly.

Files

Here is a list of the files created in this lab.

Makefile
dblab.c
lab.db (Created by your program.)

Please turn in just the dblab.c file with your name, lab # and class # at the top.

CS 3210 - Lab 13 - Using the Berkely db Library