<$BlogRSDUrl$>

Reflections on a Wandering Life.....

Sunday, April 10, 2011

Complex Problems, Simple Solutions 

Lush on a Sunday evening. I'm supposed to be studying my Chinese lesson, but I'm sitting here staring into space. It's a database problem. I am setting up a schema for my research at the National Library of China. Ordinarily one would think that the ISBN would be a good unique identifier for each book. But there are many old books at the National Library that do not have ISBN numbers. But they all have library bar codes, and the bar codes are simple and straightforward (not a combination of alpha and numeric values). So it's really easy to use the bar code as an identifier. The column data type can be set to "number," and you don't have to worry about how many characters need to be reserved for the column.

But I also read books that are not at the Natural Library. Almost all of them would have ISBN numbers, but they would not, of course, have National Library bar codes. So at first, I planned to set up two separate schemas. The problem with that is that every time I want to look for something, I have to do two separate searches. Of course, I could write the search code so that it joins the two queries together and searches both schemas at the same time. But since the schemas are identical in every respect except the bar code, it seems that there should be a way to combine them.

The solution I came up with is painfully simple. It goes with out saying that books from outside the library would need to use the ISBN as an identifier, since they don't have a library bar code. But even for books from the library, I want to have a column for the ISBN number, because I want to have access to that information for each book (in case I want to order a used copy online or something). So there will be a column for the bar code and a column for the ISBN number. The fields in the ISBN column can be null for those books in the library that do not have an ISBN, since the ISBN column is not used to join tables. But the fields in the "barcode" column cannot be left null for those books from outside the library that do not have a bar code, because the "barcode" column is the primary key, and a primary key column would not function properly if it contained null values. You would have lots of data anomalies. It would be more accurate to say that if the primary key contains null values, it is not a primary key, because the requirement of a primary key is that it uniquely identifies each record, which means that it cannot have null or non-unique values.

The solution is to use the ISBN as the bar code for those books that do not have a bar code. Solves the problem, and if I ever did want to separate them later, it would be easy to surgically remove those records:

select * from booklist where isbn=barcode;

It never ceases to amaze me how much time and thinking it takes me to come up with the most obvious simple solution.

OK, enough of the musings of a database guy. If you don't understand what you just read, just forget it and use Excel. Excel is not as versatile as a relational database, of course, but it is a good flat file database, and if you don't have too many records, you can live with the inefficiency. My reason for using a relational database comes down to scalability. Years ago, when I was doing a study of the Civil War, I was using an old DOS program called, "Electric Desk." I had accumulated some 1400 records. When you get that many, it's kinda hard to take a look at the whole table (what we call a "full table scan"). Electric Desk worked pretty well, but it did not follow the rules of normalization, and in the process of moving to Arizona, I lost the floppies I had it all stored on. I guess it doesn't matter much, because I don't even have a drive that would read them anymore, and, as far as I know, the software that would implement the data is now obsolete. I don't want that to happen again, so this time I determined to set up a relational database that would be both scalable, as to size and adaptable to future changes in software. Got the job done, thanks to God and E. F. Codd.

This page is powered by Blogger. Isn't yours?