Introduction
Many people use Python because, like other scripting languages, it is a portable, platform-independent and general-purpose language that can perform the same tasks as the database-centric, proprietary 4GL tools supplied by database vendors. Like 4GL tools, Python lets you write programs that access, display and update the information in the database with minimal effort. Unlike many 4GLs, Python also gives you a variety of other capabilities, such as parsing HTML, making socket connections and encrypting data.
Possible applications for the Python DB-API include:
- Many Web sites construct pages on the fly to display information requested by the user, such as selections from the products offered in a catalogue or from the documents available in a library. Doing this requires CGI scripts that can extract the desired information from a database and render it as HTML.
- An Intranet application might use the Tkinter module and the DB-API to provide a graphical user interface for browsing through a local database, such as accounts receivable or a customer list.
- Python programs can be used to analyze data by computing statistical properties of the data.
- Python programs can form a testing framework for programs that modify the database, in order to verify that some integrity constraint is maintained.
To solve the problem, a Special Interest Group (or SIG) for databases was formed. People interested in using Python for a given application form a SIG of their own: they meet on an Internet mailing list where they discuss the topic, exchange ideas, write code (and debug it), and eventually produce a finished product. (Sounds a lot like the development process for the Linux kernel, doesn't it?)
After some discussion, the Database SIG produced a specification for a consistent interface to relational databases -- the DB-API. Thanks to this specification, there's only one interface to learn. Porting code to use a different database product is much simpler, often requiring the change of only a few lines.
The database modules written before the Database SIG are still around and don't match the specification -- the mSQL module is the most commonly used one. These modules will eventually be rewritten to comply with the DB-API; it's just a matter of the maintainers finding the time to do it.
Relational Databases
A relational database is made up of one or more tables. Each table is divided into columns and rows. A column contains items of a similar type, such as customer IDs or prices, and a row contains a single data item, with a value for each column. A single row is also called a tuple or a relation, which is where the term "relational database" originates.For an example database, we'll use a small table designed to
track the attendees for a series of seminars. (See listing 1.) The
Seminars table lists the seminars being offered; an example row is
(1, Python Programming, 200, 15)
. Each row contains a
unique identifying ID number (1, in this case), the seminar's title
(Python Programming), its cost ($200), and the number of places
still open (15). The Attendees table lists the name of each
attendee, the seminar that he or she wishes to attend, and whether
the fee has been paid. If someone wants to attend more than one
seminar, there will be more than one row with that person's name,
with each row having a different seminar number and payment
status.
ID | title | cost | places_left |
---|---|---|---|
1 | Python Programming | 200 | 15 |
2 | Intro to Linux | 100 | 32 |
3 | Socket Programming | 475 | 7 |
4 | Web Commerce | 300 | 26 |
name | seminar | paid |
---|---|---|
Albert | 1 | no |
Albert | 4 | no |
Beth | 1 | yes |
Chuck | 2 | no |
Dale | 4 | yes |
Elaine | 3 | yes |
Elaine | 1 | yes |
Felix | 4 | no |
The examples below will use the soliddb module, which supports accessing SOLID databases from Python. SOLID is a product from Solidtech that was reviewed by Bradley Willson in LJ, September, 1997. I'm not trying to cover CGI or Tkinter programming, so only the commands to access the database are presented here, in the same manner as if typed directly into the Python interpreter.
Getting Started
To begin, the program must first import the appropriate Python module for connecting to the database product being used. By convention, all database modules compliant with the DB-API have names that end in 'db': e.g., soliddb, and oracledb.The next step is to create an object that represents a database connection. The object has the same name as the module. The information required to open a connection, and its format, varies for different databases. Usually it includes a user name and password, and some indication of how to find the database server, such as a TCP/IP hostname. If you're using the free trial version of SOLID, Unix pipes are the only method available to connect to the server, so the code is:
>>> import soliddb >>> db = soliddb.soliddb('UPipe SOLID', 'amk', 'mypassword'); >>> db <Solid object at 809bf10>
Cursor Objects
Next, you should create a cursor object. A cursor object acts as a handle for a given SQL query; it allows retrieval of one or more rows of the result, until all the matching rows have been processed. For simple applications that do not need more than one query at a time, it's not necessary to use a cursor object since database objects support all the same methods as cursor objects. We'll deliberately use cursor objects in the following example. (For more on beginning SQL, see At The Forge by Reuven Lerner in LJ, Octoboer, 1997.)
Cursor objects provide an execute()
method that
accepts a string containing an SQL statement to be performed. This,
in turn, causes the database server to create a set of rows that
match the query.
The results are retrieved by calling a method whose name begins
with 'fetch', which returns one or more matching rows, or None if
there are no more rows to retrieve. The fetchone()
method always returns a single row, while fetchmany()
returns a small number of rows, and fetchall()
returns
all the rows that match. For example, to list all the seminars
being offered:
>>>cursor = db.cursor(); >>> # List all the seminars >>> cursor.execute('select * from Seminars') >>> cursor.fetchall() [(4, 'Web Commerce', 300.0, 26), (1, 'Python Programming', 200.0, 15), (3, 'Socket Programming', 475.0, 7), (2, 'Intro to Linux', 100.0, 32), ]A row is represented as a Python tuple, so the first row returned is
(4, 'Web Commerce', 300.0, 26)
. Notice that the
rows aren't returned in sorted order; to do that, the query has to
be slightly different (just add 'order by ID' to the SQL query).
Because they return multiple rows, the fetchmany()
and
fetchall()
methods return a list of tuples. It's also
possible to manually iterate through the results using the
fetchone()
method and looping until it returns
None
, as in this example which lists all the attendees
for seminar 1:
>>> cursor.execute('select * from Attendees where seminar=1') >>> while (1): ... attendee = cursor.fetchone() ... if attendee == None: break ... print attendee ... ('Albert', 1, 'no') ('Beth', 1, 'yes') ('Elaine', 1, 'yes')
SQL also lets you write queries that operate on multiple tables, as in this query, which lists the seminars that Albert will be attending.
>>> cursor.execute("""select Seminars.title ... from Seminars, Attendees ... where Attendees.name = 'Albert' ... and Seminars.ID = Attendees.seminar""") >>> cursor.fetchall() [('Python Programming',), ('Web Commerce',)]
Now that we can get information out of the database, it's time
to start modifying it by adding new information. Changes are made
by using the SQL insert and update statements. Just like queries,
the SQL command is passed to the execute()
method of a
cursor object.
Transactions
Before showing how to add information, there's one subtlety to be noted that occurs when a task requires several different SQL commands to complete. Consider adding an attendee to a given seminar. This requires two steps. In the first step, a row must be added to the Attendees table giving the person's name, the ID of the seminar they'll be attending, and whether or not they've paid. In the second step, theplaces_left
value for their
seminar should be decreased by one, since there's room for one less
person. SQL has no way to combine two commands, so this requires
two execute()
calls. But what if something happens and
the second command isn't executed?--because the computer crashed,
the network died, or there was a typo in the Python program? The
database is now inconsistent: an attendee has been added, but the
places_left
column for that seminar is now wrong.
Most databases offer transactions as a solution for this problem. A transaction is a group of commands, such that either all of them are executed, or none of them are. Programs can issue several SQL commands as part of a transaction, and then commit them, (i.e. tell the database to apply all these changes simultaneously"). Alternatively, the program can decide that something's wrong, and roll back the transaction, which says "Forget all these changes."
For databases that support transactions, the Python interface
silently starts a transaction when the cursor is created. The
commit()
method commits the updates made using that
cursor, and the rollback()
method discards them. Each
method then starts a new transaction. Some databases don't have
transactions, but simply apply all changes as they're executed. On
these databases, commit()
does nothing, but you should
still call it in order to be compatible with those databases that
do support transactions.
Listing 2 is a Python function tries to get all this right by committing the transaction once both operations have been performed.
def addAttendee(name, seminar, paid): # Fetch the number of places left, and the cost cursor.execute('select places_left from Seminars where ID=%s' % (seminar),) (places_left, )=cursor.fetchone() # If there if places_left==0: raise ValueError, 'No more places left in seminar'+str(seminar) # Add a row to the Attendees table cursor.execute("insert into Attendees values (?, ?, ?)", (name, seminar, paid) ) # Decrease the places_left count of the selected seminar cursor.execute("update Seminars set places_left = %i where ID=%s" % (places_left-1, seminar) ) # Commit the changes db.commit()
Calling this function is simple:
addAttendee('George', 4, 'yes')
We can verify that the change was performed by checking the listing for seminar #4, and listing its attendees. This produces the following output:
Seminars: 4 'Web Commerce' 300.0 25 Attendees: Albert 4 no Dale 4 yes Felix 4 no George 4 yesNote that this function is still buggy if more than one process or thread tries to execute it at the same time. Database programming can be potentially quite complex. With this standardized interface, it's not difficult to write all kinds of Python programs that act as easy-to-use front ends to a database.
References
Solidtech Web site: http://www.solidtech.com
DB-SIG Web page: http://www.python.org/sigs/db-sig/
C.J. Date, "An Introduction to Database Systems", 1995: Addison-Wesley, ISBN 0-201-54329-X.