|
Tools for data storage
In addition to lists
and tuples,
Python's dictionary
data class is widely used and may be
adequate if persistence is not required. Check out the recommended
documentation referenced via the PYTHON menu on the left.
Relational Databases (DBs)
The python.org
website has a section
devoted to DBs and contains links to general database references. Joel
Shprentz' 1997 paper, Persistent Storage of Python Objects
in Relational
Databases is a readable
introduction to Python and relational
databases, even if it is a little aged.
Name |
anydbm,
dbm, gdbm and shelve
- Native Python DB interface modules. |
Description |
Standard
Python includes
dictionary/file like interface to operating-system database management
tools such as dbm or gdbm. "import anydbm" will import whatever tool is
appropriate to the OS and SW version. Use python''s inbuilt help
command to find out more. Shelves are really just dbm files that
automatically serialise objects into character streams whn they're
transferred to and from the dbm file. This serialisation process is
called pickling [c.f.] Once either shelf or dmb files are open they are
processed as if they were in-memory python dictionaries. |
Warnings |
dbm
files only allow (both key and
value) data to be stored as character strings. Shelves can store
arbitrary data but the keys must be strings. |
Comments |
Useful
when persistence is required,
the dataset is not too large and fast, complex queries are unnecessary.
When processor memory is inadequate for data storage and access using
these inbuild python tools is inadequate, specialised database tools
are necessary. Interfaces to MySQL seem to be the most complete. |
Name |
MySQLdb |
Description |
MySQLdb
is an thread-compatible
python Application Program Interface (API) to the popular MySQL
database server. The MySQL C API has been encapsulated in an
object-oriented way. The only MySQL data structures which are
implemented are the "MYSQL" (database connection handle) and
"MYSQL_RES" (result handle) types. MySQLdb assumes you are familiar
with MySQL: it is really just a "dumb" client interface which passes a
restricted set of MySQL commands and receives the results of those
queries from MySQL servers which are either installed on your computer
or accessible via the internet. Usage details and examples are
available in the documentation with the download. |
Principal
reference |
sourceforge
mySQL site |
License |
GPL, Python
License (CNRI Python License), Zope
Public License |
Documentation |
sourceforge
mySQL document section |
Downloads |
Local
Server software MySQLserver |
Version |
5.0 |
Warnings |
Familiarity
with MySQL is a
necessity. Make sure you get the right version of the MySQL server for
your hardware and Operating System. The following OSX applications,
available from the MySQL site, are also useful: MySQL Administrator,
MySQL Query Browser and (perhaps less useful, MySQL Workbench). FOR
OSX, there is also a System Preference tool for starting and stopping
the MySQL server and a dashboarf tool for monitoring the MySQL database
activity. |
Comments |
MySQLdb
as a basic tool. It is really
just a way of using python to send SQL to the MySQL database server. A
more abstracted inerface would be more powerful.
If you find typing MySQLdb all the time annoying, use "import MySQLdb
as mysqldb"
The MySQLdb.com site emphasises the commercial server (called MySQL
Enterprice). The MySQL Communty Server is free and perfectly adequate
if 3rd party DB support is not needed. |
Name |
SQLObject |
Description |
SQLObject
is an Object Relational
Manager (ORM) for providing an
object interface to a database, with
tables as classes, rows as instances, and columns as attributes.
SQLObject includes a Python object-based query language that makes SQL
more abstract, and provides substantial database independence for
applications. |
Principal
reference |
sqloject.org |
Documentation |
There
is an active
community
and a useful beginners tutorial. |
Downloads |
sqlobject.org
download |
Version |
0.9 |
Dependencies |
MySQLserver-
see above. |
Warnings |
None |
Comments |
A
suitable abstraction of relational
database technology to make it object-oriented. Large user group; well
supported.
There is a (simpler) sourceforge project named ForgetSQL
which
appears to be older, unfinished and perhaps unsupported.
Even more general is SQL
Alchemy,
which we have not extensively evaluated. It may be
overkill but it is connected through to Twisted and can apparently be
used with Oracle databases.
|
Name |
pyTables |
Description |
PyTables
is a hierarchical database package designed to efficently manage very
large amounts of data. PyTables is built on top of the HDF5 library and
the numarray package. It features an object-oriented interface that,
combined with natural naming and C-code generated from Pyrex sources,
makes it a fast, yet extremely easy to use tool for interactively save
and retrieve large amounts of data. pyTables differs from PyHL
interface to HDF5 (see below) in that PyTables has a completely
object-oriented interface, rather that the more function-oriented
approach of PyHL. There is a professional version PyTablesPro for
faster cross-object searching. |
Principal
reference |
PyTables
website pyTablesPro |
License |
PyTables
is
free for use under BSD
terms
|
Documentation |
PyTablesMoin
FAQ is a good place to start
An article entitled Portable Data with PyTables, published in the April
2008 issue of Python magazine
(Volume 2 Issue 4) gives a good idea of how PyTables is helping users
to work with HDF5 files.
|
Downloads |
SVN
and easy_install from the
Pytables website. |
Version |
2.1
|
Dependencies |
|
Warnings |
None |
Comments |
Hierarchical
Data Format library version 5, (HDF5) is a versatile, mature scientific
software library designed at NCSA supercomputing facility for the fast,
flexible storage of enormous amounts of data. It provides a robust way
to store data, organized by name in a tree-like fashion. With
HDF5, extremely large datasets (hundreds of gigabytes in size) are
organized in a filesystem-like hierarchy using containers called
"groups" an accessed using the tradional POSIX /path/to/resource
syntax.
For more details, see the
hdfgroup website.
|
Name |
pyHL |
Description |
PyHL
allows the user to work with HDF5 at a high level. It is actually a
wrapper around HL-HDF but with some additional functionality
high-level. Like HL-HDF, it is up to the user to define appropriate
ways of representing data and using the building blocks available in
PyHL to store the data in HDF5. PyHL is pronounced ``pile'',
which is an appropriate description of a heirarchy ...
|
Principal
reference |
on
hdfgroup website
|
Documentation |
On the hdfgroup website
|
Downloads |
from hdfgroup
|
Version |
2.1 |
Dependencies |
|
Warnings |
Current
development status unknown. PyTables (above) has been more extensively
tested and is the preferred API to HDF5 |
Comments |
Copyright
© 2000, 2001, 2002 by the Swedish Meteorological and
Hydrological Institute (SMHI), Norrköping, Sweden.
|
Suggestions and/or contributions are most welcome: Please contact us
through the forum.
|