Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries,-- dictionaries, encyclopedias, gazetteers etc. -- are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number curated databases designed to support scientific research. The value of these databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area.
Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.
Peter Buneman is Professor of Database Systems in the School of Informatics at the University of Edinburgh. His work in computer science has focused mainly on databases and programming languages, specifically: database semantics, approximate information, query languages, types for databases, data integration, bioinformatics and semistructured data. He has recently worked on issues associated with scientific databases such as data provenance, archiving and annotation. In addition he has made contributions to graph theory and to the mathematics of phylogeny. He has served on numerous program committees, editorial boards and working groups, and has been program chair for ACM SIGMOD, ACM PODS, VLDB and ICDT. He is a fellow of the Royal Society of Edinburgh, a fellow of the ACM and the recipient of a Royal Society Wolfson Merit Award. He is research director of the UK Digital Curation Centre.