Text Databases and Document Management: Theory and Practice

Gobinda G. Chowdhury (Senior Lecturer, Department of Computer & Information Sciences, University of Strathclyde)

Program: electronic library and information systems

ISSN: 0033-0337

Article publication date: 1 June 2002

142

Keywords

Citation

Chowdhury, G.G. (2002), "Text Databases and Document Management: Theory and Practice", Program: electronic library and information systems, Vol. 36 No. 2, pp. 133-134. https://doi.org/10.1108/prog.2002.36.2.133.4

Publisher

:

Emerald Group Publishing Limited

Copyright © 2002, MCB UP Limited


With the rapid increase in the volume and variety of information on the Internet and intranet, the need for sophisticated tools and techniques for managing information has greatly increased. This book addresses some issues related to textual data management. There are eight chapters in this book contributed by a total of 12 authors. These chapters address various issues related to document management, such as mark‐up languages, document and database connectivity, metadata, text indexing and categorisation, thesauri for query expansion, and intra‐ and inter‐organisational document management issues.

Chapter 1 addresses issues related to mark‐up languages. It begins with a basic introduction to the evolution of mark‐up languages including XML and the document object model (DOM). It then discusses how these technologies, together with the modern database technologies, can be used in e‐commerce applications. Chapter 2 reports a survey of relevance ranking mechanisms currently used by the Web search engines. It also discusses some recent experiments aimed at measuring the performance of Web search engines, and finally examines a method for reducing the retrieval of non‐relevant documents. Chapter three addresses the issues of metadata, Dublin Core (DC) in particular, for managing corporate digital documents. It points out the needs for changing the connotation of some DC fields to make the metadata scheme suitable for handling digital documents created within organizations. Chapter 4 investigates the usefulness of n‐grams for document indexing in text categorisation. Chapter 5 focuses on the tools used in an information retrieval environment to support users in their selection of terms for query expansion. It proposes that automatically generated, category specific associative thesauri can be a useful tool to support query expansion. This chapter mainly reports the work in progress within the Eurosearch Project whose main purpose is the design and implementation of a European federation of various search engines working in a multilingual, multi‐document environment. Chapter 6 presents a database language, quite similar to the relational database language that provides the basis for the management of marked‐up documents. Chapter 7, the longest chapter in the book covering 63 pages, discusses a cooperative document management model in a multidisciplinary healthcare environment. The last chapter deals with interorganisational document management issues. It introduces a method for improving the retrieval of documents created in organizational processes by collecting metadata about the context of the documents. Each chapter has a list of references, and the book has a subject index.

Thus, the book addresses some topical issues related to textual document management. However, an introductory chapter describing the objective of the book and the chapters, and showing the links among the various topics addressed in the book, would be very useful. In the absence of this, it appears like a collection of articles addressing some relevant, but somewhat unrelated issues of document management. Three of the eight chapters, namely chapters 4, 5 and 6, are rather technical, requiring some mathematical knowledge and/or computer background. Chapters 2, 4 and 5 have abstracts, while others do not. Another shortcoming of the book stems from the index. It is not only short, but also contains some unusual entries, like reuse, relevance of pages, search and retrieval, and so on.

To sum up, the book contains some interesting discussions with pointers to some exciting research projects in different areas of document management, and hence it may be a good source of information for researchers in this field. However, the relatively high price may make it less attractive to many readers.

Related articles