Kristina Voigt and Gerhard Welzl
Scientific information is more and more buried in the proliferation of commercial sites on the Internet. This means that valuable chemistry sites and chemical databases are…
Abstract
Scientific information is more and more buried in the proliferation of commercial sites on the Internet. This means that valuable chemistry sites and chemical databases are difficult to find. In this paper some databases selected by the authors are introduced. These sites can be divided into three groups: databases which used to be only available through commercial hosts; databases which are available commercially but parts of them on the free Internet; and databases of topical concern, e.g. chemical weapons. All the mentioned databases can be found in a structured format in the DAIN Metadatabase of Internet Resources for Environmental Chemicals, which is explained in this paper. An important further step to get out of the information labyrinth is the evaluation of the content of data‐sources for chemicals. Approaches have been made to analyze chemical databases applying discrete mathematical methods and multivariate statistics.
Details
Keywords
Kristina Voigt, Gerhard Welzl and Gerda Rediske
Constantly expanding chemical and environmental information sources increase the need for descriptive statistical analysis. This paper gives a comparative evaluation of data…
Abstract
Constantly expanding chemical and environmental information sources increase the need for descriptive statistical analysis. This paper gives a comparative evaluation of data sources, i.e. online databases, databases on CD‐ROM and Internet resources in the field of environmental chemicals. The evaluation is based on information in three metadatabases for environmental chemicals: DADB‐Metadatabase of Online Databases, DACD‐Metadatabase of CD‐ROMs, DAIN‐Metadatabase of Internet Resources. A data matrix of 50 environmental and chemical descriptors found in DADB, DACD and DAIN is analysed and a technique is applied to transform the data set into a data matrix of a more homogeneous structure. This method is based on algorithms for solving the so‐called travelling salesman problem. Two different ways of analysing the data set are applied and the results are compared. Also, media combination patterns are identified and discussed. For most descriptors the information depth is higher in commercial online databases and databases on CD‐ROM than in free Internet resources. Exceptions, e.g. some health‐related parameters which have a higher percentage in Internet resources, are identified and explained.