Search results

1 – 2 of 2

View access options

Article

Publication date: 1 December 2001

Automatic vs manual categorisation of documents in Spanish

Carlos G. Figuerola, Angel Zazo Rodríguez and José Luis Alonso Berrocal

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e…

HTML

PDF (313 KB)

Downloads

307

Abstract

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

Details

Journal of Documentation, vol. 57 no. 6

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 1 May 2006

An evaluation of conflation accuracy using finite‐state transducers

Carmen Galvez and Félix de Moya‐Anegón

To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).

HTML

PDF (196 KB)

Downloads

519

Abstract

Purpose

To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).

Design/methodology/approach

Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm.

Findings

The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms.

Originality/value

The report outlines the potential of transducers in their application to normalization processes.

Details

Journal of Documentation, vol. 62 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

Access

Year

All dates (2)

Content type

Article (2)

1 – 2 of 2

Automatic vs manual categorisation of documents in Spanish

Abstract

Details

Keywords

An evaluation of conflation accuracy using finite‐state transducers

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions