Search results

1 – 2 of 2
Article
Publication date: 14 June 2013

Yousuke Watanabe, Hidetaka Kamigaito and Haruo Yokota

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important…

Abstract

Purpose

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important. The recent file format of office documents is based on a package of multiple XML files. These XML files include not only body text but also page structure data and style data. The purpose of this paper is to utilize them to find similar office documents.

Design/methodology/approach

The authors propose SOS, a similarity search method based on structures and styles of office documents. SOS needs to compute similarity values between multiple pairs of XML files included in the office documents. We also propose LAX+, which is an algorithm to calculate a similarity value for a pair of XML files, by extending existing XML leaf node clustering algorithm.

Findings

SOS and LAX+ are evaluated by using three types of office documents (docx, xlsx and pptx) in our experiments. The results of LAX+ and SOS are better than ones of the existing algorithms.

Originality/value

Existing text‐based search engines do not take structure and style of documents into account. SOS can find similar documents by calculating similarities between multiple XML files corresponding to body texts, structures and styles.

Article
Publication date: 1 April 1987

Lorna Cullen

In this second part of the report on Printed Circuit World Convention IV held at the Tokyo Prince Hotel, Tokyo, from 3–5 June 1987, a general synopsis of the content of the papers…

Abstract

In this second part of the report on Printed Circuit World Convention IV held at the Tokyo Prince Hotel, Tokyo, from 3–5 June 1987, a general synopsis of the content of the papers presented in the eighteen technical sessions will be given. As three sessions were run in parallel throughout the 2½‐day conference, and therefore not all presentations were heard by those reporting on the technical programme, a number of them have been briefly summarised from the Convention Proceedings.

Details

Circuit World, vol. 14 no. 1
Type: Research Article
ISSN: 0305-6120

1 – 2 of 2