DETECTING APPROXIMATELY DUPLICATE BIBLIOGRAPHIC RECORDS WITH TEXT ALGORITHMS: EXPERIENCE OF CREATING A UNION CATALOGUE OF LIBRARIES AT THE WARSAW UNIVERSITY OF TECHNOLOGY

GRZEGORZ PŁOSZAJSKI

DETECTING APPROXIMATELY DUPLICATE BIBLIOGRAPHIC RECORDS WITH TEXT ALGORITHMS: EXPERIENCE OF CREATING A UNION CATALOGUE OF LIBRARIES AT THE WARSAW UNIVERSITY OF TECHNOLOGY

Abstract

The paper describes a fault-tolerant method of selecting duplicate bibliographic records in catalogues. The method is based on the use of text algorithms; decisions are suggested to librarians who make the final decision. The method was applied to four library catalogues at the Warsaw University of Technology which were compared with the catalogue of the main library. Process of joining catalogues was conducted differently for nonduplicate records and for duplicate ones. Thanks to this method, a significant portion of records in the catalogues of the joining libraries had been found to be duplicate before the catalogues were added. The algorithms proved helpful in assuring high quality of information.

Keywords:

duplicate record resolution, n-grams, text algorithms

Details

Issue: Vol. 7 No. 2 (2003)
Section: Research article
Published: 2003-06-30
Licencja:: This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors

GRZEGORZ PŁOSZAJSKI

Warsaw University of Technology, Main Library, Faculty of Electronics and Information Technology, Politechniki 1, 00-661 Warsaw, Poland

Download paper

pdf

Main menu

DETECTING APPROXIMATELY DUPLICATE BIBLIOGRAPHIC RECORDS WITH TEXT ALGORITHMS: EXPERIENCE OF CREATING A UNION CATALOGUE OF LIBRARIES AT THE WARSAW UNIVERSITY OF TECHNOLOGY

Abstract

Keywords:

Details

Authors

GRZEGORZ PŁOSZAJSKI

Download paper