Free download pdf book Automatic Disambiguation of Author Names in Bibliographic Repositories (Synthesis Lectures on Information Concepts, Retrieval, and S) by Anderson A. Ferreira, Marcos André Gonçalves, Alberto H. F. Laender
Overview of the pdf book Automatic Disambiguation of Author Names in Bibliographic Repositories (Synthesis Lectures on Information Concepts, Retrieval, and S)
This book offers with a tough drawback that’s inherent to human language: ambiguity. In explicit, we concentrate on writer title ambiguity, a kind of ambiguity that exists in digital bibliographic repositories, which happens when an writer publishes works beneath distinct names or distinct authors publish works beneath related names. This drawback could also be attributable to a quantity of causes, together with the dearth of requirements and customary practices, and the decentralized technology of bibliographic content material. As a consequence, the standard of the primary companies of digital bibliographic repositories reminiscent of search, looking, and advice could also be severely affected by writer title ambiguity. The point of interest of the book is on automated strategies, since guide options don’t scale to the dimensions of the present repositories or the velocity in which they’re up to date. Accordingly, we offer an ample view on the issue of automated disambiguation of writer names, summarizing the outcomes of greater than a decade of analysis on this matter performed by our group, which had been reported in greater than a dozen publications that acquired over 900 citations thus far, in keeping with Google Scholar. We begin by discussing its motivational points (Chapter 1). Next, we formally outline the writer title disambiguation activity (Chapter 2) and use this formalization to supply a quick, taxonomically organized, overview of the literature on the subject (Chapter 3). We then arrange, summarize and combine the efforts of our personal group on creating options for the issue which have traditionally produced state-of-the-art (by the point of their proposals) outcomes in phrases of the standard of the disambiguation outcomes. Thus, Chapter 4 covers HHC – Heuristic-based Clustering, an writer title disambiguation methodology that’s primarily based on two particular real-world assumptions concerning scientific authorship. Then, Chapter 5 describes SAND – Self-training Author Name Disambiguator and Chapter 6 presents two incremental writer title disambiguation strategies, particularly INDi – Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 gives an outline of current writer title disambiguation strategies that tackle new particular approaches reminiscent of graph-based representations, different predefined similarity capabilities, visualization services and approaches primarily based on synthetic neural networks. The chapters are adopted by three appendices that cowl, respectively: (i) a sample matching perform for evaluating correct names and utilized by some of the strategies addressed in this book; (ii) a device for producing artificial collections of quotation data for distinct experimental duties; and (iii) a quantity of datasets generally used to guage writer title disambiguation strategies. In abstract, the book organizes a big physique of data and work in the world of writer title disambiguation in the final decade, hoping to consolidate a strong foundation for future developments in the sector.