The largest collection in UWSpace, the Library-hosted repository of Waterloo research, is the university’s ever-growing collection of electronic theses and dissertations. As these are added to the collection by their authors—not library staff—their metadata are of varying quality, and the subject keyword vocabulary is uncontrolled.
In this presentation, Jordan Hale and Larisa Smyk will discuss their semi-automated approach to cleaning up the subject index, from the technical constraints posed by the DSpace repository platform to the special considerations of working within a STEM environment, and will demonstrate how OpenRefine provides us with a starting point for tidying up this research collection. OpenRefine is a freely available application for data wrangling—data cleanup and transformation to other formats.
200 University Avenue West
Waterloo, ON N2L 3G1
Canada