Data Rescue: No Pain, No Gain - 'Rescuing' Historical Data USSR Meteorological Data

Citation:

Sookoo, N. N. , Persaud, B. D. , Bocaniov, S. , Szigeti, K. , Van Wychen, W. , & Van Cappellen, P. . (2019). Data Rescue: No Pain, No Gain - 'Rescuing' Historical Data USSR Meteorological Data. AGU Fall Meeting, San Francisco. Retrieved from https://agu2019fallmeeting-agu.ipostersessions.com/default.aspx?s=EC-06-59-A6-E7-07-03-BA-4B-FB-BC-92-63-E5-0E-5B

Abstract:

Recent years have seen an acceleration of climate change related impacts across the world, ranging from amplified melting of Greenland’s ice sheet to freezing Arctic temperatures in US cities. The key to unraveling long-term climate change trends is the availability of reliable, high resolution meteorological data time series. However, prior to electronic data storage and distribution via the internet, historical climate records were often sparsely distributed and of limited access to researchers. Here, we report on the serendipitous “discovery” of 20th Century paper-based meteorological records (1950s-1990s) from the former Union of Soviet Socialist Republics (USSR). In 2017, the Canadian Cryospheric Information Network/Polar Data Catalogue (CCIN/PDC) acquired the records. To prevent deterioration of the books, CCIN/PDC, the Global Water Futures (GWF) Program and the Davis Centre Library joined forces to sort, document and digitize the data, to make them publicly available to researchers. The process began in May 2019 with sorting of the various files and the development of a metadata spreadsheet with the titles of the data series, date and location of the observations, type of data presentation and climate variables. We established that the collection contained fifty different series and five major series amounting to 2172 booklets with data displayed in the form of maps and tables. Initially, we decided to process the maps, but this proved to be a troublesome process. In particular, the scanner for the large maps malfunctioned daily and the process of digitizing needed to be done manually. We therefore shifted focus to processing tabular solar radiation data. Multiple Optical Character Recognition software was sourced and tested for its ability to detect characters properly and to export the data reliably. Based on the efforts so far, we designed workflows for the processing of the maps and tables. After completion of the data extraction, the data, along with their metadata, will be uploaded and stored on the CCIN/PDC’s data portal. Despite the difficulties faced during the data rescue project, successes were booked, and lessons learnt. Most importantly, the data are yielding new information from the era when global environmental change is becoming recognized as a major challenge facing science and society.

Notes:

Publisher's Version

Last updated on 07/02/2020