This year's symposium will consist of talks by David R. Cheriton, and Faculty Fellowship recipients, Ihab Ilyas and M. Tamer Özsu.
Posters by Cheriton Graduate Student Scholarship recipients will be on display in the Great Hall, Davis Centre from 10:00 am to 3:00 pm.
Schedule of the Day (Tentative)
Time | Description |
---|---|
10:00am - 3:00pm | DC Great Hall - Poster Session |
10:30am | DC 1302 - Mark Giesbrecht - Welcome & Opening Remarks |
10:45am - 11:30am | DC 1302 - David Cheriton - HICAMP Bitmap: Space-efficient updatable bitmap index for in-memory databases |
12:00pm-1:00pm | DC 1301 - Lunch |
3:00pm-3:45pm |
DC 1302 - Ihab Ilyas - Data Cleaning from Theory to Practice With decades of research on the various aspects of data cleaning, multiple technical challenges have been tackled and interesting results have been published in many research papers. Example quality problems include missing values, functional dependency violations and duplicate records. Unfortunately, very little success can be claimed in adopting any of these results in practice. Businesses and enterprises are building silos of home-grown data curation solutions under various names, often referred to as ETL layers in the business intelligence stack. The impedance mismatch between the challenges faced in industry and the challenges tackled in research papers explain to a large extent the growing gap between the two worlds. In this talk I claim that being pragmatic in developing data cleaning solution does not necessarily mean being unprincipled or ad-hoc. I discuss a subset of these practical challenges including data ownership, human involvement, and holistic data quality concerns. These new set of challenges often hinder current research proposals from being adopted in the real world. I also go through a quick overview of the approach we use in tamr (a data curation startup) to tackle these challenges. |
3:45pm - 4:30pm |
DC 1302 - M. Tamer Özsu - Web Data Management in the RDF Age Web data management has been a topic of interest for many years during which a number of different modelling approaches have been tried. The latest in this approaches is to use RDF (Resource Description Framework), which seems to provide real opportunity for querying at least some of the web data systematically. RDF has been proposed by the World Wide Web Consortium (W3C) for modeling Web objects as part of developing the "semantic web". W3C has also proposed SPARQL as the query language for accessing RDF data repositories. The publication of Linked Open Data (LOD) on the Web has gained tremendous momentum over the last number of years, and this provides a new opportunity to accomplish web data integration. A number of approaches have been proposed for running SPARQL queries over RDF-encoded Web data: data warehousing, SPARQL federation, and live linked query execution. In this talk, I will review these approaches with particular emphasis on some of our research within the context of gStore project (joint project with Prof. Lei Zou of Peking University and Prof. Lei Chen of Hong Kong University of Science and Technology), chameleon-db project (joint work with Günes Aluç, Dr. Olaf Hartig, and Prof. Khuzaima Daudjee of University of Waterloo), and live linked query execution Joint work with Dr. Olaf Hartig. |
Previous Symposiums