2016 technical reports

CS-2016-01
Title	Providing Serializability for Pregel-like Graph Processing Systems
Authors	Minyang Han and Khuzaima Daudjee
Abstract	We apply recent work on referring expression types to the issue of identification in conceptual modelling. In particular, we consider how such types yield a separation of concerns in a setting where an information system based on a conceptual schema is to be mapped to a relational schema plus SQL queries. We start from a simple object-centered representation (as in semantic data models), where naming is not an issue because everything is self-identified (possibly using surrogates). We then allow the analyst to attach to every class a preferred "referring expression type", and to specify uniqueness constraints in the form of generalized functional dependencies. We show (1) how a number of well-formedness conditions concerning an assignment of referring expressions can be efficiently diagnosed, and (2) how a concrete relational schema and SQL queries over this schema are derived from a combination of the conceptual schema and queries over it, once identification issues have been separately resolved as above.
Date	February 1, 2016
Report	Providing Serializability for Pregel-like Graph Processing Systems (PDF)

CS-2016-02
Title	Distributed Data Deduplication
Authors	Xu Chu, Ihab Ilyas and Paraschos Koutris
Abstract	Data deduplication refers to the process of identifying tuples in a relation that refer to the same real world entity. The complexity of the problem is inherently quadratic with respect to the number of tuples, since a similarity value must be computed for every pair of tuples. In order to avoid comparing tuple pairs that are obviously non-duplicates, matching algorithms use blocking techniques that divide the tuples into blocks and compare only tuples within the same block. However, even with the use of blocking, data deduplication remains a costly problem for large datasets. In this paper, we show how to further speed up data deduplication by leveraging parallelism in a shared-nothing computing environment. Our main contribution is a distribution strategy, called \disdedup, that minimizes the maximum workload across all worker nodes and provides strong theoretical guarantees. We demonstrate the effectiveness of our proposed strategy by performing extensive experiments on both synthetic datasets with varying block size distributions, as well as real world datasets.
Date	February 1, 2016
Report	Distributed Data Deduplication (PDF)

CS-2016-03
Title	On Referring Expressions in Information Systems derived from Conceptual Models
Authors	Alexander Borgida, David Toman and Grant Weddell
Abstract	We apply recent work on referring expression types to the issue of identification in conceptual modelling. In particular, we consider how such types yield a separation of concerns in a setting where an information system based on a conceptual schema is to be mapped to a relational schema plus SQL queries. We start from a simple object-centered representation (as in semantic data models), where naming is not an issue because everything is self-identified (possibly using surrogates). We then allow the analyst to attach to every class a preferred "referring expression type", and to specify uniqueness constraints in the form of generalized functional dependencies. We show (1) how a number of well-formedness conditions concerning an assignment of referring expressions can be efficiently diagnosed, and (2) how a concrete relational schema and SQL queries over this schema are derived from a combination of the conceptual schema and queries over it, once identification issues have been separately resolved as above.
Date	April 28, 2016
Report	On Referring Expressions in Information Systems derived from Conceptual Models (PDF)

CS-2016-04
Title	Feature-Oriented Modelling in BIP: A Case Study
Authors	Cecylia Bocovich and Joanne Atlee
Abstract	In this paper, we investigate the usage of Behaviour-Interaction-Priority version 2 (BIP2), a component-based modelling framework, for specifying feature-oriented systems. We evaluate BIP2 in the context of the Feature Interaction Problem and quantify the amount of work needed to add features to an existing system (i.e., in terms of rework to existing features, and work to identify and specify interactions). We present the results of a case study on a telephony system with five optional features where we found that the amount of work depends heavily on how features are interconnected. We identify a number of different strategies for interconnecting features, and propose one that reduces the amount of work and rework needed to add new features to an existing system.
Date	September 20, 2016
Report	Feature-Oriented Modelling in BIP: A Case Study (PDF)

CS-2016-05
Title	Improving Time-of-Use Electricity Pricing in Ontario
Authors	Adedamola Adepetu, Srinivasan Keshav
Abstract	Time-of-Use (ToU) electricity pricing is an electricity pricing scheme where consumers are charged at a rate that is dependent on the time of electricity consumption. This pricing scheme is often implemented to match the cost of generating and supplying electricity, and to make consumers defer appliance usage; this would reduce the daily electricity consumption peak that can both reduce the cost of generation and carbon footprints. We first critique the current ToU scheme in Ontario and make recommendations to improve it. Subsequently, we create an Agent-Based Model (ABM) to study ToU pricing and its effectiveness in reducing peak loads, which allows us to evaluate the benefit of our recommendations. We nd that while ToU is effective in incentivizing load deferral, improvements can be made in the Ontario ToU scheme. Keywords: demand response, agent-nased model, electricity pricing
Date	September 20, 2016
Report	Improving Time-of-Use Electricity Pricing in Ontario (PDF)