2003 technical reports

CS-2003-01
Title	Processing Sliding Window Multi-Joins in Continuous Queries
Authors	Lukasz Golab and M. Tamer Ozsu
Abstract	We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under assumption that all the sliding windows fit in main memory. The algorithms include multi way incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize the processing cost per unit time. We test a possible implementation of these algorithms and show that as expected, has joins are faster than NLJs for performing equi-joins, and that the overall processing cost is influenced by the strategies used to remove expired tuples from the sliding windows
Date	February 2003
Report	Processing Sliding Window Multi-Joins in Continuous Queries (PDF)

CS-2003-04
Title	A Roadmap to a Medeling Language for Multi-Agent Systems Engineering
Authors	Viviane Silva, Carlos Lucena, Paulo Alencar and Donald Cowan
Abstract	In this paper we present a conceptual framework called TAO that organizes the core concepts related to multi-agent systems and assumes that agents and objects co-exist as independent and unique abstractions. We then use the TAO framework to create the multi-agent system modelling language MAS-ML, which extends UML (Unified Modelling Language). MAS-ML is defined as a conservative extension of the UML metamodel, and includes agent-related notions that are part of the TAO conceptual framework while preserving all object-related concepts, which constitute the UML metamodel. Creating a modelling language for multi-agent systems is a complex task and we should consider following a process or roadmap similar to the one that led to the development of the UML and its various sub-models and supporting facilities.
Date	February 2003
Report	A Roadmap to a Medeling Language for Multi-Agent Systems Engineering (PDF)

CS-2003-06
Title	Towards Identifying Frequent Items in Sliding Windows
Authors	David DeHaan, Erik D. Demaine, Lukasz Golab, Alejandro Lopez-Ortiz and J. Ian Munro
Abstract	Queries that return a list of frequently occurring items are popular in the analysis of data streams such as real-time Internet traffic logs. While several resultsexist for computing frequent item queries using limited memory in the infinite stream model, none have been extended to the limited-memory sliding window model, which considers only the last N items that have arrived at any given time and forbids the storage of the entire window in memory. We present several algorithms for identifying frequent items in sliding windows, both under arbitrary distributions and assuming that each window conforms to a multinomial distribution. The former is a straightforward extension of existing algorithms and is shown to work well when tested on real-life TCP traffic logs. Our algorithms for the multinomial distribution are shown to outperform classical inference based on random sampling from the sliding window, but lose their accuracy as predictors of item frequencies when the underlying distribution is not multinomial.
Date	March 2003
Report	Towards Identifying Frequent Items in Sliding Windows (PDF)

CS-2003-10
Title	An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint
Authors	Claude-Guy Quimper, Peter van Beek, Alejandro Lopez-Ortiz, Alexander Golynski and Sayyed Bashir Sadjad
Abstract	Previous studies have demonstrated that designing special purpose constraint propagators can significantly improve the efficiency of a constraint programming approach. In this paper we present an efficient algorithm for bounds consistency propagation of the generalized cardinality constraint "gcc". Using a variety of benchmark and random problems, we show that our bounds consistency algorithm is competitive with and can dramatically outperform existing state-of-the-art commercial implementations of constraint propagators for the gcc. We also present a new algorithm for domain consistency propagation of the gcc which improves on the worst-case performance of the best previous algorithm for problems that occur often in applications.
Date	February 2003
Report	An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint (PS)		An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint (PDF)

CS-2003-21
Title	A Geometric B-Spline Over the Triangular Domain
Authors	Christopher K. Ingram
Abstract	For modelling curves, B-splines [3] are among the most versatile control schemes. However, scaling this technique to surface patches has proven to be a non-trivial endeavor. While a suitable scheme exists for rectangular patches in the form of tensor product B-splines, techniques involving the triangular domain are much less spectacular. The current cutting edge in triangular B-splines is the DMS-spline [2]. While the resulting surfaces possess high degrees of continuity, the control scheme is awkward and the evaluation is computationally expensive. A more fundamental problem is the construction bears little resemblance to the construction used for the B-Spline. This deciency leads to the central idea of the thesis; what happens if the simple blending functions found at the heart of the B-Spline construction are used over higher dimension domains? In this thesis I develop a geometric generalization of B-Spline curves over the triangular domain. This construction mimics the control point blending that occurs with uniform B-Splines. The construction preserves the simple control scheme and evaluation of B-Splines, without the immense computational requirements of DMSsplines. The result is a new patch control scheme, the G-Patch, possessing C0 continuity between adjacent patches.
Date	June 2003
Report	A Geometric B-Spline Over the Triangular Domain (PDF)

CS-2003-22
Title	Offset Surface Light Fields
Authors	Jason Ang
Abstract	For producing realistic images, reflection is an important visual effect. Reflections of the environment are important not only for highly reflective objects, such as mirrors, but also for more common objects such as brushed metals and glossy plastics. Generating these reflections accurately at real-time rates for interactive applications, however, is a difficult problem. Previous works in this area have made assumptions that sacrifice accuracy in order to preserve interactivity. I will present an algorithm that tries to handle reflection accurately in the general case for real-time rendering. The algorithm uses a database of prerendered environment maps to render both the original object itself and an additional bidirectional reflection distribution function (BRDF). The algorithm performs image-based rendering in reflection space in order to achieve accurate results. It also uses graphics processing unit (GPU) features to accelerate rendering.
Date	August 2003
Report	Offset Surface Light Fields (PDF)

CS-2003-24
Title	Evaluation of DBMSs Using XBench Benchmark
Authors	M. Tamer Ozsu and Benjamin B. Yao
Abstract	XML support is being added to existing database management systems (DBMSs) and native XML systems are being developed both in industry and in academia. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern. XBench is a family of XML benchmarks which recognizes that the XML data that DBMSs manage are quite varied and no one database schema and workload can properly capture this variety. Thus, the members of this benchmark family have been defined for capturing diverse application domains. In this report we briefly discuss the XBench XML benchmark and report on the relative performance of three commercial DBMSs: X-Hive, DB2 and SQL Server. We also describe the potential issues when storing XML documents into relational DBMSs.
Date	August 2003
Report	Evaluation of DBMSs Using XBench Benchmark (PDF)

CS-2003-25
Title	Investigations in Tree Locking for Compiled Database Applications
Authors	Heng YU and Grant Weddell
Abstract	We report on initial experiments in tree locking schemes for compiled database applications. Such applications have a repository style of architecture in which a collection of software modules or subsystems operate on a common database in terms of a predefined set of transaction types, and are very often at the core of embedded systems. Since the tree locking protocol is deadlock free, it becomes possible to decouple recovery mechanisms from concurrency control, a property that we believe is critical to the successful deployment of database technology to this new application area. Our experiments show that the performance of tree locking can compete with two phase locking for cases such asmthe above in which a great deal can be known at the time of system generation about workloads.
Date	September 2003
Report	Investigations in Tree Locking for Compiled Database Applications (PDF)

CS-2003-26
Title	A Study on Atmospheric Halo Visualization
Authors	Sung Min Hong and Gladimir Baranoski
Abstract	In this report, the atmospheric halo phenomena have been explained and current trends of the halo simulation have been described. Then A ray tracer based on the Monte Carlo method has been presented and demonstrated. The comparison of the simulation results with the photos taken for the same phenomena has shown that the ray tracer gives reasonable results. This ray tracer provides a very useful tool for simulating the halo phenomena and identifying atmospheric data like proportions of ice crystals and orientations. With some enhancements of the ray tracer, which are future works, the ray tracer is expected to give physically correct realistic halo images.
Date	September 2003
Report	A Study on Atmospheric Halo Visualization (PDF)

CS-2003-27
Title	Supporting Set-at-a-time Extensions for XML through DOM
Authors	Hai (Helena) Chen, Frank Tompa
Abstract	With the rapid growth of the web and e-commerce, W3C produced a new standard, XML, as a universal data representation format to facilitate information interchange and integration from heterogeneous systems. In order to process XML, documents need to be parsed and then users can access, manipulate, and retrieve the XML data easily. DOM is one of the major application programming interfaces, which provides the abstract, logical tree structure of an XML document. Though it is a very promising initiative in defining the standard interface for XML documents to access all data required by applications, significant benefit can result if some functions are extended. In this thesis, we want to support set-at-a-time extensions for XML through DOM. Through extending some powerful functions to get summary information from different groups of related document elements, and filter, extract and transform this information as a sequence of nodes, the extended DOM can reduce the communications overhead and response time between the client and the server and therefore provide applications with more convenience. Our work is to explore the ideas for set-at-a-time processing, define and implement some suitable methods, and code some query application examples query using the DOM and our extensions to make comparisons through a test system. Finally, future work will be given.
Date	September 2003
Report	Supporting Set-at-a-time Extensions for XML through DOM (PDF)

CS-2003-28
Title	On the Relevance of Self-Similarity in Network Traffic Prediction
Authors	Majid Ghaderi
Abstract	Self-similarity is an important characteristic of traffic in high speed networks which can not be captured by traditional traffic models. Traffic predictors based on non-traditional long-memory models are computationally more complex than traditional predictors based on short-memory models. Even online estimation of their parameters for actual traffic traces is not trivial work. Based on the observation that the Hurst parameter of real traffic traces rarely exceeds 0.85, which means that real traffic does not exhibit strong long-range dependence, and the fact that infinite history is not possible in practice, we propose to use a simple non-model-based minimum mean square error predictor. In this paper, we look at the problem of traffic prediction in the presence of self-similarity. We briefly describe a number of short-memory and long-memory stochastic traffic models and talk about non-model-based predictors, particularly minimum mean square error and its normalized version. Numerical results of our experimental comparison between the so-called fractional predictors and the simple minimum mean square error predictor show that this simple method can achieve accuracy within $5\%$ of the best fractional predictor while it is much simpler than any model-based predictor and is easily used in an on-line fashion.
Date	October 2003
Report	On the Relevance of Self-Similarity in Network Traffic Prediction (PS)		On the Relevance of Self-Similarity in Network Traffic Prediction (PDF)	Compressed PostScript: On the Relevance of Self-Similarity in Network Traffic Prediction (GZIP)

CS-2003-29
Title	On Indexing Sliding Windows over On-line Data Streams
Authors	Lukasz Golab,Shaveen Garg and M. Tamer Ozsu
Abstract	We consider indexing sliding windows in main memory over on-line data streams. Our proposed data structures and query semantics are based on a division of the sliding window into sub-windows. When a new sub-window fills up with newly arrived tuples, the oldest sub-window is evicted, indices are refreshed, and continuous queries are re-evaluated to reflect the new state of the window. By classifying relational operators according to their method of execution in the windowed scenario, we show that many useful operators require access to the entire window, motivating the need for two types of indices: those which provide a list of attribute values and their counts for answering set-valued queries, and those which provide direct access to tuples for answering attribute-valued queries. For the former, we evaluate the performance of linked lists, search trees, and hash tables as indexing structures, showing that the high costs of maintaining such structures over rapidly changing data are offset by the savings in query processing costs. For the latter, we propose novel ways of maintaining windowed ring indices, which we show to be much faster than conventional ring indices and more efficient than executing windowed queries without an index.
Date	September 2003
Report	On Indexing Sliding Windows over On-line Data Streams (PDF)

CS-2003-30
Title	Symbolic Representation and Retrieval of Moving Object Trajectories
Authors	Lei Chen, Vincent Oria, and M. Tamer Ozsu
Abstract	Similarity-based retrieval of moving object trajectory is useful to many applications, such as GPS system, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory records. Hence, it is difficult to design a robust and fast similarity measure for similarity-based retrieval in a large database. In this paper, we propose a normalized edit distance (NED) to measure the similarity between two trajectories. Compared to Euclidean distance, Dynamic Time Warping (DTW), and Longest Common Subsequences (LCSS), NED is more robust and accurate for trajectories contain noise and local time shifting. In order to improve the retrieval efficiency, we further convert the trajectories into a symbolic representation, called movement pattern strings, which encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. Furthermore, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality of movement pattern strings and computation cost of NED. The tests that we conducted indicate that the normalized edit distance is effective and the matching techniques are efficient.
Date	September 2003
Report	Symbolic Representation and Retrieval of Moving Object Trajectories (PDF)

CS-2003-33
Title	Generalized Infinite Undo and Speculative User Interfaces
Authors	Alexjandro Lopez-Ortiz
Abstract	We study the potential benefits of a computations model supporting an unlimited number of undo-like operations. We argue that, when implemented to its full generality this result s in a novel mode of human-computer interaction in which the user session is not linear in time. The model proposed is also more resilient in the presence of user or computer error. This resiliency can be exploited by a speculative user interface (SUI) which guesses user intentions. If the prediction is incorrect, the generalized infinite undo facility provides the ability to extemporaneously roll back the misprediction. We observe that this model is only now becoming a realistic possibility due to the current surplus of available CPU cycles on a modern desktop.
Date	September 2003
Report	Generalized Infinite Undo and Speculative User Interfaces (PS)		Generalized Infinite Undo and Speculative User Interfaces (PDF)

CS-2003-34
Title	Exploiting Statistics of Web Traces to Improve Caching Algorithms
Authors	Alexander Golynski, Alejandro L´opez-Ortiz and Ray Sweidan
Abstract	Web caching plays an important role in reducing network traffic, user delays, and server load. One important factor in describingthe stream of requests made to a given server is the popularity oflinks between accessed pages. For example, the probability of requesting page A increases if a user makes a request to a page which contains a link to page A. Furthermore, this probability depends on the amount of time elapsed since the request was made to the page containing the link and on popularity of the link. In this project we (1) analyze web access logs and determine the frequency distribution of access and mean life expectancy of correlations, (2) propose two new cache replacement policies that use the above distribution, and (3) evaluate the effectiveness of the new policies by comparing them to widely-used algorithms such as Greedy Dual-Size (GDS) and Greedy Dual-Frequency (GDSF).
Date	October 2003
Report	Exploiting Statistics of Web Traces to Improve Caching Algorithms (PS)		Exploiting Statistics of Web Traces to Improve Caching Algorithms (PDF)	Compressed PostScript: Exploiting Statistics of Web Traces to Improve Caching Algorithms (GZIP)

CS-2003-36
Title	Using Word Position in Documents for Topic Characterization
Authors	Authors: Reem K. Al-Halimi, Frank W. Tompa
Abstract	In this report we show how to use the position of words in a text for characterizing the topic of the text, and compare our method to measures that use frequency statistics that are independent of word position. We show that word position information produces words that are more suited for characterizing topics and at the same time relies on a vocabulary size that is as little as 10\%$ of the size used by the other measures.
Date	October 2003
Report	Using Word Position in Documents for Topic Characterization (PDF)

CS-2003-39
Title	Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard
Authors	Claude-Guy Quimper
Abstract	In a global cardinality constraint, there is a set of variables X = {x_1, ... x_n} and a set of values D. Each variable x_i is associated to a domain dom(x_i) contained in D and each value v in D is associated to a cardinality set K(v). An assignment satisfies the extended global cardinality constraint (extended-GCC) if each variable x_i is instantiated to a value in its domain dom(x_i) and if each value v in D is assigned to k variables for some k in K(v). Extended-GCC differs from normal GCC in the fact that its sets of cardinality K(v) can be any set of values. In normal GCC these cardinality sets are restricted to intervals. We prove that enforcing domain consistency on the extended-GCC is NP-hard.
Date	November 2003
Report	Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard (PS)		Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard (PDF)

CS-2003-41
Title	Call Admission Control for Voice/Data Integration in Broadband Wireless Networks
Authors	Majid Ghaderi and Raouf Boutaba
Abstract	This paper addresses bandwidth allocation for an integrated voice/data broadband mobile wireless network. Specifically, we propose a new admission control scheme called EFGC, which is an extension of the well-known fractional guard channel scheme proposed for cellular networks supporting voice traffic. The main idea is to use two acceptance ratios, one for voice calls and the other for data calls in order to maintain the proportional service quality for voice and data traffic while guaranteeing a target handoff failure probability for voice calls. We describe two variations of the proposed scheme: EFGC-REST, a conservative approach which aims at preserving the proportional service quality by sacrificing the bandwidth utilization; and EFGC-UTIL, a greedy approach which achieves higher bandwidth utilization at the expense of increasing the handoff failure probability for voice calls. Extensive simulation results show that our schemes satisfy the hard constraints on handoff failure probability and service differentiation while maintaining a high bandwidth utilization.
Date	November 2003
Report	Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (PS)		Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (PDF)	Compressed PostScript: Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (GZIP)

CS-2003-45
Title	Revisiting the Foundations of Subsurface Scattering
Authors	Gladimir V. G. Baranoski, Aravind Krishnaswamy and Bradley Kimmel
Abstract	Despite the significant advances in rendering, we are far from being able to automatically generate realistic and predictable images of organic materials such as plant and human tissues. Creating convincing pictures of these materials is usually accomplished by carefully adjusting rendering parameters. A key issue in this context is the simulation of subsurface scattering. Current algorithmic models usually rely on scattering approximations based on the use of phase functions, notably the Henyey-Greenstein phase function and its variations, which were not derived from biophysical principles of any organic material, and whose parameters have no biological meaning. In this report, we challenge the validity of this approach for organic materials. Initially, we present an original chronology of the use of these phase functions in tissue optics simulations, which highlights the pitfalls of this approach. We then demonstrate that a significant step toward predictive subsurface scattering simulations can be given by replacing it by a more efficient and accurate data oriented approach. Our investigation is supported by comparisons involving the original measured data that motivated the application of phase functions in tissue subsurface scattering simulations. We hope that the results of our investigation will help strengthen the biophysical basis required for the predictive rendering of organic materials.
Date	December 2003
Report	Revisiting the Foundations of Subsurface Scattering (PDF)