2003 technical reports

CS-2003-01
Title Processing Sliding Window Multi-Joins in Continuous Queries
Authors Lukasz Golab and M. Tamer Ozsu
Abstract We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under assumption that all the sliding windows fit in main memory. The algorithms include multi way incremental nested loop joins (NLJs) and multi-way incremental hash joins. We also propose join ordering heuristics to minimize the processing cost per unit time. We test a possible implementation of these algorithms and show that as expected, has joins are faster than NLJs for performing equi-joins, and that the overall processing cost is influenced by the strategies used to remove expired tuples from the sliding windows
Date February 2003
Report Processing Sliding Window Multi-Joins in Continuous Queries (PDF)
CS-2003-04
Title A Roadmap to a Medeling Language for Multi-Agent Systems Engineering
Authors Viviane Silva, Carlos Lucena, Paulo Alencar and Donald Cowan
Abstract In this paper we present a conceptual framework called TAO that organizes the core concepts related to multi-agent systems and assumes that agents and objects co-exist as independent and unique abstractions. We then use the TAO framework to create the multi-agent system modelling language MAS-ML, which extends UML (Unified Modelling Language). MAS-ML is defined as a conservative extension of the UML metamodel, and includes agent-related notions that are part of the TAO conceptual framework while preserving all object-related concepts, which constitute the UML metamodel. Creating a modelling language for multi-agent systems is a complex task and we should consider following a process or roadmap similar to the one that led to the development of the UML and its various sub-models and supporting facilities.
Date February 2003
Report A Roadmap to a Medeling Language for Multi-Agent Systems Engineering (PDF)
CS-2003-06
Title Towards Identifying Frequent Items in Sliding Windows
Authors David DeHaan, Erik D. Demaine, Lukasz Golab, Alejandro Lopez-Ortiz and J. Ian Munro
Abstract Queries that return a list of frequently occurring items are popular in the analysis of data streams such as real-time Internet traffic logs. While several resultsexist for computing frequent item queries using limited memory in the infinite stream model, none have been extended to the limited-memory sliding window model, which considers only the last N items that have arrived at any given time and forbids the storage of the entire window in memory. We present several algorithms for identifying frequent items in sliding windows, both under arbitrary distributions and assuming that each window conforms to a multinomial distribution. The former is a straightforward extension of existing algorithms and is shown to work well when tested on real-life TCP traffic logs. Our algorithms for the multinomial distribution are shown to outperform classical inference based on random sampling from the sliding window, but lose their accuracy as predictors of item frequencies when the underlying distribution is not multinomial.
Date March 2003
Report Towards Identifying Frequent Items in Sliding Windows (PDF)
CS-2003-10
Title An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint
Authors Claude-Guy Quimper, Peter van Beek, Alejandro Lopez-Ortiz, Alexander Golynski and Sayyed Bashir Sadjad
Abstract Previous studies have demonstrated that designing special purpose constraint propagators can significantly improve the efficiency of a constraint programming approach. In this paper we present an efficient algorithm for bounds consistency propagation of the generalized cardinality constraint "gcc". Using a variety of benchmark and random problems, we show that our bounds consistency algorithm is competitive with and can dramatically outperform existing state-of-the-art commercial implementations of constraint propagators for the gcc. We also present a new algorithm for domain consistency propagation of the gcc which improves on the worst-case performance of the best previous algorithm for problems that occur often in applications.
Date February 2003
Report An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint (PS) An Efficient Bounds Consistency Algorithm for the Global Cardinality Constraint (PDF)
CS-2003-21
Title A Geometric B-Spline Over the Triangular Domain
Authors Christopher K. Ingram
Abstract For modelling curves, B-splines [3] are among the most versatile control schemes. However, scaling this technique to surface patches has proven to be a non-trivial endeavor. While a suitable scheme exists for rectangular patches in the form of tensor product B-splines, techniques involving the triangular domain are much less spectacular.
The current cutting edge in triangular B-splines is the DMS-spline [2]. While the resulting surfaces possess high degrees of continuity, the control scheme is awkward and the evaluation is computationally expensive. A more fundamental problem is the construction bears little resemblance to the construction used for the B-Spline. This deciency leads to the central idea of the thesis; what happens if the simple blending functions found at the heart of the B-Spline construction are used over higher dimension domains? In this thesis I develop a geometric generalization of B-Spline curves over the triangular domain. This construction mimics the control point blending that occurs with uniform B-Splines. The construction preserves the simple control scheme and evaluation of B-Splines, without the immense computational requirements of DMSsplines. The result is a new patch control scheme, the G-Patch, possessing C0 continuity between adjacent patches.
Date June 2003
Report A Geometric B-Spline Over the Triangular Domain (PDF)
CS-2003-22
Title Offset Surface Light Fields
Authors Jason Ang
Abstract For producing realistic images, reflection is an important visual effect. Reflections of the environment are important not only for highly reflective objects, such as mirrors, but also for more common objects such as brushed metals and glossy plastics. Generating these reflections accurately at real-time rates for interactive applications, however, is a difficult problem. Previous works in this area have made assumptions that sacrifice accuracy in order to preserve interactivity. I will present an algorithm that tries to handle reflection accurately in the general case for real-time rendering. The algorithm uses a database of prerendered environment maps to render both the original object itself and an additional bidirectional reflection distribution function (BRDF). The algorithm performs image-based rendering in reflection space in order to achieve accurate results. It also uses graphics processing unit (GPU) features to accelerate rendering.
Date August 2003
Report Offset Surface Light Fields (PDF)
CS-2003-24
Title Evaluation of DBMSs Using XBench Benchmark
Authors M. Tamer Ozsu and Benjamin B. Yao
Abstract XML support is being added to existing database management systems (DBMSs) and native XML systems are being developed both in industry and in academia. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern.

XBench is a family of XML benchmarks which recognizes that the XML data that DBMSs manage are quite varied and no one database schema and workload can properly capture this variety. Thus, the members of this benchmark family have been defined for capturing diverse application domains. In this report we briefly discuss the XBench XML benchmark and report on the relative performance of three commercial DBMSs: X-Hive, DB2 and SQL Server. We also describe the potential issues when storing XML documents into relational DBMSs.

Date August 2003
Report Evaluation of DBMSs Using XBench Benchmark (PDF)
CS-2003-25
Title Investigations in Tree Locking for Compiled Database Applications
Authors Heng YU and Grant Weddell
Abstract We report on initial experiments in tree locking schemes for compiled database applications. Such applications have a repository style of architecture in which a collection of software modules or subsystems operate on a common database in terms of a predefined set of transaction types, and are very often at the core of embedded systems. Since the tree locking protocol is deadlock free, it becomes possible to decouple recovery mechanisms from concurrency control, a property that we believe is critical to the successful deployment of database technology to this new application area. Our experiments show that the performance of tree locking can compete with two phase locking for cases such asmthe above in which a great deal can be known at the time of system generation about workloads.
Date September 2003
Report Investigations in Tree Locking for Compiled Database Applications (PDF)
CS-2003-26
Title A Study on Atmospheric Halo Visualization
Authors Sung Min Hong and Gladimir Baranoski
Abstract In this report, the atmospheric halo phenomena have been explained and current trends of the halo simulation have been described. Then A ray tracer based on the Monte Carlo method has been presented and demonstrated. The comparison of the simulation results with the photos taken for the same phenomena has shown that the ray tracer gives reasonable results. This ray tracer provides a very useful tool for simulating the halo phenomena and identifying atmospheric data like proportions of ice crystals and orientations. With some enhancements of the ray tracer, which are future works, the ray tracer is expected to give physically correct realistic halo images.
Date September 2003
Report A Study on Atmospheric Halo Visualization (PDF)
CS-2003-27
Title Supporting Set-at-a-time Extensions for XML through DOM
Authors Hai (Helena) Chen, Frank Tompa
Abstract With the rapid growth of the web and e-commerce, W3C produced a new standard, XML, as a universal data representation format to facilitate information interchange and integration from heterogeneous systems. In order to process XML, documents need to be parsed and then users can access, manipulate, and retrieve the XML data easily. DOM is one of the major application programming interfaces, which provides the abstract, logical tree structure of an XML document. Though it is a very promising initiative in defining the standard interface for XML documents to access all data required by applications, significant benefit can result if some functions are extended.

In this thesis, we want to support set-at-a-time extensions for XML through DOM. Through extending some powerful functions to get summary information from different groups of related document elements, and filter, extract and transform this information as a sequence of nodes, the extended DOM can reduce the communications overhead and response time between the client and the server and therefore provide applications
with more convenience. Our work is to explore the ideas for set-at-a-time processing, define and implement some suitable methods, and code some query application examples query using the DOM and our extensions to make comparisons through a test system. Finally, future work will be given.

Date September 2003
Report Supporting Set-at-a-time Extensions for XML through DOM (PDF)
CS-2003-28
Title On the Relevance of Self-Similarity in Network Traffic Prediction
Authors Majid Ghaderi
Abstract Self-similarity is an important characteristic of traffic in high speed networks which can not be captured by traditional traffic models. Traffic predictors based on non-traditional long-memory models are computationally more complex than traditional predictors based on short-memory models. Even online estimation of their parameters for actual traffic traces is not trivial work. Based on the observation that the Hurst parameter of real traffic traces rarely exceeds 0.85, which means that real traffic does not exhibit strong long-range dependence, and the fact that infinite history is not possible in practice, we propose to use a simple non-model-based minimum mean square error predictor.

In this paper, we look at the problem of traffic prediction in the presence of self-similarity. We briefly describe a number of short-memory and long-memory stochastic traffic models and talk about non-model-based predictors, particularly minimum mean square error and its normalized version. Numerical results of our experimental comparison between the so-called fractional predictors and the simple minimum mean square error predictor show that this simple method can achieve accuracy within $5\%$ of the best fractional predictor while it is much simpler than any model-based predictor and is easily used in an on-line fashion.

Date October 2003
Report On the Relevance of Self-Similarity in Network Traffic Prediction (PS) On the Relevance of Self-Similarity in Network Traffic Prediction (PDF) Compressed PostScript:
On the Relevance of Self-Similarity in Network Traffic Prediction (GZIP)
CS-2003-29
Title On Indexing Sliding Windows over On-line Data Streams
Authors Lukasz Golab,Shaveen Garg and M. Tamer Ozsu
Abstract We consider indexing sliding windows in main memory over on-line data streams. Our proposed data structures and query semantics are based on a division of the sliding window into sub-windows. When a new sub-window fills up with newly arrived tuples, the oldest sub-window is evicted, indices are refreshed, and continuous queries are re-evaluated to reflect the new state of the window. By classifying relational operators according to their method of execution in the windowed scenario, we show that many useful operators require access to the entire window, motivating the need for two types of indices: those which provide a list of attribute values and their counts for answering set-valued queries, and those which provide direct access to tuples for answering attribute-valued queries. For the former, we evaluate the performance of linked lists, search trees, and hash tables as indexing structures, showing that the high costs of maintaining such structures over rapidly changing data are offset by the savings in query processing costs. For the latter, we propose novel ways of maintaining windowed ring indices, which we show to be much faster than conventional ring indices and more efficient than executing windowed queries without an index.
Date September 2003
Report On Indexing Sliding Windows over On-line Data Streams (PDF)
CS-2003-30
Title Symbolic Representation and Retrieval of Moving Object Trajectories
Authors Lei Chen, Vincent Oria, and M. Tamer Ozsu
Abstract Similarity-based retrieval of moving object trajectory is useful to many applications, such as GPS system, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory records. Hence, it is difficult to design a robust and fast similarity measure for similarity-based retrieval in a large database. In this paper, we propose a normalized edit distance (NED) to measure the similarity between two trajectories. Compared to Euclidean distance, Dynamic Time Warping (DTW), and Longest Common Subsequences (LCSS), NED is more robust and accurate for trajectories contain noise and local time shifting. In order to improve the retrieval efficiency, we further convert the trajectories into a symbolic representation, called movement pattern strings, which encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. Furthermore, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality of movement pattern strings and computation cost of NED. The tests that we conducted indicate that the normalized edit distance is effective and the matching techniques are efficient.
Date September 2003
Report Symbolic Representation and Retrieval of Moving Object Trajectories (PDF)
CS-2003-33
Title Generalized Infinite Undo and Speculative User Interfaces
Authors Alexjandro Lopez-Ortiz
Abstract We study the potential benefits of a computations model supporting an unlimited number of undo-like operations. We argue that, when implemented to its full generality this result s in a novel mode of human-computer interaction in which the user session is not linear in time. The model proposed is also more resilient in the presence of user or computer error. This resiliency can be exploited by a speculative user interface (SUI) which guesses user intentions. If the prediction is incorrect, the generalized infinite undo facility provides the ability to extemporaneously roll back the misprediction. We observe that this model is only now becoming a realistic possibility due to the current surplus of available CPU cycles on a modern desktop.
Date September 2003
Report Generalized Infinite Undo and Speculative User Interfaces (PS) Generalized Infinite Undo and Speculative User Interfaces (PDF)
CS-2003-34
Title Exploiting Statistics of Web Traces to Improve Caching Algorithms
Authors Alexander Golynski, Alejandro LĀ“opez-Ortiz and Ray Sweidan
Abstract Web caching plays an important role in reducing network traffic, user delays, and server load. One important factor in describingthe stream of requests made to a given server is the popularity oflinks between accessed pages. For example, the probability of requesting page A increases if a user makes a request to a page which contains a link to page A. Furthermore, this probability depends on the amount of time elapsed since the request was made to the page containing the link and on popularity of the link. In this project we (1) analyze web access logs and determine the frequency distribution of access and mean life expectancy of correlations, (2) propose two new cache replacement policies that use the above distribution, and (3) evaluate the effectiveness of the new policies by comparing them to widely-used algorithms such as Greedy Dual-Size (GDS) and Greedy Dual-Frequency (GDSF).
Date October 2003
Report Exploiting Statistics of Web Traces to Improve Caching Algorithms (PS) Exploiting Statistics of Web Traces to Improve Caching Algorithms (PDF) Compressed PostScript:
Exploiting Statistics of Web Traces to Improve Caching Algorithms (GZIP)
CS-2003-36
Title Using Word Position in Documents for Topic Characterization
Authors Authors: Reem K. Al-Halimi, Frank W. Tompa
Abstract In this report we show how to use the position of words in a text for characterizing the topic of the text, and compare our method to measures that use frequency statistics that are independent of word position. We show that word position information produces words that are more suited for characterizing topics and at the same time relies on a vocabulary size that is as little as 10\%$ of the size used by the other measures.
Date October 2003
Report Using Word Position in Documents for Topic Characterization (PDF)
CS-2003-39
Title Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard
Authors Claude-Guy Quimper
Abstract In a global cardinality constraint, there is a set of variables X = {x_1, ... x_n} and a set of values D. Each variable x_i is associated to a domain dom(x_i) contained in D and each value v in D is associated to a cardinality set K(v). An assignment satisfies the extended global cardinality constraint (extended-GCC) if each variable x_i is instantiated to a value in its domain dom(x_i) and if each value v in D is assigned to k variables for some k in K(v). Extended-GCC differs from normal GCC in the fact that its sets of cardinality K(v) can be any set of values. In normal GCC these cardinality sets are restricted to intervals. We prove that enforcing domain consistency on the extended-GCC is NP-hard.
Date November 2003
Report Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard (PS) Enforcing Domain Consistency on the extended Global Cardinality Constraint is NP- Hard (PDF)
CS-2003-41
Title Call Admission Control for Voice/Data Integration in Broadband Wireless Networks
Authors Majid Ghaderi and Raouf Boutaba
Abstract This paper addresses bandwidth allocation for an integrated voice/data broadband mobile wireless network. Specifically, we propose a new admission control scheme called EFGC, which is an extension of the well-known fractional guard channel scheme proposed for cellular networks supporting voice traffic. The main idea is to use two acceptance ratios, one for voice calls and the other for data calls in order to maintain the proportional service quality for voice and data traffic while guaranteeing a target handoff failure probability for voice calls. We describe two variations of the proposed scheme: EFGC-REST, a conservative approach which aims at preserving the proportional service quality by sacrificing the bandwidth utilization; and EFGC-UTIL, a greedy approach which achieves higher bandwidth utilization at the expense of increasing the handoff failure probability for voice calls. Extensive simulation results show that our schemes satisfy the hard constraints on handoff failure probability and service differentiation while maintaining a high bandwidth utilization.
Date November 2003
Report Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (PS) Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (PDF) Compressed PostScript:
Call Admission Control for Voice/Data Integration in Broadband Wireless Networks (GZIP)
CS-2003-45
Title Revisiting the Foundations of Subsurface Scattering
Authors Gladimir V. G. Baranoski, Aravind Krishnaswamy and Bradley Kimmel
Abstract Despite the significant advances in rendering, we are far from being able to automatically generate realistic and predictable images of organic materials such as plant and human tissues. Creating convincing pictures of these materials is usually accomplished by carefully adjusting rendering parameters. A key issue in this context is the simulation of subsurface scattering. Current algorithmic models usually rely on scattering approximations based on the use of phase functions, notably the Henyey-Greenstein phase function and its variations, which were not derived from biophysical principles of any organic material, and whose parameters have no biological meaning. In this report, we challenge the validity of this approach for organic materials. Initially, we present an original chronology of the use of these phase functions in tissue optics simulations, which highlights the pitfalls of this approach. We then demonstrate that a significant step toward predictive subsurface scattering simulations can be given by replacing it by a more efficient and accurate data oriented approach. Our investigation is supported by comparisons involving the original measured data that motivated the application of phase functions in tissue subsurface scattering simulations. We hope that the results of our investigation will help strengthen the biophysical basis required for the predictive rendering of organic materials.
Date December 2003
Report Revisiting the Foundations of Subsurface Scattering (PDF)