The Database Seminar Series provides a forum for presentation and discussion of interesting and current database issues. It complements our internal database meetings by bringing in external colleagues. The talks that are scheduled for 2001-2002 are below, and more will be listed as we get confirmations. Please send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC (Davis Centre) 1304. Coffee will be served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
Anthony J. Bonner |
Qiang Zhu |
Sylvia Osborn |
Jose Blakeley |
Paul Cotton |
Pat Martin |
H.V. Jagadish |
Daniel Barbara |
Philip Bernstein |
Ouri Wolfson |
Derick Wood |
Raymond Ng |
24 September 2001, 11:00 AM
Title: | Large-Scale Physical Genome Mapping (PDF) |
Speaker: | Anthony J. Bonner, University of Toronto |
Abstract: |
A
major
step
in
understanding
the
genome
of
an
organism
is
constructing
a
physical
map,
that
is,
an
assignment
of
DNA
fragments
to
their
locations
on
the
genome.
Building
complete
maps
of
large
genomes
often
requires
integrating
many
kinds
of
experimental
data,
each
with
its
own
forms
of
noise,
experimental
error
and
anomalies,
and
with
subtle
relationships
between
the
different
forms
of
data.
Unfortunately,
the
quantity
and
complexity
of
the
data
greatly
complicate
the
map-assembly
process,
limiting
the
effectiveness
and
flexibility
of
many
map-assembly
algorithms.
This talk outlines the biology of physical genome mapping, the computational problems involved, and our approach to solving them. This approach is based on abstracting the experimental data as a graph. Intuitively, each node in the graph represents a point on the genome, and each edge represents evidence that two points are close together on a chromosome. Many forms of physical mapping data can be abstracted in this way. The result is that genome map assembly becomes largely a problem of graph manipulation. Graph algorithms and graph visualization play key roles in our approach to the map-assembly problem. |
Bio: | Dr. Bonner is an Associate Professor of Computer Science at the University of Toronto, where he has been a faculty member since 1991. He received his M.S. and Ph.D. in Computer Science from Rutgers University in 1990 and 1991, respectively, and his B.Sc. in Mathematics and Physics from the University of Toronto in 1977. He has held visiting positions at the University of Pennsylvania and at INRIA-Rocquencourt. From 1977 to 1983, he worked in signal processing and underwater acoustics at the Defence Research Establishment Atlantic, in Nova Scotia, Canada. He is the author of over 50 publications and presentations. |
15 October 2001, 11:00 AM
Title: | Local Cost Estimation for Global Query Optimization in a Multidatabase System (PDF) |
Speaker: | Qiang Zhu, The University of Michigan - Dearborn |
Abstract: | A multidatabase system (MDBS) integrates data from pre-existing autonomous local databases managed by heterogeneous database management systems, such as ORACLE, DB2 and ObjectStore, in a distributed environment. A user can issue a global query on the MDBS to retrieve data from multiple local databases. The user does not need to know where the data is stored and how the result is obtained. How to process such a global query efficiently is the task of global query optimization. There are a number of challenges for performing global query optimization in an MDBS. Among them, a crucial one is that some local optimization information, such as local cost models, may not be available to the global query optimizer because of local autonomy. To perform global query optimization, methods for estimating necessary local cost parameters of an autonomous local database system at the global level are required. Research on exploring such methods has attracted a number of researchers in the database area. In this talk, we will present several our techniques to tackle this challenge, including (1) the query sampling method, which drives local cost models for an MDBS based on observed costs of sample queries; (2) the qualitative approach, which extends the query sampling method for a dynamic multidatabase environment by incorporating a qualitative variable into a cost model; (3) the fractional analysis, which estimates a query cost by analyzing its fractions for a gradually-changing multidatabase environment; and (4) the probabilistic approach, which estimates query costs based on Markov chain theory for a rapidly-changing multidatabase environment. Our experimental results demonstrate that this set of techniques is quite promising in solving the local query cost estimation problem for various multidatabase environments. |
Bio: | Dr. Qiang Zhu is an Associate Professor at The University of Michigan - Dearborn. He received his Ph.D. in Computer Science from the University of Waterloo in 1995. He also holds an M.Sc. degree from the McMaster University in Canada and an M.Eng. degree from the Southeast University in China. Dr. Zhu is a principal investigator for a number of database research projects funded by sources including NSF and IBM at The University of Michigan. He has over 40 research publications in various journals and conference proceedings. Dr. Zhu has served as a program committee member and session/workshop chair for a number of international conferences. His current research interests include query optimization for advanced database systems, multidatabase systems, Web-based database technology, and data mining. |
12 November 2001, 11:00 AM
Title: | Database Applications of Role-Based Access Control (PDF) |
Speaker: | Sylvia Osborn, University of Western Ontario |
Abstract: | Role-Based Access Control (RBAC) has been receiving a lot of attention in the last ten years as an alternative to more traditional forms of access control, namely discretionary and mandatory access control. We will introduce RBAC models, and show how they are and could be used for relational databases. As well, we will show some algorithms for role graphs, and describe how they can be used to integrate security information when two databases or systems are being integrated. |
Bio: | Sylvia Osborn received her PhD from the University of Waterloo in 1978. Since 1977 she has been a faculty member in the Computer Science Department at the University of Western Ontario in London. Her research interests are in role-based access control, object-oriented databases, and database integration. |
3 December 2001, 11:00 AM
Title: | Programmability in the SQL Server DBMS (PDF) |
Speaker: | Jose Blakeley, Microsoft Corp. |
Abstract: | The SQL Server product is broadening the set of programming languages that can be used by database developers to write business logic in the form of functions, procedures, triggers, and types. A key component of this work is hosting the .NET Common Language Runtime inside the SQL Server process. This talk will provide an overview of the relevant design decisions in hosting the .NET Runtime and their impact on performance, scalability, security, and robustness of the server. I will also describe the new features, such as functions and types, exposed through the SQL language. |
Bio: | José Blakeley joined Microsoft in 1994. He served as an architect for the OLE DB data access interfaces during 1995-1998. He is currently an architect in the SQL Server product working on server-side programmability and extensibility. José has authored several conference and journal articles and book chapters on design aspects of relational and object database management systems, and data access. Before joining Microsoft, José was a member of the technical staff with Texas Instruments where he developed an object database management system. He received a Ph.D. in Computer Science from the University of Waterloo in 1987. |
1 February 2002, 10:00 AM (Please note the special date and time)
Title: | W3C XML Query WG: A Status Report (PDF) |
Speaker: | Paul Cotton, Microsoft Canada |
Abstract: |
The
World
Wide
Web
Consortium
(W3C)
XML
Query
Working
Group
[1]
was
chartered
in
September
1999
to
develop
a
query
language
for
XML
[2]
documents.
The
goal
of
the
XML
Query
Working
Group
is
to
produce
a
formal
data
model
for
XML
documents
with
Namespaces
[3]
based
on
the
XML
Infoset
[4]
and
XML
Schemas
[5,
6,
7],
a
formal
semantics
for
a
set
of
query
operators
on
that
data
model,
and
then
a
query
language
with
a
concrete
canonical
syntax
based
on
the
proposed
operators.
The WG recently produced a revision of the XQuery specifications [8-11] which are now aligned with the first draft of XPath 2.0 [12]. This talk will provide an update on the current status of XQuery and XPath. This update will include current information on the status of the XML Query Data Model, Formal Semantics, and the XQuery and XPath languages. The talk will also include a list of the important issues and questions still in front of the XML Query WG.
[1]
http://www.w3.org/XML/Query |
Bio: |
Paul
Cotton
joined
Microsoft
in
May,
2000
and
is
currently
Program
Manager
of
XML
Standards.
Paul
telecommutes
to
his
Redmond
job
from
his
home
in
Ottawa,
Canada.
Paul has been participating in the W3C XML Activity since mid-1998 when he became IBM's prime representative on the XML Linking and Infoset Working Groups. Paul has been chairperson of the XML Query Working Group and a member of the XML Coordination Group since September 1999. Paul was elected to the W3C Advisory Board in June 2000 and in December 2001 was elected to the first W3C Technical Architecture Group. Paul is also Microsoft's alternate on the XML Protocol Working Group which is working on SOAP 1.2. Paul graduated with a M.Math in Computer Science from the University of Waterloo in 1974. |
11 February 2002, 11:30 AM (Please note the time change)
Title: | Automatically Tuning DBMS Buffer Pools |
Speaker: | Pat Martin, Queens University |
Abstract: |
The
tasks
of
configuring
and
tuning
large
database
management
systems
(DBMSs)
have
always
been
both
complex
and
time-consuming.
They
require
knowledge
of
the
characteristics
of
the
system,
the
data,
and
the
workload.
The
increasing
diversity
of
the
data
and
the
workloads
handled
by
today's
systems
is
making
manual
tuning
by
database
administrators
almost
impossible.
Self-managing
DBMSs
attempt
to
solve
this
problem
by
shifting
responsibility
for
tuning
and
configuration
onto
the
system
itself.
In this talk I will first present a self-tuning algorithm, called the Dynamic Reconfiguration algorithm (DRF), for managing the buffer pools in a DBMS and discuss the results of a set of experiments to investigate the performance of the algorithm. I will then propose a set of principles for a self-managing DBMSs and suggest directions for moving towards the goal of self-management. |
Bio: | Patrick Martin is a Professor in the Department of Computing and Information Science at Queen's University and a Visiting Scientist with the Centre for Advanced Studies at the IBM Toronto Laboratory. His research interests include self-managing database systems and web caching. |
11 March 2002, 11:00 AM
Title: | Timber: A Native XML Database Management System (PDF) |
Speaker: | H.V. Jagadish, University of Michigan |
Abstract: | The eXtensible Markup Language (XML) has recently become very popular as a representation format for a wide variety of data. Large repositories of XML data have begun to emerge. The effective management of XML in a database thus becomes a pressing issue. A central challenge in this regard is the complex and heterogeneous structure of XML data. In this talk, I will discuss the design of Timber, a native XML database management system that we are building at the University of Michigan. |
Bio: |
H.
V.
Jagadish
is
a
Professor
of
Computer
Science
and
Engineering
at
the
University
of
Michigan
in
Ann
Arbor.
After
earning
his
PhD
from
Stanford
in
1985,
he
spent
over
a
decade
at
AT&T
Bell
Laboratories
in
Murray
Hill,
N.J.,
eventually
becoming
head
of
AT&T
Labs
database
research
department
at
the
Shannon
Laboratory
in
Florham
Park,
N.J.
He
has
also
served
as
a
Professor
at
the
University
of
Illinois
in
Urbana-Champaign.
Professor Jagadish is well-known for his broad-ranging research on databases, and has over 80 major papers and 20 patents. He is currently the founding editor of the ACM SIGMOD Digital Review. Among many professional positions he has held, he has previously been an Associate Editor for the ACM Transactions on Database Systems (1992-1995) and Program Chair of the ACM SIGMOD annual conference (1996). |
8 April 2002, 11:00 AM
Title: | Clustering by Impact: Scalable, Incremental Clustering of Data Streams (PDF) |
Speaker: | Daniel Barbara, George Mason University |
Abstract: | As organizations tend to store and analyze more and more data, the need to design and implement data mining algorithms that are incremental and can deal with continuous data streams is pressing. In this talk we present two clustering algorithms that meet that need. They cluster new points by measuring the impact these points have on existing cluster's properties. In particular, Fractal Clustering deals with numerical domains and uses the impact over the fractal dimension of the existing clusters to decide where to place a new point. Its counterpart, Entropy Clustering, is designed to deal with categorical domains and it uses the impact over the entropy of the existing clusters to perform clustering. We will show results of using these techniques over real and synthetic data sets, as well as show the techniques' relationship to the well-known principle of Minimum Description Length. We will also show how these techniques are well-suited to track the evolution of clusters in data streams, due both to the concise representations of the clustering models they produce, and incremental nature of the algorithms. |
Bio: | Daniel Barbara is an Associate Professor in the Information and Software Engineering Department, George Mason University. He got his Ph.D. in Computer Science at Princeton University in 1985, and has held positions in Panasonic Laboratories and Bell Communication Research as a Senior Scientist. He is the author of over 60 publications in international conferences and research journals. Dr. Barbara's current interest are in the areas of Data Warehousing and Data Mining. |
29 April 2002, 11:00 AM
Title: | Generic Model Management - A Database Infrastructure for Schema Manipulation (PDF) |
Speaker: | Philip Bernstein, Microsoft Research |
Abstract: | Despite 30 years of research on database support for engineering applications, such applications remain complicated and hard to build. To improve this situation by an order of magnitude, a much higher level API is needed. We present such an interface, called Model Management. Its objects are models and mappings. By "model," we mean a complex structure that represents a design artifact, such as a relational schema, XML schema, object-oriented interface, UML model, web-site map, or software configuration. By "mapping," we mean an explicit representation of connections or transformations between two models. The main operations of Model Management are match, merge, diff, and compose. We explain how these operations can be used to solve classical meta data management problems and sketch a system architecture to implement them. |
Bio: | Phil Bernstein is a researcher at Microsoft Corporation. Over the past 25 years, he has been a product architect at Microsoft and at Digital Equipment Corp., a professor at Harvard University and Wang Institute of Graduate Studies, and a VP Software at Sequoia Systems. During that time, he has published over 100 articles on the theory and implementation of database systems, and coauthored three books, the latest of which is Principles of Transaction Processing for the System Professional (Morgan Kaufmann, 1997). He holds a B.S. from Cornell University and a Ph.D. from University of Toronto. A summary of his current research on meta data management can be found at http://www.research.microsoft.com/~philbe. |
10 June 2002, 11:00 AM
Title: | Location Management in Moving Objects Databases |
Speaker: | Ouri Wolfson, University of Illinois at Chicago |
Abstract: |
Consider
applications
of
a
database
that
models
information
about
moving
objects
and
their
location.
For
example,
given
a
database
representing
the
location
of
taxi-cabs,
a
typical
query
may
be:
retrieve
the
free
cabs
that
are
currently
within
1
mile
of
a
customer
at
33
N.
Michigan
Ave..
Military
applications
utilizing
moving
objects
databases
arise
in
the
context
of
the
digital
battlefield,
and
civilian
ones
arise
in
transportation
systems
and
in
systems
that
track
mobile
computers
for
providing
context
awareness.
Currently, moving objects database applications are being developed in an ad hoc fashion. Database Management System (DBMS) technology provides a potential foundation upon which to build these applications, however, there is a critical set of needed capabilities that are lacking in existing DBMS's. These include support for continuously changing data, for integrated spatial and temporal information, and for uncertainty management. The objective of our DOMINO project is to build an envelope containing these capabilities on top of existing DBMS's. In this talk I will describe the problems addressed by the project, and our proposed solutions. |
Bio: | Ouri Wolfson's main research interests are in database systems, distributed systems, transaction processing, and mobile computing. He received his B.A. degree in mathematics, and his Ph.D. degree in computer science from Courant Institute of Mathematical Sciences, New York University. He is currently a Professor in the Department of Computer Science at the University of Illinois at Chicago, where he directs the Databases and Mobile Computing Laboratory, and the newly established Mobile Information Systems Research Center. He served as a consultant to Argonne National Laboratory, to the US Army Research Laboratories, and to the Center of Excellence in Space Data and Information Sciences at NASA. He is also the founder of Mobitrac, a high-tech startup company specializing in infrastructure software for location based services and products. Before joining the University of Illinois he has been on the computer science faculty at the Technion and Columbia University, and he has been a Member of Technical Staff at Bell Laboratories. Ouri Wolfson authored over ninety publications in leading journals and conference proceedings. He is a Fellow of the Association of Computing Machinery, an editor of the ACM/URSI/Baltzer Wireless Networks Journal, a Member of the ACM SIGMOD Digital Review Editorial Board and a guest editor of the ACM/Baltzer Journal on Special Topics in Mobile Networks. He is also the 2001 recipient of the UIC College of Engineering Faculty Research Award. He is currently serving as a National Lecturer for the Association of Computing Machinery professional society. He participated in numerous conferences (including ACM-SIGMOD, VLDB, PODS, ICDE, NGITS, ICDCS, MOBIDATA, DOOD, SSD, GIS, PDIS, CIKM) as a program committee member, keynote speaker, session chairman, and panelist. Most recently he was the keynote speaker at the Second International Conference on Mobile Data Management (MDM '2001), and is the program committee co-chair of the Third International Conference on Mobile Data Management (MDM 2002). He was also the General co-Chairman of the IEEE Knowledge and Data Engineering Exchange Workshop, and he serves on the Advisory Committee of the NSF Center of Research Excellence in Science and Technology, at Florida A&M University. His work has been funded by the National Science Foundation, Air Force Office of Scientific Research, Defense Advanced Research Projects Agency, NATO, US Army, the New York State Science and Technology Foundation, Hughes Research Laboratories, and Informix Co. |
8 July 2002, 11:00 AM
Title: | Caterpillars, T-Graphs and Context |
Speaker: | Derick Wood, Hong Kong University of Science and Technology |
Abstract: |
I
will
present
two
different
ways
that
we
developed
for
specifying
context
in
documents.
The
first,
caterpillar
expressions,
leads
to
very
nice
theoretical
questions/problems;
the
second,
T-graphs,
supports
a
70%
solution
that
is
suitable
for
most
contexts
that
we
need
(it
was
used
in
Designer).
I will compare the two methods and summarize their positive and negative aspects. |
Bio: |
Professor
Wood
received
his
BSc
and
PhD
degrees
from
the
University
of
Leeds,
England,
in
1963
and
1968,
respectively.
He
was
a
Postdoctoral
Fellow
at
the
Courant
Institute,
New
York
University,
from
1968
to
1970
before
joining
the
Unit
of
Computer
Science
at
McMaster
University
in
1970.
He
was
Chair
of
Computer
Science
from
1979
to
1982.
From
1982
to
1992
he
was
a
Professor
in
the
Department
of
Computer
Science,
University
of
Waterloo.
For three years he served as Director of the Data Structuring Group. Before joining HKUST in 1995, he was a Professor in the Department of Computer Science, University of Western Ontario. He has published widely in a number of research areas and has written two textbooks, "Theory of Computation," published by John Wiley, and "Data Structures, Algorithms, and Performance," published by Addison-Wesley. |
2 August 2002, 11:00 AM
Title: | Robust Space Transformations for Distance-based Operations (PDF) |
Speaker: | Raymond Ng, University of British Columbia |
Abstract: |
For
many
database
operations,
such
as
nearest
neighbor
search,
distance-based
clustering
and
outlier
detection,
there
is
an
underlying
$k$-D
data
space
in
which
each
tuple/object
is
represented
as
a
point
in
the
space.
We
observe
that
in
the
presence
of
variability,
correlation,
outliers
and/or
differing
scales,
we
may
get
unintuitive
results
if
an
inappropriate
space
is
used.
The fundamental question addressed in this paper is: ``What then is an appropriate space?''. We propose using a robust space transformation called the Donoho-Stahel estimator. In the first half of the paper, we show the key properties of the estimator. Of particular importance to database applications is the stability property, which says that in spite of frequent updates, the estimator does not (a) change much, (b) lose its usefulness, or (c) require re-computation. In the second half, we focus on the computation of the estimator for high-dimensional databases. We develop randomized algorithms and evaluate how well they perform empirically. The bottom-line is that the Donoho-Stahel transformation, which possesses desirable properties for database operations, can be computed efficiently as well. |
Bio: | Raymond Ng received his B.Sc.(Hons) degree in Computer Science from the University of British Columbia in 1984, and his M.Math. degree in Computer Science from the University of Waterloo in 1986, and his Ph.D. degree in Computer Science from the University of Maryland, College Park, in 1992. He is a full professor at the University of British Columbia. His areas of research include data mining, bioinformatics, image databases, and multimedia systems. He has published numerous conference and journal papers on these topics. He is one of the associate editors for the VLDB journal and the IEEE TKDE journal. He is a program co-chair of the 2002 SIGKDD Conference, and has served as a member of program committees for many premier conferences, including ACM SIGMOD, VLDB, SIGKDD and ACM PODS. |