The Database Seminar Series provides a forum for presentation and discussion of interesting and current database issues. It complements our internal database meetings by bringing in external colleagues. The talks that are scheduled for 2005-2006 are below, and more will be listed as we get confirmations. Please send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC (Davis Centre) 1304. Coffee will be served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
19 September 2005, 11:00 AM
Title: | Building a MetaQuerier and Beyond: A Trilogy of Search, Integration, and Mining for Web Information Access |
Speaker: | Kevin Chang, University of Illinois, Urbana-Champaign |
Abstract: |
While the Web has become the ultimate information repository, several major barriers have hindered today's search engines from unleashing the Web's promise. Toward tackling the dual challenges for accessing both the deep and the surface Web, I will present our "trilogy" of pursuit: To begin with, from search to integration: As the Web has deepened dramatically, much information is now hidden on the "deep Web," behind the query interfaces of numerous searchable databases. Our 2004 survey estimated 450,000 online databases and 1,258,000 query interfaces. We thus believe that search much resort to integration: To enable access to the deep Web, we are building the MetaQuerier at UIUC for both finding and querying such online databases. Further, from integration to mining: Toward large scale integration, to tackle the critical issue of dynamic semantics discovery, we observe our key insight that-- while the deep Web challenges us for its large scale, the challenge itself presents a unique opportunity: We believe that integration must resort to mining, to tackle the deep semantics by exploring shallow syntactic and statistic regularities hidden across large scale of sources, holistically. Finally, from mining back to search? Beyond the MetaQuerier, such holistic mining is equally crucial for the dual challenge of semantics discovery on the surface Web. We believe such mining must resort to search, and propose to build holistic analysis into a next generation search engine by demonstrating our initial solutions. Project URL: http://metaquerier.cs.uiuc.edu |
Bio: |
Kevin
Chen-Chuan
Chang
is
an
Assistant
Professor
in
the
Department
of
Computer
Science,
University
of
Illinois
at
Urbana-Champaign.
He
received
a
PhD
in
Electrical
Engineering
in
2001
from
Stanford University. His research interests are in large scale information access, with emphasis on Web information integration and top-k ranked query processing. He is the recipient of an NSF CAREER Award in 2002, an NCSA Faculty Fellow Award in 2003, and IBM Faculty Awards in 2004 and 2005. URL: http://www-faculty.cs.uiuc.edu/~kcchang/ |
4 October 2005, 11:00 AM; MC 5136 (Please note special place)
Title: | SMaestro: Second Generation Storage Infrastructure Management |
Speaker: | Kaladhar Voruganti, IBM Almaden Research Center |
Abstract: | Storage management has now become the largest component of the overall cost of owning storage subsystems. One of the key reasons for the high value of this cost is due to the limit on the amount of storage that can be managed by a single system administrator. This limit is due to the set of complex storage management tasks that a system administrator has to perform such as storage provisioning, performance bottleneck evaluation, planning for future growth, backup/restore, security violation analysis, and interaction with application, network and database system administrators. Thus, many storage vendors have introduced storage management tools to try and increase the amount of storage that can be managed by a single system administrator by trying to automate many of these tasks. However, most of these existing storage management products can generally be classified as first generation products that provide basic monitoring and workflow based action support. These tools generally lack analysis and planning functionality. The objective of this talk is to present the trends in the planning and analysis area of storage management with specific emphasis on open research problems. |
Bio: | Kaladhar Voruganti received his BSc in Computer Engineering and PhD in Computing Science from the University of Alberta in Canada. For the past 6 six years he has been working as a research staff member at the IBM Almaden Research lab in San Jose, California. He is currently leading an multi-site research team that is working on storage management planning tools. Kaladhar has received an Outstanding Technical Achievement award for his contributions to IBM iSCSI storage controller, and another Outstanding Technical achievement award for his contributions to IBM storage management products. IBM iSCSI target controller has received the most innovative product award at Storage 2001 and Interop 2001 conferences. In the past Kaladhar has published in leading database conferences. Currently he is actively publishing in leading storage systems conferences and has received three IBM Bravo awards for his publication efforts. |
17 October 2005, 11:00 AM
Title: | Learning in Query Optimization |
Speaker: | Volker Markl, IBM Almaden Research Center |
Abstract: |
Database
Systems
let
users
specify
queries
in
a
declarative
language
like
SQL.
Most
modern
DBMS
optimizers
rely
upon
a
cost
model
to
choose
the
best
query
execution
plan
(QEP)
for
any
given
query.
Cost
estimates
are
heavily
dependent
upon
the
optimizer's
estimates
for
the
number
of
rows
that
will
result
at
each
step
of
the
QEP
for
complex
queries
involving
many
predicates
and/or
operations.
These
estimates,
in
turn,
rely
upon
statistics
on
the
database
and
modeling
assumptionsthat
may
or
may
not
be
true
for
a
given
database.
In
the
first
part
of
our
talk,
we
present
research
on
learning
in
query
optimization
that
has
been
carried
out
at
the
IBM
Almaden
Research
Center.
We
introduce
LEO,
DB2's
LEarning
Optimizer,
as
a
comprehensive
way
to
repair
incorrect
statistics
and
cardinality
estimates
of
a
query
execution
plan.
By
monitoring
executed
queries,
LEO
compares
the
optimizer's
estimates
with
actuals
at
each
step
in
a
QEP,
and
computes
adjustments
to
cost
estimates
and
statistics
that
may
be
used
during
the
current
and
future
query
optimizations.
LEO
introduces
a
feedback
loop
to
query
optimization
that
enhances
the
available
information
on
the
database
where
the
most
queries
have
occurred,
allowing
the
optimizer
to
actually
learn
from
its
past
mistakes.
In the second part of the talk, we describe how the knowledge gleaned by LEO is exploited consistently in a query optimizer, by adjusting the optimizer's model and by maximzing information entropy. |
Bio: |
Dr.
Markl
has
been
working
at
IBM's
Almaden
Research
Center
in
San
Jose,USA
since
2001,
conducting
research
in
query
optimization,
indexing,
and
self-managing
databases.
Volker
Markl
is
spearheading
the
LEO
project,
an
effort
on
autonomic
computing
with
the
goal
to
create
a
self-tuning
optimizer
for
DB2
UDB.
He
also
is
the
Almaden
chair
for
the
IBM
Data
Management
Professional
Interest
Community
(PIC).
From January 1997 to December 2000, Dr. Markl worked for the Bavarian Research Center for Knowledge-Based Systems (FORWISS) in Munich, Germany as deputy research group manager, leading the MISTRAL and MDA projects, thereby cooperating with SAP AG, NEC, Hitachi, Teijin Systems Technology, GfK, and Microsoft Research. His MDA project, jointly with TransAction Software, developed the relational database management system TransBase HyperCube, which was awarded the European IST Prize 2001 by EUROCASE and the European Commission. Dr. Markl also initiated and co-ordinated the EDITH EU IST project investigating the physical clustering of multiple hierarchies and its applications to GIS and Data Warehousing that now is being carried out by FORWISS and several partners from Germany, Italy, Greece, and Poland. Volker Markl is a graduate of the Technische Universität München, where he earned a Masters degree in Computer Science in 1995. He completed his PhD in 1999 under the supervision of Rudolf Bayer. His dissertation on "Relational Query Processing Using a Multidimensional Access Technique" was honored "with distinction" by the German Computer Society (Gesellschaft für Informatik). He also earned a degree in Business Administration from the University Hagen, Germany in 1995. Since 1996, Volker Markl has published more than 30 reviewed papers at prestigious scientific conferences and journals, filed more than 10 patents and has been invited speaker at many universities and companies. Dr. Markl is member of the German Computer Society (GI) as well as the Special Interest Group on Management of Data of the Assosication for Computing Machinery (ACM SIGMOD). He also serves as program committee member and reviewer for several international conferences and journals, including SIGMOD, ICDE, VLDB, TKDE, TODS, IS, and the Computer Journal. His main research interests are on autonomic computing, query processing, and query optimization, but also include applications like data warehousing, electronic commerce and pervasive computing. Dr. Markl's earlier professional experience include software engineer for a virology laboratory, as part of his military service; lecturer for software-engineering courses at the University of Applied Sciences in Augsburg, Germany and for programming and communications at the Technische Universität München; and consultant for a forwarding agency. He was awarded a fellowship by Siemens AG, Munich and also worked as an international intern with Benefit Panel Services, Los Angeles. |
24 October 2005, 11:00 AM
Title: | Approximate Joins: Concepts and Techniques (PDF) |
Speaker: | Divesh Srivastava, AT&T Labs-Research |
Abstract: |
The quality of the data residing in information repositories and databases gets degraded due to a multitude of reasons. In the presence of data quality errors, a central problem is to identify all pairs of entities (tuples) in two sets of entities that are approximately the same. This operation has been studied through the years and it is known under various names, including record linkage, entity identification, entity reconciliation and approximate join, to name a few. The objective of this talk is to provide an overview of key research results and techniques used for approximate joins. This is joint work with Nick Koudas. |
Bio: |
Divesh Srivastava is the head of the Database Research Department at AT&T Labs-Research. He received his B.Tech. in Computer Science & Engineering from the Indian Institute of Technology, Bombay, India, and his Ph.D. in Computer Sciences from the University of Wisconsin, Madison, USA. His current research interests include XML databases and IP network data management. |
10 November 2005, 11:00 PM
Title: | The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search (PDF) |
Speaker: | Sihem Amer-Yahia, AT&T Labs-Research |
Abstract: |
A
key
benefit
of
XML
is
its
ability
to
represent
a
mix
of
structured
and
text
data.
We
discuss
the
interplay
of
structured
information
and
keyword
search
in
three
aspects
of
XML
search:
query
design,
scoring
methods
and
query
evaluation.
In
query
design,
existing
languages
for
XML
evolved
from
simple
keyword
search
to
queries
combining
sophisticated
conditions
on
structure
ala
XPath
and
XQuery
and
complex
full-text
search
primitives,
such
as
the
use
of
ontologies
and
keyword
proximity
distance,
ala
XQuery
Full-Text.
In
XML
scoring,
methods
range
from
a
pure
IR
tf*idf
to
approximating
and
scoring
both
structure
and
keyword
conditions.
In
evaluating
XML
search,
document
structure
has
been
used
to
identify
meaningful
XML
fragments
to
be
returned
as
answers
to
keyword
queries
and,
is This discussion is based on published and ongoing work between AT&T Labs and UBC, The U. of Toronto, Cornell U., Rutgers U., the U. of Waterloo and UCSD. |
Bio: | Sihem Amer-Yahia is a Senior Technical Specialist at AT&T Labs Research. She received her Ph.D. degree from the University of Paris XI-Orsay and INRIA. She has been working on various aspects related to XML query processing. More lately, she has focused on XML full-text search. Sihem is a co-editor of the XQuery Full-Text language specification and use cases published in September 2005 by the W3C Full-Text Task Force. She is the main developer of GalaTex, a conformance implementation of XQuery Full-Text. |
14 November 2005, 11:00 PM
Title: | MobiEyes: Distributed Processing of Moving Queries over Moving Objects (PDF) |
Speaker: | Ling Liu, Georgia Institute of Technology |
Abstract: |
With the growing popularity and availability of mobile communications, our ability to stay connected while on the move is becoming a reality instead of science fiction just a decade ago. An important research challenge for modern location-based services is the scalable processing of location monitoring requests on a large collection of mobile objects. The centralized architecture, though studied extensively in literature, would create intolerable performance problems as the number of mobile objects grows significantly. In this talk, we present a distributed architecture and a suite of optimization techniques for scalable processing of continuously moving location queries. Moving location queries can be viewed as standing location tracking requests that continuously monitors the locations of mobile objects of interests and return a subset of mobile objects when a certain conditions are met. We describe the design of a distributed location monitoring architecture through MobiEyes, a distributed real time location monitoring system in a mobile environment. The main idea behind the MobiEyes distributed architecture is to promote a careful partition of a real time location monitoring task into an optimal coordination of server-side processing and client-side processing. Such a partition allows the location of a moving object to be computed with a high degree of precision using a small number of location updates or no updates at all, thus providing highly scalable and more cost-effective location monitoring services. Concretely, the MobiEyes distributed architecture not only encourages a careful utilization of the rapidly growing computational power available at various mobile devices, such as cell phones, hand helds, GPS devices, but also endorses a strong coordination agreement between the mobile objects and the server. Such an agreement supports varying location update rates for different mobile users at different times, and advocates the exploitation of location predication and location inference to further constrain the resource/bandwidth consumption while maintaining the satisfactory precision of location information. A set of optimization techniques are used to further limit the amount of computations to be handled by the mobile objects and enhance the overall performance and system utilization of MobiEyes. Important metrics to validate the proposed architecture and optimizations include messaging cost, server load, and amount of computation at individual mobile objects. Our experimental results show that the MobiEyes approach can lead to significant savings in terms of server load and messaging cost when compared to solutions relying on central processing of location information at the server. If time permits, at the end of my talk, I will also give an overview of the location privacy protection in LBS. |
Bio: | Ling Liu is currently an associate professor at the College of Computing at Georgia Tech. She directs the research programs in Distributed Data Intensive Systems lab, examining research issues and technical challenges in building scalable and secure distributed data intensive systems. Her current research interests include performance, scalability, security and privacy issues in networked computing systems and applications, in particular, mobile location based services and distributed enterprise computing systems. She has published over 150 international journal and conference articles. She has served as a PC chair of several IEEE conferences, including the co-PC chair of IEEE 2006 International Conference on Data Engineering (ICDE 06), the vice chair of the Internet Computing track of the IEEE 2006 International Conference on Distributed Computing (ICDCS 06), and is on the editorial board of several international journals, including an associate editor of IEEE Transactions on Knowledge and Data Engineering (TKDE), International Journal of Very Large Databases (VLDBJ), and International Journal of Web Service Research. Most of Dr. Liu's recent research has been sponsored by NSF, DoE, DARPA, IBM, and HP. |
5 December 2005, 11:00 AM
Title: | Implementing XQuery 1.0: The Story of Galax (PDF) |
Speaker: | Mary Fernández, AT&T Labs - Research |
Abstract: |
XQuery 1.0 and its sister language XPath 2.0 have set a fire underneath database vendors and researchers alike. More than thirty commercial and research XQuery implementations are listed on the XML Query working group home page. Galax (www.galaxquery.org) is an open-source, general-purpose XQuery engine, designed to be complete, efficient, and extensible. During Galax's development, we have focused on each of these three requirements in turn, while never losing sight of the other two. In this talk, I will describe how these requirements have impacted Galax's evolution and our own research interests. Along the way, I will show how Galax's architecture supports these three requirements. Galax is joint work with Jérôme Siméon, IBM T.J. Watson Research Center. |
Bio: | Mary Fernandez is Principal Technical Staff at AT&T Labs - Research. Her research interests include data integration, Web-site implementation and management, domain-specific languages, and their interactions. She is a member of the W3C XML Query Language Working Group, co-editor of several of the XQuery W3C working drafts, and is a principal designer and implementor of Galax, a complete, open-source implementation of XQuery (www.galaxquery.org). Mary is also an associate editor of ACM Transactions on Database Systems and serves on the advisory board of MentorNet (www.mentornet.net), an e-mentoring network for women in engineering and science. |
16 January 2006, 11:00 AM, MC 5136 (Please note room change)
Title: |
Discovering Interesting Subsets of Data in Cube Space |
Speaker: | Raghu Ramakrishnan, University of Wisconsin - Madison |
Abstract: | Data Cubes have been widely studied and implemented, and so we researchers shouldn't be thinking about them anymore, right? Wrong. In this talk, I'll try to convince you that the multidimensional model of data ("cube" sounds so much cooler) provides the right perspective for addressing many challenging tasks, including dealing with imprecision, mining for interesting subsets of data, analysis of historical stream data, and world peace. The talk will touch upon results from a couple of VLDB 2005 papers, and some recent ongoing work. |
Bio: |
Raghu
Ramakrishnan
is
Professor
of
Computer
Sciences
at
the
University
of
Wisconsin-Madison,
and
was
founder
and
CTO
of
QUIQ,
a
company
that
pioneered
collaborative
customer
support
(acquired
by
Kanisa).
His
research
is
in
the
area
of
database
systems,
with
a
focus
on
data
retrieval,
analysis,
and
mining.
He
and
his
group
have
developed
scalable
algorithms
for
clustering,
decision-tree
construction,
and
itemset
counting,
and
were
among
the
first
to
investigate
mining
of
continuously
evolving,
stream
data.
His
work
on
query
optimization
and
deductive
databases
has
found
its
way
into
several
commercial
database
systems,
and
his
work
on
extending
SQL
to
deal
with
queries
over
sequences
has
influenced
the
design
of
window
functions
in
SQL:1999.
He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment, an associate editor of ACM Transactions on Database Systems, and was previously editor-in-chief of the Journal of Data Mining and Knowledge Discovery and the Database area editor of the Journal of Logic Programming. Dr. Ramakrishnan is a Fellow of the Association for Computing Machinery (ACM), and has received several awards, including a Packard Foundation Fellowship, an NSF Presidential Young Investigator Award, and an ACM SIGMOD Contributions Award. He has authored over 100 technical papers and written the widely-used text "Database Management Systems" (WCB/McGraw-Hill), now in its third edition (with J. Gehrke). |
13 February 2006, 11:00 AM
Title: | Racer - Optimizing in ExpTime and Beyond: Lessons Learnt and Challenges Ahead |
Speaker: | Volker Haarslev, Concordia University |
Abstract: |
In February 2004 the Web Ontology Language (OWL) was adopted by the W3C as a recommendation and emerged as a core standard for knowledge representation in the web. The sublanguage OWL-DL is a notational variant of the well-known description logic SHOIN(Dn-), which has decidable inference problems but is also known to be NexpTime-complete. The availability of OWL-DL caused a significant interest in OWL-compliant assertional description logic reasoners. Racer was the first highly optimized assertional reasoner for the very expressive (ExpTime-complete) description logic SHIQ(D-), which covers most parts of OWL-DL with the exception of so-called nominals. In this talk I will briefly introduce description logics / OWL-DL and associated inferences services. Afterward I will discuss the architecture of the description logic reasoner Racer and highlight selected tableau optimization techniques, especially on assertional reasoning and its relationship to database technology. Several recently devised optimization techniques were introduced due to requirements from semantic web applications relating huge amounts of (incomplete) data to ontological information. I will conclude my presentation with an outlook on OWL 1.1 and ongoing and future description logic research such as explanation of reasoning and adding uncertainty as well as database support in Racer Pro. The research on Racer is joint work with Ralf Moeller, Hamburg University of Technology. |
Bio: |
Dr. Haarslev obtained his doctoral degree from the University of Hamburg, Germany, specializing in user interface design. His early research work was in compilers, interfaces and visual languages. His current work is in automated reasoning, especially description logics, which play important roles in database technology and Internet technology. For databases, description logics allow the integration of heterogeneous data sources. For Internet technology, description logics are the logical foundation of the web ontology language (OWL) and form the basis of the semantic web, the emerging next generation of the World Wide Web. Dr. Haarslev is internationally regarded for his substantial research contributions in the fields of visual language theory and description logics. He is a principal architect of the description logic and OWL reasoner Racer, which can be considered as a key component for the emerging semantic web. Dr. Haarslev holds the position of Associate Professor in the Department of Computer Science and Software Engineering in Concordia University. He leads a research group working on automated reasoning and related database technology in the context of the semantic web. Dr. Haarslev is also cofounder of the company Racer Systems, which develops and distributes Racer Pro, the commercial successor of Racer. |
17 April 2006, 11:00 AM
Title: | Entity Resolution in Relational Data (PDF) |
Speaker: | Lise Getoor, University of Maryland |
Abstract: |
A
key
challenge
for
data
mining
is
tackling
the
problem
of
mining
richly
structured
datasets,
where
the
objects
are
linked
in
some
way.
Links
among
the
objects
may
demonstrate
certain
patterns,
which
can
be
helpful
for
many
data
mining
tasks
and
are
usually
hard
to
capture
with
traditional
statistical
models.
Recently
there
has
been
a
surge
of
interest
in
this
area,
fueled
largely
by
interest
in
web
and
hypertext
mining,
but
also
by
interest
in
mining
social
networks,
security
and
law
enforcement
data,
bibliographic
citations
and
epidemiological
records.
In this talk, I'll begin with a short overview of this newly emerging research area. Then, I will describe some of my group's recent work on link-based classification and entity resolution in relational domains. I'll spend the majority of time describing our work on entity resolution. I'll describe the framework and algorithms that we have developed, present results on several real world datasets and our work on making the algorithms scalable. Joint work with students: Indrajit Bhattacharya, Mustafa Bilgic, Louis Licamele and Prithviraj Sen. |
Bio: | Prof. Lise Getoor is an assistant professor in the Computer Science Department at the University of Maryland, College Park. She received her PhD from Stanford University in 2001. Her current work includes research on link mining, statistical relational learning and representing uncertainty in structured and semi-structured data. Her work in these areas has been supported by NSF, NGA, KDD, ARL and DARPA. In July 2004, she co-organized the third in a series of successful workshops on statistical relational learning, http://www.cs.umd/srl2004. She has published numerous articles in machine learning, data mining, database and AI forums. She is a member of AAAI Executive council, is on the editorial board of the Machine Learning Journal and JAIR and has served on numerous program committees including AAAI, ICML, IJCAI, KDD, SIGMOD, UAI, VLDB, and WWW. |
15 May 2006, 11:00 AM, MC 5136 (Please note room change)
Title: | Nile: Data Streaming in Practice |
Speaker: | Walid Aref, Purdue University |
Abstract: |
Emerging data streaming applications pose new challenges to database management systems. In this talk, I will focus on two applications, namely mobile objects and phenomena detection and tracking applications. I will highlight new challenges that these applications raise and how we address them in the context of Nile, a data stream management system being developed at Purdue. In particular, I will present new features of Nile, including incremental evaluation of continuous queries, supporting "predicate windows" using views, and stream query processing with relevance feedback. I will demonstrate the use and performance gains of these features in the context of the above two applications. Finally, I will talk about ongoing research in Nile and directions for future research. |
Bio: | Walid G. Aref is a professor of computer science at Purdue. His research interests are in developing database technologies for emerging applications, e.g., spatial, spatio-temporal, multimedia, bioinformatics, and sensor databases. He is also interested in indexing, data mining, and geographic information systems (GIS). Professor Aref's research has been supported by the National Science Foundation, Purdue Research Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he received the CAREER Award from the National Science Foundation and in 2004, he received a Purdue University Faculty Scholar award. Professor Aref is a member of Purdue's Discovery Park Bindley Bioscience and Cyber Centers. He is on the editorial board of the VLDB Journal, a senior member of the IEEE, and a member of the ACM. |
5 June 2006, 11:00 AM
Title: | Data Mining using Fractals and Power Laws (PDF) |
Speaker: | Christos Faloutsos, CMU |
Abstract: |
What
patterns
can
we
find
in
a
bursty
web
traffic?
On
the
web
or
on
the
internet
graph
itself?
How
about
the
distributions
of
galaxies
in
the
sky,
or
the
distribution
of
a
company's
customers
in
geographical
space?
How
long
should
we
expect
a
nearest-neighbor
search
to
take,
when
there
are
100
attributes
per
patient
or
customer
record?
The
traditional
assumptions
(uniformity,
independence,
Poisson
arrivals,
Gaussian
distributions),
often
fail
miserably.
Should
we
give
up
trying
to
find
patterns
in
such
settings?
Self-similarity, fractals and power laws are extremely successful in describing real datasets (coast-lines, rivers basins, stockprices, brain-surfaces, communication-line noise, to name a few). We show some old and new successes, involving modeling of graph topologies (internet, web and social networks); modeling galaxy and video data; dimensionality reduction; and more. |
Bio: | Christos Faloutsos holds a Ph.D. degree in Computer Science from the University of Toronto, Canada. He is currently a professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), seven "best paper" awards, and four teaching awards. He has published over 130 refereed articles, one monograph, and holds four patents. His research interests include data mining, fractals, indexing in multimedia and bio-informatics databases, and database performance. |
10 July 2006, 11:00 AM
Title: | Dynamic Programming for Join Ordering Revisited (PDF) |
Speaker: | Guido Moerkotte, University of Mannheim |
Abstract: | Two approaches to derive dynamic programming algorithms for constructing join trees are described in the literature. We show analytically and experimentally that these two variants exhibit vastly diverging runtime behaviors for different query graphs. More specifically, each variant is superior to the other for one kind of query graph (chain or clique), but fails for the other. Moreover, neither of them handles star queries well. This motivates us to derive an algorithm that is superior to the two existing algorithms because it adapts to the search space implied by the query graph. |
Bio: | From 1981 to 1987 Guido Moerkotte studied computer science at the Universities of Dortmund, Massachusetts, and Karlsruhe. The University of Karlsruhe awarded him a Diploma (1987), a doctorate (1989), and a postdoctoral lecture qualification (1994). In 1994 he became an associate professor at the RWTH Aachen. Since 1996 he holds a full professor position at the University of Mannheim where he heads the database research group. His research interests include databases and their applications, query optimization, and XML databases. Guido Moerkotte (co-) authored more than 100 publications and three books. |
26 July 2006, 11:00 AM
Title: | A System for Data, Uncertainty, and Lineage (PDF) |
Speaker: | Jennifer Widom, Stanford University |
Abstract: |
Trio
is
a
new
type
of
database
system
that
manages
uncertainty
and
lineage
of
data
as
first-class
concepts,
along
with
the
data
itself.
Uncertainty
and
lineage
arise
in
a
variety
of
data-intensive
applications,
including
scientific
and
sensor
data
management,
data
cleaning
and
integration,
and
information
extraction
systems.
This
talk
will
survey
our
recent
and
current
work
in
the
Trio
project:
the
extended-relational
"ULDB"
model
upon
which
the
Trio
system
is
based,
Trio's
SQL-based
query
language
(TriQL)
including
formal
and
operational
semantics,
a
selection
of
new
theoretical
challenges
and
results,
Trio's
initial
prototype
implementation,
and
our
planned
research
directions.
Trio web site: http://www-db.stanford.edu/trio/ |
Bio: | Jennifer Widom is a Professor in the Computer Science and Electrical Engineering Departments at Stanford University. She received her Bachelors degree from the Indiana University School of Music in 1982 and her Computer Science Ph.D. from Cornell University in 1987. She was a Research Staff Member at the IBM Almaden Research Center before joining the Stanford faculty in 1993. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering, was a Guggenheim Fellow, and has served on a variety of program committees, advisory boards, and editorial boards. |