The Database Seminar Series provides a forum for presentation and discussion of interesting and current database issues. It complements our internal database meetings by bringing in external colleagues. The talks that are scheduled for 2002-2003 are below, and more will be listed as we get confirmations. Please send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC (Davis Centre) 1304. Coffee will be served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
Klaus Dittrich |
Yelena Yesha |
Jai Shanmugasundaram |
Guozhu Dong |
Michael Kifer |
Björn Þór Jónsson |
Michael Franklin |
Luis Gravano |
Aidong Zhang |
Wolfgang Lehner |
Ricardo Baeza-Yates |
23 September 2002, 11:00 AM
Title: |
SINGAPORE: Towards flexible querying of heterogeneous data sources (PDF) |
Speaker: | Klaus R. Dittrich, University of Zurich |
Abstract: | Data available on-line today is spread across heterogeneous data sources like traditional databases or repositories of various forms containing unstructured and semistructured data. Obviously, the "technical'' availability alone is not at all sufficient for making meaningful use of existing information, and thus the problem of effectively and efficiently accessing and querying heterogeneous data is an important research issue. One popular approach is to integrate the data sources and offer users an a priori defined global schema. Alternatively, there are approaches which implement tools for giving users the possibility to define the query schema themselves. We propose a new approach where heterogeneous sources can be queried through a unified interface and underlying sources are integrated by means of a query language only. We present extensions to OQL which allow to query structurally heterogeneous, i.e. structured, semistructured and unstructured data alike, and to integrate data on the fly. We also present some details of query preprocessing and show how techniques from database and information retrieval systems can be combined. |
Bio: |
Prof.
Klaus
Dittrich
received
his
diploma
degree
(M.Sc.)
in
Computer
Science
from
the
University
of
Karlsruhe.
He
earned
his
Ph.D.
in
1982
at
IPD
Institute
for
Program
Structures
and
Data
Organization.
1984
he
spent
a
year
as
a
post-doctoral
fellow
at
IBM
Almaden
Research
Center.
He
was
head
of
the
database
department
at
FZI
Research
Center
for
Information
Technologies
at
University
of
Karlsruhe
from
1985
to
1989. Since 1989 he has been a Professor of Computer Science at the University of Zurich and head of the Database Technology Research Group. He took a sabbatical leave at Stanford University, USA and Hewlett Packard Labs, USA (1996) and was guest professor at Aalborg University, Denmark (1999). He is a member of
and
the
current
president
of
SI
(Swiss
Informaticians
Society)
and
former
president
of
IPEG
(interuniversitäre
Partnerschaft
für
Erdbeobachtung
und
Geoinformatik).
He
is
also
the
secretary
of
the
VLDB
Endowment
(Very
Large
Data
Base
Endowment
Inc.).
Until
1997
he
was
a
member
of
the
SIGMOD
Advisory
Committee. |
4 October 2002, 2:00 PM (Note the special time)
Title: | Profile Driven Data Management for Pervasive Environments (PDF) |
Speaker: | Yelena Yesha, University Maryland at Baltimore County |
Abstract: | The past few years have seen significant work in mobile data management, typically based on the client/proxy/server model. Mobile/wireless devices are treated as clients that are data consumers only, while data sources are on servers that typically reside on the wired network. With the advent of "pervasive computing" environments an alternative scenario arises where mobile devices gather and exchange data from not just wired sources, but also from their ethereal environment and one another. This is accomplished using ad-hoc connectivity engendered by Bluetooth like systems. In this new scenario, mobile devices become both data consumers and producers. We describe the new data management challenges which this scenario introduces. We describe the design and present an implementation prototype of our framework, MoGATU, which addresses these challenges. An important component of our approach is to treat each device as an autonomous entity with its "goals" and "beliefs", expressed using a semantically rich language. We have implemented this framework over a combined Bluetooth and Ad-Hoc 802.11 network with clients running on a variety of mobile devices. We present experimental results validating our approach and measure system performance. |
Bio: |
Yelena
Yesha
received
the
B.Sc.
degree
in
Computer
Science
from
York
University,
Toronto,
Canada
in
1984,
and
the
M.Sc.
and
Ph.D
degrees
in
Computer
and
Information
Science
from
The
Ohio
State
University
in
1986
and
1989,
respectively.
Since 1989 she has been with the Department of Computer Science and Electrical Engineering at the University of Maryland Baltimore County, where she is presently a Verizon Professor. In addition, from December, 1994 through August, 1999 Dr. Yesha served as the Director of the Center of Excellence in Space Data and Information Sciences at NASA. Her research interests are in the areas of distributed databases, distributed systems, mobile computing, digital libraries, electronic commerce, and trusted information systems. She published 8 books and over 100 refereed articles in these areas. Dr. Yesha was a program chair and general co-chair of the ACM International Conference on Information and Knowledge Management and a member of the program committees of many prestigious conferences. She is a member of the editorial board of the Very Large Databases Journal, and the IEEE Transaction on Knowledge and Data Engineering, and is editor-in-chief of the International Journal of Digital Libraries. During 1994, Dr. Yesha was the Director of the Center for Applied Information Technology at the National Institute of Standards and Technology. Dr. Yesha is a senior member of IEEE, and a member of the ACM. |
21 October 2002, 11:00 AM
Title: | Bridging Relational Technology and XML (PDF) |
Speaker: | Jayavel Shanmugasundaram, Cornell University |
Abstract: | XML has emerged as the standard data-exchange format for Internet-based business applications. These applications introduce a new set of data management requirements involving XML. However, for the foreseeable future, a significant amount of business data will continue to be stored in relational database systems. Thus, a bridge is needed to satisfy the requirements of these new XML-based applications while still leveraging relational database technology. In this talk, we shall describe the design and implementation of a middleware system that we believe achieves this goal. In particular, we shall describe a general framework for creating XML views of relational data, querying XML views, and storing and querying XML documents using a relational database system. Some of the interesting features of the system architecture are that it (a) provides users with a single XML query language for creating and querying XML views of relational data, (b) it evaluates queries efficiently! by pushing most computation down to the relational database engine, (c) it allows users to query seamlessly over relational data and meta-data, and (d) it allows users to write queries that span XML documents and XML views of relational data. |
Bio: | Jayavel Shanmugasundaram is an Assistant Professor in the Department of Computer Science at Cornell University. He received his Ph.D. degree from the University of Wisconsin at Madison, a masters degree from the University of Massachusetts at Amherst, and a bachelors degree from the Regional Engineering College at Tiruchirappalli, India. Shanmugasundaram's research interests include Internet data management, database systems and query-processing in emerging system architectures. He is the author of several publications and patents, and his research ideas have been implemented in commercial data management products. |
4 November 2002, 11:00 AM
Title: | Mining Knowledge about Changes, Differences, and Trends (PDF) |
Speaker: | Guozhu Dong, Wright State University |
Abstract: |
Knowledge
about
changes,
differences,
and
trends
is
very
useful.
For
example,
companies
wish
to
identify
important
temporal
changes
and
trends
in
customer
purchase
behavior,
so
that
they
can
adjust
their
business
priorities.
Medical
researchers
wish
to
identify
differences
in
gene
group
interactions
between
normal
cell
tissues
and
cancer
cell
tissues,
so
that
they
can
discover
better
treatment
to
cancer.
We discuss some recent results on mining such knowledge. We are concerned with transactional data, relational data, and data cubes. We consider emerging patterns that capture differences and changes between a dataset pair, gradient patterns in a data cube that capture similar cells with big differences in measure values, and multidimensional multi-level trends in sets of time series in a data cube context. We discuss mining algorithms and ways to use the patterns. |
Bio: | Guozhu Dong is an associate professor at Wright State University, USA. He received his PhD from the University of Southern California in 1988. He previously taught at the University of Melbourne and the Flinders University, both in Australia, and consulted for Lucent Bell Labs and LIT Singapore. His main research interests are in the areas of databases, data mining, and bioinformatics. He has published over 80 articles in these areas. He has served on numerous program committees, including ICDE, ICDM, ICDT, PODS, SIGKDD, and VLDB. He is a program co-chair of the International Conference on Web-Age Information Management (2003), and is on the editorial board of International Journal of Information Technology. |
2 December 2002, 11:00 AM
Title: | FLORA-2: Programming with Logic and Objects (PDF) |
Speaker: | Michael Kifer, SUNY at Stony Brook |
Abstract: |
This
talk
is
about
a
marriage
of
object-based
and
logic-based
paradigms
for
programming
knowledge-intensive
applications.
The product of this marriage is FLORA-2, which is both a seamless integration of Frame Logic, HiLog and Transaction Logic in a single formalism, and an implementation that adds important pragmatic extensions. Together they make a powerful knowledge programming language. Frame Logic relates to the object-oriented data model as classical predicate calculus relates to the relational data model. HiLog adds meta-programming, and Transaction Logic add dynamics to the mix. Although FLORA-2 has been released only in its alpha form, it is already very usable and has a following of dedicated users in the areas of information integration, semantic web, information systems design, agent building, etc. |
Bio: |
Michael
Kifer
is
a
Professor
with
the
Department
of
Computer
Science,
State
University
of
New
York
at
Stony
Brook
(USA).
He
received
his
Ph.D.
in
Computer
Science
in
1985
from
the
Hebrew
University
of
Jerusalem,
Israel,
and
the
M.S.
degree
in
Mathematics
in
1976
from
Moscow
University,
Russia.
Dr. Kifer's interests include database systems, knowledge representation, and Web information systems. He has published two text books and numerous articles in these areas. In 1999 and 2002 he was a recipient of the ACM-SIGMOD "Test of Time" awards for his works on object-oriented database languages. |
21 January 2003, 1:00 PM, MC 5136 (Please note special date, time and place)
Title: | Practical Considerations for Semantic Cache Management (PDF) |
Speaker: | Björn Þór Jónsson, Reykjavik University |
Abstract: | The emergence of query-based on-line data services and e-commerce applications has prompted much recent research on data caching. This talk describes semantic caching, a caching arcitecture for such applications, that caches the results of selection queries. Unlike most previous approaches to caching query results, data is not replicated in the semantic cache, thus improving the utility of the cache. Furthermore, partial results are re-used, reducing network traffic. The focus of the talk is on two performance studies using a prototype implementation that connects to a commercial relational server. One study focuses on relatively simple selection workloads and demonstrates several intrinsic benefits of semantic caching, including low overhead, insensitivity to the physical layout of the database, reduced network traffic, the ability to answer some queries without contacting the server, and the ability to incorporate application knowledge in replacement decisions. The second performance study focuses on complex selection workloads. It demonstrates that, despite the increased complexity of cache management, semantic caching works well in a wide range of network-constrained environments. |
Bio: | Dr. Björn Þór Jónsson is an associate professor in the School of Computer Science at Reykjavík University, Iceland. His research focuses on database caching architectures and multimedia database systems, in particular image and text databases. He has taught classes on database theory and application, database tuning and advanced database systems. Björn received his Ph.D. degree in Computer Science from the University of Maryland, College Park in 1999. The subject of his thesis was "Application-Oriented Buffering and Caching Techniques". |
14 March 2003, 2:00 PM (Please note special date and time)
Title: | TelegraphCQ: Continuous Dataflow Processing for an Uncertain World (PDF) |
Speaker: | Michael Franklin, University of California, Berkeley |
Abstract: | Increasingly pervasive networks are leading towards a world where data is constantly in motion. In such a world, conventional techniques for query processing, which were developed under the assumption of a far more static and predictable computational environment, will not be sufficient. In response to this need, the Telegraph project at Berkeley has developed a suite of novel technologies for continuously adaptive query processing. We are currently building the next generation Telegraph system, called TelegraphCQ, which is focused on meeting the challenges that arise in handling large numbers of continuous queries over high-volume, highly-variable data streams. In this talk, I will describe the TelegraphCQ system architecture and its underlying technology, and report on our ongoing implementation effort leveraging the PostgreSQL open source code base. I will also discuss our overall research agenda, including related projects on high-volume XML filtering and query processing in ad hoc sensor networks. |
Bio: | Michael Franklin is an Associate Professor of Computer Science at the University of California, Berkeley. His research focuses on the architecture and performance of distributed databases and information systems. He received his Ph.D. from the University of Wisconsin, Madison in 1993. Previously, he was on the faculty at the University of Maryland, College Park, where he led projects on adaptive query processing and data dissemination. He served as Program Chair for the 2002 ACM SIGMOD Conference and is currently an Editor of ACM Transactions on Database Systems, Vice Chair of the SIGMOD Advisory Board, and a member of the Board of Trustees of the VLDB Endowment. He is also a technology advisor to the Mayfield Fund and sits on the technology advisory boards of several companies. |
14 April 2003, 11:00 AM
Title: | Hidden-Web Databases: Classification and Search (PDF) |
Speaker: | Luis Gravano, Columbia University |
Abstract: |
Many
valuable
text
databases
on
the
web
have
non-crawlable
contents
that
are
"hidden"
behind
search
interfaces.
Hence
traditional
search
engines
do
not
index
this
valuable
information.
One
way
to
facilitate
access
to
"hidden-web"
databases
is
through
commercial
Yahoo!-like
directories,
which
organize
these
databases
manually
into
categories
that
users
can
browse.
In
this
talk,
I
will
describe
a
technique
to
automate
the
classification
of
hidden-web
databases.
Our
technique
adaptively
probes
the
databases
with
queries
derived
from
document
classifiers,
without
retrieving
any
documents.
A
large-scale
experimental
evaluation
over
130
real
web
databases
indicates
that
our
technique
produces
highly
accurate
database
classification
results
using
-on
average-
fewer
than
200
queries
of
four
words
or
less
to
classify
a
database. An alternative way to facilitate access to hidden-web databases is through "metasearchers," which provide a unified query interface to search many databases at once. For efficiency, a critical task for a metasearcher is the selection of the most promising databases to search for a query, a task that typically relies on statistical summaries of the database contents. In this talk, I will also describe a recent technique to derive content summaries from hidden-web databases. We exploit our probing-based classification algorithm to adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. We can then build content summaries from these topically-focused document samples. A large-scale experimental evaluation over a variety of databases indicates that our new content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. |
Bio: |
Luis
Gravano
has
been
on
the
faculty
of
the
Computer
Science
Department,
Columbia
University
since
September
1997,
where
he
has
been
an
associate
professor
since
July
2002.
From
January
through
August
2001,
Luis
was
a
Senior
Research
Scientist
at
Google
(while
on
leave
from
Columbia
University).
He
received
his
Ph.D.
degree
in
Computer
Science
from
Stanford
University
in
1997.
He
also
received
an
M.S.
degree
from
Stanford
University
in
1994
and
a
B.S.
degree
from
the
Escuela
Superior
Latinoamericana
de
Informatica
(ESLAI),
Argentina
in
1990.
Luis
is
an
associate
editor
of
the
ACM
Transactions
on
Information
Systems,
as
well
as
database
program
chair
for
the
upcoming
ACM
CIKM
2004.
Luis
is
also
a
recipient
of
a
CAREER
award
from
the
National
Science
Foundation. __ This talk describes work performed jointly with Panos Ipeirotis (Columbia) and Mehran Sahami (Stanford/Google). |
12 May 2003, 11:00 AM
Title: | Bioinformatics: Gene Expression Data Analysis (PDF) |
Speaker: | Aidong Zhang, University at Buffalo |
Abstract: | DNA microarray technology provides a broad snapshot of the state of the cell by measuring the expression levels of thousands of genes simultaneously. It has already had a significant impact on the field of bioinformatics and has proposed an unique challenge: information in gene expression matrices is special in that the sample space and gene space are of very different dimensionality and it can be studied in either sample space or gene space. While most of the previous studies focus on clustering either genes or samples, it is interesting to ask whether we can partition the complete set of samples into exclusive groups (called phenotypes) and find a set of informative genes that can manifest the phenotypes. The mining of phenotypes and informative genes can provide valuable information to the biologists to understand the roles of genes and the phenotype structure of samples. In this talk, I will describe new techniques which simultaneously mine phenotypes and informative genes from gene expression data. These techniques integrate statistics, data mining, and machine learning methods in an unique fashion to achieve optimal solutions. |
Bio: |
Aidong
Zhang
is
a
Professor
in
the
Department
of
Computer
Science
and
Engineering
at
State
University
of
New
York
at
Buffalo.
She
received
her
Ph.D
degree
in
computer
science
from
Purdue
University,
West
Lafayette,
Indiana,
in
1994.
Her
research
interests
include
bioinformatics,
multimedia
systems,
content-based
image
retrieval,
geographical
information
systems,
and
data
mining.
She
serves
on
the
editorial
boards
of
ACM
Multimedia
Systems,
the
International
Journal
of
Multimedia
Tools
and
Applications,
International
Journal
of
Distributed
and
Parallel
Databases,
and
ACM
SIGMOD
DiSC
(Digital
Symposium
Collection). She was co-chair of the technical program committee for ACM Multimedia 2001. Dr. Zhang is a recipient of the National Science Foundation CAREER award and SUNY Chancellor's Research Recognition award. |
7 July 2003, 11:00 AM
Title: | Database Support for Data Mining Applications |
Speaker: | Wolfgang Lehner, Technische Universität Dresden |
Abstract: | Database support for data mining has become an important research topic. Especially for large high-dimensional data volumes, comprehensive support from the database side is necessary. In this talk I will focus on the data intensive subproblem of aggregating high-dimensional data in all possible low-dimensional projections (for instance estimating low-dimensional histograms), which occurs in several established data mining techniques. I will argue that existing OLAP SQL-extensions are insufficient for high-dimensional data and propose a new SQL-operator, which seamlessly fits into the set of existing OLAP group-by operators.The new SQL operator is presented from a SQL language as well as from an implementational point of view. Different methods implementing the operator will be outlined and discussed in the context of the prototypical implementation within the Postgres database engine. Performance studies show that the operator yields a large speedup (up to factor 10) over existing methods provided by commercially available database systems. |
Bio: | Please see http://wwwdb.inf.tu-dresden.de |
31 July 2003, 11:00 AM; DC 1302 (Please note change of regular place)
Title: | Mining the Web: Search Engines (PDF) |
Speaker: | Ricardo Baeza-Yates, University of Chile |
Abstract: | The Web grows and evolves faster than we like and expect, imposing scalability and relevance problems to Web search engines. In this talk we present how mining Web data and usage logs allows to improve a search engine in several ways: page ranking, indices and interfaces. As a corollary we show several interesting relations of different Web characteristics: structure, dynamics, "quality", etc. Our results help to understand not only technical issues, but also social ones, as the Web is the collaborative work of many people, a few publishing, and all of them querying. |
Bio: | Ricardo Baeza-Yates obtained a Ph.D. in CS at U. of Waterloo, Canada, in 1989. In 1992 he was elected president of the Chilean Computer Science Society (SCCC) until 1995, being elected again in for 1997-98. During 1993, he received the Organization of American States award for young researchers in exact sciences. In 1994 he received the award to the best engineering research in the last 4 years from the Institute of Engineers of Chile. In 1997 with two Brazilian colleagues obtained the COMPAQ prize to the best Brazilian research article in CS. He was recently elected to the IEEE CS Board of Governors for the period 2002-04. In 2002 he was appointed to the Chilean Academy of Sciences, being the first person from computer science to achieve this position in Chile. Currently he is a professor at the CS department of the University of Chile, where he was the chair in the period 1993-95. He is also director of the Center for Web Research, a project funded by the Millenium Scientific Initiative. His research interests include information retrieval, algorithms, and information visualization. He is co-author of the book Modern Information Retrieval, published in 1999 by Addison-Wesley, as well as co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992. |