The Database Seminar Series provides a forum for presentation and discussion of interesting and current database issues. It complements our internal database meetings by bringing in external colleagues. The talks that are scheduled for 2004-2005 are below, and more will be listed as we get confirmations. Please send your suggestions to M. Tamer Özsu.
Unless otherwise noted, all talks will be in room DC (Davis Centre) 1304. Coffee will be served 30 minutes before the talk.
We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes (usually in pdf format).
Database Seminar Series is supported by iAnywhere Solutions, A Sybase Company.
C. Mohan |
Jim Gray |
Gustavo Alonso |
Moshe Vardi |
Dennis Shasha |
Elisa Bertino |
Minos N. Garofalakis |
Jeffrey F. Naughton |
Cristiana Amza |
Gerhard Weikum |
Patrick Valduriez |
Esther Pacitti |
20 September 2004, 11:00 AM
Title: | DBCache: A Project on Database Caching Support for Web Applications (PDF) |
Speaker: | C. Mohan, IBM Almaden Research Lab. |
Abstract: | In this talk, I will discuss IBM Almaden's DBCache project whose goal is to support database caching for use with web-based applications. I will introduce a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table. The content of a cache table is either defined declaratively and populated in advance at setup time, or determined dynamically and populated on demand at query execution time. Dynamic cache tables exploit the characteristics of typical transactional web applications with a high volume of short transactions, simple equality predicates, and 3-4 way joins. Based on federated query processing capabilities, we developed a set of new technologies for database caching: cache tables, "Janus" (two-headed) query execution plans, cache constraints, and asynchronous cache population methods. Our solution supports transparent caching both at the edge of content delivery networks and in the middle-tier of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications. This is joint work with M. Altinel, C. Bornhoevd, S. Krishnamurthy, H. Pirahesh and B. Reinwald. The paper on which this talk is based was published in the 29th International Conference on Very Large Databases (VLDB). |
Bio: | Please go here for the bio. |
4 October 2004, 4:30 PM, DC 1350 (This is SCS Distinguished Lecture; please note the special time and place)
Title: | On-Line Science: The World-Wide Telescope as a Prototype for the New Computational Science |
Speaker: | Jim Gray, Microsoft Bay Area Research Center |
Abstract: | Computational science has historically meant simulation; but, there is an increasing role for analysis and mining of online scientific data. As a case in point, half of the world's astronomy data is public. The astronomy community is putting all that data on the Internet so that the Internet becomes the world's best telescope: it has the whole sky, in many bands, and in detail as good as the best 2-year-old telescopes. It is useable by all astronomers everywhere. This is the vision of the Virtual Observatory -- also called the World Wide Telescope (WWT). As one step along that path I have been working with the Sloan Digital Sky Survey and CalTech to federate their data in web services on the Internet, and to make it easy to ask questions of the database (see http://skyserver.sdss.org and http://skyquery.net). This talk explains the rationale for the WWT, discusses how we designed the database, and talks about some data mining tasks. It also describes computer science challenges of publishing, federating, and mining scientific data, and argues that XML web services are key to federating diverse data sources. |
Bio: | Jim Gray is part of Microsoft's research group. His work focuses on databases and transaction processing. Jim is active in the research community, is an ACM, NAE, NAS, and AAAS Fellow, and received the ACM Turing Award for his work on transaction processing. He edits of a series of books on data management, and has been active in building online databases like http://TerraService.net and http://SkyServer.sdss.org. |
18 October 2004, 11:00 PM
Title: | Database replication for commodity database services (PDF) |
Speaker: | Gustavo Alonso, ETH Zürich |
Abstract: |
Database
replication
is
a
complex
topic
that
has
attracted
many
researchers
but
where
there
are
comparatively
few
working
systems.
In
this
talk
I
will
review
several
projects
in
our
group
that
aim
at
implementing
feasible
replication
solutions
that
combine
performance
and
consistency.
The
talk
will
review
database
replication
as
a
first
step
in
identifying
the
problems
that
need
to
be
addressed.
This
introduction
will
be
followed
by
a
review
of
commercial
systems
and
related
work
to
describe
the
current
state
of
the
art.
Then
the
talk
will
present
Ganymed,
our
most
recent
replication
platform.
Ganymed
uses
a
lazy
replication
strategy
that
nevertheless
appears
to
be
eager
to
the
clients
using
the
replicated
database.
Unlike
other
middleware
based
approaches,
Ganymed
has
a
very
small
footprint
and
imposes
no
restrictions
on
how
transactions
are
submitted
to
the
system,
how
the
data
is
partitioned
or
the
type
of
queries
involved.
Ganymed
is
being
used
to
scale
out
commercial
engines
(Oracle,
DB2)
by
replicating
the
contents
using
open
source
databases
(PostgreSQL).
It
is
also
the
basis
for
a
number
of
ongoing
projects
on
autonomic
scale
out
and
on
providing
database
services
as
commodity
services.
In
the
talk
I
will
also
discuss
these
projects
and
how
the
approach
taken
in
Ganymed
is
a
fundamental stepping stone to achieve these goals. |
Bio: | Gustavo Alonso is professor in the Department of Computer Science at the Swiss Federal Institute of Technology in Zurich (ETHZ). Gustavo Alonso is from Madrid, Spain, where he completed in 1989 his undergraduate studies in Telecommunications Engineering at the Madrid Technical University (UPM-ETSIT). As a Fulbright student, he did his graduate studies in computer science (M.S. 1992, Ph.D. 1994) in the University of California at Santa Barbara. After graduating, he was a visiting scientist in the IBM Almaden Research Laboratory in San Jose, California, where he worked within the Exotica project in areas such as workflow management and transaction processing. In September 1995 he joined ETH where he has since then lead several projects in databases, workflow management, replication, and advanced applications. Currently, Gustavo Alonso leads the Information and Communication Systems Research Group. The research interests of the group include Web Services, grid and cluster computing, databases, workflow management, scientific applications of database and workflow technology (for geographic, astronomical, and biochemical data), pervasive computing and dynamic aspect oriented programming. Gustavo Alonso is co-author of a recently published book on Web Services (Springer Verlag, Berlin 2004, ISBN 3-540-44008-9) and has participated in numerous conferences, panels and projects related to the topic. He also regularly works as an independent consultant in enterprise application integration, Web Services, and middleware projects. |
22 November 2004, 11:00 AM
Title: | A Call to Regularity (PDF) |
Speaker: | Moshe Vardi, Rice University |
Abstract: |
From the mid 1980s to the mid 1990s, a major theme in database theory was the study of Datalog, the language of logic programs without function symbols. Unfortunately, most decision problems involving Datalog turned out to be undecidable. Furthermore, well-behaved fragments of Datalog turned out to be rather restrictive and unnatural. Surprisingly, a natural and quite general fragment of Datalog did emerge in the mid 1990s, in the context of semistructured data. This fragment is the class of regular queries, whose basic element is that of regular path query. In his talk I will describe the class of regular queries and its well-behavedness in the context of view-based query processing. |
Bio: | Moshe Y. Vardi is the George Professor in Computational Engineering and Director of the Computer and Information Technology Institute at Rice University. He chaired the Computer Science Department at Rice University from January 1994 till June 2002. Prior to joining Rice in 1993, he was at the IBM Almaden Research Center, where he managed the Mathematics and Related Computer Science Department. His research interests include database systems, computational-complexity theory, multi-agent systems, and design specification and verification. Vardi received his Ph.D. from the Hebrew University of Jerusalem in 1981. He is the author and co-author of over 250 technical papers, as well as a book titled "Reasoning about Knowledge". Vardi is the recipient of three IBM Outstanding Innovation Awards and a co-winner of the 2000 Goedel Prize. He holds honorary doctorates from the University of Saarland, Germany, and the University of Orleans, France. Vardi is an editor of several international journals and the president of the International Federation of Computational Logicians. He is a Fellow of the Association of Computing Machinery, the American Association for the Advancement of Science, and the American Association for Artificial Intelligence. He was designated Highly Cited Researcher by the Institute for Scientific Information, and was elected as a member of the US National Academy of Engineering and the European Academy of Sciences. |
17 January 2005, 11:00 AM
Title: | Privacy-Preserving Piracy Prevention |
Speaker: | Dennis Shasha, New York University |
Abstract: |
Software,
whether
computer
programs,
films,
books,
or
DVDs,
is
currently
easy
to
steal.
It
can
be
copied,
modified,
and
run
in
spite
of
the
best
protections
available.
The
net
result
is
that
software
authors
and
performing
artists
are
often
not
paid
by
the
consumers
of
their
work.
Efforts
to
combat
this
theft
have
been
based
on
technologies
that
don't
work
and
that
entail
the
threat
of
criminal
penalties
and
gross
invasions
of
privacy.
The Shield system is a flexible framework that can prevent most piracy, preserve privacy, and allow even more flexibility than current systems provide (e.g., no regional zones for DVDs). The framework requires that the operating system and hardware protect the piracy detection software which performs episodic communications with a security infrastructure. The protocols are designed to prevent anyone from learning which software a user device has, even when the user device attempts to use pirated software. Joint work with Michael Rabin |
Bio: |
Dennis
Shasha
is
a
professor
of
computer
science
at
the
Courant
Institute
of
NYU
where
he
works
with
biologists
on
pattern
discovery
for
microarrays,
combinatorial
design,
and
network
inference;
and
with
physicists
and
financial
people
on
algorithms
for
time
series.
Other
areas
of
interest
include
database
tuning,
tree
and
graph
matching,
and
cryptographic
file
systems.
In
his
spare
time,
he
has
written
three
books
of
puzzles,
a
biography
of
great
computer
scientists,
and
technical
books
about
database
tuning,
biological
pattern
recognition
and
a
book
on
time
series.
He
also
writes
the
puzzle
column
for
Scientific
American
and
Dr.
Dobb's
Journal.
After graduating from Yale in 1977, he worked for IBM designing circuits and microcode for the 3090. He completed his Ph.D. at Harvard in 1984 (thesis advisor: Nat Goodman). Since he enjoys typing, he has written a few books: a professional reference book Database Tuning: principles, experiments, and troubleshooting techniques (2002, Morgan Kaufmann), three books about a mathematical detective named Dr. Ecco entitled The Puzzling Adventures of Dr. Ecco (1988, Freeman, and republished in 1998 by Dover), Codes, Puzzles, and Conspiracy (1992, Freeman), Dr. Ecco's Cyberpuzzles (2002, W. W. Norton), a book of biographies about great computer scientists called Out of Their Minds: the lives and discoveries of 15 great computer scientists (1995, Copernicus/Springer-Verlag), and Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications published in 1999 by Oxford University Press. In addition, he has co-authored thirty journal papers, 40 conference papers, and six patents. He writes a monthly puzzle column for Scientific American and Dr. Dobb's Journal. |
21 February 2005, 11:00 AM, DC 1302 (Please note the special place)
Title: | Purpose-Based Database Access Control for Privacy Protection |
Speaker: | Elisa Bertino, Cerias, Purdue University |
Abstract: |
As
privacy
becomes
a
major
concern
for
both
consumers
and
enterprises,
many
privacy
protecting
access
control
models
have
been
proposed.
Privacy
protection
however
cannot
be
achieved
by
traditional
access
control
models.
The
main
reason
is
that
the
nature
of
privacy
does
not
fit
into
traditional
access
control
models.
While
traditional
access
control
policies
mainly
concern
which
users
perform
which
action
on
which
data
object,
privacy
policies
only
concern
which
data
object
is
used
for
what
purpose(s).
For
example,
a
typical
privacy
policy
such
as
"we
will
collect
and
use
customer
identifiable
information
for
billing
purposes
and
to
anticipate
and
resolve
problems
with
your
service"
does
not
state
who
can
access
the
customer
information,
but
states
that
the
information
can
be
accessed
for
the
purposes
of
billing,
customer
service,
and
possibly
some
analysis.
Another
difficulty
of
privacy
protection
is
that
definitions
of
privacy
vary
from
individual
to
individual.
In
this
talk
we
explore
issues
related
to
annotating
data
with
purpose-related
metadata
and
to
using
these
metadata
to
support
new
forms
of
access
control.
This is joint work with Ninghui Li and Ji-Won Byun (CERIAS, Purdue University) |
Bio: |
Elisa
Bertino
is
professor
of
Computer
Science
and
of
Electrical
and
Computer
Engineering
(joint
appointmennt)
at
Purdue
University
and
serves
as
Research
Director
of
CERIAS.
She
is
also
a
faculty
member
at
Department
of
Computer
Science
and
Communication
of
the
University
of
Milan
where
she
is
the
director
of
the
DB&SEC
laboratory.
She
has
been
a
visiting
researcher
at
the
IBM
Research
Laboratory
(now
Almaden)
in
San
Jose,
at
the
Microelectronics
and
Computer
Technology
Corporation,
at
Rutgers
University,
at
Telcordia
Technologies.
Her main research interests include security, privacy, database systems, object-oriented technology, multimedia systems. In those areas, Prof. Bertino has published more than 250 papers in all major refereed journals, and in proceedings of international conferences and symposia. Her research has been funded by severa; entities and organizations both in USA and Europe, including NSF, the European Union under the 5th and 6th Research Program Framework, IBM, Telcordia, Microsoft, the Italian Telecom. She is a co-author of the books "Object-Oriented Database Systems - Concepts and Architectures" 1993 (Addison-Wesley International Publ.), "Indexing Techniques for Advanced Database Systems" 1997 (Kluwer Academic Publishers), and "Intelligent Database Systems" 2001 (Addison-Wesley International Publ.). She is a co-editor in chief of the Very Large Database Systems (VLDB) Journal and a member of the advisory board of the IEEE Transactions on Knowledge and Data Engineering. She serves also on the editorial boards of several scientific journals, incuding IEEE Internet Computing, ACM Transactions on Information and System Security, IEEE Transactions on Secure and Dependable Computing, Journal, the Journal of Computer Security, Data & Knowledge Engineering, the International Journal of Cooperative Information Systems, Science of Computer Programming. She has been consultant to several Italian companies on data management systems and applications and has given several courses to industries. She is involved in several projects sponsored by the EU. She has served as Program Committee members of several international conferences, such as ACM SIGMOD, VLDB, ACM OOPSLA, as Program Co-Chair of the 1998 IEEE International Conference on Data Engineering (ICDE), as program chair of 2000 European Conference on Object-Oriented Programming (ECOOP 2000), as program chair of the 7th ACM Symposium of Access Control Models and Technologies (SACMAT 2002), and as program chair of the 2004 Extending Database Technology (EDBT 2004) Conference. Elisa Bertino is a Fellow member of IEEE and a Fellow member of ACM and has been been named a Golden Core Member for her service to the IEEE Computer Society. She received the 2002 IEEE Computer Society Technical Achievement Award for "For outstanding contributions to database systems and database security and advanced data management systems". |
11 April 2005, 11:30 AM, DC 1302 (Please note the special time and place)
Title: | Sketching Massive Distributed Data Streams |
Speaker: | Minos N. Garofalakis, Bell Labs |
Abstract: |
There
is
growing
interest
in
algorithms
for
querying
and
analyzing
massive,
continuous
data
streams
(that
is,
data
that
is
seen
only
once
and
in
a
fixed
order)
with
limited
memory
and
CPU-time
resources.
Such
streams
arise
naturally
in
emerging
large-scale
event
monitoring
applications;
for
instance,
network-operations
monitoring
in
large
ISPs,
where
usage
information
from
numerous
sites
needs
to
be
continuously
collected
and
analyzed
for
interesting
trends.
In
addition
to
memory-
and
time-efficiency
concerns,
the
inherently
distributed
nature
of
such
applications
also
raises
important
communication-efficiency
issues,
making
it
critical
to
carefully
optimize
the
use
of
the
underlying
network
infrastructure.
In this talk, I will give an overview of some key small-space pseudo-random sketching techniques for effective query processing over streaming data. The focus will be on some of our own efforts in the area, including our very recent work on communication-efficient continuous query tracking over distributed data streams. |
Bio: | Minos Garofalakis is a Member of Technical Staff in the Internet Management Research Department of Bell Labs, Lucent Technologies. He received his PhD in Computer Science from the University of Wisconsin-Madison in 1998. His current research interests lie in the areas of data streaming, data reduction and approximate query processing, network management, and XML databases. Minos has served on a number of program committees for international database and data-mining conferences, including ACM SIGMOD, ACM PODS, VLDB, and ACM SIGKDD; he also currently serves as an Associate Editor for the IEEE Data Engineering Bulletin. |
9 May 2005, 11:00 AM, DC 1302 (Please note the special place)
Title: | Progress Indicators and Future Directions |
Speaker: | Jeffrey F. Naughton, University of Wisconsin |
Abstract: | This talk will consist of two largely disjoint portions - the first will be a discussion of some recent work we have done on providing progress indicators for SQL queries. Many applications provide progress indicators (think of file downloads and the like), but providing similar functionality for SQL queries is perhaps surprisingly challenging and interesting, with close ties to query optimization. In the second part of the talk, time permitting, I will speculate on future directions for DBMS research. |
Bio: | Jeffrey F. Naughton earned a bachelor's degree in Mathematics from the University of Wisconsin-Madison, and a PhD in Computer Science from Stanford University. He served as a faculty member in the Computer Science Department of Princeton University before moving to the University of Wisconsin-Madison, where he is currently Professor of Computer Science. He has published over 100 technical papers, received the National Science Foundation's Presidential Young Investigator award in 1991, and was named an ACM Fellow in 2002. |
6 June 2005, 11:00 AM, DC 3313 (Please note the special place)
Title: | Distributed Versioning Techniques for Scaling the Back-end Database Tier in Dynamic Content Server Clusters |
Speaker: | Cristiana Amza, University of Toronto |
Abstract: |
In
this
presentation
I
will
introduce
and
compare
two
novel
database
replication
techniques
based
on
distributed
versioning
for
dynamic
scaling
and
availability
in
the
database
tier
of
dynamic
content
web
servers.
Our
techniques
provide
both
throughput
scaling
and
strong
data
consistency
(1-copy
serializability)
for
the
database
tier.
It
has
generally
been
believed
that
this
combination
is
hard
to
achieve
through
replication
because
of
the
growth
of
the
number
of
conflicts.
As
a
result,
state-of-the-art
dynamic
content
web
servers
rely
on
a
single
very
expensive
database
supercomputer
to
satisfy
the
volume
of
requests.
We exploit the different distributed data versions that naturally come about as a result of asynchronous data replication in order to increase concurrency by running conflicting transactions in parallel on different replicas. The two distributed versioning variations I will present have different concurrency versus transparency trade-offs. The first replication scheme avoids making any changes to the database or web server to support replication by implementing concurrency control and conflict avoidance in an intermediate scheduler tier. In the second scheme, the role of the scheduler tier is minimal, while the database tier is modified to closely integrate replication with fine-grained concurrency control. Our evaluation of distributed versioning replication uses the TPC-W e-commerce benchmark. For small clusters of up to eight database replicas, we perform measurements of a web site implementing the TPC-W specification. We use simulation to extend our measurement results to larger clusters. We show close to ideal scaling for the most common e-commerce workload, the shopping mix, up to 8 database replicas in the experiments and good scaling to up to 60 database replicas in simulation. |
Bio: | Cristiana Amza obtained her Ph.D. in Computer Science from Rice University in May 2003, her M.S. in Computer Science also from Rice University in 1997 and her B.S. in Computer Engineering from Bucharest Polytechnic Institute in 1991. Her Ph.D. thesis work focused on scaling dynamic content web servers using database replication on a commodity cluster. Dr. Amza joined the University of Toronto in October 2003 as an Assistant Professor in the Department of Electrical and Computer Engineering, where she is continuing her research on dynamic content servers with a new emphasis on automatic reconfiguration for performance tuning and failure transparency. Due to the potential commercial applications of her research, she is currently collaborating with the IBM Toronto Lab Tivoli and DB2 High Availability teams through a recently initiated IBM CAS program. |
20 June 2005, 11:00 AM
Title: | Efficient Top-k Queries for XML Information Retrieval |
Speaker: | Gerhard Weikum, Max-Planck-Institut für Informatik |
Abstract: |
Non-schematic
XML
data
that
comes
from
many
different
sources
and
inevitably
exhibits
heterogeneous
structures
and
annotations
(i.e.,
XML
tags)
cannot
be
adequately
searched
using
database
query
languages
like
XPath
or
XQuery.
Often,
queries
either
return
too
many
or
too
few
results.
Rather
the
ranked-retrieval
paradigm
is
called
for,
with
relaxable
search
conditions,
various
forms
of
similarity
predicates
on
tags
and
contents,
and
quantitative
relevance
scoring.
The method of choice for top-k similarity queries is the TA family of threshold algorithms. Large disk-resident index lists for content terms and tag names strongly suggest using the TA variant with sorted access only for disk I/O efficiency. This method maintains score intervals for top-k candidates and performs a conservative threshold test based on upper bounds of candidate scores. The talk will present the TA-prob method which applies a probabilistic threshold test for fast approximation of top-k results. It is based on score predictors that compute convolutions of score distributions in the yet to be scanned tails of index lists. This methods provides an order of magnitude improvement in run-time at high and probabilistically controllable levels of precision and recall relative to the TA baseline. TA-prob has been further enhanced to cope with ontology-based query expansion; by using a new incremental merge technique over index lists for expansion terms, it is much more efficient than the traditional techniques that expand queries before execution starts. TA-prob is also the major building block in the TopX search engine for XML data. TopX supports path conditions for all XPath axes as exact constraints and content term conditions as well as ontology-based relaxation of terms and tag names as similarity conditions for ranked retrieval. |
Bio: |
Gerhard
Weikum
is
a
Research
Director
at
the
Max-Planck
Institute
of
Computer
Science
in
Saarbruecken,
Germany,
where
he
is
leading
the
research
group
on
databases
and
information
systems.
He
is
also
the
spokesperson
of
the
International
Max-Planck
Research
School
(IMPRS)
for
Computer
Science.
He
received
his
diploma
and
doctoral
degrees
from
the
University
of
Darmstadt,
Germany,
in
1982
and
1986.
Dr. Weikum's recent working areas include implementation, optimization, and self-organization aspects of distributed information systems such as peer-to-peer systems, dependable workflow management, and intelligent search and organization of semi-structured data on the Web and in intranets. Earlier he held positions at the University of the Saarland in Saarbruecken, Germany, ETH Zurich, Switzerland, with MCC in Austin, Texas, and he was a visiting senior researcher at Microsoft Research in Redmond, Washington. Dr. Weikum has published more than 100 refereed articles in conferences and journals. He has written a textbook on Transactional Information Systems, jointly authored with Gottfried Vossen and published with Morgan Kaufmann. A paper on self-tuning load control for transaction processing systems that appeared in the 1992 VLDB conference has been awarded with the 2002 VLDB 10-year award for the most influential paper of the conference ten years ago. Dr. Weikum has served on the editorial boards of various journals including ACM TODS, the premier scientific journal of the database research community, and IEEE-CS TKDE. He is president of the board of trustees of the VLDB Endowment, the steering committee of the International Conference on Very Large Data Bases, one of the two leading international conferences on database and information systems research. He has served on committees of numerous conferences in various roles, including the General co-chair of the 2nd Semiannual Conference on Innovative Data Systems Research (CIDR) in 2005, the PC chair of the ACM SIGMOD International Conference on Management of Data in 2004, the PC co-chair of the IEEE-CS International Conference on Data Engineering in 2000, and the organization of several Dagstuhl Seminars. Dr. Weikum was an invited tutorial speaker at the 5th International Conference on Database Theory (ICDT) in 1995, and an invited keynote speaker at the 6th International Conference on Extending Database Technology (EDBT) in 1998 and the 23rd International Conference on Conceptual Modeling in 2004, as well as the French, German, Indian, and Italian Database Conferences. |
11 July 2005, 11:00 AM
Title: | Top-N Query Processing in an Unstructured P2P System |
Speaker: | Patrick Valduriez, INRIA-Rennes and Lina, University of Nantes |
Abstract: |
Top-n
queries
are
attractive
for
users
in
P2P
systems
with
very
large
numbers
of
peers
but
difficult
to
support
efficiently.
Solutions
that
rely
on
centralized
statistics
no
longer
apply
in
the
general
case.
In
this
talk,
we
describe
a
fully
distributed
algorithm
for
executing
Top-n
queries
in
an
unstructured
P2P
system.
This
work
is
done
in
the
context
of
APPA
(Atlas
P2P
Architecture),
a
P2P
data
management
system
we
are
building.
We present our algorithm for the case of unstructured systems, thus with minimal assumptions, so it could also apply to other P2P systems as well (DHT, super-peer). Our algorithm requires no global information, does not depend on the existence of certain peers and addresses the volatility of peers during query execution. We validated our algorithm through implementation over a 64-node cluster and simulation using the BRITE topology generator and SimJava. Our performance evaluation shows that our algorithm has logarithmic scale up and improves Top-n query response time very well using P2P parallelism. Joint work with Reza Akbarinia and Esther Pacitti |
Bio: |
Patrick
Valduriez
is
Director
of
Research
at
INRIA,
the
national
research
center
for
computer
science
in
France
and
scientific
director
of
the
Atlas
research
group
(joint
team
between
INRIA
and
LINA,
University
of
Nantes)
pursuing
research
in
data
management
in
distributed
systems.
He
received
his
Ph.
D.
degree
and
Doctorat
d'Etat
in
Computer
Science
from
the
University
Pierre
et
Marie
Curie
(Paris
6)
in
1981
and
1985,
respectively.
Between
1999-2002,
he
was
a
professor
of
computer
science
at
University
Paris
6,
on
leave
from
INRIA.
From
1989
to
1999,
he
led
the
Rodin
group
working
on
distributed
database
and
multimedia
technology.
From
1995
to
2000,
he
also
led
Dyade,
the
R&D
joint
venture
between
Bull
and
INRIA
to
foster
technology
transfer
in
the
areas
of
Internet
(50
people).
In
Dyade,
he
contributed
to
the
design
and
the
transfer
of
the
Disco
(Distributed
Information
Search
Component)
prototype
to
the
KelKoo
startup.
From
1985
to
1989,
he
was
a
senior
scientist
at
Microelectronics
and
Computer
Technology
Corp.
(MCC)
in
Austin,
Texas.
There
he
participated
in
the
design
and
implementation
of
the
Bubba
parallel
database
system. |
25 July 2005, 11:00 AM, DC 1302 (Please note the different location)
Title: | Preventive Replication in a Database Cluster |
Speaker: | Esther Pacitti, Université de Nantes |
Abstract: | In a database cluster, preventive replication can provide strong consistency without the limitations of synchronous replication. However, the original proposal [9] assumes full replication and has performance limitations. In this paper, we address these two limitations in order to scale up to large cluster configurations. Our first contribution is a refreshment algorithm that reduces the delay introduced by the algorithm and prevents inconsistencies for partially replicated databases. Our second contribution is an optimization that improves transaction throughput. We describe the implementation of our algorithm in our RepDB* prototype over a cluster of 64 nodes running PostgreSQL. Our experimental results using the TPC-C Benchmark show that our algorithm has excellent scale up and speed up. |
Bio: | Esther Pacitti is associate professor at the University of Nantes. She obtained her Ph.D. degree in 1998 from PUC-Rio, Brazil. Her current research interests include data management in distributed systems, in particular, replication and query processing in cluster and P2P systems. |