Georgia Institute of Technology
Real-Time Streaming String-Matching
Abstract: One of the most fundamental processes in computing is finding all instances of a pattern P, of length m, in an unpreprocessed text T, of length n. A number of algorithms have been published, including methods that work in real time (i.e., constant time per symbol), with constant additional space and without random access, and even on a finite automaton with one head reading T and 5 heads reading P.
A key area of application is in inspecting messages as they “stream by,” perhaps through a switch in the Internet. Under this streaming model neither the pattern nor parts of the text are stored and only small amount of space is available. Simple information theoretic arguments imply that no deterministic algorithm is possible in the streaming model and one can only hope for a randomized algorithm which makes errors with small probability. The algorithms mentioned above are not streaming since they store the pattern.
In 2009 Porat and Porat described a surprising result: a randomized O(n log m) streaming algorithm that uses O(log m) space. Each space unit is a register of O(log n) bits. The pattern is first preprocessed and certain information is computed and stored in O(log m) space. Porat and Porat do not give a streaming algorithm for the preprocessing and it is not clear whether such an algorithm is possible for their preprocessing. In addition their randomized algorithm may have two sided errors with small probability; it may miss occurrences of the pattern as well as report occurrences that do not occur.
We design two algorithms for the problem. The first one, like Porat and Porat, is an O(n log m) time streaming randomized algorithms using O(log m) space. The preprocessing of our algorithm is a trivial real-time streaming algorithm. Our algorithm allows with small probability only one sided (false positive) errors, which are unavoidable. In particular, it never misses an occurrence. We then modify this algorithm to obtain a real-time O(log m) space streaming algorithm that uses a real time streaming preprocessing and allows only one sided errors. Porat and Porat used their algorithm as a black box to obtain streaming algorithms for the 1-mismatch and k-mismatch string matching problem. Using our algorithm we shave an O(log m) factor (an order of magnitude) from their time bounds.
This work is joint with Danny Breslauer.
Biography: Zvi Galil was born in Tel-Aviv, Israel. He earned his BS and MS degrees in Applied Mathematics from Tel Aviv University, both summa cum laude. He then obtained a PhD in Computer Science from Cornell University. After a post-doctorate in IBM's Thomas J. Watson research center, he returned to Israel and joined the faculty of Tel-Aviv University. He served as the chair of the Computer Science department in 1979-1982. In 1982 he joined the faculty of Columbia University. He served as the chair of the Computer Science Department in 1989-1994 and as dean of The Fu Foundation School of Engineering & Applied Science in 1995-2007.
Galil was appointed Julian Clarence Levi Professor of Mathematical Methods and Computer Science in 1987, and Morris and Alma A. Schapiro Dean of Engineering in 1995. In 2007 Galil returned to Tel Aviv University and served as president. In July 2010 he became The John P. Imlay, Jr. Dean of Computing at Georgia Tech.
Galil's research areas have been the design and analysis of algorithms, complexity, cryptography and experimental design. In 1983-1987 he served as chairman of ACM SIGACT, the Special Interest Group of Algorithms and Computation Theory.