Seminar in Data Structures (0368-3386-01) Fall 08/09

The seminar will focus mainly on indexing text and compressed text. In the last third we will cover some papers on simplex range searching.

 

In the first part on strings we will use two main references as well as several recent papers. The two main references are

1.      Algorithms on Strings, Trees and Sequences / D. Gusfield

2.      Compressed full-text indexes / G. Navarro and V. Makinen

In the second part some of the material is in the chapter of the classical book on computation geometry by de Berg, van Kreveld, Overmars, and Schwarzkopf: Computational geometry: Algorithms and Applications.

Tentative schedule

Indexing text

·         The algorithm of McCreight for constructing the suffix tree. The original paper is  A Space-Economical Suffix Tree Construction Algorithm / McCreight JACM 23, 1976.  Presentation by Aviv Avitan.

·         Udi Manber and Gene Myers (1991). "Suffix arrays: a new method for on-line string searches". SIAM Journal on Computing, Volume 22, Issue 5 (October 1993), pp. 935-948. Presentation by Yael  Schneebaum.

·         Rank and select in binary and general sequences. Based on Sections 6.1, 6.2, and 6.3 from ref (2) above.  Presentation by Omer Cohen

·         Introduction to the Burrows-Wheeler transform and compression algorithms based on it. Here are three good sources. 1)  The original technical report of Burrows and Wheeler: A block-sorting lossless data compression algorithm. 2) A paper by Fenwick: Burrows Wheeler Compression: Principles and Reflections. 3) A paper by Landau, Verbin, and myself: A Simpler Analysis of Burrows-Wheeler Based Compression. Presentation by Inna Katz

·         Compressed suffix arrays. Sections 7 and 8 of ref (2) above and the original papers mentioned there. In particular this all started when the first draft of the paper compressed suffix arrays and suffix trees with applications to text indexing and string matching  by Grossi and Vitter appeared.  Presentation by Michal Ofir

·         Backwards search and the FM-index. Section 9 of ref (2) above and the original papers mentioned there. The main original paper here is Indexing compressed text by Ferragina and Manzini. Presentation by Yuval Rikover

·         Boosting textual compression in optimal linear time by Ferrangina, Giancarlo, Manzini, Sciortino, 2005.  Presentation by  Maor Itzkovitch

Simplex range searching

·         Cutting tree and the cutting lemma. The proof of the cutting lemma can be found in the book by Matousek: Lectures on Discrete Geometry, the rest is in Section 16.3 of the computational geometry book mentioned above.  Presentation by Amir Mograbi

·         Partition trees. The paper Efficient partition trees by Matousek and Sections 16.1 and 16.2 of ref (2).  Presentation by Benny Schlesinger and Omer Tavori

·         The paper Reporting points in halfspaces by Matousek.

·         Cutting hyperplanes for Divide-and-Conquer by Chazelle.