Substring selectivity estimation. H. V. Jagadish, Raymond T. Ng and Divesh Srivastava. With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees as the basic framework for substring selectivity estimation. We present a novel technique to obtain a good estimate for a given substring matching query, called MO (for Maximal Overlap), that estimates the selectivity of a query based on all maximal substrings of the query in the pruned count-suffix tree. We show that MO is provably better than the (independence-based) substring selectivity estimation technique proposed by Krishnan et al. [KVI96], called KVI, under the natural assumption that strings exhibit the so-called ``short memory'' property. We complement our analysis with an experiment, using a real AT&T data set, that demonstrates that MO is substantially superior to KVI in the quality of the estimate. Finally, we develop and analyze two selectivity estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given pruned count-suffix tree. We show that KVI, MO, MOC and MOLC illustrate an interesting tradeoff between estimation accuracy and computational efficiency.