Despite the popularity of storing mathematical objects on the web, searching for mathematical expressions is extremely limited. Conventional retrieval systems are inadequate for mathematical expressions, because they are not tuned for text with complex structures that include only a few distinct terms. Surprisingly current approaches to the problem of retrieving mathematical information do not include a formal definition of the similarity between two expressions, and thus fail to find many relevant documents. In this paper, we present steps to advance mathematics retrieval to incorporate best practices from modern information retrieval. We first review encodings of mathematical expressions currently found on the web, and present the results of our efforts to create an experimental testbed. We formally define the similarity between two mathematical expressions and present the problem of searching for similar mathematical expressions.
