Oblivious Subspace Embedding

In an earlier post, we saw how the Johnson-Lindenstrauss Transform can reduce the dimensionality of a set of m points while preserving their pairwise distances. That's neat, but what if we need to preserve the pairwise distance between all (infinitely many) points in an entire subspace? This post introduces the concept of subspace embedding, which answers that question.

Epsilon Nets

A common technique in arguments in sketching is to appeal to the concept of an Epsilon Net. This article explains what that is and why it always exists.

Concentration of Sub-Exponential Random Variables

This post reviews the definition of sub-exponential random variables and discusses useful tail bounds (Bernstein) for this family.

Concentration of Sub-Gaussian Random Variables

This post reviews basic tail bounds (Markov, Chebyshev, Chernoff, Hoeffding) and explains the concept of sub-Gaussian random variables.

Johnson-Lindenstrauss Transform

The JL Lemma offers a simple and elegant machinery to reduce the dimensionality of a fixed set of high-dimensional points while approximately preserving their pairwise Euclidean distance. This post explains the concept and describes simple randomized constructions of such a transformation.

Schatten Norm Approximation

Randomized Numerical Linear Algebra is a gorgeous field that embraces chaos and whose results often seem wonderfully magical. My very first post on this subject explains a beautiful algorithm to approximate Schatten norms of a matrix.

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

This article describes a novel retrieval algorithm developed for collections of sparse vectors. This work, which is published in the proceedings of SIGIR'24, is the culmination of a joint collaboration with my brilliant colleagues Rossano Venturini (of the University of Pisa), Cosimo Rulli and Franco Maria Nardini (both of the Italian National Research Council, ISTI-CNR). Our proposed algorithm marks a seismic shift in retrieval over learnt sparse embeddings of text collections in accuracy and speed. We are all equally proud of this paper!

Foundations of Vector Retrieval

A monograph devoted to the theoretical foundations of nearest neighbor search, with a discussion of key data structures and algorithms from this vast literature.