11 September 2024
After more than a decade in the tech industry, I'm returning to academia. Here's why.
1Concentration of Martingale Difference Sequences
09 September 2024
Sub-exponentialness is such an important property, as we saw in earlier posts. We learnt, for example, that sums of independent sub-exponential random variables are themselves sub-exponential random variables, allowing us to extend concentration results to sums. In this post, I'll review another important result that allows us to apply the same results (such as Hoeffding's inequality) to functions of independent random variables, under certain conditions.
225 August 2024
In an earlier post, we saw how the Johnson-Lindenstrauss Transform can reduce the dimensionality of a set of m points while preserving their pairwise distances. That's neat, but what if we need to preserve the pairwise distance between all (infinitely many) points in an entire subspace? This post introduces the concept of subspace embedding, which answers that question.
325 August 2024
A common technique in arguments in sketching is to appeal to the concept of an Epsilon Net. This article explains what that is and why it always exists.
4Concentration of Sub-Exponential Random Variables
23 August 2024
This post reviews the definition of sub-exponential random variables and discusses useful tail bounds (Bernstein) for this family.
5Concentration of Sub-Gaussian Random Variables
15 August 2024
This post reviews basic tail bounds (Markov, Chebyshev, Chernoff, Hoeffding) and explains the concept of sub-Gaussian random variables.
6Johnson-Lindenstrauss Transform
12 August 2024
The JL Lemma offers a simple and elegant machinery to reduce the dimensionality of a fixed set of high-dimensional points while approximately preserving their pairwise Euclidean distance. This post explains the concept and describes simple randomized constructions of such a transformation.
729 July 2024
Randomized Numerical Linear Algebra is a gorgeous field that embraces chaos and whose results often seem wonderfully magical. My very first post on this subject explains a beautiful algorithm to approximate Schatten norms of a matrix.
8Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
11 July 2024
This article describes a novel retrieval algorithm developed for collections of sparse vectors. This work, which is published in the proceedings of SIGIR'24, is the culmination of a joint collaboration with my brilliant colleagues Rossano Venturini (of the University of Pisa), Cosimo Rulli and Franco Maria Nardini (both of the Italian National Research Council, ISTI-CNR). Our proposed algorithm marks a seismic shift in retrieval over learnt sparse embeddings of text collections in accuracy and speed. We are all equally proud of this paper!
9Foundations of Vector Retrieval
01 May 2024
A monograph devoted to the theoretical foundations of nearest neighbor search, with a discussion of key data structures and algorithms from this vast literature.
10