11 September 2024

After more than a decade in the tech industry, I'm returning to academia. Here's why.

1Concentration of Martingale Difference Sequences

09 September 2024

Sub-exponentialness is such an important property, as we saw in earlier posts. We learnt, for example, that sums of independent sub-exponential random variables are themselves sub-exponential random variables, allowing us to extend concentration results to sums. In this post, I'll review another important result that allows us to apply the same results (such as Hoeffding's inequality) to *functions* of independent random variables, under certain conditions.

25 August 2024

In an earlier post, we saw how the Johnson-Lindenstrauss Transform can reduce the dimensionality of a set of m points while preserving their pairwise distances. That's neat, but what if we need to preserve the pairwise distance between all (infinitely many) points in an entire subspace? This post introduces the concept of subspace embedding, which answers that question.

325 August 2024

A common technique in arguments in sketching is to appeal to the concept of an Epsilon Net. This article explains what that is and why it always exists.

4Concentration of Sub-Exponential Random Variables

23 August 2024

This post reviews the definition of sub-exponential random variables and discusses useful tail bounds (Bernstein) for this family.

5Concentration of Sub-Gaussian Random Variables

15 August 2024

This post reviews basic tail bounds (Markov, Chebyshev, Chernoff, Hoeffding) and explains the concept of sub-Gaussian random variables.

6Johnson-Lindenstrauss Transform

12 August 2024

The JL Lemma offers a simple and elegant machinery to reduce the dimensionality of a fixed set of high-dimensional points while approximately preserving their pairwise Euclidean distance. This post explains the concept and describes simple randomized constructions of such a transformation.

729 July 2024

Randomized Numerical Linear Algebra is a gorgeous field that embraces chaos and whose results often seem wonderfully magical. My very first post on this subject explains a beautiful algorithm to approximate Schatten norms of a matrix.

8Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

11 July 2024

This article describes a novel retrieval algorithm developed for collections of sparse vectors. This work, which is published in the proceedings of SIGIR'24, is the culmination of a joint collaboration with my brilliant colleagues Rossano Venturini (of the University of Pisa), Cosimo Rulli and Franco Maria Nardini (both of the Italian National Research Council, ISTI-CNR). Our proposed algorithm marks a seismic shift in retrieval over learnt sparse embeddings of text collections in accuracy and speed. We are all equally proud of this paper!

9Foundations of Vector Retrieval

01 May 2024

A monograph devoted to the theoretical foundations of nearest neighbor search, with a discussion of key data structures and algorithms from this vast literature.

10