Some stuff » mathbb

a problem of moments

admin — Sun, 07 Oct 2012 07:59:39 +0000

We would like to prove the following fact:

For any non-negative random variable \(X\) having finite first and second moments, \(\mathbb P(X>0) \ge (\mathbb EX)^2/\mathbb EX^2\).

The proof isn’t difficult. Here are three different ones.

Proof 1. We already know from Jensen’s inequality that \(\mathbb E f(X) \ge f(\mathbb E X)\) if \(f\) is convex. This gives \((\mathbb EX)^2/\mathbb EX^2 \le 1\) for any \(X\). The trick to make it be \(\le \mathbb P(X>0)\) is to note that the density at \(X=0\) contributes nothing to any moment. In particular, if \(F_X(t)\) is the distribution function of \(X\), then define a random variable \(Y\) that is \(X\) without the probability at zero, that is, according to the distribution function \(F_Y(t)=(F_X(t)-F_X(0))/(1-F_X(0))\). Then Jensen’s gives \((\mathbb EY)^2/\mathbb EY^2 \le 1\). However, \(\mathbb EY = \mathbb EX / (1-F_X(0)) = \mathbb EX / \mathbb P(X>0)\), and \(\mathbb EY^2 = \mathbb EX^2 / (1-F_X(0)) = \mathbb EX^2 / \mathbb P(X>0)\), so \((\mathbb EX)^2/\mathbb EX^2 = [(\mathbb EY)^2 \mathbb P(X>0)^2] / [\mathbb EY^2 \mathbb P(X>0)] \le \mathbb P(X>0)\). \(\blacksquare\)

The statement would also work for non-positive \(X\), of course; and an analogous statement can be made for arbitrary \(X\) comparing \(\mathbb P(X\ne 0)\) with some combination of moments for the positive and negative parts of \(X\).

Proof 2. Apparently this problem can also be proved by an application of the Cauchy-Schwarz inequality. Assume the probability space \((\Omega, \mathcal F, \mathbb P)\). The space of finite second-moment real-valued random variables \(L_2(\Omega)=\{X(\omega):\Omega \to R\}\) with the inner product \(\langle X,Y\rangle_{L_2(\Omega)}=\mathbb E XY\) and induced norm \(\Vert X\Vert_{L_2(\Omega)}=\sqrt{\mathbb EX^2}\) is a Hilbert space (modulo \(L_2\) equivalence). Given this, let us apply Cauchy-Schwarz on the two random variables \(X\) and \(\mathbf 1_{X>0}\):

\(\langle X, \mathbf 1_{X>0} \rangle^2 \le \Vert X \Vert^2 \Vert \mathbf 1_{X>0} \Vert^2\), by Cauchy-Schwarz
\((\mathbb E X \mathbf 1_{X>0})^2 \le \mathbb EX^2 \mathbb E\mathbf 1_{X>0}\), specializing to \(L_2(\Omega)\)
\((\mathbb E X)^2 \le \mathbb EX^2 \mathbb P(X>0)\), by noting that \(X = X \mathbf 1_{X>0}\). \(\blacksquare\)

This is a special case of something called the Paley-Zygmund inequality. I didn’t know such a thing existed.

Proof 3. This one only proves the discrete case. It is well known that for positive discrete random variables \(X\), \(\mathbb EX = \sum_{k=0}^\infty \mathbb P(X>k) = \mathbb P(X>0)+\mathbb P(X>1)+\cdots\). Basically \(\mathbb P(X=1)\) is counted once, \(\mathbb P(X=2)\) is counted twice, and so on. The analogous thing can be derived for \(\mathbb EX^2\), except now we need to count in squares. Happily we also know that squares accumulate by odd integers, i.e. \(n^2=1+3+5+\cdots+(2n-1)\), so \(\mathbb EX^2 = \sum_{k=0}^\infty \mathbb (2k+1) \mathbb P(X>k) = \mathbb P(X>0)+3\mathbb P(X>1)+5\mathbb P(X>2)+\cdots\).

Let’s simplify the notation a bit. Put \(q_k=\mathbb P(X>k)\), then \(q_0\ge q_1\ge q_2 \ge \cdots\). We just need to prove that \(q_0\ge (q_0+q_1+q_2+\cdots)^2 / (q_0+3q_1+5q_2+\cdots)\), which is to say, \((q_0+q_1+q_2+\cdots)^2 \le q_0(q_0+3q_1+5q_2+\cdots)\). The two sides both have limits, so this just requires some accounting. On the left hand side, \((q_0+q_1+q_2+\cdots)^2\) expands to \(q_0^2+(q_1^2+2q_0q_1)+(q_2^2+2q_0q_2+2q_1q_2)+\cdots = Q_0+Q_1+Q_2+\cdots\), where \(Q_k \triangleq q_k^2 + 2 \sum_{i=0}^{k-1} q_iq_k \le (2k+1)q_0q_k \triangleq R_k\). But \(R_0+R_1+R_2+\cdots\) is exactly the right hand side. So the left hand sum is dominated by the right hand sum. \(\blacksquare\)

With some real analysis, this proof could be made to work for random variables that are not discrete, but it might also turn into a special case of Proof 1. In any case, it’s interesting in its own right.

data structure problem

admin — Fri, 09 Mar 2012 04:50:41 +0000

Another problem by fakalin.

A data structure has the entropy bound if all queries have amortized time \(O(\sum_k p_k \log 1/p_k)\), where \(p_k\) is the fraction of the time that key \(k\) is queried. It has the working-set property if the time to search for an element \(x_i\) is \(O(\log t_i)\), where \(t_i\) is the number of elements queried since the last access to \(x_i\). Prove that the working-set property implies the entropy bound.

This isn’t really a data structure problem, per se.

The general intuition here is that, if the waiting time between two queries to a key \(k\) is \(t(k)\), then key \(k\) ends up taking up about a \(p_k = 1/t(k)\) fraction of the queries, and therefore the average query time is about \(\sum_{k\in K} 1/t(k) (\log t(k))\).

While this is exactly true for evenly spaced-out queries, the general case only requires a slight modification using any of the rudimentary convex inequalities such as:

Jensen’s inequality: If \(f\) is a convex function and \(X\) is a random variable, then \(\mathbb{E}f(X)\ge f(\mathbb{E}X)\).

(The proof just uses the definition of what a convex function is. Furthermore, if we recognize that \(-\log x\) is a convex function, then we get \(\mathbb{E}\log(X)\le \log(\mathbb{E}X)\), which restates the well-known fact that the geometric mean is less than or equal to the arithmetic mean of a collection of real numbers.)

Now, let \(N\) be the total number of queries. Let \(n(k)\) be the number of queries on key \(k\). Let \(t_i(k)\) be the time between the \(i\)th query on key \(k\) and the previous query on the same key. Let \(\bar{a}(k)< C \sum_{i=1}^{n(k)} \log t_i(k) / n(k)\) (for some \(C>0\)) be the average query time looking up key \(k\), as guaranteed by the working-set property (*). Let \(\bar{t}(k) = \sum_{i=1}^{n(k)} t_i(k) / n(k)\) be the average time between queries for key \(k\).

Note that we must have \(\bar{t}(k) \le N/n(k)\). Furthermore, \(p_k = n(k)/N\) by definition, hence \(\bar{t}(k) \le 1/p_k\) (**). The average query time over all keys is therefore:

\(\sum_k \bar{a}(k) n(k) / N\)
\(< \sum_k [C \sum_{i=1}^{n(k)} \log t_i(k) / n(k)] [n(k) / N]\), by (*)
\(\le C \sum_k [\log \sum_{i=1}^{n(k)} t_i(k) / n(k)] [n(k) / N]\), by Jensen’s inequality
\(= C \sum_k [\log \bar{t}(k)] p_k\)
\(\le C \sum_k p_k \log 1/p_k\), by (**)

∎

a polynomial problem

admin — Mon, 22 Nov 2010 21:52:13 +0000

The latest problem from fakalin. Took some wrong turns and hints to solve it…

Given a polynomial \(f: \mathbb{Z}\to \mathbb{Z}\) with positive integer coefficients, how many evaluations of \(f\) does it take to obtain the polynomial?

(An \(f: \mathbb{R}\to \mathbb{R}\) polynomial with real coefficients would take the number of degrees plus 1 to specify, which, if it held in this case, would render the answer unbounded. But the correct answer in this case is surprisingly small.)

It takes only 2 evaluations. Suppose in the following that \(b>0\). Let us note that a polynomial \(f(b) = a_n b^n + … + a_0 b^0\) specifies essentially a base \(b\) representation of the number \(f(b)\), in that \(a_n a_{n-1} … a_0\) is an expansion of \(f(b)\) in base \(b\). The only problem is this expansion is non-unique, as it is possible for any \(a_j \ge b\).

However, it is not possible for any \(a_j \ge f(b) + 1\), since for all \(j\), \(f(b) \ge a_j\) by the problem statement and assumption on \(b\). Then take \(B = f(b) + 1\). Now we are guaranteed that \(a_n a_{n-1} … a_0\) is the unique (and canonical) base \(B\) expansion of \(f(B)\), from which the polynomial coefficients immediately obtain.

So the two evaluations are at \(f(b)\) and \(f(B=f(b)+1)\).

Example: \(f(b) = 3b^2 + 2b + 1\). Evaluate at, e.g., \(b=1\) to get \(f(1) = 6\). Then evaluate at \(B=f(1)+1=7\) to get \(f(7)=162=321_{7}\).