Some stuff » sum

a problem of moments

admin — Sun, 07 Oct 2012 07:59:39 +0000

We would like to prove the following fact:

For any non-negative random variable $X$ having finite first and second moments, $\mathbb P(X>0) \ge (\mathbb EX)^2/\mathbb EX^2$.

The proof isn’t difficult. Here are three different ones.

Proof 1. We already know from Jensen’s inequality that $\mathbb E f(X) \ge f(\mathbb E X)$ if $f$ is convex. This gives $(\mathbb EX)^2/\mathbb EX^2 \le 1$ for any $X$. The trick to make it be $\le \mathbb P(X>0)$ is to note that the density at $X=0$ contributes nothing to any moment. In particular, if $F_X(t)$ is the distribution function of $X$, then define a random variable $Y$ that is $X$ without the probability at zero, that is, according to the distribution function $F_Y(t)=(F_X(t)-F_X(0))/(1-F_X(0))$. Then Jensen’s gives $(\mathbb EY)^2/\mathbb EY^2 \le 1$. However, $\mathbb EY = \mathbb EX / (1-F_X(0)) = \mathbb EX / \mathbb P(X>0)$, and $\mathbb EY^2 = \mathbb EX^2 / (1-F_X(0)) = \mathbb EX^2 / \mathbb P(X>0)$, so $(\mathbb EX)^2/\mathbb EX^2 = [(\mathbb EY)^2 \mathbb P(X>0)^2] / [\mathbb EY^2 \mathbb P(X>0)] \le \mathbb P(X>0)$. $\blacksquare$

The statement would also work for non-positive $X$, of course; and an analogous statement can be made for arbitrary $X$ comparing $\mathbb P(X\ne 0)$ with some combination of moments for the positive and negative parts of $X$.

Proof 2. Apparently this problem can also be proved by an application of the Cauchy-Schwarz inequality. Assume the probability space $(\Omega, \mathcal F, \mathbb P)$. The space of finite second-moment real-valued random variables $L_2(\Omega)=\{X(\omega):\Omega \to R\}$ with the inner product $\langle X,Y\rangle_{L_2(\Omega)}=\mathbb E XY$ and induced norm $\Vert X\Vert_{L_2(\Omega)}=\sqrt{\mathbb EX^2}$ is a Hilbert space (modulo $L_2$ equivalence). Given this, let us apply Cauchy-Schwarz on the two random variables $X$ and $\mathbf 1_{X>0}$:

$\langle X, \mathbf 1_{X>0} \rangle^2 \le \Vert X \Vert^2 \Vert \mathbf 1_{X>0} \Vert^2$, by Cauchy-Schwarz
$(\mathbb E X \mathbf 1_{X>0})^2 \le \mathbb EX^2 \mathbb E\mathbf 1_{X>0}$, specializing to $L_2(\Omega)$
$(\mathbb E X)^2 \le \mathbb EX^2 \mathbb P(X>0)$, by noting that $X = X \mathbf 1_{X>0}$. $\blacksquare$

This is a special case of something called the Paley-Zygmund inequality. I didn’t know such a thing existed.

Proof 3. This one only proves the discrete case. It is well known that for positive discrete random variables $X$, $\mathbb EX = \sum_{k=0}^\infty \mathbb P(X>k) = \mathbb P(X>0)+\mathbb P(X>1)+\cdots$. Basically $\mathbb P(X=1)$ is counted once, $\mathbb P(X=2)$ is counted twice, and so on. The analogous thing can be derived for $\mathbb EX^2$, except now we need to count in squares. Happily we also know that squares accumulate by odd integers, i.e. $n^2=1+3+5+\cdots+(2n-1)$, so $\mathbb EX^2 = \sum_{k=0}^\infty \mathbb (2k+1) \mathbb P(X>k) = \mathbb P(X>0)+3\mathbb P(X>1)+5\mathbb P(X>2)+\cdots$.

Let’s simplify the notation a bit. Put $q_k=\mathbb P(X>k)$, then $q_0\ge q_1\ge q_2 \ge \cdots$. We just need to prove that $q_0\ge (q_0+q_1+q_2+\cdots)^2 / (q_0+3q_1+5q_2+\cdots)$, which is to say, $(q_0+q_1+q_2+\cdots)^2 \le q_0(q_0+3q_1+5q_2+\cdots)$. The two sides both have limits, so this just requires some accounting. On the left hand side, $(q_0+q_1+q_2+\cdots)^2$ expands to $q_0^2+(q_1^2+2q_0q_1)+(q_2^2+2q_0q_2+2q_1q_2)+\cdots = Q_0+Q_1+Q_2+\cdots$, where $Q_k \triangleq q_k^2 + 2 \sum_{i=0}^{k-1} q_iq_k \le (2k+1)q_0q_k \triangleq R_k$. But $R_0+R_1+R_2+\cdots$ is exactly the right hand side. So the left hand sum is dominated by the right hand sum. $\blacksquare$

With some real analysis, this proof could be made to work for random variables that are not discrete, but it might also turn into a special case of Proof 1. In any case, it’s interesting in its own right.

problem of strings

admin — Wed, 07 Mar 2012 05:19:49 +0000

This is a problem via fakalin.

You have 10 pieces of string, each with two ends. You randomly pick two ends of string (possibly from the same string, possibly from different ones) and tie them together, creating either a longer piece of string or a loop. You keep doing this until you run out of free ends.

What is the expected number of loops you end up with?

Things to note are the following:

- Once a loop is made from a string, it is removed from further consideration.
- Picking two ends from the same string immediately makes a loop.
- Picking two ends from different strings makes a longer string.

So in the end, no matter which two ends are picked, we have one fewer open string than we started with.

Let $f(n)$ be the expected number of loops with $n$ open strings. We know $f(1)=1$. With $n$ strings, the probability of picking two ends from the same string is $n/\binom{2n}{2}$. So:

$f(n) = n/\binom{2n}{2} (f(n-1) + 1) + (1 – n/\binom{2n}{2}) f(n-1)$
$= f(n-1) + n/\binom{2n}{2} = f(n-1) + n / [2n (2n - 1) / 2] = f(n-1) + 1 / (2n-1)$

That is, $f(n) = \sum_{i=1}^n 1/(2i-1)$, $f(10) = 1841/863$.

For large $n$, this sum does not converge, in fact it is obvious that $f(n)$ grows like $\log n$.

tensors

admin — Sun, 16 Oct 2011 12:33:29 +0000

This has been a confusing topic, with half a dozen Wikipedia pages on the subject. Here I took some notes.

Tensors are sums of “products” of vectors. There are different kinds of vector products. The one used to build tensors is, naturally, the tensor product. In the Cartesian product of vector spaces $V\times W$, the set elements are tuples like $(v,w)$ where $v\in V, w\in W$. A tensor product $v\otimes w$ is obtained by tupling the component bases rather than the component elements. If $V$ has basis $\{e_i\}_{i\in\{1,…,M\}}$ and $W$ has basis $\{f_j\}_{j\in\{1,…,N\}}$, then take $\{(e_i,f_j)\}_{i\in\{1,…,M\},j\in\{1,…,N\}}$ as the basis of the tensor product space $V\otimes W$. Then define the tensor product $v\otimes w$ as

(1) $\sum_{i,j} v_i w_j (e_i,f_j) \in V\otimes W$,

if $v=\sum_i v_i e_i$ and $w=\sum_j w_j f_j$. The entire tensor product space $V\otimes W$ is defined as sums of these tensor products

(2) $\{\sum_k v_k\otimes w_k | v_k\in V, w_k\in W\}$.

So tensors in a given basis can be represented as multidimensional arrays.

$V\otimes W$ is also a vector space, with $MN$ basis dimensions (c.f. $V\times W$ with $M+N$ basis dimensions). But additionally, it has internal multilinear structure due to the fact that it is made of component vector spaces, namely:

$(v_1+v_2)\otimes w = v_1\otimes w + v_2\otimes w$
$v\otimes (w_1+w_2) = v\otimes w_1 + v\otimes w_2$
$\alpha (v\otimes w) = (\alpha v)\otimes w = v\otimes (\alpha w)$

Higher-order (n-th order) tensor products $v_1\otimes v_2\otimes \cdots \otimes v_n$ are obtained by chaining in the obvious way, likewise for higher-order tensor product spaces $V_1\otimes V_2\otimes \cdots \otimes V_n$. With this, concatenation of tensors are also defined, i.e. $S_{i_1,…i_m} \in V_1\otimes \cdots \otimes V_m$ and $T_{i_{m+1},…,i_n} \in V_{m+1}\otimes \cdots \otimes V_n$, then $S_{i_1,…,i_m}\otimes T_{i_{m+1},…,i_n} = Z_{i_1,…,i_n} \in V_1\otimes \cdots \otimes V_n$. In other words, the indices are appended. This is essentially the Kronecker product, which generalizes the outer product.

However, usually when tensors are mentioned, the tensor product spaces under discussion are already specialized to those generated from a single base vector space $V$ and its dual space $V^*$, rather than from a collection of arbitrary vector spaces. In such a space $P(m,n) = \overbrace{V\otimes \cdots \otimes V}^{m} \otimes \overbrace{V^*\otimes \cdots \otimes V^*}^{n}$, the component spaces (and their bases, indices, etc.) naturally belong to two groups, those from $V$ are called contravariant, those from $V^*$ are called covariant, and an (m,n)-tensor from $P(m,n)$ is written $T^{i_1,…,i_m}_{j_1,…,j_n}$.

This specialization allows the contraction of tensors to be defined. A contraction basically chooses one covariant vector component and one contravariant vector component from a tensor and applies the former as a functional on the latter, e.g., contracting $T^{i_1,…,i_m}_{j_1,…,j_n} = v_{i_1}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_1} \otimes \cdots \otimes v^*_{j_n}$ on the pair of indices $i_1$ and $j_1$ gives $Z^{i_2,…,i_m}_{j_2,…,j_n} = v^*_{j_1}(v_{i_1}) (v_{i_2}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_2} \otimes \cdots \otimes v^*_{j_n})$. $v^*_{j_1}(v_{i_1})$ of course is an inner product that sums over the dimensions of the paired components. Contraction generalizes the trace operator. Combined with concatenation, this defines a tensor multiplication, such that if $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\in P(m,n)$ and $T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\in P(p,q)$, then $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}T^{s,k_2,…,k_p}_{r,l_2,…,l_q}$ is the contraction of $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\otimes T^{s,k_2,…,k_p}_{r,l_2,…,l_q}$ on all common indices that can be paired, e.g. $r,s$. This is the so-called Einstein notation, and generalizes matrix multiplication.

The distinction of $V$ vs. $V^*$ also manifests in the change-of-basis rules for tensors, which inherit from the change-of-basis rules of the component vector spaces, which are:

contravariant change-of-basis rule: If $B = [b_1\ b_2\ \cdots\ b_M]$ is a change-of-basis matrix, with the new basis $\{b_i\}_{i\in \{1,…,M\}}$ written in the old basis as columns, then for a vector written in the old basis $v\in V$ and the same vector written in the new basis $\tilde{v}\in V$, $v = B\tilde{v}$. Therefore, we have
(3) $v \mapsto \tilde{v} = B^{-1}v$.
covariant change-of-basis rule: If additionally $a^T\in V^*$ is a functional written in the old basis and $\tilde{a}^T\in V^*$ is the same functional written in the new basis, then $\forall v\in V: a^T v = \tilde{a}^T \tilde{v} = \tilde{a}^T B^{-1}v$. Therefore, we have
(4) $a^T \mapsto \tilde{a}^T = a^T B$.

Combining (1), it can be shown that, for a change-of-basis tensor $B = {B^{-1}}^{i_1}_{i_1}\cdots {B^{-1}}^{i_m}_{i_m}B^{j_1}_{j_1}\cdots B^{j_n}_{j_n}$, an (m,n)-tensor $T^{i_1,…,i_m}_{j_1,…,j_n}$ has the change-of-basis rule $T^{i_1,…,i_m}_{j_1,…,j_n} \mapsto BT^{i_1,…,i_m}_{j_1,…,j_n}$.

Okay, so what’s the point of these tensors? Basically, an (m,n)-tensor $T^{i_1,…,i_m}_{j_1,…,j_n}\in P(m,n)$ represents a multilinear input-output relationship that takes $n$ vectors as input and produces $m$ vectors as output. If used “canonically” on an input $X^{j_1,…,j_n}\in P(n,0)$, you get $T^{i_1,…,i_m}_{j_1,…,j_n}X^{j_1,…,j_n} = Y^{i_1,…,i_m}\in P(m,0)$ as output. The contravariant input gets contracted with the covariant parts of the transformation tensor, and these drive the contravariant parts of the transformation tensor to produce the contravariant output.

(System diagrams: Rank is the minimal number of terms in a tensor. On the left, a rank-1 tensor transformation; on the right, a rank-$K$ one. )

An example is linear transformations, which are (1,1)-tensors (1 vector in, 1 vector out). In array representation these would just be matrices. Any rank-$K$ linear transformation $T^v_a$ is decomposable into a $K$-term tensor $\sum_{k=1}^K v_k\otimes a_k$, but 1-term (1,1)-tensors are outer products, so this is the matrix $\sum_{k=1}^K v_k a^T_k$, and $Y^v = T^v_a X^a$ is just $y = \sum_k v_k (a^T_k x)$.

Most other “multilinear” operations on vectors (inner product, cross product, wedge product, determinant) can be written as tensors. For example, the inner product operation (2 vectors in, “0″ vectors out, i.e. scalar) is the (0,2) $N$-term Kronecker tensor $\delta_{i_1 i_2}=\sum_{k=1}^N e^*_k\otimes e^*_k$ where $\{e^*_k\}_{k\in\{1,…,N\}}$ are the standard basis of $V^*$.

t-mobile prepaid optimization

admin — Sat, 28 May 2011 02:50:52 +0000

t-Mobile has these tiered refill cards for their prepaid mobile phones. The pricing table is here and reproduced below:

$10 for 30 minutes, expires in 90 days
$25 for 130 minutes, expires in 90 days
$40 for 208 minutes, expires in 90 days
$50 for 400 minutes, expires in 90 days
$100 for 1000 minutes, expires in 365 days

So which card should you buy? You could calculate a per minute cost and conclude that $100 for 1000 minutes is the most economical (plus it doesn’t expire for the longest time). Wrong!

It depends on how much you use the phone. The fact that the minutes expire makes the prepaid plan a virtual monthly plan in the regime where you do not use 1000 or more minutes per year, which is highly likely for people who choose prepaid phones to begin with (e.g. temporary visitors, odd occasions, emergencies, etc.). The constraint in that case is the expiration, not the number of minutes. If you blindly purchased $100 refills one after another, you’d have more and more unused minutes piling up. Sure, you could still use them, but even at $0.10/min. it is expensive compared to a straight monthly plan if you really mean to call that much. Of course you don’t, so now what?

The trick is time-sharing. (Never thought this phrase would pop up in this context.) Let’s re-write the table in terms of how much you get for $1, both minutes of call, and days of non-expiry:

$10: 3 min., 9 days / $1
$25: 5.2 min., 3.6 days / $1
$40: 5.2 min., 2.25 days / $1
$50: 8 min., 1.8 days / $1
$100: 10 min., 3.65 days / $1

We see that the $25, $40, and $50 refills are good for nothing! Why would anyone buy those? A rational person should only buy the $10 and $100 refills in some combination: $10 for when the account is about to expire but there are plenty of minutes, and $100 for when running low on minutes. The “proof” is as follows:

We really care about paying the lowest per minute cost for the minutes actually used. To that end, if we divide the purchase between the $N$ refill options by the weights $w_i$, $i=1,…,N$, and every $1 of the $i$th refill option pays for $m_i$ minutes and $d_i$ days, then, we want

maximize $\sum_{i=1}^N w_i d_i$ (equivalently, maximize $\sum_{i=1}^N w_i m_i$)
subject to $\sum_{i=1}^N w_i m_i / \sum_{i=1}^N w_i d_i = L$
and $\sum_{i=1}^N w_i=1$

where $L$ is the minutes per day that we know we use. We don’t even need to solve this explicitly. The plot shows that every point in the pentagonal region below the red line is achievable with $1, and for any given constraint $L$, the outer boundary on the red line itself solves the optimization (i.e. is the most economical), and this is done by using only the $10 and $100 refills. Here we assumed infinitely divisible refills. By using the heuristic of when to buy which refill above though, we tend toward the average $L$ by construction so we are always at the right operating point.

The same analysis can be carried over to the “gold rewards” tier, which you get when you purchase the $100 refill and keep the account from expiring year after year (this is what you should do anyway, so even better). The new plot is different but the conclusion is the same, though the $50 refill looks competitive this time.

(For reference, the monthly cost of such a “virtual monthly plan” ranges from an incredible $0.82/mo. for 3 min./mo. — keeping the account active, basically — to $8.22/mo. for 82 min./mo. For more than 82 min./mo., the cost goes up at a rate of $0.10/min. of course. Unfortunately, you cannot buy “negative” refills, otherwise you could do better even in that regime.)

different kind of coupon collector’s problem

admin — Mon, 11 Oct 2010 09:18:39 +0000

The classic coupon collector’s problem asks for the number of samples it takes to collect all coupon types from a sequence of coupons in which each of the $k$ types of coupons has an unlimited number of copies.

Here is a different kind of problem: if there are limited copies of each of the $k$ coupon types (say, $d$ copies), how many samples does it take to collect $t$ coupon types?

By the pigeonhole principle, we have two extremes: from $n=0$ to $n=t-1$ samples, it is impossible to collect $t$ coupon types; and for $n>(t-1)d$ samples, it is impossible not to. For number of samples in between, there is some monotonically decreasing probability that $t$ or more types are not collected. Let’s call the latter probability $P(n)$. The expected number of samples it takes to collect $t$ types is then

$\sum_{n=1}^{(t-1)d} P(n)$

Finding $P(n)$ for $n\ge t$ doesn’t appear easy, though the combinatorics could be written down (in theory), maybe involving partition functions, Stirling numbers, or whatever. But the question is reduced to the following, which somebody must have solved: given a $d \times k$ 0-1 matrix, in which $n$ 1′s are placed randomly, what is the probability that exactly $z$ columns are 0-weight (have 0 column sums)?

These papers give some approximations:

http://arxiv.org/PS_cache/arxiv/pdf/0806/0806.1480v3.pdf

http://cs.anu.edu.au/~bdm/papers/irregular.pdf

Does anybody know the actual answer?

red-blue cross problem

admin — Sat, 11 Apr 2009 06:10:47 +0000

Here is a problem described to me by fakalin. Given n red points and n blue points, no three of which are collinear, prove that there exists a pairing of red and blue points such that the line segments connecting each pair do not intersect.

The solution is straightforward though it took a while to identify after it was already staring me in the face.

With each pairing is associated a total length (sum of line segment lengths). Since there are a finite number of pairings, there is a minimum length pairing. We claim this pairing must be one that satisfies the problem statement.

Suppose it were not, then there are some 2 red and 2 blue points such that the pairs’ connecting line segments cross. Then uncross the two pairs. This can be done because the points are not collinear. By uncrossing, the sum of the pairs’ segment lengths strictly decreases and therefore the total length also decreases. This contradicts the supposition. Therefore the claim is true.

However, note that a pairing that satisfies the statement need not be minimum total length. (The minimum total length doesn’t even need to be unique.) Nevertheless, an algorithm for reaching a solution is to start with any pairing, then identify any crossed pairs and uncross them. During this process, crossed pairs count may even increase for a time, but total length always decreases strictly at each step. Therefore, the algorithm will terminate, either when a valid pairing is reached, or when the minimum total length is reached, which also gives a valid pairing.

An extended question is, given the valid pairing, can a non-self-intersecting red-blue alternating path connecting all the points be found? I believe the answer is yes.

Paradox of the risk premium

admin — Wed, 25 Jun 2008 03:44:43 +0000

I mentioned this to an officemate years ago once, but I never was quite satisfied with the explanation that there is a certain amount of excess return built into the price of a company’s stock as its risk premium, but which can be diversified away through some portfolio of disparate stocks.

If, merely by holding a diversified portfolio, the aggregate risk can be reduced, then one would expect the return demanded for such a portfolio to be less than the sum of the returns of the individual stocks. Yet this is not the case. So we must assume either the portfolio is overperforming, the individual stocks are underperforming, or both. On the other hand, it would seem that the risk premium of any individual stock would be arbitraged away by people holding the most diversified portfolios containing it, thus it is strange that an individual stock could retain an undiversified risk premium. Thus it must not, at least not fully. Its return (and the price it sells for) is also determined by the availability for sale of other not-fully-correlated stocks and their characteristics, even ones that have no material effect on the company’s performance. This is a sure sign of value arising from the demand for the instrument unrelated to its underlying. It would seem to be mispriced as an individual stock. Perhaps an equilibrium will be struck, but it is still paradoxical to talk about the risk premium of individual stocks as a property of that stock alone.