Some stuff » math

die throwing problem

admin — Thu, 14 Sep 2017 03:21:41 +0000

Here’s a link to a subtle probability problem.

You throw a die until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers?

The “obvious” answer is incorrect.

The correct answer can be brute-forced by computing probabilities of this sort:
If $s$ is a sequence of length $n$, then
$P_n\triangleq$ = P($s$ ends in 6 and $s$ all even) = $(2/6)^{n-1} (1/6)$
(The total probability over all $n$ is therefore $Z=(1/6)/(4/6)=1/4$.)
The expected length of $s$ is $\sum_{n=1}^\infty n (P_n / Z)$
$= \sum_{n=1}^\infty n (2/6)^{n-1} (1/6) / (1/4) = \sum_{n=1}^\infty n (2/6)^{n-1} (2/3) = 3/2$

This is interesting since premature conditioning on the 3 even sides of a die in every roll produces an (incorrect) expected length of 3.

There is already an elegant and convincing derivation of the correct answer attributed to Paul Cuff if you follow the links from the original article, but here’s an intuitive explanation for why the incorrect answer overestimates the expected length in such a way –

Notice that in $P_n$ the probability $(1/6)$ of obtaining the final 6 has no effect on the expected length. The expected length depends entirely on the shape of $P_n$ over $n$. The calculation for the “incorrect” reasoning would have made the sum $\sum_{n=1}^\infty n (2/3)^{n-1} (1/3) = 3$, and the difference is in the $(2/3)^{n-1}$ vs. $(2/6)^{n-1}$ term. So the existence of 1, 3, and 5 matter; they make longer prefixes of 2 and 4 even less likely, as more of the longer sequences that would have been valid ends up containing a 1, 3, or 5 instead and getting thrown away.

Note: There is also a good discussion on Quora.

bounding overlaps

admin — Mon, 21 Jan 2013 06:35:45 +0000

A Venn diagram gives a schematic view of joint counts on a set of $n$ categories, e.g. $c(S_1^n=s_1^n)$ where $s_i\in\{0,1\}$. Each “patch” of the diagram corresponds to one of $2^n$ possible values of $s_1^n$.

If we have the total count $C\triangleq \sum_{s_j: j\in \{1,…,n\}} c(S_1^n=s_1^n)$, then we can take the counts as probabilities by normalizing with $p(S_1^n=s_1^n)=c(S_1^n=s_1^n)/C$.

Suppose we are given only singleton marginals $p(S_i=1)\triangleq \sum_{s_j: j\in \{1,…,n\}\backslash i} p(S_1^n=s_1^n)$, then we can bound the other probabilities by imposing universal constraints on probabilities to be between 0 and 1.

To get the bounds, we solve a linear programming problem. The standard form is $\min_x f’x \ \ \mathrm{s.t.}\ Ax \leq b$. Here, $f$ is a length-$2^n$ vector that indicates the particular marginal probability we want to bound, by selecting among the $s_1^n$-sequences. $A$ is $[I_{2^n\times 2^n}; B; \mathbf{1}_{1\times 2^n}]$ where the columns correspond to all possible $s_1^n$-sequences, and where the rows of $B$ are the $n$ half-1 masks indicating the $n$ singleton marginals. Finally, let $u=[\mathbf{1}_{2^n\times1}; p(S_1=s_1);...;p(S_n=s_n); 1]$ and $l=[\mathbf{0}_{2^n\times1}; p(S_1=s_1);...;p(S_n=s_n); 1]$.

Hence to get the lower bound on the desired marginal, we solve $\min_x f’ x \ \ \mathrm{s.t.}\ A x \leq u\ \mathrm{and}\ -A x \leq -l$. Likewise, for the upper bound, $-\min_x -f’ x \ \ \mathrm{s.t.}\ A x \leq u\ \mathrm{and}\ -A x \leq -l$.

For example, $n=2$, $p(S_1=1)=0.6$, $p(S_2=1)=0.5$, then we can bound $p(S_1=1,S_2=1)$ by $0.1\le p(S_1=1,S_2=1)\le 0.5$.

Of course, we can add additional constraints when we know them. In some cases with categories, we have that $p(S_1^n=0)=0$, that is, everything must belong to at least one category. Then it would appear the doublet marginal $p(S_i=1,S_j=1)$ satisfies
$\max \{0, p(S_i=1)+p(S_j=1)-1\}\le p(S_i=1,S_j=1)$
$\le \min \{(\sum_{i=1}^n p(S_i=1))-1, p(S_i=1), p(S_j=1)\}$.

extrinsic bias in the prediction market

admin — Wed, 31 Oct 2012 05:55:54 +0000

People have proposed using price signals from prediction markets to estimate the odds of certain events. On Intrade right now, you can buy contracts for the two outcomes of the 2012 US Presidential Election. Each contract expires at $10 if the event occurs or $0 if it doesn’t. For example, “Barack Obama wins” contracts are $6.33 a pop right now, while “Mitt Romney wins” contracts go for $3.65. On the page, these are taken directly as probabilities, because it is assumed that the gamble is zero-sum.

Specifically, if $p$ and $\bar{p}=1-p$ are respectively the probabilities of two complementary events, and $a$ and $b$ are respectively the prices of contracts on them, which can be bought and sold freely, then no-arbitrage imposes that $-a-b+10 = 0$ and statistical no-arbitrage imposes $-\bar{p}a +p(10-a) = 0$ and $-pb +\bar{p}(10-b) = 0$. Solving indeed gives the prices $a=10p$ and $b=10\bar{p}$.

However, this isn’t the end of the story.

The prediction market isn’t a closed system. Event outcomes are correlated with other payoffs outside of it. For instance, the election outcome has personal income tax consequences for certain individuals. While playing in the prediction market has no expected gain or loss, its contracts can diversify just such an external payoff to reduce its variance.

Suppose the total tax exposure of an Obama presidency is $T_o$ and of a Romney presidency $T_r$, and the probabilities of the two winning are respectively $p$ and $\bar{p}$, then the expected payoff is $-pT_o -\bar{p}T_r$ while the variance is $p\bar{p}(T_o-T_r)^2$.

Without loss of generality, assume $T_o > T_r$. Then we can reduce the variance of the payoff by buying “Obama wins” contracts at normalized price $q=a/10$. Let’s say we buy an amount worth $N$ if expired in the money. The payoff becomes $-T_o +(1-q)N$ with probability $p$ and $-T_r -qN$ with probability $\bar{p}$. If the contracts are priced for no arbitrage as before ($q=p$), the expected payoff is $-pT_o +p\bar{p}N -\bar{p}T_r -\bar{p}pN = -pT_o -\bar{p}T_r$ as before. However, the variance is $p\bar{p}(T_o-T_r-N)^2$, which is a decrease for any $N\in (0,2(T_o-T_r))$, with the aggregate (i.e. hedged) payoff becoming completely deterministic for $N=T_o-T_r$. This is the point of maximal utility gain for a risk-averse hedger. One ends up “pre-paying” a portion of the potential additional tax burden in exchange for immunity from the election outcome.

The fact that hedgers exist and are biased in one direction means that the normalized price of a contract may no longer be exactly the probability of its expiring in the money. The imbalance in the market caused by risk aversion should create precisely an insurance premium to be added to the price of “Obama wins” contracts. Of course, the reality is more complicated, since not all individuals have homogeneous tax burdens under the outcomes. If the number of risk-averse hedgers is small, then the no-arbitrage assumption may still approximately hold.

compactness

admin — Thu, 27 Sep 2012 04:10:52 +0000

I swear the concept of compactness was invented to remedy the shortcomings of closedness. Compact sets are closed (in Hausdorff spaces and therefore metric spaces), so compactness is stricter than closedness. It evidently patches some feebleness in the definition of closedness to make it more useful.

Closedness of a set in a metric space (“includes all limit points”), by the sound of it, really wants to be something akin to “has solid boundaries.” But it isn’t. The problem is that the existence of limit points depends on the embedding space. If the embedding space lacks those limit points, then a set in it can be technically closed even though it isn’t really “like” other closed sets. For example, the set $\mathbb R$ in space $(\mathbb R, d_{\text{Eucl.}})$ is closed, because the space has no point called $\infty$.

The first stab at “has solid boundaries” was to get rid of infinities by adding boundedness (“can be covered by some ball”) to the condition. Now, the set $\mathbb R$ in space $(\mathbb R, d_{\text{Eucl.}})$ is closed yet not bounded, so it doesn’t have solid boundaries. Excellent. Upon further inspection, however, it doesn’t really address the real issue. For instance, set $\mathbb Q \cap [0,1]$ in space $(\mathbb Q, d_{\text{Eucl.}})$ is closed and bounded, but it’s still porous at all the irrational numbers. It doesn’t have solid boundaries.

The second stab then was to replace closedness with completeness (“contains all Cauchy limits”). Completeness gets rid of the dependency on the embedding space and uses points in the set itself to define limits. This along with boundedness takes care of the above two examples. But there is still a deeply unsatisfying outcome: completeness and boundedness both depend on the metric and therefore are affected by rescaling. First let’s rewrite the boundedness condition as equivalently “can be covered by a finite number of $r$-balls for a fixed $r$“. The set $\mathbb Z$ in space $(\mathbb Z, d_{\text{Disc.}}^1)$ where $d_{\text{Disc.}}^k(x,y)=k$ if $x\neq y$ is complete and bounded, if we choose $r>1$, yet under a rescaling, say by replacing $d_{\text{Disc.}}^1$ with $d_{\text{Disc.}}^{r+\epsilon}$, the set is no longer bounded.

So finally, a third stab was to require sets to be bounded at once for all rescaling, by replacing boundedness with total boundedness (“can be covered by a finite collection of $r$-balls for all $r$“).

This last definition, complete and totally bounded, turns out to be equivalent to compactness (“every open cover has a finite subcover”) in metric spaces. And it shows. We’ve managed to drop all the stuff dependent on the embedding space and the metric scaling. All that is left are the topological properties of the set under open balls generated by the metric.

In some sense, what we get out of compactness (and really wanted from the outset) is the notion that a set is well contained regardless of scaling. This is what it means to have solid boundaries. It has to be a topological property of the set itself. More interestingly, because we can scale the set however we want (provided the scaling is non-singular), we can turn infinitesimals into infinities and vice versa. Small-scale infinitesimals (limiting sequence) and large-scale infinities (unbounded sequence) are really the same thing. For instance, the set $\{1/n\}_{n\in \mathbb Z}$ under the metric $d_{\text{Eucl.}}$ has an unincluded Cauchy limit at 0 and is thus not complete, but if we rescale the underlying space by $1/x$ at $x$, then it is basically $\mathbb Z$ under the metric $d_{\text{Disc.}}^1$, with no Cauchy limits to include and is complete. Similarly the first set is totally bounded while the second is not. This scale-dependence is awkward. Apparently, completeness and total boundedness take care of the small-scale and the large-scale separately, but in the end, they are all infinities and should be treated the same. The true distinction of consequence is between finitude (solid boundaries, well contained) and infinitude (no solid boundaries, ill contained), a distinction which compactness identifies.

One of the immediately intuitive notions coming from this understanding of compactness is the theorem that states “the image of a compact set under continuous mapping is also compact” (just a rescaling after all). We can regurgitate this as, a set with solid boundaries under continuous rescaling of the space still has solid boundaries. Other nice properties about limits being attained on compact sets, and about functions on compact sets, follow from the same intuition of their having solid boundaries. It also makes sense why the intersection of a compact set with a closed set is compact (well containment only needs to be ensured once), and therefore the intersection of any number of compact sets is compact. In some sense, compact sets are a much more useful dual to open sets, than closed sets — even under their topological definition (“complement is open”) — ever were.

tensors

admin — Sun, 16 Oct 2011 12:33:29 +0000

This has been a confusing topic, with half a dozen Wikipedia pages on the subject. Here I took some notes.

Tensors are sums of “products” of vectors. There are different kinds of vector products. The one used to build tensors is, naturally, the tensor product. In the Cartesian product of vector spaces $V\times W$, the set elements are tuples like $(v,w)$ where $v\in V, w\in W$. A tensor product $v\otimes w$ is obtained by tupling the component bases rather than the component elements. If $V$ has basis $\{e_i\}_{i\in\{1,…,M\}}$ and $W$ has basis $\{f_j\}_{j\in\{1,…,N\}}$, then take $\{(e_i,f_j)\}_{i\in\{1,…,M\},j\in\{1,…,N\}}$ as the basis of the tensor product space $V\otimes W$. Then define the tensor product $v\otimes w$ as

(1) $\sum_{i,j} v_i w_j (e_i,f_j) \in V\otimes W$,

if $v=\sum_i v_i e_i$ and $w=\sum_j w_j f_j$. The entire tensor product space $V\otimes W$ is defined as sums of these tensor products

(2) $\{\sum_k v_k\otimes w_k | v_k\in V, w_k\in W\}$.

So tensors in a given basis can be represented as multidimensional arrays.

$V\otimes W$ is also a vector space, with $MN$ basis dimensions (c.f. $V\times W$ with $M+N$ basis dimensions). But additionally, it has internal multilinear structure due to the fact that it is made of component vector spaces, namely:

$(v_1+v_2)\otimes w = v_1\otimes w + v_2\otimes w$
$v\otimes (w_1+w_2) = v\otimes w_1 + v\otimes w_2$
$\alpha (v\otimes w) = (\alpha v)\otimes w = v\otimes (\alpha w)$

Higher-order (n-th order) tensor products $v_1\otimes v_2\otimes \cdots \otimes v_n$ are obtained by chaining in the obvious way, likewise for higher-order tensor product spaces $V_1\otimes V_2\otimes \cdots \otimes V_n$. With this, concatenation of tensors are also defined, i.e. $S_{i_1,…i_m} \in V_1\otimes \cdots \otimes V_m$ and $T_{i_{m+1},…,i_n} \in V_{m+1}\otimes \cdots \otimes V_n$, then $S_{i_1,…,i_m}\otimes T_{i_{m+1},…,i_n} = Z_{i_1,…,i_n} \in V_1\otimes \cdots \otimes V_n$. In other words, the indices are appended. This is essentially the Kronecker product, which generalizes the outer product.

However, usually when tensors are mentioned, the tensor product spaces under discussion are already specialized to those generated from a single base vector space $V$ and its dual space $V^*$, rather than from a collection of arbitrary vector spaces. In such a space $P(m,n) = \overbrace{V\otimes \cdots \otimes V}^{m} \otimes \overbrace{V^*\otimes \cdots \otimes V^*}^{n}$, the component spaces (and their bases, indices, etc.) naturally belong to two groups, those from $V$ are called contravariant, those from $V^*$ are called covariant, and an (m,n)-tensor from $P(m,n)$ is written $T^{i_1,…,i_m}_{j_1,…,j_n}$.

This specialization allows the contraction of tensors to be defined. A contraction basically chooses one covariant vector component and one contravariant vector component from a tensor and applies the former as a functional on the latter, e.g., contracting $T^{i_1,…,i_m}_{j_1,…,j_n} = v_{i_1}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_1} \otimes \cdots \otimes v^*_{j_n}$ on the pair of indices $i_1$ and $j_1$ gives $Z^{i_2,…,i_m}_{j_2,…,j_n} = v^*_{j_1}(v_{i_1}) (v_{i_2}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_2} \otimes \cdots \otimes v^*_{j_n})$. $v^*_{j_1}(v_{i_1})$ of course is an inner product that sums over the dimensions of the paired components. Contraction generalizes the trace operator. Combined with concatenation, this defines a tensor multiplication, such that if $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\in P(m,n)$ and $T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\in P(p,q)$, then $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}T^{s,k_2,…,k_p}_{r,l_2,…,l_q}$ is the contraction of $S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\otimes T^{s,k_2,…,k_p}_{r,l_2,…,l_q}$ on all common indices that can be paired, e.g. $r,s$. This is the so-called Einstein notation, and generalizes matrix multiplication.

The distinction of $V$ vs. $V^*$ also manifests in the change-of-basis rules for tensors, which inherit from the change-of-basis rules of the component vector spaces, which are:

contravariant change-of-basis rule: If $B = [b_1\ b_2\ \cdots\ b_M]$ is a change-of-basis matrix, with the new basis $\{b_i\}_{i\in \{1,…,M\}}$ written in the old basis as columns, then for a vector written in the old basis $v\in V$ and the same vector written in the new basis $\tilde{v}\in V$, $v = B\tilde{v}$. Therefore, we have
(3) $v \mapsto \tilde{v} = B^{-1}v$.
covariant change-of-basis rule: If additionally $a^T\in V^*$ is a functional written in the old basis and $\tilde{a}^T\in V^*$ is the same functional written in the new basis, then $\forall v\in V: a^T v = \tilde{a}^T \tilde{v} = \tilde{a}^T B^{-1}v$. Therefore, we have
(4) $a^T \mapsto \tilde{a}^T = a^T B$.

Combining (1), it can be shown that, for a change-of-basis tensor $B = {B^{-1}}^{i_1}_{i_1}\cdots {B^{-1}}^{i_m}_{i_m}B^{j_1}_{j_1}\cdots B^{j_n}_{j_n}$, an (m,n)-tensor $T^{i_1,…,i_m}_{j_1,…,j_n}$ has the change-of-basis rule $T^{i_1,…,i_m}_{j_1,…,j_n} \mapsto BT^{i_1,…,i_m}_{j_1,…,j_n}$.

Okay, so what’s the point of these tensors? Basically, an (m,n)-tensor $T^{i_1,…,i_m}_{j_1,…,j_n}\in P(m,n)$ represents a multilinear input-output relationship that takes $n$ vectors as input and produces $m$ vectors as output. If used “canonically” on an input $X^{j_1,…,j_n}\in P(n,0)$, you get $T^{i_1,…,i_m}_{j_1,…,j_n}X^{j_1,…,j_n} = Y^{i_1,…,i_m}\in P(m,0)$ as output. The contravariant input gets contracted with the covariant parts of the transformation tensor, and these drive the contravariant parts of the transformation tensor to produce the contravariant output.

(System diagrams: Rank is the minimal number of terms in a tensor. On the left, a rank-1 tensor transformation; on the right, a rank-$K$ one. )

An example is linear transformations, which are (1,1)-tensors (1 vector in, 1 vector out). In array representation these would just be matrices. Any rank-$K$ linear transformation $T^v_a$ is decomposable into a $K$-term tensor $\sum_{k=1}^K v_k\otimes a_k$, but 1-term (1,1)-tensors are outer products, so this is the matrix $\sum_{k=1}^K v_k a^T_k$, and $Y^v = T^v_a X^a$ is just $y = \sum_k v_k (a^T_k x)$.

Most other “multilinear” operations on vectors (inner product, cross product, wedge product, determinant) can be written as tensors. For example, the inner product operation (2 vectors in, “0″ vectors out, i.e. scalar) is the (0,2) $N$-term Kronecker tensor $\delta_{i_1 i_2}=\sum_{k=1}^N e^*_k\otimes e^*_k$ where $\{e^*_k\}_{k\in\{1,…,N\}}$ are the standard basis of $V^*$.

a card problem

admin — Sun, 26 Sep 2010 05:33:15 +0000

Here is a problem quoted from fakalin. A full deck of cards has 52 cards. Suppose 10 of them were face up and 42 were face down. You are in a dark room holding the deck. How do you rearrange the deck into two subdecks so that they have the same number of cards facing up?

The cards can be flipped. So first, the specific numbers don’t matter. They don’t even have to be even numbers. The answer is simple.

Say there are $n$ cards up in the full deck. If you just divide the deck without flipping cards, then however you divide, one subdeck will have $m\in [0,n]$ cards up, and the other subdeck will have the complement $n-m$ cards up. So take $n$ cards out of the original deck as subdeck A, and flip them over. If this subdeck A had $m$ cards up originally, now it has $n-m$ cards up. In the remaining cards forming subdeck B, there were already $n-m$ cards up. So the two subdecks have an equal number of cards up.

road path problem

admin — Fri, 19 Feb 2010 21:45:17 +0000

Suppose there is a straight road, infinitely long at both ends, located 1 unit from your starting location. Find the most efficient path to reach the road, and the worst-case total length of this path.

The trivial but wrong way is to go for 1 unit in some direction, then trace the circumference of a unit-radius circle. The road will surely be found this way, but the path length is $1+2\pi$, which can be improved upon.

Every possible position for the road corresponds to a tangent line to the unit circle centered at the starting location. Therefore, this is a problem about finding a shortest path that touches all tangent lines to a unit circle. To make this more concrete, let us transform the space into polar form as below.

We see that in polar form, the unit circle becomes the line $r=1$ and each tangent line touching the circle at angle $\theta_0$ becomes the graph $r=\sec(\theta-\theta_0)$. The trivial solution corresponds to a path that traces $r=0$ to $r=1$ at $\theta=0$, followed by $\theta=0$ to $\theta=2\pi$ at $r=1$. This is not efficient.

In fact, any continuous path starting at $r=0$ “reaches the road” (i.e. touches every tangent line) as long as it touches $r=\sec(\theta)$ somewhere on $\theta\in [0,\pi/2)$ and somewhere on $\theta\in (3\pi/2,2\pi]$, and it stays above $r=1$ between those two points. And up to shifts in $\theta$, these are the only possible solutions.* Note that the path doesn’t need to be closed, like the trivial path was. So we are left to find the shortest such path (e.g. the one in green above).

Whereas in rectangular form, the shortest path between two points is a straight line, in polar form, the shortest path between two points is along a secant graph. It is a fairly easy matter to show that the shortest path must take the following form:

1. Start from $(r,\theta)=(0,\phi)$ for some $\phi\in [0,\pi/2)$;
2. Drop onto the graph $r=\sec(\theta-2\phi)$, between the points $(r,\theta)=(\sec(\phi),\phi)$ and $(r,\theta)=(1,2\phi)$;
3. Take the path $r=1$ from $\theta=2\phi$ to some $\theta=2\pi-2\psi$;
4. Drop onto the graph $r=\sec(\theta-(2\pi-2\psi))$, between the points $(r,\theta)=(1,2\pi-2\psi)$ and $(r,\theta)=(\sec(2\pi-\psi),2\pi-\psi)$.

Finally, we need to solve two minimizations, which will find the best $\phi$ and $\psi$. $\psi$ can actually be solved by inspection, but formally:

$\min_{\psi\in[0,\pi/2)} \tan(\psi) + \pi - 2\psi$
solution at $\sec^2{\psi} = 2 \Rightarrow \psi=\pi/4$

$\min_{\phi\in[0,\pi/2)} \sec(\phi) + \tan(\phi) + \pi - 2\phi$
solution at $\frac{\sin(\phi)}{\cos^2(\phi)} + \sec^2(\phi) = 2 \Rightarrow \sin(\phi)=1/2 \Rightarrow \phi=\pi/6$.

The solved path has total length $\tan(\pi/4) + \pi – \pi/2 + \sec(\pi/6) + \tan(\pi/6) + \pi – \pi/3 = 1+\sqrt{3}+7\pi/6$, and is plotted below. This is about 12% shorter than the trivial path.

* Technically, we still need to prove that paths that re-enter the unit circle (go below $r=1$) are not worthy. One can reason about why such paths can be improved upon by replacing the path segments that re-enter the unit circle, but it seems a proof needs some global properties of the path. With the reader comment (below) about reflecting the starting point, the best path actually becomes a function $r(\theta)$, of the variable $\theta$, which almost certainly is a necessary condition for efficient paths. Then we can impose other constraints, like, $\forall \theta_0: r(\theta) \not< \sec(\theta-\theta_0)$, which says the path function should not be dominated by any secant function (equivalent to crossing all tangent lines to the unit circle). This may be a direction to a complete proof.

Windows 7 Math Input Panel!

admin — Tue, 06 Oct 2009 14:44:22 +0000

Somehow the ability to turn handwritten math into MathML escaped my attention as a Windows 7 feature from trying the first beta. Finally! I’ve been waiting for this since forever… Wonder what took so long.

Next up are music notation and graphics in general*. The ultimate goal of a handwriting recognizer is of course similar to that of OCR: to turn one piece of art (for the lack of a better word) rendering to another, text included. Specifically, it should rectify all the rendering to a “typeset” form. It should intelligently recognize a host of objects with its own Visio-like templates: if I draw a resistor, it should pull out a nice schematic rendering of a resistor. If I draw a rectangle, and select “rectify”, it should make a rectangle with straight edges.

It is in some sense harder than OCR: the handwriting can vary a lot; but in other sense easier: the user is by definition actively interacting with the process. Combined with a rational input device (not a mouse, not a pen that is but a mouse emulator**), this can become the realistic next-generation human-computer interface, and tablet will take off as one of the next-generation form factors (as it almost inevitably must take off, probably in a merged tablet/ebook form). I’m not too sure what human-computer interface people are doing these days, but last I checked they were working on fanciful stuff divorced from an actual use. If it were up to me to decide, I’d explore the limits of the pen input device first, simple things often yield greater and surprising results.

* Now this is something I’ve been ranting about for years as one of these “obvious” advances that should have happened but haven’t, but it is also possible that technology has caught up enough to implement these things. Certainly the math input idea isn’t new at all. Somebody wrote an MEng thesis on it 10 years ago and I’m sure he wasn’t the first one. The general input problem is more interesting though.

** A useful pen device would forsake the god-awful mouse model, and move along the lines of some pens nowadays: with a digital eraser, with pressure sensing (these two already exist), but also with a scroll wheel to select menus or choose options, with some feedback either mechanical or like a small display. The latter can be useful, for instance, to show the color of the pen or some other such property. There is no reason why a pen cannot be as useful — indeed, more useful — than a mouse (which has become an uncomfortable and unnatural inner glove). The pen form factor has been tried for thousands of years and its usefulness should not be doubted, disdain for archaism be damned.

An improvised dialogue on Wolfram|Alpha

admin — Sat, 16 May 2009 23:36:50 +0000

[18:01] fakalin: hey
[18:01] fakalin: did you try wolfram alpha
[18:02] me: waht
[18:02] fakalin: what
[18:02] fakalin: jeez you’re out of touch
[18:02] fakalin: http://www.wolframalpha.com/

[18:04] me: what is this
[18:04] me: i don’t get it
[18:04] fakalin: what’s it like living under a rock
[18:04] fakalin: ask it something
[18:04] fakalin: anything
[18:04] me: it’s down
[18:04] fakalin: oh
[18:04] fakalin: lol
[18:04] fakalin: read up on it
[18:06] me: but what’s the idea, these are all pre computed?
[18:06] fakalin: here it comes
[18:06] fakalin: HERE IT COMES
[18:06] fakalin: http://lmgtfy.com/?q=wolfram+alpha
[18:08] me: hmm
[18:08] me: but what does this have to do with the web
[18:09] fakalin: um
[18:09] me: except that the users are “on the web”
[18:09] fakalin: the fact that you can use it using this thing called a ‘web browser’
[18:09] me: that’s weak
[18:09] me: i’m asking this because it is compared to google
[18:09] fakalin: it’s useful for some things i guess
[18:10] me: that’s a weak statement
[18:10] fakalin: you’re weak
[18:10] me: in fact it can’t get waker
[18:10] me: weaker
[18:12] me: “Alpha does not answer natural language queries — you have to ask questions in a particular syntax, or various forms of abbreviated notation.”
[18:12] me: ok
[18:13] me: “and it computes answers, it doesn’t merely look them up in a big database.”
[18:13] me: i don’t see a clear distinction
[18:13] me: even SQL can compute
[18:13] fakalin: it’s basically a bunch of expert systems
[18:13] fakalin: lol
[18:13] fakalin: so your argument is ‘even sql can do stuff’
[18:14] me: i’m quoting an article
[18:15] me: not impressed
[18:15] fakalin: yawn
[18:15] me: wolfman is overreaching
[18:15] me: again
[18:15] me: “Stephen showed me many interesting examples — for example, Wolfram Alpha was able to solve novel numeric sequencing problems, calculus problems, and could answer questions about the human genome too.”
[18:16] me: this sounds like mathematica
[18:16] fakalin: no shit sherlock
[18:16] me: but then
[18:16] fakalin: it’s like mathematica on the web with some added modules
[18:16] me: “It was also able to compute answers to questions about many other kinds of topics (cooking, people, economics, etc.). ”
[18:16] me: see, this is too vague\
[18:16] fakalin: you’re too vague
[18:16] me: i guess i’ll have to wait for it to resurrect
[18:16] me: and try myself
[18:16] fakalin: keep trying
[18:16] fakalin: it just came out yesterday
[18:16] fakalin: sometimes it works
[18:17] me: haha
[18:18] me: i’d like to see ONE example where something non trivial in a non-scientific field is “computed”
[18:21] me: ok
[18:22] me: anagrams
[18:22] me: that’s one example i guess
[18:22] fakalin: lol what
[18:22] me: and maybe decoding stuff
[18:22] fakalin: what are you blathering about
[18:22] me: an example of computing a non scientific topic
[18:22] fakalin: who cares about that
[18:22] fakalin: you and your straw men
[18:22] me: lol
[18:22] fakalin: unit conversions nigga
[18:23] me: that’s mat
[18:23] me: math
[18:23] fakalin: lol
[18:23] fakalin: news flash
[18:23] fakalin: computers can do math
[18:23] me: dude, i know that
[18:23] me: that’s mathematica
[18:23] me: i’m trying to figure out how this wolframalpha is new
[18:23] fakalin: simple
[18:23] me: the thing is, if you know the word “anagram”, you can find an anagramming tool
[18:23] fakalin: find a website on the internet like it
[18:24] me: sure, it’s more convenient
[18:24] me: but it’s not a one stop shop
[18:24] me: which google is
[18:24] me: like i said, you can find an “anagramming” tool
[18:24] me: if you want to do anagrams
[18:24] me: same with just about any other “tagged” computation type
[18:25] me: it’d be more interesting if wolfram alpha computes things for which tools haven’t been built ….
[18:25] me: this is a bit problematic
[18:26] me: see, google can work because everybody writes in some language it indexes
[18:26] me: however, on the web, people do not write computational tools solely in mathematica
[18:26] me: if they did, then wolfram alpha can just glom onto all of those
[18:26] me: but as is, they only have wolfram researchers to write stuff
[18:30] fakalin: lol
[18:30] me: you see my point though, right?
[18:31] fakalin: that wolfram alpha just packages things together and offers nothing new?
[18:31] fakalin: could say the same about google
[18:31] fakalin: you just like shitting on new things
[18:31] fakalin: communist hates change
[18:32] me: i’m saying wolfram alpha is like yahoo
[18:32] me: unlike google
[18:33] fakalin: except yahoo sux
[18:33] fakalin: lolzz
[18:33] me: my point is, they depend on experts

Is this true?

admin — Sat, 07 Mar 2009 21:41:39 +0000

So this thing on Wikipedia

http://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

could have left it at the classical statement of the theorem with bullet #1. Then it goes on to say:

2. If a probability of bit error $p_b$ is acceptable, rates up to $R(p_b)$ are achievable, where

$R(p_b) = \frac{C}{1-H_2(p_b)}$.

3. For any $p_b$, rates greater than $R(p_b)$ are not achievable.

I have never seen this before. At first glance, this seems questionable, as Fano’s converse gives $P_e^{(n)} \ge 1 – \frac{1}{nR} – \frac{C}{R}$, which seems to converge to $H_b(p_e) \ge p_e$ for $p_e \in [0,0.5]$. So it must mean whatever is used to code this is not going to be a long block code.

One example where this is true is the binary symmetric channel, with uncoded transmission. But I’m not so sure what is the achievability scheme in general, although I have some ideas — it may involve quantizing the excess codewords to the nearest zero-error codewords. The converse I have no idea.

In terms of the statement, it is really unclear what is meant by “bit error”. In the classical statement, a message from a large alphabet is coded into some $X^n \in \mathcal{X}^n$ where $\mathcal{X}$ is the channel input alphabet. After decoding, $X^n$ is either found correctly, or it is in error. There is no “bit” in here. Even if $X$ is binary, is the bit error the received (uncooked) bit error? Or is it the decoded (cooked) bit error? Why should the decoded bit error matter, isn’t that a codebook artifact? Or is it the bit error in the original message, if the original message is to be represented by a bit-stream? But that is also entirely arbitrary.

Anyway I’d like a clarification from someone or a reference.