Some stuff » number

problem of strings

admin — Wed, 07 Mar 2012 05:19:49 +0000

This is a problem via fakalin.

You have 10 pieces of string, each with two ends. You randomly pick two ends of string (possibly from the same string, possibly from different ones) and tie them together, creating either a longer piece of string or a loop. You keep doing this until you run out of free ends.

What is the expected number of loops you end up with?

Things to note are the following:

- Once a loop is made from a string, it is removed from further consideration.
- Picking two ends from the same string immediately makes a loop.
- Picking two ends from different strings makes a longer string.

So in the end, no matter which two ends are picked, we have one fewer open string than we started with.

Let \(f(n)\) be the expected number of loops with \(n\) open strings. We know \(f(1)=1\). With \(n\) strings, the probability of picking two ends from the same string is \(n/\binom{2n}{2}\). So:

\(f(n) = n/\binom{2n}{2} (f(n-1) + 1) + (1 – n/\binom{2n}{2}) f(n-1)\)
\(= f(n-1) + n/\binom{2n}{2} = f(n-1) + n / [2n (2n - 1) / 2] = f(n-1) + 1 / (2n-1)\)

That is, \(f(n) = \sum_{i=1}^n 1/(2i-1)\), \(f(10) = 1841/863\).

For large \(n\), this sum does not converge, in fact it is obvious that \(f(n)\) grows like \(\log n\).

toward a synthetic universal instrument

admin — Wed, 15 Feb 2012 01:40:17 +0000

The Roland line of “SuperNATURAL” digital pianos claims to produce a more natural sound by combining the two primary methods of synthesizing instruments, namely: acoustic modeling of real instruments, and recording samples from them. The two methods are different enough that, even if both converge to the true output as more sophistication is put to bear, they are rather difficult to merge together.

The history of synthesized instruments has ping-ponged between the two methods. First there was FM synthesis, which used analog function generation based on the simplest physical models of standing waves (harmonics, etc.). This allowed distinguishing between major instrument groups but sounded decidedly fake. Then people recorded acoustic instruments and looped/interpolated between samples — much better, but storage constraints placed limits on what could be processed; and there was never any dynamism. Then it was back to physical modeling, this time using waveguides to determine how various inputs like bowing on strings or blowing into pipes dynamically affect output sound (I think it started at CCRMA). This gave really good expressivity — but again sounded fake. And so back to wave-samples. For the last 15 years or so, especially with the cheapening of storage, it appears that the dumbest, brute-force method of using a large enough number of samples and ad-hoc adjustments to their decay and reverb characteristics became dominant. For ensemble instruments with little individual personality, it was actually superb. The trouble was always with instruments in solo.

One has to ask the question though — is sampling really that dumb?

In other words, is physical modeling ever going to achieve reality? Sure, with more processing power, more realistic models could be built. But let’s be serious: the reason why acoustic instruments sound real is likely because they are built exactly the way they are. They are already made (physically) as simple as they can be — e.g. just some metal pipes or a piece of string, so that a “realistic” model may well turn out to be the acoustic object itself, with all of its mechanical physics…

Instead of trying to reverse engineer the box that takes mechanical input into sound output, we may as well embrace sampling by taking a huge number of input and output pairs. But we should do it intelligently. The entire space shouldn’t be sampled evenly or by hunches. It should be more like what they do at CCRMA where they attempt to sample the various independent modes of an instrument — striking it with a mallet at different locations to get impulse responses, for example.

Beyond this, we should consider decomposing sound reproduction differently. The sound generation process need not be fully decoupled at the point of digital waves, that then need to be reproduced by generic membrane-based speakers. Perhaps we should instead take acoustic objects that span various modes of the output space — things like a large piece of wood, a small piece of wood, some metal pipe, some pins, whatever minimal set we could derive (may even be a weird basis set of synthetically shaped physical objects) — then drive them like specialized “speakers.” Likewise, we do the same modal decomposition with input interfaces — a pressure-resistive mouthpiece, a coupled string, etc.

Much like in color science, there should be the concept of gamut that a particular output device can cover and that a particular input device can pick up. Then we should match real sounds from real instruments with the inputs/outputs we get on our input/output devices by applying the appropriate digital transforms, which we get by (mathematically) solving the inverse problem from measurements, rather than trying to come up with the physics from first principles.

What we get is then a universal instrument that should reproduce acoustic instruments (and many other realizable timbres) accurately. As a bonus, it has the benefit of allowing interchangeable “playing” styles upon swapping inputs (cf. keytar, melodica).

dialectics and truth-finding

admin — Fri, 10 Feb 2012 06:12:53 +0000

When one is presented with some subject on which there are several viewpoints, and exhorted to look at things “dialectically,” one might ask what this means.

Wikipedia says of classical dialectic that the point is to generate either a refutation of one viewpoint or a synthesis — to reach a better conclusion. But it doesn’t say what form the better conclusion is in. Similarly, it says of Hegelian dialectic that the point is to reach a synthesis by combining the common points of a thesis and an antithesis.

These models of truth-finding appear to be rather limited. Besides the fact that in some sense they are specialized to dual or opposing viewpoints numbering two (or even if we extend it, a finite number), they are restricted to finding truth only in the intersection or union or some other simple-minded method of synthesis. I argue for a more general way to model truth-finding. This is inspired by engineering, as usual.

The truth in this case is like an object that lives in some high dimensional space. The viewpoints, any number of them, are projections of the truth onto different subspaces. Some subspaces may be larger than others even. The point is each “view” will see something different out of the same object.

Truth-finding is to combine the viewpoints in the sense of inverting the projections, so as to recover the original object. This is not a simple or even unique process, but there is probably a most parsimonious way to do it a la estimation. But it has the benefit of giving not only the complex and rich object in its original form or some approximation of it, but also the capability to support projection to an infinite number of other subspaces or viewpoints not present in the original argument. This is what dialectics cannot provide.

how to summarize a long tail?

admin — Sun, 01 May 2011 11:36:29 +0000

Where to Live to Avoid a Natural Disaster
Weather disasters and quakes: who’s most at risk? The analysis below, by Sperling’s Best Places, a publisher of city rankings, is an attempt to assess a combination of those risks in 379 American metro areas … and take(s) into account the relative infrequency of quakes, compared with weather events and floods.

I don’t know what exact metric they used here but it seems to be more or less expected value of “disaster points” accumulated during a unit period, for the lack of a better description. Here is the problem with these expected value based metrics, trying to summarize very different distributions: the variance matters! One catastrophic disaster isn’t quite the same as several smaller disasters costing the same number of disaster points. Yet “maximizing” survival using this map, humanity moves to the West Coast then becomes extinct in the next big earthquake. Even imposing a convex utility function on the distribution isn’t entirely satisfying. When it comes to decision making, tail risk is in a different bucket than central risk. Somehow an important aspect of the decision making process isn’t captured by “soft” metrics.

But let’s go back a bit. The reason why distributions are summarized by a single number is often based on the notion of repeated experiments such that the law of large numbers kicks in. For a single sample path, repetition is not relevant for truly rare events, so expected value and other metrics that depend on repetition should be discarded immediately. It isn’t clear what a reasonable replacement is though. If something like a simple survival threshold were more appropriate (let’s say you could survive a two sigma lifetime event and no larger), people certainly don’t behave as such, since there would be no solution: there is always a catastrophic risk, say, of the universe ending. It may be necessary to treat three different severity regimes separately: soft metrics for the “mundane” and multiply occurring central risks, thresholding for the “rare” but “statistically identifiable” tail risks, and disregard for the “never occurred and unknown” acts of god.

poor man’s bandwidth control

admin — Fri, 18 Feb 2011 03:02:02 +0000

Consumer broadband modems and routers typically cannot deal with a large number of connections from multiple users, and the cheap firmware has no settings to restrict the bandwidth of each user. This is when you resort to the physical layer to help you out.

Solution: Restrict wifi to the really slow 802.11b, maybe even slower, and let the physical layer radio contention fake the effects of fair bandwidth control. Works like a charm, no more dropped connections.

a polynomial problem

admin — Mon, 22 Nov 2010 21:52:13 +0000

The latest problem from fakalin. Took some wrong turns and hints to solve it…

Given a polynomial \(f: \mathbb{Z}\to \mathbb{Z}\) with positive integer coefficients, how many evaluations of \(f\) does it take to obtain the polynomial?

(An \(f: \mathbb{R}\to \mathbb{R}\) polynomial with real coefficients would take the number of degrees plus 1 to specify, which, if it held in this case, would render the answer unbounded. But the correct answer in this case is surprisingly small.)

It takes only 2 evaluations. Suppose in the following that \(b>0\). Let us note that a polynomial \(f(b) = a_n b^n + … + a_0 b^0\) specifies essentially a base \(b\) representation of the number \(f(b)\), in that \(a_n a_{n-1} … a_0\) is an expansion of \(f(b)\) in base \(b\). The only problem is this expansion is non-unique, as it is possible for any \(a_j \ge b\).

However, it is not possible for any \(a_j \ge f(b) + 1\), since for all \(j\), \(f(b) \ge a_j\) by the problem statement and assumption on \(b\). Then take \(B = f(b) + 1\). Now we are guaranteed that \(a_n a_{n-1} … a_0\) is the unique (and canonical) base \(B\) expansion of \(f(B)\), from which the polynomial coefficients immediately obtain.

So the two evaluations are at \(f(b)\) and \(f(B=f(b)+1)\).

Example: \(f(b) = 3b^2 + 2b + 1\). Evaluate at, e.g., \(b=1\) to get \(f(1) = 6\). Then evaluate at \(B=f(1)+1=7\) to get \(f(7)=162=321_{7}\).

different kind of coupon collector’s problem

admin — Mon, 11 Oct 2010 09:18:39 +0000

The classic coupon collector’s problem asks for the number of samples it takes to collect all coupon types from a sequence of coupons in which each of the \(k\) types of coupons has an unlimited number of copies.

Here is a different kind of problem: if there are limited copies of each of the \(k\) coupon types (say, \(d\) copies), how many samples does it take to collect \(t\) coupon types?

By the pigeonhole principle, we have two extremes: from \(n=0\) to \(n=t-1\) samples, it is impossible to collect \(t\) coupon types; and for \(n>(t-1)d\) samples, it is impossible not to. For number of samples in between, there is some monotonically decreasing probability that \(t\) or more types are not collected. Let’s call the latter probability \(P(n)\). The expected number of samples it takes to collect \(t\) types is then

\(\sum_{n=1}^{(t-1)d} P(n)\)

Finding \(P(n)\) for \(n\ge t\) doesn’t appear easy, though the combinatorics could be written down (in theory), maybe involving partition functions, Stirling numbers, or whatever. But the question is reduced to the following, which somebody must have solved: given a \(d \times k\) 0-1 matrix, in which \(n\) 1′s are placed randomly, what is the probability that exactly \(z\) columns are 0-weight (have 0 column sums)?

These papers give some approximations:

http://arxiv.org/PS_cache/arxiv/pdf/0806/0806.1480v3.pdf

http://cs.anu.edu.au/~bdm/papers/irregular.pdf

Does anybody know the actual answer?

wire matching problem

admin — Thu, 02 Apr 2009 21:13:36 +0000

Here’s an interesting and, shall I say, practical problem brought up at group meeting today: If you have say, n identical looking wires strung across a long distance between point A and point B, and your job is to label all wires with just an ohm-meter (to measure continuity, i.e. short circuit or open circuit), how many trips between A and B does it take?

Indeed, the only way to use an ohm-meter for this purpose is by tying wires together. Clearly the number of trips shouldn’t be larger than \(\log n\) or so, simply by tying various halves of the set of wires at one end and going to the other end to measure. But there are other kinds of coding that do better.

First, some examples.

Clearly, \(n=2\) is impossible to label.

For \(n=\binom{k}{2}\) for some \(k\) (call these binomial numbers), the number of trips between A and B is no larger than 2. Here is how. At point A, partition the \(n\) wires into groups \(P_1,P_2,P_3,…,P_k\) of \(1,2,3,…,k\) wires each and tie the wires in each group together. Go to point B and recover the partition set memberships by measuring continuity between pairs of wires. (Thus A has communicated set membership to B.) Now based on this side information, repartition the wires at B into groups \(Q_1,Q_2,Q_3,…,Q_k\) again of \(1,2,3,…,k\) wires each, but do this by taking 1 wire each from \(P_1,P_2,…,P_k\) to form \(Q_k\), take 1 wire each from \(P_2,P_3,…,P_k\) to form \(Q_{k-1}\), and so on. Go back to point A, undo all the original ties, and recover the new partitions. The membership pair \((P_i,Q_j)\) uniquely identifies a wire so all wires are fully labeled at both A and B. This establishes the main ideas of this class of partitioning solutions. It is also easy to see that at least 2 trips need to be made for any \(n>1\) since sets of the same size and wires within one set cannot be distinguishable without other information.

For \(n\) that are sums of unique binomial numbers (call these doubly binomial numbers), a two-phase procedure can label the wires in no more than 3 trips. The first phase is to partition the wires into clusters of unique binomial number sizes \(b_1,b_2,…,b_l\). Go to point B and identify the clusters by the unique sizes of the continuity sets. Then follow the previous technique to identify the wires within each cluster. And, you can generalize this by further nesting so you get 4 trips for \(n\) that are sums of unique doubly binomial numbers i.e. triply binomial numbers, 5 trips for quadruply binomial numbers, etc.

In essence, this is a generalization of binary partitioning to partitioning into many sets, and more sophisticated partition based codes can probably be found for all \(n\) on a case-by-case basis, however, these are not always optimal. For one, the multi-phase procedure is arbitrary and probably inefficient. And other kinds of coding that wipes away received information in the wire configuration in order to send new information (even if some information remains undecoded, e.g. sets of the same size yet to be distinguished) lose efficiency.

The better solution that solves the problem in the optimal 2 trips (A to B to A) uses clever feedback in the coding. The idea is based on the fact that if you chain all wire segments together in series (like Christmas lights), then the location of any single break in the chain can be determined by measuring continuity from one terminus. Here it is:

So suppose \(n\) is odd. At point A, arbitrarily label 1 wire to be “the left terminus” or node 0. Tie the remaining wires into \((n-1)/2\) pairs. Go to point B. By the coding at A, the single wire and collection of wire pairs can be distinguished. So node 0 at B is established. Then arbitrarily choose a pair and a node in the pair, and connect it to node 0. This becomes node 1, the remaining node of the pair becomes node 2. Choose another pair and a node in the pair and connect it to node 2, this becomes node 3 and the other of the pair node 4, and so on, until all wire segments are connected in series. This configuration becomes the feedback information. Go back to point A. Now we use the feedback to search for all nodes from 1 to \(n-1\). This is easy to do: just make a cut at any connection at A and measure continuity between node 0 and all other nodes. Say there are some s nodes that read short circuit and the rest open circuit. Then immediately it is known that the cut is between nodes \(s\) and \(s+1\). This can be repeated at all cuts so all nodes are labeled.

Suppose \(n>2\) is even. Then arbitrarily choose two wires to be the termini. Connect the remaining into pairs. Go to B and identify the two single wires. Choose one to be the left terminus, i.e. node 0 and the other one the right terminus, i.e. node \(n-1\). Then follow the previous connection procedure for the odd number of n-1 nodes not including node \(n-1\). Go back to A and identify the right terminus as the only one of two single wires that is open circuit with all other nodes; that gets labeled node \(n-1\). Then remove it from consideration. The other single wire is labeled node 0, and the same procedure is applied as previously to label the remaining nodes.

Polya’s urn, martingale, CLT

admin — Tue, 19 Feb 2008 04:36:50 +0000

A problem described here

what do you expect to happen if you had 2 urns with a ball in each one…. and you placed balls in them…. choosing an urn with a probability proportional to the number of balls in each urn…..

intuitively, you’d expect a rich getting richer kind of thing….. or those with more balls getting even ballsier…. [see extreme sports.... those ppl are crazy... and get crazier] but the amazing thing is that you get a uniform distribution….. it’s not even a distribution that peaks at the two ends where there’s 1 ball and n-1 balls in the other….

Well, it’s not that counter-intuitive. Certainly, if you begin with exactly 1 ball in each urn, you end up with a uniform distribution on all outcomes, but the uniformity is kind of an artifact. In fact, the proportion of total balls in one designated urn is a martingale process, which means the proportion of balls at any number of times steps later is expected to be the same as the starting proportion. So, if you do not start with an equal number of balls in the two urns, there is no way that the long-run outcome will be uniformly distributed across the possibilities because that would give a mean proportion value of 1/2.

Furthermore, this means that indeed there is a rich getting richer “kind of thing” going on, if you deviate from equal proportion by chance. This is a weak effect that happens when the balls are few, but it doesn’t mean that the half-half case is somehow a counter-intuitive stable system. The paths that deviate early will indeed tend to go to the rails. It’s just that there are so many central paths that you aren’t likely to deviate proportionally unless it happens early.

What’s more interesting is that there are so many central paths. If you began with 2 and 2 balls, or 10 and 10 balls, you don’t get uniform outcomes, but a centrally peaked distribution around equal proportions. Indeed, if the number of balls is large, then addition of balls to urns one at a time doesn’t even move the drawing probabilities given by existing balls, and so it is reasonable to expect something like the central limit theorem to kick in.

So this is a case of deviation instability fighting with central tendencies of large numbers. To see this, keep the total number of balls the same, and move balls from the urn not chosen to the urn chosen, instead of adding balls to the urn chosen (and hence remove the latter cause from the “fight”), then it will clearly go to the rails.

8 perfect shuffles

admin — Sat, 29 Dec 2007 09:50:56 +0000

It is said that 8 perfect shuffles of a standard deck of 52 cards results in an unshuffled deck. Why is that? Is it true only for 52 cards? What kind of shuffle is implied by this statement? Yesterday, we (high school math buddies) made pretty good sense of it.

Let’s first index positions in a standard deck by 0,1,…,51.

When we say “perfect shuffle,” it means to cut the deck in half, then interleave the cards from the two halves. There are actually two choices here, depending on which card we put on top. We can send card 0 to 0, card 26 to 1, card 1 to 2, …, card 25 to 50, card 51 to 51 (we’ll call this Shuffle 1), or we can send card 26 to 0, card 0 to 1, card 27 to 2, …, card 51 to 50, card 25 to 51 (we’ll call this Shuffle 2). Shuffle 1 is what we deal with, because the statement doesn’t work for Shuffle 2. But note that Shuffle 2 is just Shuffle 1 followed by exchanges of pairs.

So, the effect of shuffling on the first half of the deck (\(n \in \{ 0, \dots ,25 \}\)) is to send card \(n\) to \(2n\). Notice that a perfect shuffle looks the same if we flip the deck around, so the effect on the second half of the deck (\(n \in \{26, \dots ,51 \}\)) is to send card \(n\) to \(R(2R(n))\), where \(R(n) = 51-n\) is the operation that gives the reversed index of card \(n\).

\(R(2R(n)) = 51 – 2(51-n) = 2n – 51\). Thus each shuffle sends card \(n\) to \(2n\), except that sometimes, we subtract 51 from the answer. (That “sometimes” is when the card \(n\) is in the second half of the deck.) The point is, no matter what, the card \(n\) is again sent to something in the range 0,…,51, so in fact it is sent to \(2n\ (mod\ 51)\) (if we ignore 51, which is basically 0, and is always sent to itself.)

If we shuffle \(s\) times, we get that card \(n\) is sent to \(2^s n\ (mod\ 51)\).

Immediately, we see that 8 shuffles works because \(2^8 n \equiv n\ (mod\ 51)\), and

In general, \(s\) shuffles brings a deck of size \(d\) back to its original order if and only if \((2^s – 1) n \equiv 0\ (mod\ d-1)\) for all \(n\in \{0,\dots,d-1\}\), that is, if and only if \(d-1\) divides \(2^s – 1\).

The last statement is because if \((2^s -1) n \equiv 0\ (mod\ d-1)\) for \(n=1\), then it must be true for all \(n>1\). (The case \(n=0\) is trivial.)

There are more interesting observations. If \(s\) shuffles brings a deck to its original order, then surely \(ks\) shuffles does that, too. And indeed, if \(d-1\) divides \(2^s – 1\), then it also divides \(2^{ks} – 1\), which factors as \((2^s – 1)(2^{(k-1)s} + 2^{(k-2)s} + \cdots + 2^s + 1)\).

In another observation, we can explicitly write the card position after \(s\) shuffles as

\(2(\dots 2(2n – 51 b_1(n)) – 51 b_2(n) \dots) – 51 b_s(n)\ \ \ \ (*)\)

where \(b_i(n)\) is an indicator bit that is 1 if, entering the \(i\)th shuffle, the card is in the second half of the deck, and 0 otherwise.

Expanding \((*)\), we get \(2^s n – 51 (2^{s-1} b_1(n) + 2^{s-2} b_2(n) \cdots 2^1 b_{s-1}(n) + 2^0 b_s(n)) \)

Now, define \(b(n) \equiv 2^{s-1} b_1(n) + 2^{s-2} b_2(n) \cdots 2^1 b_{s-1}(n) + 2^0 b_s(n)\). Clearly, \(\overline{b_1(n) b_2(n) \dots b_s(n)}\) is the binary expansion of \(b(n)\).

So \(s\) shuffles brings the deck back to its original order if and only if \(2^s n – 51 b(n) = n\) for all \(n\), i.e. \(b(n) = \frac{2^s – 1}{51}n\). But we know that 51 divides \(2^s – 1\) in our example, so \(\frac{2^s – 1}{51}\) is an integer, and in general

If \(s\) shuffles brings a deck of size \(d\) to identity, then the integer given by \(\frac{2^s – 1}{d – 1} \equiv b(1)\) defines the function \(b(n)\) for all \(n\) by \(b(n) = n b(1)\), where the binary expansion of \(b(n)\) gives which half the card \(n\) is in for all \(s\) shuffles.

In the case of \(s=8\) and \(d=52\), \(b(1) = 5\).

Now we can also find the minimum number of shuffles to return various deck sizes to identity, by looking at the factors of \(2^s-1\):

\(s\):	\(d\) for which \(s\) is minimal
1:	2
2:	4
3:	8
4:	6, 16
5:	32
6:	10, 22, 64
7:	128
8:	18, 52, 86, 256

etc…

Of course, every deck of size \(d\) can be returned to its original order in some number of shuffles: at most \(d!\) (and if fewer, then the number of shuffles divides \(d!\)), this is because the automorphism group of \(d\) elements is at most size \(d!\). This knowledge is of no practical utility, since \(d!\) is huge, but it does imply that any odd positive integer \(d-1\) divides \(2^{d!} – 1\). And, it highlights the fact that some deck sizes are very special: any \(d\) that is a power of 2 only takes \(\log d\) shuffles, surely; but especially 52, for such a large deck size that isn’t a power of 2, the fact that it only takes 8 shuffles is quite amazing.

I do wonder if there are any “bad” deck sizes \(d\) that do take up to \(d!\) shuffles to return to the original order. A search of the factors of Mersenne numbers may be in order.