Some stuff » binom

problem of strings

admin — Wed, 07 Mar 2012 05:19:49 +0000

This is a problem via fakalin.

You have 10 pieces of string, each with two ends. You randomly pick two ends of string (possibly from the same string, possibly from different ones) and tie them together, creating either a longer piece of string or a loop. You keep doing this until you run out of free ends.

What is the expected number of loops you end up with?

Things to note are the following:

- Once a loop is made from a string, it is removed from further consideration.
- Picking two ends from the same string immediately makes a loop.
- Picking two ends from different strings makes a longer string.

So in the end, no matter which two ends are picked, we have one fewer open string than we started with.

Let \(f(n)\) be the expected number of loops with \(n\) open strings. We know \(f(1)=1\). With \(n\) strings, the probability of picking two ends from the same string is \(n/\binom{2n}{2}\). So:

\(f(n) = n/\binom{2n}{2} (f(n-1) + 1) + (1 – n/\binom{2n}{2}) f(n-1)\)
\(= f(n-1) + n/\binom{2n}{2} = f(n-1) + n / [2n (2n - 1) / 2] = f(n-1) + 1 / (2n-1)\)

That is, \(f(n) = \sum_{i=1}^n 1/(2i-1)\), \(f(10) = 1841/863\).

For large \(n\), this sum does not converge, in fact it is obvious that \(f(n)\) grows like \(\log n\).

nchoosetwo and collaborative ranking

admin — Tue, 01 Feb 2011 17:55:02 +0000

Walking around campus these days, there are cryptic-looking things like

\(\binom{n}{2}\mathrm{.com}\) and \(\binom{n}{2} \ni \binom{i}{u}\)

obviously referring to a dating site — currently it’s restricted to MIT and Harvard students. This one tries on an idea that I’ve heard discussed numerous times in different contexts, but apparently nobody went and did it in all these years. Instead of running a matching algorithm, it asks third parties (i.e. matchmakers) as well as the interested parties themselves to suggest matches. The thing that is supposed to keep this low-risk is anonymity: a match isn’t revealed until the two primary parties involved mutually accept or their lists intersect.

As with all things that involve anonymity, this asks for trollish and antisocial behavior. I’ve already registered three aliases on moira for exactly this purpose — ok, ok, so they’ve suppressed that antic after people raised concerns, though these and other ramifications should have perhaps been worked through a bit more carefully pre-launch.

The spam potential remains. A matchmaker’s identity isn’t revealed unless both people accept her suggestion, so pranks and insults can be conducted to an extent. One way around this may be grafting social graph data onto the system for collaborative filtering (if they manage to obtain such data…). And if they do, perhaps the suggestions of more closely related people should weigh more, along with those of successful matchmakers. Perhaps there should even be more weight if multiple matchmakers concur. This is extremely intriguing, because eliminating spam is equivalent to predicting who is a likely match, and collaborative filtering for this problem is an unexplored direction.

The more fundamental question is why such a site is even necessary.

Ostensibly, there is a gain over the serial nature of asking in person, due to the ability to make more informative decisions by using data you don’t have or cannot socially obtain in the open. If anonymity compels people to provide more preference information into the system than they would otherwise do, then everybody is better off. This is the positive aspect.

Nevertheless, if we wanted to use full information, why not run a global algorithm? If humans had no feelings, they could just make a list ranking whom they liked in order, and let a computer take care of allocation, almost like housing assignments. But alas, despite the rationality and efficiency of this obvious method, real humans appreciate neither the results nor the implications of it. Nobody likes to ponder the idea of not being #1. So something like nchoosetwo is a compromise, and hides the negative aspect of knowing too much: hurt feelings. But now the site becomes very dangerous. Under the cover of anonymity, the site is collecting ranking information about people from every action on the site. One particular situation in which an explicit rank order is elicited is where there are multiple matches. By your choice, you reveal to the system that “A is better than B.” Is this something the site should know? Not to mention that proposing matches in parallel, when combined with side information from the subsequently unfolding real world, also leaks preference information to an anonymous matchmaker. Those are much more dangerous information than who your Facebook friends are…

So far though, this site has embarrassingly few features. When you have a mutual match, all it does is to print:

Mutual crush! How about a date?

Tellingly, you can’t remove this.

wire matching problem

admin — Thu, 02 Apr 2009 21:13:36 +0000

Here’s an interesting and, shall I say, practical problem brought up at group meeting today: If you have say, n identical looking wires strung across a long distance between point A and point B, and your job is to label all wires with just an ohm-meter (to measure continuity, i.e. short circuit or open circuit), how many trips between A and B does it take?

Indeed, the only way to use an ohm-meter for this purpose is by tying wires together. Clearly the number of trips shouldn’t be larger than \(\log n\) or so, simply by tying various halves of the set of wires at one end and going to the other end to measure. But there are other kinds of coding that do better.

First, some examples.

Clearly, \(n=2\) is impossible to label.

For \(n=\binom{k}{2}\) for some \(k\) (call these binomial numbers), the number of trips between A and B is no larger than 2. Here is how. At point A, partition the \(n\) wires into groups \(P_1,P_2,P_3,…,P_k\) of \(1,2,3,…,k\) wires each and tie the wires in each group together. Go to point B and recover the partition set memberships by measuring continuity between pairs of wires. (Thus A has communicated set membership to B.) Now based on this side information, repartition the wires at B into groups \(Q_1,Q_2,Q_3,…,Q_k\) again of \(1,2,3,…,k\) wires each, but do this by taking 1 wire each from \(P_1,P_2,…,P_k\) to form \(Q_k\), take 1 wire each from \(P_2,P_3,…,P_k\) to form \(Q_{k-1}\), and so on. Go back to point A, undo all the original ties, and recover the new partitions. The membership pair \((P_i,Q_j)\) uniquely identifies a wire so all wires are fully labeled at both A and B. This establishes the main ideas of this class of partitioning solutions. It is also easy to see that at least 2 trips need to be made for any \(n>1\) since sets of the same size and wires within one set cannot be distinguishable without other information.

For \(n\) that are sums of unique binomial numbers (call these doubly binomial numbers), a two-phase procedure can label the wires in no more than 3 trips. The first phase is to partition the wires into clusters of unique binomial number sizes \(b_1,b_2,…,b_l\). Go to point B and identify the clusters by the unique sizes of the continuity sets. Then follow the previous technique to identify the wires within each cluster. And, you can generalize this by further nesting so you get 4 trips for \(n\) that are sums of unique doubly binomial numbers i.e. triply binomial numbers, 5 trips for quadruply binomial numbers, etc.

In essence, this is a generalization of binary partitioning to partitioning into many sets, and more sophisticated partition based codes can probably be found for all \(n\) on a case-by-case basis, however, these are not always optimal. For one, the multi-phase procedure is arbitrary and probably inefficient. And other kinds of coding that wipes away received information in the wire configuration in order to send new information (even if some information remains undecoded, e.g. sets of the same size yet to be distinguished) lose efficiency.

The better solution that solves the problem in the optimal 2 trips (A to B to A) uses clever feedback in the coding. The idea is based on the fact that if you chain all wire segments together in series (like Christmas lights), then the location of any single break in the chain can be determined by measuring continuity from one terminus. Here it is:

So suppose \(n\) is odd. At point A, arbitrarily label 1 wire to be “the left terminus” or node 0. Tie the remaining wires into \((n-1)/2\) pairs. Go to point B. By the coding at A, the single wire and collection of wire pairs can be distinguished. So node 0 at B is established. Then arbitrarily choose a pair and a node in the pair, and connect it to node 0. This becomes node 1, the remaining node of the pair becomes node 2. Choose another pair and a node in the pair and connect it to node 2, this becomes node 3 and the other of the pair node 4, and so on, until all wire segments are connected in series. This configuration becomes the feedback information. Go back to point A. Now we use the feedback to search for all nodes from 1 to \(n-1\). This is easy to do: just make a cut at any connection at A and measure continuity between node 0 and all other nodes. Say there are some s nodes that read short circuit and the rest open circuit. Then immediately it is known that the cut is between nodes \(s\) and \(s+1\). This can be repeated at all cuts so all nodes are labeled.

Suppose \(n>2\) is even. Then arbitrarily choose two wires to be the termini. Connect the remaining into pairs. Go to B and identify the two single wires. Choose one to be the left terminus, i.e. node 0 and the other one the right terminus, i.e. node \(n-1\). Then follow the previous connection procedure for the odd number of n-1 nodes not including node \(n-1\). Go back to A and identify the right terminus as the only one of two single wires that is open circuit with all other nodes; that gets labeled node \(n-1\). Then remove it from consideration. The other single wire is labeled node 0, and the same procedure is applied as previously to label the remaining nodes.