NSF graduate student fellowship

I was spammed by the NSF multiple times to fill out some silly little survey on their graduate student fellowship program (GRFP), so I got annoyed and did it. I call it a silly little survey because I suspect no learning will occur from it, where by “learning” I mean “corrective action.” The cynic in me suspects these surveys are to prove whatever they want to prove — in this case, that the program is “working.” I, however, don’t believe the program is working at a core level.
(Read the article)

on eating crabs

One might think it cruel to boil and eat those crabs taken from the sea that had been alive but a moment ago, roe and all, but they do not give up so easily even in perfect death, tearing your meat up with sharp spines just as you try to tear them up, prying your teeth back just as you try to pry them open with teeth. Furiously you ponder the caloric implications of this meal, but you push on, till they are a broken bony pile of discarded refuse, and you have asserted the order of nature, with bloody fingers and injured tongue.

The travails of defending carnivorism against unrelenting crabs is crueler than the ease of aborting a family tree of them.

data structure problem

Another problem by fakalin.

A data structure has the entropy bound if all queries have amortized time \(O(\sum_k p_k \log 1/p_k)\), where \(p_k\) is the fraction of the time that key \(k\) is queried. It has the working-set property if the time to search for an element \(x_i\) is \(O(\log t_i)\), where \(t_i\) is the number of elements queried since the last access to \(x_i\). Prove that the working-set property implies the entropy bound.

This isn’t really a data structure problem, per se.
(Read the article)

problem of strings

This is a problem via fakalin.

You have 10 pieces of string, each with two ends. You randomly pick two ends of string (possibly from the same string, possibly from different ones) and tie them together, creating either a longer piece of string or a loop. You keep doing this until you run out of free ends.

What is the expected number of loops you end up with?

(Read the article)

two inductive problems

Terrence Tao quotes an (apparently) widely known problem, briefly paraphrased:

There is an island upon which 1000 people with various eye colors live. 900 have brown eyes, and 100 have blue. Each resident can (and does) see the eye colors of all other residents, but is forbidden by custom to try to discover one’s own (no talking about it, etc.). If (and only if) one does discover one’s own eye color somehow, then one commits suicide the following day for all to witness.

One day a visitor unaware of island customs comes to the island and announces (truthfully) to everyone: I see at least one blue-eyed person among you. What happens next?

One might be tempted to say that nothing happens, since every islander already sees either 99 or 100 blue-eyed people, so the visitor seemingly brought no new information.
(Read the article)

toward a synthetic universal instrument

The Roland line of “SuperNATURAL” digital pianos claims to produce a more natural sound by combining the two primary methods of synthesizing instruments, namely: acoustic modeling of real instruments, and recording samples from them. The two methods are different enough that, even if both converge to the true output as more sophistication is put to bear, they are rather difficult to merge together.

The history of synthesized instruments has ping-ponged between the two methods. First there was FM synthesis, which used analog function generation based on the simplest physical models of standing waves (harmonics, etc.). This allowed distinguishing between major instrument groups but sounded decidedly fake. Then people recorded acoustic instruments and looped/interpolated between samples — much better, but storage constraints placed limits on what could be processed; and there was never any dynamism. Then it was back to physical modeling, this time using waveguides to determine how various inputs like bowing on strings or blowing into pipes dynamically affect output sound (I think it started at CCRMA). This gave really good expressivity — but again sounded fake. And so back to wave-samples. For the last 15 years or so, especially with the cheapening of storage, it appears that the dumbest, brute-force method of using a large enough number of samples and ad-hoc adjustments to their decay and reverb characteristics became dominant. For ensemble instruments with little individual personality, it was actually superb. The trouble was always with instruments in solo.
(Read the article)

a list of problems for finance

The system [of finance] is too complex to be run on error-strewn hunches and gut feelings, but current mathematical models don’t represent reality adequately. The entire system is poorly understood and dangerously unstable. The world economy desperately needs a radical overhaul and that requires more mathematics, not less.

This article in the Guardian is a little late to the party and has an intentionally misleading headline, but brings up some points that are usually too esoteric to survive in print:

Any mathematical model of reality relies on simplifications and assumptions. The Black-Scholes equation was based on arbitrage pricing theory, in which both drift and volatility are constant. This assumption is common in financial theory, but it is often false for real markets. The equation also assumes that there are no transaction costs, no limits on short-selling and that money can always be lent and borrowed at a known, fixed, risk-free interest rate. Again, reality is often very different.

There are more false assumptions like Gaussianity of log-returns, complete markets, martingale price paths, etc., but these are merely technical complaints, which can be patched (as many are doing). The real issue is, as the author notes, “… instability is common in economic models … mainly because of the poor design of the financial system.” Namely, there is a lack of accounting for behavioral effects that result in feedback, which give rise to rather more fundamental issues that would require the “radical overhaul” alluded to in the opening quotation to resolve. There are some problems that could be tackled in this area.
(Read the article)

dialectics and truth-finding

When one is presented with some subject on which there are several viewpoints, and exhorted to look at things “dialectically,” one might ask what this means.

Wikipedia says of classical dialectic that the point is to generate either a refutation of one viewpoint or a synthesis — to reach a better conclusion. But it doesn’t say what form the better conclusion is in. Similarly, it says of Hegelian dialectic that the point is to reach a synthesis by combining the common points of a thesis and an antithesis.

These models of truth-finding appear to be rather limited. Besides the fact that in some sense they are specialized to dual or opposing viewpoints numbering two (or even if we extend it, a finite number), they are restricted to finding truth only in the intersection or union or some other simple-minded method of synthesis. I argue for a more general way to model truth-finding. This is inspired by engineering, as usual.
(Read the article)

wget

This is a useful reference. It seems wget can do recursive downloads while cURL cannot:

You want to download all the GIFs from an HTTP directory. `wget http://host/dir/*.gif’ doesn’t work, since HTTP retrieval does not support globbing. In that case, use:

wget -r -l1 –no-parent -A.gif http://host/dir/

It is a bit of a kludge, but it works.

`-r -l1′ means to retrieve recursively (See section Recursive Retrieval), with maximum depth of 1.
`–no-parent’ means that references to the parent directory are ignored (See section Directory-Based Limits), and
`-A.gif’ means to download only the GIF files. `-A “*.gif”‘ would have worked too.

language context, ambiguity and inference

This article on today’s MIT front page sketches an argument that I’ve been thinking about for a while, ever since the IBM Watson Jeopardy contest — that natural language processing is hopeless to the extent that there is additional context (side information) not visible to the machine.

Many prominent linguists … have argued that language is … poorly designed for communication [and] … point to the existence of ambiguity: In a system optimized for conveying information…, they argue, each word would have just one meaning. … Now, a group of MIT cognitive scientists has turned this idea on its head … [T]hey claim that ambiguity actually makes language more efficient, by allowing for the reuse of short, efficient sounds that listeners can easily disambiguate with the help of context.

Although this is just talking about homophonic ambiguity at the word level, the same applies to all levels of language, including full messages whose decoding requires potentially deep context.
(Read the article)

« Previous PageNext Page »