toward a synthetic universal instrument
The Roland line of “SuperNATURAL” digital pianos claims to produce a more natural sound by combining the two primary methods of synthesizing instruments, namely: acoustic modeling of real instruments, and recording samples from them. The two methods are different enough that, even if both converge to the true output as more sophistication is put to bear, they are rather difficult to merge together.
The history of synthesized instruments has ping-ponged between the two methods. First there was FM synthesis, which used analog function generation based on the simplest physical models of standing waves (harmonics, etc.). This allowed distinguishing between major instrument groups but sounded decidedly fake. Then people recorded acoustic instruments and looped/interpolated between samples — much better, but storage constraints placed limits on what could be processed; and there was never any dynamism. Then it was back to physical modeling, this time using waveguides to determine how various inputs like bowing on strings or blowing into pipes dynamically affect output sound (I think it started at CCRMA). This gave really good expressivity — but again sounded fake. And so back to wave-samples. For the last 15 years or so, especially with the cheapening of storage, it appears that the dumbest, brute-force method of using a large enough number of samples and ad-hoc adjustments to their decay and reverb characteristics became dominant. For ensemble instruments with little individual personality, it was actually superb. The trouble was always with instruments in solo.
One has to ask the question though — is sampling really that dumb?
In other words, is physical modeling ever going to achieve reality? Sure, with more processing power, more realistic models could be built. But let’s be serious: the reason why acoustic instruments sound real is likely because they are built exactly the way they are. They are already made (physically) as simple as they can be — e.g. just some metal pipes or a piece of string, so that a “realistic” model may well turn out to be the acoustic object itself, with all of its mechanical physics…
Instead of trying to reverse engineer the box that takes mechanical input into sound output, we may as well embrace sampling by taking a huge number of input and output pairs. But we should do it intelligently. The entire space shouldn’t be sampled evenly or by hunches. It should be more like what they do at CCRMA where they attempt to sample the various independent modes of an instrument — striking it with a mallet at different locations to get impulse responses, for example.
Beyond this, we should consider decomposing sound reproduction differently. The sound generation process need not be fully decoupled at the point of digital waves, that then need to be reproduced by generic membrane-based speakers. Perhaps we should instead take acoustic objects that span various modes of the output space — things like a large piece of wood, a small piece of wood, some metal pipe, some pins, whatever minimal set we could derive (may even be a weird basis set of synthetically shaped physical objects) — then drive them like specialized “speakers.” Likewise, we do the same modal decomposition with input interfaces — a pressure-resistive mouthpiece, a coupled string, etc.
Much like in color science, there should be the concept of gamut that a particular output device can cover and that a particular input device can pick up. Then we should match real sounds from real instruments with the inputs/outputs we get on our input/output devices by applying the appropriate digital transforms, which we get by (mathematically) solving the inverse problem from measurements, rather than trying to come up with the physics from first principles.
What we get is then a universal instrument that should reproduce acoustic instruments (and many other realizable timbres) accurately. As a bonus, it has the benefit of allowing interchangeable “playing” styles upon swapping inputs (cf. keytar, melodica).