Some stuff » math math

tensors

admin — Sun, 16 Oct 2011 12:33:29 +0000

This has been a confusing topic, with half a dozen Wikipedia pages on the subject. Here I took some notes.

Tensors are sums of “products” of vectors. There are different kinds of vector products. The one used to build tensors is, naturally, the tensor product. In the Cartesian product of vector spaces \(V\times W\), the set elements are tuples like \((v,w)\) where \(v\in V, w\in W\). A tensor product \(v\otimes w\) is obtained by tupling the component bases rather than the component elements. If \(V\) has basis \(\{e_i\}_{i\in\{1,…,M\}}\) and \(W\) has basis \(\{f_j\}_{j\in\{1,…,N\}}\), then take \(\{(e_i,f_j)\}_{i\in\{1,…,M\},j\in\{1,…,N\}}\) as the basis of the tensor product space \(V\otimes W\). Then define the tensor product \(v\otimes w\) as

(1) \(\sum_{i,j} v_i w_j (e_i,f_j) \in V\otimes W\),

if \(v=\sum_i v_i e_i\) and \(w=\sum_j w_j f_j\). The entire tensor product space \(V\otimes W\) is defined as sums of these tensor products

(2) \(\{\sum_k v_k\otimes w_k | v_k\in V, w_k\in W\}\).

So tensors in a given basis can be represented as multidimensional arrays.

\(V\otimes W\) is also a vector space, with \(MN\) basis dimensions (c.f. \(V\times W\) with \(M+N\) basis dimensions). But additionally, it has internal multilinear structure due to the fact that it is made of component vector spaces, namely:

\((v_1+v_2)\otimes w = v_1\otimes w + v_2\otimes w\)
\(v\otimes (w_1+w_2) = v\otimes w_1 + v\otimes w_2\)
\(\alpha (v\otimes w) = (\alpha v)\otimes w = v\otimes (\alpha w)\)

Higher-order (n-th order) tensor products \(v_1\otimes v_2\otimes \cdots \otimes v_n\) are obtained by chaining in the obvious way, likewise for higher-order tensor product spaces \(V_1\otimes V_2\otimes \cdots \otimes V_n\). With this, concatenation of tensors are also defined, i.e. \(S_{i_1,…i_m} \in V_1\otimes \cdots \otimes V_m\) and \(T_{i_{m+1},…,i_n} \in V_{m+1}\otimes \cdots \otimes V_n\), then \(S_{i_1,…,i_m}\otimes T_{i_{m+1},…,i_n} = Z_{i_1,…,i_n} \in V_1\otimes \cdots \otimes V_n\). In other words, the indices are appended. This is essentially the Kronecker product, which generalizes the outer product.

However, usually when tensors are mentioned, the tensor product spaces under discussion are already specialized to those generated from a single base vector space \(V\) and its dual space \(V^*\), rather than from a collection of arbitrary vector spaces. In such a space \(P(m,n) = \overbrace{V\otimes \cdots \otimes V}^{m} \otimes \overbrace{V^*\otimes \cdots \otimes V^*}^{n}\), the component spaces (and their bases, indices, etc.) naturally belong to two groups, those from \(V\) are called contravariant, those from \(V^*\) are called covariant, and an (m,n)-tensor from \(P(m,n)\) is written \(T^{i_1,…,i_m}_{j_1,…,j_n}\).

This specialization allows the contraction of tensors to be defined. A contraction basically chooses one covariant vector component and one contravariant vector component from a tensor and applies the former as a functional on the latter, e.g., contracting \(T^{i_1,…,i_m}_{j_1,…,j_n} = v_{i_1}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_1} \otimes \cdots \otimes v^*_{j_n}\) on the pair of indices \(i_1\) and \(j_1\) gives \(Z^{i_2,…,i_m}_{j_2,…,j_n} = v^*_{j_1}(v_{i_1}) (v_{i_2}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_2} \otimes \cdots \otimes v^*_{j_n})\). \(v^*_{j_1}(v_{i_1})\) of course is an inner product that sums over the dimensions of the paired components. Contraction generalizes the trace operator. Combined with concatenation, this defines a tensor multiplication, such that if \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\in P(m,n)\) and \(T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\in P(p,q)\), then \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\) is the contraction of \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\otimes T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\) on all common indices that can be paired, e.g. \(r,s\). This is the so-called Einstein notation, and generalizes matrix multiplication.

The distinction of \(V\) vs. \(V^*\) also manifests in the change-of-basis rules for tensors, which inherit from the change-of-basis rules of the component vector spaces, which are:

contravariant change-of-basis rule: If \(B = [b_1\ b_2\ \cdots\ b_M]\) is a change-of-basis matrix, with the new basis \(\{b_i\}_{i\in \{1,…,M\}}\) written in the old basis as columns, then for a vector written in the old basis \(v\in V\) and the same vector written in the new basis \(\tilde{v}\in V\), \(v = B\tilde{v}\). Therefore, we have
(3) \(v \mapsto \tilde{v} = B^{-1}v\).
covariant change-of-basis rule: If additionally \(a^T\in V^*\) is a functional written in the old basis and \(\tilde{a}^T\in V^*\) is the same functional written in the new basis, then \(\forall v\in V: a^T v = \tilde{a}^T \tilde{v} = \tilde{a}^T B^{-1}v\). Therefore, we have
(4) \(a^T \mapsto \tilde{a}^T = a^T B\).

Combining (1), it can be shown that, for a change-of-basis tensor \(B = {B^{-1}}^{i_1}_{i_1}\cdots {B^{-1}}^{i_m}_{i_m}B^{j_1}_{j_1}\cdots B^{j_n}_{j_n}\), an (m,n)-tensor \(T^{i_1,…,i_m}_{j_1,…,j_n}\) has the change-of-basis rule \(T^{i_1,…,i_m}_{j_1,…,j_n} \mapsto BT^{i_1,…,i_m}_{j_1,…,j_n}\).

Okay, so what’s the point of these tensors? Basically, an (m,n)-tensor \(T^{i_1,…,i_m}_{j_1,…,j_n}\in P(m,n)\) represents a multilinear input-output relationship that takes \(n\) vectors as input and produces \(m\) vectors as output. If used “canonically” on an input \(X^{j_1,…,j_n}\in P(n,0)\), you get \(T^{i_1,…,i_m}_{j_1,…,j_n}X^{j_1,…,j_n} = Y^{i_1,…,i_m}\in P(m,0)\) as output. The contravariant input gets contracted with the covariant parts of the transformation tensor, and these drive the contravariant parts of the transformation tensor to produce the contravariant output.

(System diagrams: Rank is the minimal number of terms in a tensor. On the left, a rank-1 tensor transformation; on the right, a rank-\(K\) one. )

An example is linear transformations, which are (1,1)-tensors (1 vector in, 1 vector out). In array representation these would just be matrices. Any rank-\(K\) linear transformation \(T^v_a\) is decomposable into a \(K\)-term tensor \(\sum_{k=1}^K v_k\otimes a_k\), but 1-term (1,1)-tensors are outer products, so this is the matrix \(\sum_{k=1}^K v_k a^T_k\), and \(Y^v = T^v_a X^a\) is just \(y = \sum_k v_k (a^T_k x)\).

Most other “multilinear” operations on vectors (inner product, cross product, wedge product, determinant) can be written as tensors. For example, the inner product operation (2 vectors in, “0″ vectors out, i.e. scalar) is the (0,2) \(N\)-term Kronecker tensor \(\delta_{i_1 i_2}=\sum_{k=1}^N e^*_k\otimes e^*_k\) where \(\{e^*_k\}_{k\in\{1,…,N\}}\) are the standard basis of \(V^*\).

a polynomial problem

admin — Mon, 22 Nov 2010 21:52:13 +0000

The latest problem from fakalin. Took some wrong turns and hints to solve it…

Given a polynomial \(f: \mathbb{Z}\to \mathbb{Z}\) with positive integer coefficients, how many evaluations of \(f\) does it take to obtain the polynomial?

(An \(f: \mathbb{R}\to \mathbb{R}\) polynomial with real coefficients would take the number of degrees plus 1 to specify, which, if it held in this case, would render the answer unbounded. But the correct answer in this case is surprisingly small.)

It takes only 2 evaluations. Suppose in the following that \(b>0\). Let us note that a polynomial \(f(b) = a_n b^n + … + a_0 b^0\) specifies essentially a base \(b\) representation of the number \(f(b)\), in that \(a_n a_{n-1} … a_0\) is an expansion of \(f(b)\) in base \(b\). The only problem is this expansion is non-unique, as it is possible for any \(a_j \ge b\).

However, it is not possible for any \(a_j \ge f(b) + 1\), since for all \(j\), \(f(b) \ge a_j\) by the problem statement and assumption on \(b\). Then take \(B = f(b) + 1\). Now we are guaranteed that \(a_n a_{n-1} … a_0\) is the unique (and canonical) base \(B\) expansion of \(f(B)\), from which the polynomial coefficients immediately obtain.

So the two evaluations are at \(f(b)\) and \(f(B=f(b)+1)\).

Example: \(f(b) = 3b^2 + 2b + 1\). Evaluate at, e.g., \(b=1\) to get \(f(1) = 6\). Then evaluate at \(B=f(1)+1=7\) to get \(f(7)=162=321_{7}\).

minimax vs. maximin

admin — Fri, 13 Feb 2009 12:16:24 +0000

An elementary, nice lemma relating to the optimization of multivariable functions says that the smallest “big thing” is still bigger than the biggest “small thing”, in other words,

\(\min_x \max_y f(x,y) \ge \max_y \min_x f(x,y)\).

The proof is almost trivial. Let \((x^*, y^*)\) optimize the left hand side and let \((x_*, y_*)\) optimize the right hand side. Now note two facts: for any given \(x\), \(f(x, \arg\max_y f(x,y)) \ge f(x,y) \forall y\); for any given \(y\), \(f(x, y) \ge f(\arg\min_x f(x,y), y) \forall x\).

So in particular, \(f(x^*, y^*) \ge f(x^*, y_*)\) and \(f(x^*, y_*) \ge f(x_*, y_*)\). So putting everything together, \(f(x^*, y^*) \ge f(x_*,y_*)\).

uniform by three

admin — Sat, 22 Nov 2008 08:08:13 +0000

Here is a problem recently described to me. Apparently there is a more elegant solution (which may give more insight), but I don’t see it yet.

The problem: \(X, Y, Z\) are independent random variables uniformly distributed over [0,1]. What is the distribution of \((XY)^Z\)?

The conventional solution is to find the distribution of \(XY\) first, then of \((XY)^Z\).

The distribution of \(XY\) can be derived from its CDF \(F_{XY}(xy \le k)\), which is the total shaded area shown. The red curve is the function \(y=\frac{k}{x}\). This area is thus given by:

\(\int_k^1 \frac{k}{x} dx + k = k \ln(x) \vert_k^1 + k = -k \ln(k) + k\), for \(k>0\).

The PDF is the derivative of the above:

\(f_{XY}(k) = -\ln(k)\), for \(k>0\). The density is not well defined at \(k=0\).

The second part is to find the distribution in question. Here, the red curve is the function \(z=\frac{\ln(k)}{\ln(xy)}\). The CDF \(F_{(XY)^Z}((xy)^z \le k)\) is the total area to the left of the red curve. A column slice shaded in blue has probability per unit of \(z\) as labeled. Thus, the CDF is:

\(\int_0^k f_{XY}(v) dv (1-\frac{\ln(k)}{\ln(v)}) = \int_0^k -\ln(v) dv (1-\frac{\ln(k)}{\ln(v)})\)
\( = \int_0^k -\ln(v) dv + \int_0^k \ln(k) dv\)
\( = -v \ln(v) + v \vert_{\downarrow 0}^k + k \ln(k) = -k \ln(k) + k + k \ln(k) = k\), for \(k>0\).

For \(k=0\), we can fill in 0, because the minimum value of \((XY)^Z\) is 0, so it is the minimum of the support of its CDF.

So amazingly, \((XY)^Z\) has the CDF of (and is) a uniform distribution. How does that happen? What’s the intuition?

triangular pursuit

admin — Fri, 03 Oct 2008 13:01:53 +0000

Here’s a problem posed to me by a friend:

Consider an equilateral triangle ABC with edge length 1. At each vertex is an object that is capable of movement at exactly speed 1. Beginning at time 0, each of the three objects moves toward its initial adjacent neighbor object, as in a game of pursuit. Of course, by symmetry, the objects will meet at the incenter of ABC. The question: how far will they have traveled?

The paths followed by the objects at first seem non-trivial, but in fact turn out to be nice. We will come back to this, but for the problem, the exact paths are irrelevant. We just need to find the amount of time it takes the objects to meet. The key part of the solution is to realize that, due to symmetry, the three objects always define an equilateral triangle at any time. Furthermore, their direction of motion is always along the edges of such a triangle, at speed v=1.

What we have is then a shrinking and rotating equilateral triangle that eventually decays to a point. Let us characterize this process.

At time \(t\), let \(s(t)\) be the length traveled by one of the objects. Let \(l(t)\) be the edge length of the triangle. Let \(A(t)\) be the area of the triangle. Then we have several relationships:

\(\frac{ds}{dt}=1\)
\(A(t)=\frac{\sqrt{3}}{4} l(t)^2\), and \(\frac{dA}{dt} = \frac{\sqrt{3}}{2} l(t) \frac{dl}{dt}\)
By geometry, \(A(t+\Delta t) – A(t) = -3 (l(t) – \frac{ds}{dt} \Delta t) \frac{\sqrt{3}}{4} \frac{ds}{dt} \Delta t + o({\Delta t}^2)\), thus \(\frac{dA}{dt} = -3 \frac{\sqrt{3}}{4} l(t)\)

Combining: \(-\frac{3}{4}\sqrt{3} l(t) = \frac{\sqrt{3}}{2} l(t) \frac{dl}{dt}\)
\(\frac{dl}{dt} = -\frac{3}{2}\). Integrating with boundary conditions, we get \(-\frac{3}{2}t_f = -1\), \(t_f = \frac{2}{3}\), and since speed is 1, \(s(t_f) = t_f = \frac{2}{3}\). This is how far the object travels.

Finally, since the process is self-similar at every step along the path, the path of the object must be a logarithmic spiral. Furthermore, since the linear speed on the path is constant, the angular speed must be exponentially increasing (in angle) toward the center of the spiral.

To solve for this spiral explicitly, we have \(\frac{dr}{dt}=\frac{dr}{d\theta} \frac{d\theta}{dt} = \frac{\sqrt{3}}{3} \frac{dl}{dt} = -\frac{\sqrt{3}}{2}\).
\(\frac{dr}{d\theta} = -k \exp(-k \theta)\), so \(\frac{d\theta}{dt} = \frac{\sqrt{3}}{2k} \exp (k\theta)\), and \(\theta(t) = -\frac{1}{k} \ln(-\frac{\sqrt{3}}{2}t + C)\). Plugging into \(r(\theta) = \exp(-k\theta)\) and applying the boundary condition \(r(\theta_0) = \frac{\sqrt{3}}{3}\), we get \(r(t) = -\frac{\sqrt{3}}{2}t + \frac{\sqrt{3}}{3}\).
Next, \(\int_{\theta_0}^\infty r(\theta) d\theta = \frac{\sqrt{3}}{3k} = \frac{2}{3}\), so \(k=\frac{\sqrt{3}}{2}\) and \(\theta(t) = -\frac{2\sqrt{3}}{3} \ln(-\frac{\sqrt{3}}{2}t + \frac{\sqrt{3}}{3})\).

8 perfect shuffles

admin — Sat, 29 Dec 2007 09:50:56 +0000

It is said that 8 perfect shuffles of a standard deck of 52 cards results in an unshuffled deck. Why is that? Is it true only for 52 cards? What kind of shuffle is implied by this statement? Yesterday, we (high school math buddies) made pretty good sense of it.

Let’s first index positions in a standard deck by 0,1,…,51.

When we say “perfect shuffle,” it means to cut the deck in half, then interleave the cards from the two halves. There are actually two choices here, depending on which card we put on top. We can send card 0 to 0, card 26 to 1, card 1 to 2, …, card 25 to 50, card 51 to 51 (we’ll call this Shuffle 1), or we can send card 26 to 0, card 0 to 1, card 27 to 2, …, card 51 to 50, card 25 to 51 (we’ll call this Shuffle 2). Shuffle 1 is what we deal with, because the statement doesn’t work for Shuffle 2. But note that Shuffle 2 is just Shuffle 1 followed by exchanges of pairs.

So, the effect of shuffling on the first half of the deck (\(n \in \{ 0, \dots ,25 \}\)) is to send card \(n\) to \(2n\). Notice that a perfect shuffle looks the same if we flip the deck around, so the effect on the second half of the deck (\(n \in \{26, \dots ,51 \}\)) is to send card \(n\) to \(R(2R(n))\), where \(R(n) = 51-n\) is the operation that gives the reversed index of card \(n\).

\(R(2R(n)) = 51 – 2(51-n) = 2n – 51\). Thus each shuffle sends card \(n\) to \(2n\), except that sometimes, we subtract 51 from the answer. (That “sometimes” is when the card \(n\) is in the second half of the deck.) The point is, no matter what, the card \(n\) is again sent to something in the range 0,…,51, so in fact it is sent to \(2n\ (mod\ 51)\) (if we ignore 51, which is basically 0, and is always sent to itself.)

If we shuffle \(s\) times, we get that card \(n\) is sent to \(2^s n\ (mod\ 51)\).

Immediately, we see that 8 shuffles works because \(2^8 n \equiv n\ (mod\ 51)\), and

In general, \(s\) shuffles brings a deck of size \(d\) back to its original order if and only if \((2^s – 1) n \equiv 0\ (mod\ d-1)\) for all \(n\in \{0,\dots,d-1\}\), that is, if and only if \(d-1\) divides \(2^s – 1\).

The last statement is because if \((2^s -1) n \equiv 0\ (mod\ d-1)\) for \(n=1\), then it must be true for all \(n>1\). (The case \(n=0\) is trivial.)

There are more interesting observations. If \(s\) shuffles brings a deck to its original order, then surely \(ks\) shuffles does that, too. And indeed, if \(d-1\) divides \(2^s – 1\), then it also divides \(2^{ks} – 1\), which factors as \((2^s – 1)(2^{(k-1)s} + 2^{(k-2)s} + \cdots + 2^s + 1)\).

In another observation, we can explicitly write the card position after \(s\) shuffles as

\(2(\dots 2(2n – 51 b_1(n)) – 51 b_2(n) \dots) – 51 b_s(n)\ \ \ \ (*)\)

where \(b_i(n)\) is an indicator bit that is 1 if, entering the \(i\)th shuffle, the card is in the second half of the deck, and 0 otherwise.

Expanding \((*)\), we get \(2^s n – 51 (2^{s-1} b_1(n) + 2^{s-2} b_2(n) \cdots 2^1 b_{s-1}(n) + 2^0 b_s(n)) \)

Now, define \(b(n) \equiv 2^{s-1} b_1(n) + 2^{s-2} b_2(n) \cdots 2^1 b_{s-1}(n) + 2^0 b_s(n)\). Clearly, \(\overline{b_1(n) b_2(n) \dots b_s(n)}\) is the binary expansion of \(b(n)\).

So \(s\) shuffles brings the deck back to its original order if and only if \(2^s n – 51 b(n) = n\) for all \(n\), i.e. \(b(n) = \frac{2^s – 1}{51}n\). But we know that 51 divides \(2^s – 1\) in our example, so \(\frac{2^s – 1}{51}\) is an integer, and in general

If \(s\) shuffles brings a deck of size \(d\) to identity, then the integer given by \(\frac{2^s – 1}{d – 1} \equiv b(1)\) defines the function \(b(n)\) for all \(n\) by \(b(n) = n b(1)\), where the binary expansion of \(b(n)\) gives which half the card \(n\) is in for all \(s\) shuffles.

In the case of \(s=8\) and \(d=52\), \(b(1) = 5\).

Now we can also find the minimum number of shuffles to return various deck sizes to identity, by looking at the factors of \(2^s-1\):

\(s\):	\(d\) for which \(s\) is minimal
1:	2
2:	4
3:	8
4:	6, 16
5:	32
6:	10, 22, 64
7:	128
8:	18, 52, 86, 256

etc…

Of course, every deck of size \(d\) can be returned to its original order in some number of shuffles: at most \(d!\) (and if fewer, then the number of shuffles divides \(d!\)), this is because the automorphism group of \(d\) elements is at most size \(d!\). This knowledge is of no practical utility, since \(d!\) is huge, but it does imply that any odd positive integer \(d-1\) divides \(2^{d!} – 1\). And, it highlights the fact that some deck sizes are very special: any \(d\) that is a power of 2 only takes \(\log d\) shuffles, surely; but especially 52, for such a large deck size that isn’t a power of 2, the fact that it only takes 8 shuffles is quite amazing.

I do wonder if there are any “bad” deck sizes \(d\) that do take up to \(d!\) shuffles to return to the original order. A search of the factors of Mersenne numbers may be in order.

matrix factorization

admin — Thu, 02 Nov 2006 11:15:47 +0000

This question was floating around: can a determinant-1 diagonal matrix be factored into a product of triangular matrices with 1-diagonals, and how?

At least for the 2-by-2 matrix, this is answered in the affirmative here (Section 8.3). Here is a recapitulation.

First, we note that the product of two lower triangular matrices is lower triangular, and the product of two upper triangular matrices is also upper triangular. So if the decomposition is possible, then we should also be able to write the decomposition as an alternating product of lower and upper triangular matrices. Since the identity matrix is both lower and upper triangular, let’s just assume we begin with a lower triangular matrix, that is, is there a decomposition

\({\bf A} = {\bf L}_1 {\bf U}_1 \dots {\bf L}_k {\bf U}_k\)

where \({\bf L}_i\) and \({\bf U}_i\) are lower and upper triangular matrices, respectively, and \({\bf A}\) is diagonal.

Consider the case for a 2-by-2 matrix \({\bf A}={\bf diag}(\lambda, 1/\lambda)\), \(\lambda\neq1\). In general, a triangular matrix transforms coordinate bases in a restricted way that takes place in a subspace. For a 2-by-2 matrix this is even more simplified. Under transformation by a lower triangular matrix, any vector is only changed in its second component. The standard basis vector \(e_1=(1,0)^T\) is changed in its second component. The standard basis vector \(e_2=(0,1)^T\) is unchanged. Analogously, under transformation by an upper triangular matrix, any vector is only changed in its first component. \(e_1\) is unchanged while \(e_2\) is changed in its first component. On the other hand, a diagonal matrix scales the standard vectors.

Put together, we see that if the decomposition is possible, then in general, a product of at least three nontrivial lower-upper matrices is required to transform \(e_1\) to \((\lambda, 0)^T\) as well as to transform \(e_2\) to \((0, 1/\lambda)^T\). {some stuff here} On the other hand, since the right-most transformation does nothing to one of the standard vectors (but does change the other one), at least four nontrivial lower-upper matrices are required to correctly transform both standard vectors.

But any general 2-by-2 determinant-1 matrix can be written as a product of no more than four triangular matrices:

Let \({\bf A}=\left( \begin{array}{cc}x & y \\ z & w \end{array}\right)\). Let \({\bf L}_1=\left( \begin{array}{cc}1 & 0 \\ a & 1 \end{array}\right)\), \({\bf U}_1=\left( \begin{array}{cc}1 & b \\ 0 & 1 \end{array}\right)\), \({\bf L}_2=\left( \begin{array}{cc}1 & 0 \\ c & 1 \end{array}\right)\), and \({\bf U}_2=\left( \begin{array}{cc}1 & d \\ 0 & 1 \end{array}\right)\).

\(({\bf L}_1{\bf U}_1)({\bf L}_2{\bf U}_2)=\left( \begin{array}{cc}1 & b\\a & 1+ab\end{array}\right)\left( \begin{array}{cc}1 & d\\c & 1+cd\end{array}\right)\), and gives the following constraints:

(0): \(xw-yz=1\)
(1): \(1+bc=x\)
(2): \(b+d+bcd=y\)
(3): \(a+c+abc=z\)
(4): \(ad+1+ab+cd+abcd=w\)

(5): substitute (1) into (2) \(b=y-xd\)
For \(x\neq1\):
(6): substitute (5) into (1) \(c=\frac{x-1}{y-xd}\)
(7): substitute (0), (5), (6) into (3) \(a=\frac{yz-xzd-x+1}{x(y-xd)}=\frac{w-zd-1}{y-zd}\)
(8): substitute (0), (5), (6), (7) into (4): tautological, (4) is automatically satisfied for all \(d\) s.t. \(y\neq xd\).
For \(x=1\):
\(b=0\) or \(c=0\). W.l.o.g., put \(b=0\). Then \(d=y\). (3) and (4) then become linear in \(a\) and \(c\). Any \(a\) and \(c\) that sum to \(z\) work.

So it takes exactly four for a diagonal 2-by-2 determinant-1 non-identity matrix.

For larger matrices, whether it is possible to decompose, and if so, how to do so, are not clear. Group properties of \(SL_n(\mathbb R)\) may be of use. This will be revisited.