implicit function problem

admin — Wed, 14 May 2014 19:50:06 +0000

From fakalin. If \(F(x, y, z)\) is a function of 3 variables, and the relation \(F(x, y, z) = 0\) defines each of the variables in terms of the other two, namely \(x = f(y, z)\), \(y = g(x, z)\), and \(z = h(x, y)\), then show that:

\(\left(\frac{\partial x}{\partial y}\right) \left(\frac{\partial y}{\partial z}\right) \left(\frac{\partial z}{\partial x}\right) = -1\).

One may obtain a vague feeling that e.g. “\(\frac{\partial x}{\partial y}\)” (more properly \(\frac{\partial f}{\partial y}\)) should be somewhat related to \((\frac{\partial F}{\partial x})^{-1} \frac{\partial F}{\partial y}\). Indeed, a version of the Implicit Function Theorem will provide that \(\frac{\partial f}{\partial y}(y,z) = (-1)\left(\frac{\partial F}{\partial x}(f(y,z),y,z)\right)^{-1} \frac{\partial F}{\partial y}(f(y,z),y,z)\), and similarly for the other terms, thus immediately solving the problem. However, it does not give much insight into where the \((-1)\) term comes from.

Instead we look at the problem as two ways of defining a manifold, in this case a 2-manifold in 3-space. The explicit definitions, picking one w.l.o.g., \(\{(x,y,z): x = f(y,z), y = \textbf{id}(y), z = \textbf{id}(z)\}\) we shall call its generator form (or perhaps, “tangent form”). The implicit definition \(\{(x,y,z): F(x,y,z)=c\}\) we shall call its constraint form (or perhaps, “co-tangent” form).

Recall that a differential manifold has a tangent space at each point, which can be obtained from both definitions. Specialized to this problem, we linearize around \((x,y,z)\) by taking derivatives. Thus the generator form defines the tangent space directly by its bases:

\(\left[\begin{array}{c}dx\\dy\\dz\end{array}\right] =
\left[\begin{array}{cc}
\frac{\partial f}{\partial y} & \frac{\partial f}{\partial z} \\
1 & 0 \\
0 & 1 \end{array} \right]
\left[\begin{array}{c}dy\\dz\end{array}\right]
\)

(analogously for \(g\) and \(h\)), while the constraint form defines the tangent space by its normal:

\(\left[\begin{array}{ccc}
\frac{\partial F}{\partial x} & \frac{\partial F}{\partial y} & \frac{\partial F}{\partial z}
\end{array} \right]
\left[\begin{array}{c}dx\\dy\\dz\end{array}\right] = 0
\)

From orthogonality, we obtain \(\frac{\partial F}{\partial x} \frac{\partial f}{\partial y} + \frac{\partial F}{\partial y} = 0\) and \(\frac{\partial F}{\partial x}\frac{\partial f}{\partial z} + \frac{\partial F}{\partial z}= 0\), and likewise for the other cases, so we see where the \((-1)\) term comes from.

What is more, the fact that we used the terms “generator” and “constraint” is evocative of linear codes, and tangent spaces are indeed linear subspace codes. Note that the generator form has the “generator matrix”

\(\left[\begin{array}{c}
\nabla f \\
I_2\end{array}\right]\)

while the constraint form, if we divide through by \(\frac{\partial F}{\partial x}\), has the “constraint matrix”

\(\left[\begin{array}{cc}
I_1 & -\nabla f
\end{array}\right]\)

which is exactly the relationship between systematic generator and parity-check matrices of such codes.

tensors

admin — Sun, 16 Oct 2011 12:33:29 +0000

This has been a confusing topic, with half a dozen Wikipedia pages on the subject. Here I took some notes.

Tensors are sums of “products” of vectors. There are different kinds of vector products. The one used to build tensors is, naturally, the tensor product. In the Cartesian product of vector spaces \(V\times W\), the set elements are tuples like \((v,w)\) where \(v\in V, w\in W\). A tensor product \(v\otimes w\) is obtained by tupling the component bases rather than the component elements. If \(V\) has basis \(\{e_i\}_{i\in\{1,…,M\}}\) and \(W\) has basis \(\{f_j\}_{j\in\{1,…,N\}}\), then take \(\{(e_i,f_j)\}_{i\in\{1,…,M\},j\in\{1,…,N\}}\) as the basis of the tensor product space \(V\otimes W\). Then define the tensor product \(v\otimes w\) as

(1) \(\sum_{i,j} v_i w_j (e_i,f_j) \in V\otimes W\),

if \(v=\sum_i v_i e_i\) and \(w=\sum_j w_j f_j\). The entire tensor product space \(V\otimes W\) is defined as sums of these tensor products

(2) \(\{\sum_k v_k\otimes w_k | v_k\in V, w_k\in W\}\).

So tensors in a given basis can be represented as multidimensional arrays.

\(V\otimes W\) is also a vector space, with \(MN\) basis dimensions (c.f. \(V\times W\) with \(M+N\) basis dimensions). But additionally, it has internal multilinear structure due to the fact that it is made of component vector spaces, namely:

\((v_1+v_2)\otimes w = v_1\otimes w + v_2\otimes w\)
\(v\otimes (w_1+w_2) = v\otimes w_1 + v\otimes w_2\)
\(\alpha (v\otimes w) = (\alpha v)\otimes w = v\otimes (\alpha w)\)

Higher-order (n-th order) tensor products \(v_1\otimes v_2\otimes \cdots \otimes v_n\) are obtained by chaining in the obvious way, likewise for higher-order tensor product spaces \(V_1\otimes V_2\otimes \cdots \otimes V_n\). With this, concatenation of tensors are also defined, i.e. \(S_{i_1,…i_m} \in V_1\otimes \cdots \otimes V_m\) and \(T_{i_{m+1},…,i_n} \in V_{m+1}\otimes \cdots \otimes V_n\), then \(S_{i_1,…,i_m}\otimes T_{i_{m+1},…,i_n} = Z_{i_1,…,i_n} \in V_1\otimes \cdots \otimes V_n\). In other words, the indices are appended. This is essentially the Kronecker product, which generalizes the outer product.

However, usually when tensors are mentioned, the tensor product spaces under discussion are already specialized to those generated from a single base vector space \(V\) and its dual space \(V^*\), rather than from a collection of arbitrary vector spaces. In such a space \(P(m,n) = \overbrace{V\otimes \cdots \otimes V}^{m} \otimes \overbrace{V^*\otimes \cdots \otimes V^*}^{n}\), the component spaces (and their bases, indices, etc.) naturally belong to two groups, those from \(V\) are called contravariant, those from \(V^*\) are called covariant, and an (m,n)-tensor from \(P(m,n)\) is written \(T^{i_1,…,i_m}_{j_1,…,j_n}\).

This specialization allows the contraction of tensors to be defined. A contraction basically chooses one covariant vector component and one contravariant vector component from a tensor and applies the former as a functional on the latter, e.g., contracting \(T^{i_1,…,i_m}_{j_1,…,j_n} = v_{i_1}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_1} \otimes \cdots \otimes v^*_{j_n}\) on the pair of indices \(i_1\) and \(j_1\) gives \(Z^{i_2,…,i_m}_{j_2,…,j_n} = v^*_{j_1}(v_{i_1}) (v_{i_2}\otimes \cdots \otimes v_{i_m} \otimes v^*_{j_2} \otimes \cdots \otimes v^*_{j_n})\). \(v^*_{j_1}(v_{i_1})\) of course is an inner product that sums over the dimensions of the paired components. Contraction generalizes the trace operator. Combined with concatenation, this defines a tensor multiplication, such that if \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\in P(m,n)\) and \(T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\in P(p,q)\), then \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\) is the contraction of \(S^{r,i_2,…,i_m}_{s,j_2,…,j_n}\otimes T^{s,k_2,…,k_p}_{r,l_2,…,l_q}\) on all common indices that can be paired, e.g. \(r,s\). This is the so-called Einstein notation, and generalizes matrix multiplication.

The distinction of \(V\) vs. \(V^*\) also manifests in the change-of-basis rules for tensors, which inherit from the change-of-basis rules of the component vector spaces, which are:

contravariant change-of-basis rule: If \(B = [b_1\ b_2\ \cdots\ b_M]\) is a change-of-basis matrix, with the new basis \(\{b_i\}_{i\in \{1,…,M\}}\) written in the old basis as columns, then for a vector written in the old basis \(v\in V\) and the same vector written in the new basis \(\tilde{v}\in V\), \(v = B\tilde{v}\). Therefore, we have
(3) \(v \mapsto \tilde{v} = B^{-1}v\).
covariant change-of-basis rule: If additionally \(a^T\in V^*\) is a functional written in the old basis and \(\tilde{a}^T\in V^*\) is the same functional written in the new basis, then \(\forall v\in V: a^T v = \tilde{a}^T \tilde{v} = \tilde{a}^T B^{-1}v\). Therefore, we have
(4) \(a^T \mapsto \tilde{a}^T = a^T B\).

Combining (1), it can be shown that, for a change-of-basis tensor \(B = {B^{-1}}^{i_1}_{i_1}\cdots {B^{-1}}^{i_m}_{i_m}B^{j_1}_{j_1}\cdots B^{j_n}_{j_n}\), an (m,n)-tensor \(T^{i_1,…,i_m}_{j_1,…,j_n}\) has the change-of-basis rule \(T^{i_1,…,i_m}_{j_1,…,j_n} \mapsto BT^{i_1,…,i_m}_{j_1,…,j_n}\).

Okay, so what’s the point of these tensors? Basically, an (m,n)-tensor \(T^{i_1,…,i_m}_{j_1,…,j_n}\in P(m,n)\) represents a multilinear input-output relationship that takes \(n\) vectors as input and produces \(m\) vectors as output. If used “canonically” on an input \(X^{j_1,…,j_n}\in P(n,0)\), you get \(T^{i_1,…,i_m}_{j_1,…,j_n}X^{j_1,…,j_n} = Y^{i_1,…,i_m}\in P(m,0)\) as output. The contravariant input gets contracted with the covariant parts of the transformation tensor, and these drive the contravariant parts of the transformation tensor to produce the contravariant output.

(System diagrams: Rank is the minimal number of terms in a tensor. On the left, a rank-1 tensor transformation; on the right, a rank-\(K\) one. )

An example is linear transformations, which are (1,1)-tensors (1 vector in, 1 vector out). In array representation these would just be matrices. Any rank-\(K\) linear transformation \(T^v_a\) is decomposable into a \(K\)-term tensor \(\sum_{k=1}^K v_k\otimes a_k\), but 1-term (1,1)-tensors are outer products, so this is the matrix \(\sum_{k=1}^K v_k a^T_k\), and \(Y^v = T^v_a X^a\) is just \(y = \sum_k v_k (a^T_k x)\).

Most other “multilinear” operations on vectors (inner product, cross product, wedge product, determinant) can be written as tensors. For example, the inner product operation (2 vectors in, “0″ vectors out, i.e. scalar) is the (0,2) \(N\)-term Kronecker tensor \(\delta_{i_1 i_2}=\sum_{k=1}^N e^*_k\otimes e^*_k\) where \(\{e^*_k\}_{k\in\{1,…,N\}}\) are the standard basis of \(V^*\).

Some stuff » vector space

implicit function problem

tensors