Section 6.2 Orthogonal projections
Subsection 6.2.1 Orthogonal complements
Definition 6.2.1.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{.}\) The orthogonal complement of \(S\text{,}\) written as \(S^\perp\text{,}\) is the set of vectors that are orthogonal to all vectors in \(S\text{.}\) That is, for any vector \(\vec{w}\text{,}\) we have that \(\vec{w}\) is in \(S^\perp\) if and only if \(\vec{w} \cdot \vec{v} = 0\) for every \(\vec{v}\) in \(S\text{.}\)
Before we give an example of finding \(S^\perp\) for a given \(S\) it is useful to list some properties of orthogonal complements; part (4) of the following theorem is especially useful.
Theorem 6.2.2.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{.}\) Then:
\(S^\perp\) is also a subspace of \(\mathbb{R}^n\text{.}\)
\((S^\perp)^\perp = S\text{.}\)
The only vector that is in both \(S\) and \(S^\perp\) is \(\vec{0}\text{.}\)
If \(S = \SpanS(\vec{v_1}, \ldots, \vec{v_k})\) then for any vector \(\vec{w}\text{,}\) the vector \(\vec{w}\) is in \(S^\perp\) if and only if \(\vec{w} \cdot \vec{v_j} = 0\) for all \(j\text{.}\)
Example 6.2.3.
Let \(L\) be the line in \(\mathbb{R}^2\) described by \(y = 2x\text{.}\) Then \(L\) is a subspace of \(\mathbb{R}^2\text{.}\) To describe \(L^\perp\) it is useful to notice that \(L = \SpanS\left(\begin{bmatrix}1\\2\end{bmatrix}\right)\text{.}\) Thus, by Theorem 6.2.2, a vector \(\begin{bmatrix}x\\y\end{bmatrix}\) is in \(L^\perp\) if and only if \(\begin{bmatrix}x\\y\end{bmatrix} \cdot \begin{bmatrix}1\\2\end{bmatrix} = 0\text{.}\) Expanding out this dot product we get the equation \(x+2y=0\text{,}\) or equivalently, \(y = -\frac{1}{2}x\text{.}\)
Thus the orthogonal complement of the line \(y=2x\) in \(\mathbb{R}^2\) is the line \(y=-\frac{1}{2}x\text{,}\) as you might have expected from your prior knowledge of lines in \(\mathbb{R}^2\text{.}\)
Example 6.2.4.
Let \(P\) be the plane \(x-2y+3z=0\) in \(\mathbb{R}^3\text{.}\) Then \(P = \SpanS\left(\begin{bmatrix}2\\1\\0\end{bmatrix}, \begin{bmatrix}-3\\0\\1\end{bmatrix}\right)\text{.}\) Therefore the vectors in \(P^\perp\) are those \(\begin{bmatrix}x\\y\\z\end{bmatrix}\) where \(\begin{bmatrix}x\\y\\z\end{bmatrix}\cdot\begin{bmatrix}2\\1\\0\end{bmatrix} = 0 = \begin{bmatrix}x\\y\\z\end{bmatrix}\cdot\begin{bmatrix}-3\\0\\1\end{bmatrix}\text{.}\) Expanding the dot product gives two equations:
If we write these equations as \(y = -2x\) and \(z=3x\text{,}\) we see that \(P^\perp\) is vectors of the form:
That is, we have shown that \(P^\perp\) is the line with direction vector \(\begin{bmatrix}1\\-2\\3\end{bmatrix}\text{.}\) From our work in Section 2.3 we can see that \(\begin{bmatrix}1\\-2\\3\end{bmatrix}\) is the normal vector of our plane \(P\text{.}\)
The two examples above let us see the connection between orthogonal complements and the material we saw about lines and planes in Section 2.3. Specifically, in \(\mathbb{R}^3\) the orthogonal complement of a plane through \(\vec{0}\) is the line through \(\vec{0}\) that is normal to the plane, and vice versa. When we studied lines and planes we noticed that a plane in \(\mathbb{R}^3\) has two direction vectors, while it has only one normal direction. Another way of saying that is that we noticed that if \(P\) is a plane in \(\mathbb{R}^3\) then \(\dim(P) + \dim(P^\perp) = 3\text{.}\) In Theorem 6.2.14 we will show that these observations are not just applicable to lines and planes in \(\mathbb{R}^3\text{,}\) but in fact apply to any subspace of any \(\mathbb{R}^n\text{.}\)
In Section 4.6 we saw that every matrix brings with it several important subspaces, namely the row space, the column space, and the null space. These subspaces are related to each other through orthogonality relations. We first need a lemma stating the connection between matrix multiplication and the dot product, then we will be ready for the theorem about subspaces associated to matrices.
Lemma 6.2.5.
Let \(A\) be an \(m \times n\) matrix with rows \(A_1, \ldots, A_m\) (written as column vectors), and let \(\vec{v}\) be a vector in \(\mathbb{R}^n\text{.}\) Then
Proof.
Let \(\vec{v} = \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix}\text{,}\) and let \(A = \begin{bmatrix}a_{1,1} \amp a_{1, 2} \amp \cdots \amp a_{1,n} \\ a_{2, 1} \amp a_{2,2} \amp \cdots \amp a_{2,n} \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ a_{m,1} \amp a_{m,2} \amp \cdots \amp a_{m,n}\end{bmatrix}\text{.}\) For any \(j\text{,}\) with \(1 \leq j \leq m\text{,}\) the \(j\)th row of \(A\text{,}\) written as a column vector, is thus \(A_j = \begin{bmatrix} a_{j,1} \\ a_{j,2} \\ \vdots \\ a_{j,n}\end{bmatrix}\text{.}\)
By definition of matrix multiplication,
and the \(j\)th component of \(A\vec{v}\) is therefore
Theorem 6.2.6.
For any matrix \(A\text{:}\)
\(\row(A)^\perp = \NullSp(A)\text{.}\)
\(\col(A)^\perp = \NullSp(A^t)\text{.}\)
Proof.
Suppose that the rows of \(A\) are \(A_1, \ldots, A_m\) (written as column vectors), so \(\row(A) = \SpanS(A_1, \ldots, A_m)\text{.}\) Consider any vector \(\vec{v}\) in \(\mathbb{R}^m\text{.}\) We have \(\vec{v}\) in \(\row(A)^\perp\) if and only if \(A_j \cdot \vec{v} = 0\) for every \(j\) (by Theorem 6.2.2), and by Lemma 6.2.5 this happens if and only if \(A\vec{v} = \vec{0}\text{,}\) i.e., if and only if \(\vec{v}\) is in \(\NullSp(A)\text{.}\) This proves that \(\row(A)^\perp = \NullSp(A)\text{.}\)
For the second statement, applying the first statement to \(A^t\) we have \(\col(A)^\perp = \row(A^t)^\perp = \NullSp(A^t)\text{.}\)
Example 6.2.7.
Let \(S = \SpanS\left(\begin{bmatrix}1\\2\\1\end{bmatrix}, \begin{bmatrix}2\\0\\-3\end{bmatrix}\right)\text{.}\) Find a basis for \(S^\perp\text{.}\)
Solution.We could use exactly the technique that we used in Example 6.2.4, but now we have a more efficient way. Let \(A = \begin{bmatrix}1 \amp 2 \amp 1 \\ 2 \amp 0 \amp -3\end{bmatrix}\text{.}\) Then \(S = \row(A)\text{,}\) so \(S^\perp = \row(A)^\perp = \NullSp(A)\text{.}\) We already know how to find a basis for \(\NullSp(A)\) - just row reduce!
Therefore vectors in \(\NullSp(A)\) have the form
and so \(\left\{\begin{bmatrix}3/2\\-5/4\\1\end{bmatrix}\right\}\) is a basis for \(S^\perp\text{.}\) If you don't like fractions we could always re-scale our answer and obtain the basis \(\left\{\begin{bmatrix}6\\-5\\4\end{bmatrix}\right\}\) instead.
Subsection 6.2.2 Projections and the orthogonal decomposition
In Section 2.2 we worked out a formula for the orthogonal projection of a vector \(\vec{v}\) on a vector \(\vec{w}\text{,}\) and described \(\proj_\vec{w}(\vec{v})\) as the closest vector to \(\vec{v}\) in the direction of \(\vec{w}\text{,}\) or the component of \(\vec{v}\) lying along \(\vec{w}\text{.}\) A slight rephrasing is that \(\proj_{\vec{w}}(\vec{v})\) is the closest vector to \(\vec{v}\) that is in \(\SpanS(\vec{w})\text{.}\) This way of looking at it opens the door to asking, for any subspace \(S\text{,}\) what is the closest vector to \(\vec{v}\) that lies in \(S\text{?}\)
Definition 6.2.8.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{,}\) and let \(\{\vec{w_1}, \ldots, \vec{w_k}\}\) be an orthogonal basis for \(S\text{.}\) For any vector \(\vec{v}\) in \(\mathbb{R}^n\text{,}\) the orthogonal projection of \(\vec{v}\) on \(S\) is defined to be
The component of \(\vec{v}\) orthogonal to \(S\) is defined to be
Note 6.2.9.
From the formula it appears that \(\proj_S(\vec{v})\) depends on the orthogonal basis for \(S\) that we choose. Fortunately that is not the case - no matter which orthogonal basis you pick for \(S\) you will always get the same answer for \(\proj_S(\vec{v})\) for each given \(\vec{v}\text{.}\)
Note 6.2.10.
For the formula defining \(\proj_S(\vec{v})\) to work, we must use an orthogonal basis for \(S\) - if we tried to use the same formula with some basis that is not an orthogonal basis it will not correctly calculate \(\proj_S(\vec{v})\text{.}\) Fortunately, we already know how to produce an orthogonal basis for a subspace out of an arbitrary basis (Theorem 6.1.9).
While we won't prove it, it is a fact that \(\proj_S(\vec{v})\) is the closest vector in \(S\) to \(\vec{v}\text{,}\) in the sense that if \(\vec{w}\) is any vector in \(S\) then \(\norm{\vec{v} - \proj_S(\vec{v})} \leq \norm{\vec{v} - \vec{w}}\text{.}\)
Lemma 6.2.11.
For any subspace \(S\text{,}\) and any \(\vec{v}\text{,}\) the vector \(\perr_S(\vec{v})\) is in \(S^\perp\text{.}\)
Proof.
Suppose that \(\vec{w_1}, \ldots, \vec{w_k}\) form an orthogonal basis for \(S\text{.}\) To show that \(\perr_S(\vec{v})\) is in \(S^\perp\) it suffices (by Theorem 6.2.2) to prove that \(\perr_S(\vec{v}) \cdot \vec{w_j} = 0\) for every \(j\text{.}\) Notice that for any \(i\) the vector \(\proj_{\vec{w_i}}(\vec{v})\) is a scalar multiple of \(\vec{w_i}\text{.}\) Since \(\vec{w_j} \perp \vec{w_i}\) when \(i \neq j\) this also implies that \(\vec{w_j} \perp \proj_{\vec{w_i}}(\vec{v})\) for all \(i \neq j\text{.}\) Now we calculate:
Theorem 6.2.12. Orthogonal Decomposition Theorem.
Suppose that \(S\) is a subspace of \(\mathbb{R}^n\) and \(\vec{v}\) is a vector in \(\mathbb{R}^n\text{.}\) Then it is possible to express \(\vec{v}\) as the sum of a vector in \(S\) and a vector in \(S^\perp\text{,}\) and moreover the only way to do so is \(\vec{v} = \proj_S(\vec{v}) + \perr_S(\vec{v})\text{.}\)
Proof.
It follows from Definition 6.2.8 and Lemma 6.2.11 that \(\proj_S(\vec{v})\) is in \(S\) and \(\perr_S(\vec{v})\) is in \(S^\perp\text{,}\) and
so we have proved that this expression does work. It remains to be proved that there is no other way to write \(\vec{v}\) as the sum of a vector in \(S\) and a vector in \(S^\perp\text{.}\)
Suppose that \(\vec{v} = \vec{w_1} + \vec{z_1}\) and \(\vec{v} = \vec{w_2} + \vec{z_2}\text{,}\) where \(\vec{w_1}\) and \(\vec{w_2}\) are in \(S\) and \(\vec{z_1}\) and \(\vec{z_2}\) are in \(S^\perp\text{.}\) Then \(\vec{w_1} - \vec{w_2} = \vec{z_2} - \vec{z_1}\text{.}\) The vector on the left side of this equation, \(\vec{w_1}-\vec{w_2}\text{,}\) is the difference of two vectors in \(S\text{,}\) so since \(S\) is a subspace we have that \(\vec{w_1} - \vec{w_2}\) is in \(S\text{.}\) A similar reasoning shows that \(\vec{z_2} - \vec{z_1}\) is in \(S^\perp\text{.}\) But since \(\vec{w_1} - \vec{w_2} = \vec{z_2} - \vec{z_1}\) this means that \(\vec{w_1} - \vec{w_2}\) is in both \(S\) and \(S^\perp\text{.}\) By Theorem 6.2.2 we conclude that \(\vec{w_1} - \vec{w_2} = \vec{0}\text{,}\) so \(\vec{w_1} = \vec{w_2}\text{.}\) Similarly, \(\vec{z_1} = \vec{z_2}\text{.}\)
Example 6.2.13.
Let \(P\) be the plane in \(\mathbb{R}^3\) with general equation \(2x+y-3z=0\text{,}\) and let \(\vec{v} = \begin{bmatrix}1\\2\\3\end{bmatrix}\text{.}\) Write \(\vec{v}\) as the sum of a vector in \(P\) and a vector orthogonal to \(P\text{.}\)
Solution.By Theorem 6.2.12 the only way to answer this question is with \(\vec{v} = \proj_P(\vec{v}) + \perr_P(\vec{v})\text{.}\) To find \(\proj_P(\vec{v})\) we need to start with an orthogonal basis for \(P\text{.}\) To find one, we start by re-writing the equation for \(P\) as \(y=-2x+3z\text{.}\) Substituting this in to a vector \(\begin{bmatrix}x\\y\\z\end{bmatrix}\) shows us that \(\left\{\begin{bmatrix}1\\-2\\0\end{bmatrix}, \begin{bmatrix}0\\3\\1\end{bmatrix}\right\}\) is a basis for \(P\text{.}\) Sadly, this basis is not an orthogonal basis, so we use the Gram-Schmidt Algorithm to produce an orthogonal basis. We get \(\vec{w_1} = \begin{bmatrix}1\\-2\\0\end{bmatrix}\) and
As usual, rescaling is harmless, so we multiply by \(5\) and take \(\vec{w_2} = \begin{bmatrix}6\\3\\5\end{bmatrix}\text{.}\)
Now we can use the definition of \(\proj_P(\vec{v})\) to calculate:
Here we must be careful - it would be incorrect to rescale this vector. We are just stuck with the fractions. Finally, we calculate:
We now have the desired way of writing \(\begin{bmatrix}1\\2\\3\end{bmatrix}\) as a sum of a vector in \(P\) (the first vector below) and a vector in \(P^\perp\) (the second vector below).
The next result is very closely related to the Rank-Nullity Theorem. In fact, we will give two proofs: One using the Rank-Nullity Theorem, and the other using the Orthogonal Decomposition Theorem.
Theorem 6.2.14.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{.}\) Then \(\dim(S) + \dim(S^\perp) = n\text{.}\)
Proof.
This will be the proof using the Rank-Nullity Theorem.
Suppose that \(\vec{v_1}, \ldots, \vec{v_k}\) form a basis for \(S\text{.}\) Let \(A\) be the matrix with rows \(\vec{v_1}, \ldots, \vec{v_k}\text{,}\) so \(A\) is a \(k \times n\) matrix. Then \(S = \row(A)\text{,}\) so \(\dim(S) = \dim(\row(A)) = \rank(A)\) by Theorem 4.6.5. Also, by Theorem 6.2.6, \(S^\perp = \row(A)^\perp = \NullSp(A)\text{,}\) so \(\dim(S^\perp) = \nullity(A)\text{.}\) Thus, by Theorem 4.6.16 we get
Proof.
This will be the proof using the Orthogonal Decomposition Theorem.
Let \(\{\vec{v_1}, \ldots, \vec{v_k}\}\) be an orthogonal basis for \(S\text{,}\) and let \(\{\vec{w_1}, \ldots, \vec{w_\ell}\}\) be an orthogonal basis for \(S^\perp\) (we know that these exist by the Gram-Schmidt Algorithm). We claim that \(\{\vec{v_1}, \ldots, \vec{v_k},\vec{w_1}, \ldots, \vec{w_\ell}\}\) is a basis for \(\mathbb{R}^n\text{.}\) Notice that once we prove this claim we are done, since then we get \(n = k+\ell = \dim(S) + \dim(S^\perp)\text{.}\)
The set \(\{\vec{v_1}, \ldots, \vec{v_k},\vec{w_1}, \ldots, \vec{w_\ell}\}\) is an orthogonal set, and it does not contain \(\vec{0}\text{,}\) so by Theorem 6.1.5 it is a linearly independent set.
Given any vector \(\vec{x}\) in \(\mathbb{R}^n\text{,}\) by the Orthogonal Decomposition Theorem we can write \(\vec{x} = \proj_{S}(\vec{x}) + \perr_{S}(\vec{x})\text{,}\) with \(\proj_S(\vec{x})\) in \(S\) and \(\perr_S(\vec{x})\) in \(S^\perp\text{.}\) In particular, this means that \(\vec{x} \in \SpanS(\vec{v_1}, \ldots, \vec{v_k}, \vec{w_1}, \ldots, \vec{w_\ell})\text{.}\) Therefore \(\SpanS(\vec{v_1}, \ldots, \vec{v_k}, \vec{w_1}, \ldots, \vec{w_\ell}) = \mathbb{R}^n\text{.}\)
Exercises 6.2.3 Exercises
1.
Consider the plane \(P\) with general equation \(2x-3y+z=0\text{.}\) Let \(\vec{v} = \begin{bmatrix}3\\4\\1\end{bmatrix}\text{.}\) Find \(\operatorname{perp}_P(\vec{v})\text{.}\)
2.
Let \(W\) be the subspace of \(\mathbb{R}^4\) consisting of vectors \(\begin{bmatrix}x\\y\\z\\w\end{bmatrix}\) satisfying \(x+y+z=0\) and \(x=3w\text{.}\) Find a basis for \(W^\perp\text{.}\)
3.
Suppose that \(S\) is a subspace of \(\mathbb{R}^n\text{,}\) and let \(T : \mathbb{R}^n \to \mathbb{R}^n\) be defined by \(T(\vec{v}) = \operatorname{proj}_S(\vec{v})\text{.}\) Prove that \(T\) is a linear transformation.
4.
Suppose that \(S\) is a subspace of \(\mathbb{R}^n\text{.}\) Show that for every vector \(\vec{v}\) in \(\mathbb{R}^n\text{,}\)