Section 6.1 Orthogonality
You might have noticed that in the past few chapters we have not used the dot product for anything. In this chapter we explore how the new tools we have been developing interact with the dot product. In this chapter we return to our previous convention that "scalar" means "real number".
You might want to take a moment to review the definition of the dot product, and the notion of orthogonality of vectors, from Section 2.2.
Subsection 6.1.1 Orthogonal and orthonormal bases
Definition 6.1.1.
Let \(X\) be a finite collection of vectors in \(\mathbb{R}^n\text{,}\) say the vectors in \(X\) are \(\vec{v_1}, \ldots, \vec{v_k}\text{.}\)
The collection \(X\) is called an orthogonal set if whenever \(i \neq j\) we have \(\vec{v_i} \cdot \vec{v_j} = 0\text{.}\)
The collection \(X\) is called an orthonormal set if it is an orthogonal set, and for every \(i\) we also have \(\norm{\vec{v_i}} = 1\text{.}\)
Example 6.1.2.
The set consisting of \(\begin{bmatrix}1\\1\\1\end{bmatrix}\text{,}\) \(\begin{bmatrix}1\\0\\-1\end{bmatrix}\text{,}\) and \(\begin{bmatrix}0\\1\\-1\end{bmatrix}\) is not an orthogonal set, because the dot product of the last two vectors is not \(0\text{.}\)
The set consisting of \(\begin{bmatrix}1\\1\\1\end{bmatrix}\text{,}\) \(\begin{bmatrix}-2\\1\\1\end{bmatrix}\text{,}\) and \(\begin{bmatrix}0\\-1\\1\end{bmatrix}\) is an orthogonal set. To verify this you need to calculate the dot product of every pair of vectors listed here, and see that they all give \(0\text{.}\) This set is not orthonormal, because \(\norm{\begin{bmatrix}1\\1\\1\end{bmatrix}} = \sqrt{3} \neq 1\text{.}\)
The set consisting of \(\begin{bmatrix}1/\sqrt{2} \\ -1\sqrt{2}\end{bmatrix}\) and \(\begin{bmatrix}1/\sqrt{2} \\ 1/\sqrt{2}\end{bmatrix}\) is an orthonormal set. To verify this you need to check that it is an orthogonal set, and that each vector in the set has length \(1\text{,}\) both of which are true.
Note 6.1.3.
If \(\vec{v_1}, \ldots, \vec{v_k}\) are an orthogonal set of vectors, and none of them is \(\vec{0}\text{,}\) then \(\frac{1}{\norm{\vec{v_1}}}\vec{v_1}, \ldots, \frac{1}{\norm{\vec{v_k}}}\vec{v_k}\) is an orthonormal set with the same span as \(\vec{v_1}, \ldots, \vec{v_k}\text{.}\)
Definition 6.1.4.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{.}\) An orthogonal basis for \(S\) is a basis for \(S\) that is also an orthogonal set. An orthonormal basis for \(S\) is a basis for \(S\) that is also an orthonormal set.
As this chapter develops we will see that if we have a subspace of \(\mathbb{R}^n\) then it is often more convenient to work with an orthonormal basis for that subspace than it is to work with an arbitrary basis. In principle, to check that \(B\) is an orthogonal basis for a subspace \(S\) we need to check three things:
\(B\) is an orthogonal set,
\(B\) is a linearly independent set,
\(\SpanS(B) = S\text{.}\)
Fortunately, our next result tells us that if we check (1) we do not also need to check (2).
Theorem 6.1.5.
Suppose that \(\vec{v_1}, \ldots, \vec{v_k}\) form an orthogonal set of vectors in \(\mathbb{R}^n\text{.}\) Then \(\vec{v_1}, \ldots, \vec{v_k}\) are linearly independent if and only if \(\vec{v_j} \neq \vec{0}\) for all \(j\text{.}\)
Proof.
Suppose first that \(\vec{v_1}, \ldots, \vec{v_k}\) are linearly independent. Then \(\vec{v_j} \neq \vec{0}\) for all \(j\) by Example 3.2.3.
Now for the harder direction, suppose that \(\vec{v_j} \neq \vec{0}\) for all \(j\text{.}\) Suppose that we have scalars \(c_1, \ldots, c_k\) such that \(c_1\vec{v_1} + \cdots + c_k\vec{v_k} = \vec{0}\text{.}\) Taking the dot product with \(\vec{v_1}\) on both sides of the equation we get:
In the calculation above the terms that became did so because of the assumption that \(X\) is an orthogonal set. Now \(\vec{v_1} \neq \vec{0}\) implies that \(\norm{\vec{v_1}}^2 \neq 0\text{,}\) so we must have \(c_1 = 0\text{.}\) Repeating this argument using the dot product with \(\vec{v_2}\) instead of \(\vec{v_1}\) shows that \(c_2=0\text{,}\) and so on. Thus \(c_1=c_2=\cdots = c_k = 0\text{,}\) as required to show that \(\vec{v_1}, \ldots, \vec{v_k}\) are linearly independent.
Recall that if \(\vec{v_1}, \ldots, \vec{v_k}\) form a basis for a subspace \(S\text{,}\) and if \(\vec{w}\) is a vector in \(S\text{,}\) then there are unique scalars \(c_1, \ldots, c_k\) such that \(\vec{w} = c_1\vec{v_1} + \cdots + c_k\vec{v_k}\text{.}\) In general, to find \(c_1, \ldots, c_k\) involves solving a system of linear equations. When the basis is orthogonal there is an easier way.
Theorem 6.1.6.
Let \(S\) be a subspace of \(\mathbb{R}^n\text{,}\) and suppose that \(\vec{v_1}, \ldots, \vec{v_k}\) form an orthogonal basis for \(S\text{.}\) Then for any vector \(\vec{w}\) in \(S\text{,}\) the unique way to write \(\vec{w} = c_1\vec{v_1} + \cdots + c_k\vec{v_k}\) is with, for each \(j\text{,}\) the value \(c_j = \frac{\vec{w} \cdot \vec{v_j}}{\vec{v_j} \cdot \vec{v_j}}\text{.}\)
Proof.
The idea of the proof is very similar to the proof of Theorem 6.1.5. Suppose that we have written \(\vec{w} = c_1\vec{v_1} + \cdots + c_k\vec{v_k}\text{.}\) If we consider any \(j\text{,}\) and take the dot product of both sides of the equation with \(\vec{v_j}\text{,}\) then we get:
All of the other terms in the calculation are \(0\) because the vectors \(\vec{v_1}, \ldots, \vec{v_k}\) are assumed to be orthogonal. Now since \(\vec{v_1}, \ldots, \vec{v_k}\) are a basis for \(S\) they must be linearly independent, and in particular none of these vectors can be \(\vec{0}\text{.}\) Thus \(\vec{v_j} \cdot \vec{v_j} \neq 0\text{,}\) so we can divide to conclude \(c_j = \frac{\vec{w} \cdot \vec{v_j}}{\vec{v_j} \cdot \vec{v_j}}\text{,}\) as desired.
Example 6.1.7.
Let \(\vec{v_1} = \begin{bmatrix}1\\0\\1\\0\end{bmatrix}\text{,}\) \(\vec{v_2} = \begin{bmatrix}1\\1\\-1\\0\end{bmatrix}\text{,}\) and \(\vec{v_3} = \begin{bmatrix}0\\0\\0\\1\end{bmatrix}\text{,}\) and let \(S = \SpanS(\vec{v_1}, \vec{v_2}, \vec{v_3})\text{.}\) Since \(\{\vec{v_1}, \vec{v_2}, \vec{v_3}\}\) is an orthogonal set of non-zero vectors, and it certainly spans \(S\text{,}\) by Theorem 6.1.5 it is an orthogonal basis for \(S\text{.}\)
Now consider the vector \(\vec{w} = \begin{bmatrix}1\\-2\\5\\2\end{bmatrix}\text{.}\) For the purposes of this example, take it as a given fact that \(\vec{w}\) is in \(S\text{.}\) Since \(\vec{w}\) is in \(S\) it is possible to write
Without using anything about orthogonality we can find \(c_1, c_2, c_3\) by converting this vector equation into a system of four linear equations in variables \(c_1, c_2, c_3\text{,}\) and then solving. Doing that, we have
Thus we see that \(c_1 = 3\text{,}\) \(c_2 = -2\text{,}\) and \(c_3 = 2\text{.}\)
On the other hand, by Theorem 6.1.6,
We obtained the same answer using very different techniques. Which one is more appropriate in a particular situation depends on what data is given. In this example both options work well.
Note 6.1.8.
If our basis \(\vec{v_1}, \ldots, \vec{v_k}\) is not just an orthogonal basis, but is actually an orthonormal basis, then the denominators in Theorem 6.1.6 are all \(1\text{,}\) which makes the formulas for the coefficients in that case even simpler.
Subsection 6.1.2 The Gram-Schmidt algorithm
Given a subspace \(S\) of \(\mathbb{R}^n\text{,}\) we know from Theorem 3.4.10 that there is a basis for \(S\text{.}\) The next theorem gives us a method for transforming a basis for \(S\) into an orthogonal, or even orthonormal, basis for \(S\text{.}\) We omit the proof, though you are encouraged to look back at the formulas in the algorithm after you read about orthogonal decompositions in Section 6.2.
Theorem 6.1.9. Gram-Schmidt Algorithm.
Let \(\vec{v_1}, \ldots, \vec{v_k}\) be a linearly independent collection of vectors in \(\mathbb{R}^n\text{.}\) Define new vectors as follows:
Then \(\vec{w_1}, \ldots, \vec{w_k}\) form an orthogonal set of vectors, and for every \(j\) we have
In particular, if \(\vec{v_1}, \ldots, \vec{v_k}\) form a basis for a subspace \(S\text{,}\) then \(\vec{w_1}, \ldots, \vec{w_k}\) form an orthogonal basis for \(S\text{,}\) and \(\frac{1}{\norm{\vec{w_1}}}\vec{w_1}, \ldots, \frac{1}{\norm{\vec{w_k}}}\vec{w_k}\) form an orthonormal basis for \(S\text{.}\)
Example 6.1.10.
Let \(P\) be the plane in \(\mathbb{R}^3\) with general equation \(2x-y-z=0\text{.}\) Find an orthonormal basis for \(P\text{.}\)
Solution.We start by finding any basis for \(S\text{.}\) To do this, we solve for one of the variables, say writing \(z = 2x-y\text{.}\) Thus vectors on the plane have the form
Let \(\vec{v_1} = \begin{bmatrix}1\\0\\2\end{bmatrix}\) and \(\vec{v_2} = \begin{bmatrix}0\\1\\-1\end{bmatrix}\text{.}\) So far we have that \(\{\vec{v_1}, \vec{v_2}\}\) is a basis for \(P\text{.}\)
Now we apply the Gram-Schmidt algorithm. It begins by setting \(\vec{w_1} = \vec{v_1} = \begin{bmatrix}1\\0\\2\end{bmatrix}\text{.}\) Next, we have
Now \(\{\vec{w_1}, \vec{w_2}\}\) is an orthogonal basis for \(P\text{.}\) It's slightly annoying to work with \(\vec{w_2}\) in its current form, and replacing a vector by a scalar multiple of that vector doesn't change anything about orthogonality, so there is no harm in replacing our original \(\vec{w_2}\) by a multiply of it to clear out fractions. We therefore instead choose to take \(\vec{w_2} = \begin{bmatrix}2\\5\\-1\end{bmatrix}\text{.}\)
Finally, to obtain an orthonormal basis, just divide each vector in an orthogonal basis by its length. We have \(\norm{\vec{w_1}} = \sqrt{\vec{w_1} \cdot \vec{w_1}} = \sqrt{5}\) and \(\norm{\vec{w_2}} = \sqrt{30}\text{,}\) so our final orthonormal basis is
Exercises 6.1.3 Exercises
1.
Determine whether the following set of vectors is orthogonal. If it is orthogonal, determine whether it is also orthonormal.
2.
Determine whether the following set of vectors is orthogonal. If it is orthogonal, determine whether it is also orthonormal.
3.
Determine whether the following set of vectors is orthogonal. If it is orthogonal, determine whether it is also orthonormal.
4.
Suppose \(B = \{\mathbf{u_1},\mathbf{u_2}, \mathbf{u_3}\} \) is an orthogonal basis of \(\mathbb{R}^3 \text{.}\) We have been told that
5.
Consider the set of vectors given by