Matrix multiplication

Section 4.3 Matrix multiplication

Subsection 4.3.1 Composition of functions

Definition 4.3.1.

Let \(f : \mathbb{R}^n \to \mathbb{R}^m\) and \(g : \mathbb{R}^m \to \mathbb{R}^k\) be functions (not necessarily linear transformations). The composition of \(f\) and \(g\) is the function \(g \circ f : \mathbb{R}^n \to \mathbb{R}^k\) defined by \((g \circ f)(\vec{v}) = g(f(\vec{v}))\text{.}\)

Note 4.3.2.

Be careful! Given functions \(f\) and \(g\text{,}\) there are two ways to compose them: \(g\circ f\) (which computes \(f\) first and then \(g\)) and \(f \circ g\) (which computes \(g\) first and then \(f\)). It can happen that neither of these makes sense, or one makes sense but not the other, or both make sense, depending on the domains and codomains of the functions involved. Moreover, even if both \(f\circ g\) and \(g \circ f\) are defined, they are often not the same function.

The next two examples illustrate these warnings.

Example 4.3.3.

Suppose that \(f : \mathbb{R}^2 \to \mathbb{R}^3\) is defined by \(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+y\\y\\-y\end{bmatrix}\) and \(g : \mathbb{R}^3 \to \mathbb{R}^3\) is defined by \(g\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right) = \begin{bmatrix}x\\x+y\\x+y+z\end{bmatrix}\text{.}\) Then \(g \circ f : \mathbb{R}^2 \to \mathbb{R}^3\) is given by

\begin{align*} (g \circ f)\left(\begin{bmatrix}x\\y\end{bmatrix}\right) \amp = g\left(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right)\right) \\ \amp = g\left(\begin{bmatrix}x+y\\y\\-y\end{bmatrix}\right) \\ \amp = \begin{bmatrix}(x+y) \\ (x+y)+y \\ (x+y)+y+(-y)\end{bmatrix} \\ \amp = \begin{bmatrix}x+y\\x+2y\\x+y\end{bmatrix}\text{.} \end{align*}

In this example the composition in the other order, \(f \circ g\text{,}\) does not make sense. This is because the output of \(g\) is in \(\mathbb{R}^3\text{,}\) but \(f\) accepts inputs from \(\mathbb{R}^2\text{,}\) so the expression \(f\left(g\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right)\right) = f\left(\begin{bmatrix}x\\x+y\\x+y+z\end{bmatrix}\right)\) is meaningless.

Example 4.3.4.

Let \(f : \mathbb{R}^2 \to \mathbb{R}^2\) and \(g : \mathbb{R}^2 \to \mathbb{R}^2\) be defined by \(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+y\\x-y\end{bmatrix}\) and \(g\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}y\\2x\end{bmatrix}\text{.}\) In this case both \(g \circ f\) and \(f \circ g\) are defined. If we calculate them, we get:

\begin{align*} (g \circ f)\left(\begin{bmatrix}x\\y\end{bmatrix}\right) \amp = g\left(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right)\right) \\ \amp = g\left(\begin{bmatrix}x+y\\x-y\end{bmatrix}\right) \\ \amp = \begin{bmatrix}x-y\\2x+2y\end{bmatrix} \text{,} \end{align*}

and

\begin{align*} (f \circ g)\left(\begin{bmatrix}x\\y\end{bmatrix}\right) \amp = f\left(g\left(\begin{bmatrix}x\\y\end{bmatrix}\right)\right) \\ \amp = f\left(\begin{bmatrix}y\\2x\end{bmatrix}\right) \\ \amp = \begin{bmatrix}y+2x\\y-2x\end{bmatrix} \text{.} \end{align*}

As we can see, the formulas are different.

To be absolutely certain they they are really different (so that there is no clever way of re-writing the formulas to make them the same) we should pick a specific vector to plug into both and see that we get different answers. There are lots of different choices here, so we just have to pick one. For instance, \((g \circ f)\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}1\\2\end{bmatrix}\) and \((f \circ g)\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}2\\-2\end{bmatrix}\text{.}\)

One final caution: There are some choices of vector where \(g\circ f\) and \(f\circ g\) do give the same answer (for instance, this happens with \(\begin{bmatrix}0\\0\end{bmatrix}\)). That's OK, but we still have \(g \circ f \neq f \circ g\) because they disagree on at least one input.

You might have noticed that in the previous two examples we started with functions \(f\) and \(g\) that were linear transformations, and their compositions (when they were defined) were also linear transformations. In fact, this is always the case, and is a powerful way of building new linear transformations out of old ones.

Theorem 4.3.5.

Let \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) be linear transformations. Then \(S \circ T : \mathbb{R}^n \to \mathbb{R}^k\) is a linear transformation.

Proof.

Given any vectors \(\vec{v}, \vec{w}\) in \(\mathbb{R}^n\text{,}\) we use the fact that both \(T\) and \(S\) are linear to calculate:

\begin{align*} (S \circ T)(\vec{v} + \vec{w}) \amp = S(T(\vec{v}+\vec{w})) \\ \amp = S(T(\vec{v}) + T(\vec{w})) \\ \amp = S(T(\vec{v})) + S(T(\vec{w})) \\ \amp = (S\circ T)(\vec{v}) + (S \circ T)(\vec{w}) \text{.} \end{align*}

Similarly, given a vector \(\vec{v}\) in \(\mathbb{R}^n\) and a scalar \(c\text{,}\) we have:

\begin{align*} (S \circ T)(c\vec{v}) \amp = S(T(c\vec{v})) \\ \amp = S(cT(\vec{v})) \\ \amp = cS(T(\vec{v})) \\ \amp = c(S \circ T)(\vec{v}) \text{.} \end{align*}

Example 4.3.6.

Let \(T : \mathbb{R}^2 \to \mathbb{R}^2\) be the transformation that takes a vector \(\vec{v}\text{,}\) rotates it counterclockwise by \(\pi/6\text{,}\) then doubles the result of that, then takes the result of that and performs the orthogonal projection in the direction of \(\begin{bmatrix}3\\-1\end{bmatrix}\text{.}\) Then we can write \(T = \proj_{\begin{bmatrix}3\\-1\end{bmatrix}} \circ S \circ R_{\pi/6}\text{,}\) where \(S(\vec{v}) = 2\vec{v}\) and \(R_{\pi/6}\) is the rotation counterclockwise by \(\pi/6\text{.}\) Since all three of the terms in the composition are things that we already know are linear transformations, it follows from Theorem 4.3.5 that \(T\) is also a linear transformation.

Notice that using Theorem 4.3.5, together with some functions that we already know are linear transformations, has saved us from a fairly unpleaseant direct verification of the properties in Definition 4.1.2!

Subsection 4.3.2 Matrix multiplication

Suppose that \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) are linear transformations. By Theorem 4.3.5 we know that \(S \circ T : \mathbb{R}^n \to \mathbb{R}^k\) is also a linear transformation. Following out maxim that everything we could want to know about a linear transformation is encoded in its standard matrix, we are led to ask how we could use \([T]\) and \([S]\) to calculate \([S \circ T]\text{.}\) For the operations in Section 4.2 it was relatively straightforward to translate from linear transformations to matrices. With composition the translation is less obvious, as the next example illustrates.

Example 4.3.7.

Let \(T : \mathbb{R}^2 \to \mathbb{R}^3\) and \(S : \mathbb{R}^3 \to \mathbb{R}^4\) be defined by \(T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y \\ 2x+y \\ x-y\end{bmatrix}\) and \(S\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right) = \begin{bmatrix}x-y+z \\ x+y \\ y-z \\ 2y+3z\end{bmatrix}\text{.}\) Then the composition \(S \circ T\) makes sense, and we calculate a formula for it:

\begin{align*} (S \circ T)\left(\begin{bmatrix}x\\y\end{bmatrix}\right) \amp = S\left(\begin{bmatrix}x+2y \\ 2x+y \\ x-y\end{bmatrix}\right) \\ \amp = \begin{bmatrix}0 \\ 3x+3y \\ x+2y \\ 7x-y\end{bmatrix}\text{.} \end{align*}

Now if we compute the standard matrices (Definition 4.1.8), we find:

\begin{equation*} [T] = \begin{bmatrix}1 \amp 2 \\ 2 \amp 1 \\ 1 \amp -1\end{bmatrix}\text{,} \end{equation*}

\begin{equation*} [S] = \begin{bmatrix}1 \amp -1 \amp 1 \\ 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 2 \amp 3\end{bmatrix}\text{,} \end{equation*}

\begin{equation*} [S \circ T] = \begin{bmatrix}0 \amp 0 \\ 3 \amp 3 \\ 1 \amp 2 \\ 7 \amp -1\end{bmatrix}\text{.} \end{equation*}

It is far from obvious how to combine \([T]\) and \([S]\) to obtain \([S \circ T]\text{!}\)

Looking directly at the matrices in the previous example, it does not seem that there is an obvious relationship between the numbers in the three matrices. There is a relationship, but to see it we need to look at the matrices through the lens of Theorem 4.1.13.

Suppose that \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) are linear transformations. Let's calculate the first column of \([S \circ T]\text{.}\) By Definition 4.1.8 the first column is \((S \circ T)(\vec{e_1})\text{,}\) so that is what we calculate. Let \(\vec{T_1}\) be the first column of \([T]\text{,}\) and recall from the definition of \([T]\) that \(\vec{T_1} = T(\vec{e_1})\text{.}\) As we do the following computation, we use Theorem 4.1.13 to change from using transformations to using matrices.

\begin{align*} (S \circ T)(\vec{e_1}) \amp = S(T(\vec{e_1})) \\ \amp = S(\vec{T_1}) \\ \amp = [S]\vec{T_1} \end{align*}

What we have just shown is that the first column of \([S \circ T]\) is the product of \([S]\) with the first column of \([T]\text{.}\) Repeating the calculation using \(\vec{e_2}\) similarly shows that the second column of \([S \circ T]\) is \([S]\) times the second column of \([T]\text{.}\) These calculations motivate a definition.

Definition 4.3.8.

Let \(A\) be an \(k \times m\) matrix and let \(B\) be an \(m \times n\) matrix. The product \(AB\) is the \(k \times n\) matrix where, for each \(i\) with \(1 \leq i \leq n\text{,}\) the \(i\)th column of \(AB\) is the product of the matrix \(A\) with the vector that is the \(i\)th column of \(B\text{.}\)

Example 4.3.9.

Let \(A = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6\end{bmatrix}\) and \(B = \begin{bmatrix}0 \amp 1 \\ 2 \amp 3\end{bmatrix}\text{.}\) Then the first column of \(AB\) is

\begin{equation*} \begin{bmatrix}1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6\end{bmatrix}\begin{bmatrix}0 \\ 2\end{bmatrix} = \begin{bmatrix}4 \\ 8 \\ 12\end{bmatrix}\text{,} \end{equation*}

and the second column is

\begin{equation*} \begin{bmatrix}1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6\end{bmatrix}\begin{bmatrix}1\\3\end{bmatrix} = \begin{bmatrix}7 \\ 15 \\ 23\end{bmatrix}\text{.} \end{equation*}

Therefore

\begin{equation*} AB = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6\end{bmatrix}\begin{bmatrix}0 \amp 1 \\ 2 \amp 3\end{bmatrix} = \begin{bmatrix}4 \amp 7 \\ 8 \amp 15 \\ 12 \amp 23\end{bmatrix}\text{.} \end{equation*}

We constructed the definition of matrix multiplication precisely to match up with composition of linear transformations, and in the discussion leading up to the definition we essentially proved that our definition was the right one to make the following theorem true.

Theorem 4.3.10.

Let \(T : \mathbb{R}^n\to\mathbb{R}^m\) and \(S : \mathbb{R}^m\to\mathbb{R}^k\) be linear transformations. Then \([S \circ T] = [S][T]\text{.}\)

Example 4.3.11.

We return to the transformations from Example 4.3.7. In that example we did the composition by hand, and then used the definition of the standard matrix to find:

\begin{equation*} [T] = \begin{bmatrix}1 \amp 2 \\ 2 \amp 1 \\ 1 \amp -1\end{bmatrix}\text{,} \end{equation*}

\begin{equation*} [S] = \begin{bmatrix}1 \amp -1 \amp 1 \\ 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 2 \amp 3\end{bmatrix}\text{,} \end{equation*}

\begin{equation*} [S \circ T] = \begin{bmatrix}0 \amp 0 \\ 3 \amp 3 \\ 1 \amp 2 \\ 7 \amp -1\end{bmatrix}\text{.} \end{equation*}

With Theorem 4.3.10 we can now get the same answer for \([S \circ T]\) without having to actually perform the composition, because

\begin{equation*} [S\circ T] = [S][T] = \begin{bmatrix}1 \amp -1 \amp 1 \\ 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 2 \amp 3\end{bmatrix}\begin{bmatrix}1 \amp 2 \\ 2 \amp 1 \\ 1 \amp -1\end{bmatrix} = \begin{bmatrix}\begin{bmatrix}1 \amp -1 \amp 1 \\ 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 2 \amp 3\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 1\end{bmatrix} \amp \begin{bmatrix}1 \amp -1 \amp 1 \\ 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp -1 \\ 0 \amp 2 \amp 3\end{bmatrix}\begin{bmatrix}2\\1\\-1\end{bmatrix}\end{bmatrix} = \begin{bmatrix}0 \amp 0 \\ 3 \amp 3 \\ 1 \amp 2 \\ 7 \amp -1\end{bmatrix}. \end{equation*}

Even better, from here we can get back the formula for the composition if we want it:

\begin{equation*} (S \circ T)\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = [S\circ T]\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}0 \amp 0 \\ 3 \amp 3 \\ 1 \amp 2 \\ 7 \amp -1\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}0 \\ 3x+3y \\ x+2y \\ 7x-y\end{bmatrix}\text{.} \end{equation*}

Example 4.3.12.

Fix two angles \(\theta\) and \(\phi\text{,}\) and let \(R_\theta, R_\phi : \mathbb{R}^2 \to \mathbb{R}^2\) be the counterclockwise rotations by \(\theta\) and \(\phi\text{,}\) respectively. Geometrically, you can see that rotating a vector first by \(\theta\) and then by \(\phi\) is the same as rotating the vector by \(\phi+\theta\text{,}\) and so \(R_\phi \circ R_\theta = R_{\phi +\theta}\text{.}\) By Theorem 4.3.10 we therefore have

\begin{equation*} [R_\phi][R_\theta] = [R_{\phi+\theta}]\text{.} \end{equation*}

In Example 4.1.15 we calculated the standard matrix for a rotation, so we can write out both sides of the above equation explicitly:

\begin{equation*} \begin{bmatrix}\cos(\phi) \amp -\sin(\phi) \\ \sin(\phi) \amp \cos(\phi)\end{bmatrix}\begin{bmatrix}\cos(\theta) \amp -\sin(\theta) \\ \sin(\theta) \amp \cos(\theta)\end{bmatrix} = \begin{bmatrix}\cos(\phi+\theta) \amp -\sin(\phi+\theta) \\ \sin(\phi+\theta) \amp \cos(\phi+\theta)\end{bmatrix}\text{.} \end{equation*}

Now if we calculate the matrix multiplication on the left side, we get:

\begin{equation*} \begin{bmatrix}\cos(\phi)\cos(\theta) - \sin(\phi)\sin(\theta) \amp -\cos(\phi)\sin(\theta)-\sin(\phi)\cos(\theta) \\ \sin(\phi)\cos(\theta)+\cos(\phi)\sin(\theta) \amp -\sin(\phi)\sin(\theta)+\cos(\phi)\cos(\theta)\end{bmatrix} = \begin{bmatrix}\cos(\phi+\theta) \amp -\sin(\phi+\theta) \\ \sin(\phi+\theta) \amp \cos(\phi+\theta)\end{bmatrix}\text{.} \end{equation*}

For these two matrices to be equal they must have equal corresponding entries. In particular, if we look at the top left entries we recover the addition formula for cosine, and if we look at the bottom left entries we get the addition formula for sine:

\begin{equation*} \cos(\phi)\cos(\theta) - \sin(\phi)\sin(\theta) = \cos(\phi+\theta)\text{,} \end{equation*}

\begin{equation*} \sin(\phi)\cos(\theta) + \cos(\phi)\sin(\theta) = \sin(\phi+\theta)\text{.} \end{equation*}

Notice how little trigonometry we needed in order to get these identities! All we used was the calculations in Example 4.1.15, the fact that the composition of two rotations is the rotation by the sum of the angles, and some very general facts about composition of linear transformations.

Subsection 4.3.3 Properties of matrix multiplication

You might find it a bit surprising that we have called the operation \(AB\) multiplication of matrices, when it arises from composition of functions. The full explanation of why this is an appropriate name is somewhat beyond the scope of this course, but in this section we will show that matrix multiplication does have many of the same properties that multiplication has for numbers. However, we will also see that there are some ways in which matrix multiplication is quite different from multiplication of numbers - when doing matrix algebra involving multiplication you must always be careful! We start with the properties that behave in the expected way.

Theorem 4.3.13.

Suppose that \(A, B, C\) are matrices, and \(k\) is a scalar. Then the following equations are true whenever they make sense (that is, whenever all of the operations used in the equation are defined).

\(\displaystyle A(B+C) = AB+AC\)
\(\displaystyle (B+C)A = BA+CA\)
\(\displaystyle (AB)C = A(BC)\)
\(\displaystyle k(AB) = (kA)B = A(kB)\)

Multiplication by zero matrices acts as you expect.

Theorem 4.3.14.

If \(A\) is an \(m \times k\) matrix then for any \(n\text{,}\) we have \(0_{n\times m}A = 0_{n \times k}\) and \(A0_{k\times n} = 0_{m \times n}\text{.}\)

There is also a matrix that acts similarly to the number \(1\) for multiplication.

Definition 4.3.15.

For any natural number \(n\text{,}\) the \(n \times n\) identity matrix is the \(n \times n\) matrix \(I_n\) that has a \(1\) in each entry of the main diagonal and \(0\) in each other entry.

Theorem 4.3.16.

If \(A\) is an \(m \times n\) matrix then \(I_mA = A\) and \(AI_n = A\text{.}\)

Now we turn to several examples that show how matrix multiplication can behave differently from multiplication of numbers. Perhaps the most striking thing is that, very often, \(AB\neq BA\) - we say that matrix multiplication is non-commutative. When it does happen that \(AB=BA\) for some matrices \(A\) and \(B\) we say that \(A\) and \(B\) commute.

Example 4.3.17.

If \(A = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4\end{bmatrix}\) and \(B = \begin{bmatrix}1 \amp 1 \\ 2 \amp 0\end{bmatrix}\) then \(AB = \begin{bmatrix}5 \amp 1 \\ 11 \amp 3\end{bmatrix}\) and \(BA = \begin{bmatrix}4 \amp 6 \\ 2 \amp 4\end{bmatrix}\text{.}\)

Example 4.3.18.

Let \(A = \begin{bmatrix}0 \amp 1 \\ 0 \amp 0\end{bmatrix}\) and \(B = \begin{bmatrix}0 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{.}\) Notice that \(A \neq 0_{2 \times 2}\) and \(B \neq 0_{2 \times 2}\text{,}\) yet \(BA = 0_{2 \times 2}\text{!}\) Perhaps even more surprisingly, \(AB = \begin{bmatrix}0 \amp 1 \\ 0 \amp 0\end{bmatrix} \neq 0_{2 \times 2}\text{.}\)

It is a good exercise to find the linear transformations with matrices \(A\) and \(B\text{,}\) and then try to understand geometrically what the calculations in this example mean.

Definition 4.3.19.

If \(A\) is an \(n \times n\) matrix then we define \(A^0 = I_n\text{,}\) \(A^1 = A\text{,}\) \(A^2 = AA\text{,}\) \(A^3 = AAA\text{,}\) etc.

Notice that if \(A\) is not square, that is, if it is \(m\times n\) with \(m \neq n\text{,}\) then \(AA\) does not make sense. For this reason we only define powers of \(A\) when \(A\) is square.

Example 4.3.20.

Let \(A = \begin{bmatrix}1 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{,}\) \(B = \begin{bmatrix}-1 \amp 0 \\ 0 \amp -1\end{bmatrix}\text{,}\) \(C = \begin{bmatrix}-1 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{,}\) and \(D = \begin{bmatrix}2 \amp 3 \\ -1 \amp -2\end{bmatrix}\text{.}\) Then

\begin{equation*} A^2 = B^2 = C^2 = D^2 = I_2\text{.} \end{equation*}

This example shows that the equation \(X^2 = I_2\text{,}\) where \(X\) is a \(2 \times 2\) matrix, has more than two solutions. It turns out that this equation actually has infinitely many solutions! Compare this with the situation of the equation \(x^2=1\) when \(x\) is a number.

Example 4.3.21.

Even if a matrix has no entries that are \(0\) products with that matrix can still yield the zero matrix. For example, \(\begin{bmatrix}-2 \amp 1 \\ -4 \amp 2\end{bmatrix}^2 = \begin{bmatrix}0 \amp 0 \\ 0\amp 0\end{bmatrix}\text{.}\)

The non-commutativity of matrix multiplication means that we need to be careful when doing certain algebraic manipulations. For example, if \(A\) and \(B\) are \(n \times n\) matrices, then:

\begin{align*} (A+B)^2 \amp = (A+B)(A+B) \\ \amp = A(A+B) + B(A+B) \\ \amp = A^2 + AB + BA +B^2\text{,} \end{align*}

but because we often have \(AB \neq BA\) we cannot simplify further. In particular, there are square matrices \(A\) and \(B\) for which \((A+B)^2 \neq A^2+2AB+B^2\text{.}\)

Finally, we come to the transpose. Here the correct interaction between products and transposes is not exactly what you might have hoped for, but it's also not too unpleasant.

Theorem 4.3.22.

Suppose that \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times k\) matrix. Then \((AB)^t = B^tA^t\text{.}\)

Example 4.3.23.

Let \(A = \begin{bmatrix}2 \amp 1 \\ -1 \amp 0 \\ 3 \amp 2\end{bmatrix}\) and \(B = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4\end{bmatrix}\text{.}\) Then

\begin{equation*} AB = \begin{bmatrix}5 \amp 6 \\ -1 \amp -2 \\ 9 \amp 14\end{bmatrix}\text{,} \end{equation*}

\begin{equation*} (AB)^t = \begin{bmatrix}5 \amp -1 \amp 9 \\ 6 \amp -2 \amp 14\end{bmatrix}\text{.} \end{equation*}

On the other hand,

\begin{equation*} B^tA^t = \begin{bmatrix}1 \amp 3 \\ 2 \amp 4\end{bmatrix}\begin{bmatrix}2 \amp -1 \amp 3 \\ 1 \amp 0 \amp 2\end{bmatrix} = \begin{bmatrix}5 \amp -1 \amp 9 \\ 6 \amp -2 \amp 14\end{bmatrix}. \end{equation*}

In this example the product \(A^tB^t\) is not even defined. Even in examples where \(A^tB^t\) is defined, we often have \((AB)^t \neq A^tB^t\text{.}\)

Theorem 4.3.24.

Let \(A\) be any matrix. Then \(A^tA\) is a symmetric matrix.

Proof.

First, observe that if \(A\) is \(m \times n\) then \(A^t\) is \(n \times m\text{,}\) so \(A^tA\) makes sense (and is \(n \times n\)). Now to see that \(A^tA\) is symmetric, we just calculate its transpose using Theorem 4.3.22

\begin{equation*} (A^tA)^t = A^t(A^t)^t = A^tA\text{.} \end{equation*}

The theorem above is mostly presented as an illustration of a proof using matrix algebra. The theorem says that \(A^tA\) is always symmetric. It is interesting to ask whether or not the converse is true, that is, whether every symmetric matrix \(B\) can be "factored" as \(B = A^tA\) for some \(A\text{.}\) The answer turns out to be "yes", as long as we allow the entries of \(A\) to be complex numbers. The proof that this "factoring" is possible will appear much later, in Corollary 6.4.7.

Exercises 4.3.4 Exercises

1.

Consider the matrices: \(A = \begin{bmatrix} 1 \amp 2 \amp 3\\ 2 \amp 1 \amp 7 \end{bmatrix}, B = \begin{bmatrix} 3 \amp -1 \amp 2 \\ -3 \amp 2 \amp 1 \end{bmatrix}, C = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 1 \end{bmatrix}, D = \begin{bmatrix} -1 \amp 2 \\ 2 \amp -3 \end{bmatrix}, \) and \(E = \begin{bmatrix} 2 \\ 3 \end{bmatrix}.\) Find the following if possible. If it is not possible, explain why.

\(\displaystyle -3A \)
\(\displaystyle 3B - A \)
\(\displaystyle AC \)
\(\displaystyle CB \)
\(\displaystyle AE \)
\(\displaystyle EA \)

Hint.

Recall: Let \(A=[a_{i j}]\) be an \(m\times n\)-matrix, and let \(B=[b_{j k}]\) be an \(n\times p\)-matrix. Then their product is the \(m\times p\)-matrix \(AB= [c_{ik}]\) whose \((i,k)\)-entry is defined by

\begin{equation*} c_{ik} = a_{i1}b_{1k}+a_{i2}b_{2k}+...+a_{in}b_{nk}. \end{equation*}

Answer.

Answer.

\begin{align*} -3A \amp=\begin{bmatrix} -3 \amp -6 \amp -9\\ -6 \amp -3 \amp -21 \end{bmatrix}. \end{align*}
Answer.

\begin{align*} 3B - A \amp = \begin{bmatrix} 8 \amp -5 \amp 3 \\ -11 \amp 5 \amp -4 \end{bmatrix}. \end{align*}
Answer.

Does not exist.
Answer.

\begin{align*} CB = \begin{bmatrix} -3 \amp 3 \amp 4 \\ 6 \amp -1 \amp 7 \end{bmatrix}. \end{align*}
Answer.

Does not exist.
Answer.

Does not exist.

Solution.

Solution.

\begin{align*} -3A \amp= -3\begin{bmatrix} 1 \amp 2 \amp 3\\ 2 \amp 1 \amp 7 \end{bmatrix} =\begin{bmatrix} -3 \amp -6 \amp -9\\ -6 \amp -3 \amp -21 \end{bmatrix}. \end{align*}
Solution.

\begin{align*} 3B - A \amp =3\begin{bmatrix} 3 \amp -1 \amp 2 \\ -3 \amp 2 \amp 1 \end{bmatrix} - \begin{bmatrix} 1 \amp 2 \amp 3\\ 2 \amp 1 \amp 7 \end{bmatrix}\\ \amp = \begin{bmatrix} 3\cdot 3 -1 \amp 3\cdot (-1) - 2 \amp 3\cdot 2 - 3 \\ 3\cdot (-3) -2 \amp 3\cdot 2 -1 \amp 3\cdot 1 -7 \end{bmatrix}\\ \amp = \begin{bmatrix} 8 \amp -5 \amp 3 \\ -11 \amp 5 \amp -4 \end{bmatrix}. \end{align*}
Solution.

We cannot multiply these matrices as their sizes do not match up: the number of columns in \(A\) does not coincide with the number of rows in \(C\text{.}\)
Solution.

\begin{align*} CB \amp = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 1 \end{bmatrix}\begin{bmatrix} 3 \amp -1 \amp 2 \\ -3 \amp 2 \amp 1 \end{bmatrix}\\ \amp = \begin{bmatrix} 1 \cdot 3 + 2\cdot (-3) \amp 1\cdot (-1) + 2\cdot 2 \amp 1\cdot 2 + 2\cdot 1 \\ 3 \cdot 3 + 1\cdot (-3) \amp 3\cdot (-1) + 1\cdot 2 \amp 3\cdot 2 + 1\cdot 1 \end{bmatrix}\\ \amp = \begin{bmatrix} -3 \amp 3 \amp 4 \\ 6 \amp -1 \amp 7 \end{bmatrix}. \end{align*}
Solution.

We cannot multiply these matrices as their sizes do not match up: the number of columns in \(A\) does not coincide with the number of rows in \(E\text{.}\)
Solution.

We cannot multiply these matrices as their sizes do not match up: the number of columns in \(E\) does not coincide with the number of rows in \(A\text{.}\)

2.

Let \(A\) and \(B\) be \(n \times n\) matrices for which the system of equations \(A\mathbf{x} = \mathbf{0}\) and \(B\mathbf{x} = \mathbf{0}\) each only have the trivial solution \(\mathbf{x} = \mathbf{0}\text{.}\) Show that the system \(AB \mathbf{x} = \mathbf{0}\) has only the trivial solution.

Hint 1.

Assume that \(\mathbf{y}\) is a solution to the system \(AB \mathbf{x} = \mathbf{0}\text{.}\) Use the assumptions to conclude that we must have \(\mathbf{y}=\mathbf{0}\text{.}\)

Hint 2.

Remember that \(AB \mathbf{x} = A(B \mathbf{x}) =(AB) \mathbf{x}\text{.}\)

Solution.

Take any vector \(\mathbf{y}\) with \(n\) components such that \(AB \mathbf{y} = \mathbf{0}\text{.}\) This means that \(B \mathbf{y}\) is a solution to the system of equations \(A\mathbf{x} = \mathbf{0}\) which, by assumption, only has the trivial solution. In other words, \(B \mathbf{y} = \mathbf{0}\text{.}\) This similarly means that \(\mathbf{y}\) is a solution to the system of equations \(B\mathbf{x} = \mathbf{0}\) which, again by assumption, only has the trivial solution. Thus, \(\mathbf{y}=\mathbf{0}\text{.}\) We have seen that the system \(AB \mathbf{x} = \mathbf{0}\) only has the trivial solution.

3.

In each case either show the statement is true or give an example showing that it is false. (We assume that the matrices are of the correct size so that all matrix products which we consider make sense.)

If \(A^2 = I\text{,}\) then \(A = I\text{.}\)
If \(AB = A\text{,}\) then \(B = I\text{.}\)
If \(A \) is square, then \((A^t)^3 = (A^3)^t\text{.}\)
If \(A\) is symmetric, then \(I + A\) is symmetric.
If \(AB = AC \) and \(A \ne 0\text{,}\) then \(B = C \text{.}\)
If \(A \ne 0\text{,}\) then \(A^2 \ne 0\text{.}\)
If \(A \) commutes with \(A+B \) then \(A \) commutes with \(B \text{.}\)

Hint 1.

See Hint 4.2.4.1.3 for the definition of the transpose and Hint 4.3.4.1.1 for the definition of matrix multiplication.

Look back into [4], Chapter 2, Theorem 2.1.1, for some properties of matrix operation.

A square matrix \(A\) is called symmetric if \(A=A^t\text{.}\)

Two square matrices \(A,B\) of the same size are said to commute if \(AB=BA\text{.}\)

Hint 2.

All counterexamples can be chosen to be of size \(2\times 2\text{.}\)

Answer.

False.
False.
True.
True.
False.
False.
True.

Solution.

Solution.

False: Take \(A =\begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix}\neq I\text{,}\) which satisfies \(A^2 = I\text{.}\)
Solution.

False: Take \(A=B=\begin{bmatrix} 1 \amp 0 \\ 0 \amp 0\end{bmatrix}\neq I\text{,}\) which satisfy \(AB = A\text{.}\)
Solution.

True: We will prove that \(B^t A^t=(AB)^t\text{,}\) which will imply our claim because
\begin{equation*} (A^t)^3 = (A^t)(A^t)(A^t) = A^t(AA)^t = ((AA)A)^t = (A^3)^t. \end{equation*}
For this proof, let us write \(A_{i,j}\) for the \((i,j)\)-entry of \(A\text{.}\) Thus, we know from Hint 4.3.4.1.1 that the \((i,k)\)-entry of \(AB\) is given by:
\begin{equation*} (AB)_{i,k} = A_{i,1}B_{1,k}+A_{i,2}B_{2,k}+...+A_{i,n}B_{n,k}. \end{equation*}
Recall further from Hint 4.2.4.1.3 that \((B^t)_{k,j}=B_{j k}\) and \((A^{T})_{j,i}=A_{i j}\text{,}\) so that \(B^t A^t\) has \((k,i)\)-entry
\begin{align*} (B^t A^t)_{k,i} \amp= (B^t)_{k,1}(A^t)_{1,i}+(B^t)_{k,2}(A^t)_{2,i}+...+(B^t)_{k,n}(A^t)_{n,i}\\ \amp = B{1,k}A_{i,1}+B_{2,k}A_{i,2}+...+B_{n,k}A_{i,n}\\ \amp = (AB)_{i,k}. \end{align*}
Again by Hint 4.2.4.1.3, \((i,k)\)-entry of \(AB\) is the same as the \((k,i)\)-entry of \((AB)^t\text{,}\) so that
\begin{equation*} (B^t A^t)_{k,i}= (AB)_{i,k} = ((AB)^t)_{k,i} . \end{equation*}
We have shown that the \((k,i)\)-entry of \(B^t A^t\) and of \((AB)^t\) are identical. This means that \(B^t A^t=(AB)^t\text{.}\)
Solution.

True:\(A\) is symmetric means \(A=A^t\text{.}\) Thus, \((I + A)^{T} = I^{T}+A^{T} = I+A\text{,}\) where the first equality follows from Theorem 2.1.2(4) in [4]. Thus, \(I + A\) is symmetric as well.
Solution.

False: Let
\begin{equation*} A=\begin{bmatrix} 1 \amp 0 \\ 0 \amp 0 \end{bmatrix}, B=\begin{bmatrix} 1 \amp 0 \\ 1 \amp 1 \end{bmatrix}, C=\begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix}. \end{equation*}
Then \(A \ne 0\) and
\begin{equation*} AB = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 0 \end{bmatrix} = A = AC, \end{equation*}
but \(B \neq C \text{.}\)
Solution.

False: Let \(A=\begin{bmatrix} 0 \amp 1 \\ 0 \amp 0 \end{bmatrix}\text{.}\) Then \(A^2 = 0\text{,}\) even though \(A\neq 0\text{.}\)
Solution.

True: We compute
\begin{equation*} A^2 + AB = A(A+B) \overset{(*)}{=} (A+B)A = A^2 + BA, \end{equation*}
where \((*)\) is by assumption that \(A\) commutes with \(A+B\text{.}\) If we substract \(A^{2}\) from both the left- and right-most side of the equation, we see that \(AB=BA\text{,}\) i.e. \(A\) commutes with \(B \text{.}\)

4.

If \(B\) has a row of zeros, so also does \(AB\) for all \(A\text{.}\)
If \(A \) has a row of zeros, so also does \(AB \) for all \(B\text{.}\)
If \(B\) has a column of zeros, so also does \(AB \) for all \(A\text{.}\)
If \(AB \) has a column of zeros for some \(A\neq 0\text{,}\) then so also does \(B\text{.}\)
If \(AB \) has a row of zeros for some \(A\neq 0\text{,}\) so also does \(B \text{.}\)

Answer.

False.
True.
True.
False.
False.

Solution.

Solution.

False: Let
\begin{equation*} A=\begin{bmatrix} 1 \amp 1 \\ 1 \amp 1 \end{bmatrix} \text{ and } B=\begin{bmatrix} 1 \amp 0 \\ 0 \amp 0 \end{bmatrix}. \end{equation*}
Then \(AB = \begin{bmatrix} 1 \amp 0 \\ 1 \amp 0 \end{bmatrix}\) does not have a zero row.
Solution.

True: Let us write \(A= [ a_{ij}]\) and \(B= [ b_{jk}]\text{.}\) By assumption, there exists a zero row in \(A\text{,}\) say row \(i_{0}\text{.}\) In other words, \(a_{i_{0}j}=0\) for all \(j\text{.}\) By definition of \(AB= [c_{ik}]\) (see Hint 4.3.4.1.1), its \((i,k)\)-entry is defined by
\begin{equation*} c_{ik} = a_{i1}b_{1k}+a_{i2}b_{2k}+...+a_{in}b_{nk}. \end{equation*}
Since \(a_{i_{0}j}=0\) for all \(j\text{,}\) we see that \(c_{i_{0}k}=0\) for all \(k\text{.}\) In other words, row \(i_{0}\) in \(AB\) is zero.
Solution.

True: Let us write \(A= [ a_{ij}]\) and \(B= [ b_{jk}]\text{.}\) By assumption, there exists a zero column in \(B\text{,}\) say column \(k_{0}\text{.}\) In other words, \(b_{jk_{0}}=0\) for all \(j\text{.}\) By definition of \(AB= [c_{ik}]\) (see Hint 4.3.4.1.1), its \((i,k)\)-entry is defined by
\begin{equation*} c_{ik} = a_{i1}b_{1k}+a_{i2}b_{2k}+...+a_{in}b_{nk}. \end{equation*}
Since \(b_{jk_{0}}=0\) for all \(j\text{,}\) we see that \(c_{ik_{0}}=0\) for all \(i\text{.}\) In other words, column \(k_{0}\) in \(AB\) is zero.
Solution.

False: Take
\begin{equation*} A=\begin{bmatrix} 1 \amp -2 \\ 2 \amp -4 \end{bmatrix} \text{ and } B=\begin{bmatrix} 2 \amp 4 \\ 1 \amp 2 \end{bmatrix}, \end{equation*}
so that
\begin{equation*} AB =\begin{bmatrix} 0 \amp 0 \\ 0 \amp 0 \end{bmatrix} . \end{equation*}
Solution.

False: Take
\begin{equation*} A=\begin{bmatrix} 1 \amp -2 \\ 2 \amp -4 \end{bmatrix} \text{ and } B=\begin{bmatrix} 2 \amp 4 \\ 1 \amp 2 \end{bmatrix}, \end{equation*}
so that
\begin{equation*} AB =\begin{bmatrix} 0 \amp 0 \\ 0 \amp 0 \end{bmatrix} . \end{equation*}

5.

Let \(A = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \end{bmatrix}\) and \(B = \begin{bmatrix} 1 \amp 2 \\ 1 \amp k \end{bmatrix}\text{.}\) Is it possible to find \(k\) such that \(AB = BA\text{?}\) If so, what should \(k \) equal?

Hint.

Compute both \(AB\) and \(BA\) and compare the answer.

Answer.

No.

Solution.

We compute

\begin{equation*} AB = \begin{bmatrix} 3 \amp 2+2k \\ 7 \amp 6+4k \end{bmatrix} \text{ and } BA = \begin{bmatrix} 7 \amp 10 \\ 1+3k \amp 6+4k \end{bmatrix}. \end{equation*}

For these to be equal, we need them to be equal entrywise, but no matter what \(k\) we choose, the \((1,1)\)-entries will never coincide (\(3\neq 7\)). Thus, the answer is no.

6.

Find the matrix of the linear transformation that reflects every vector in \(\mathbb{R}^2 \) about the \(x\)-axis and then rotates by an angle of \(\frac{\pi}{4}. \)

Hint.

You might want to combine the solution to Exercise 4.1.5.6 and (a variant of) Exercise 4.1.5.5 by using Exercise 4.1.5.10. Be careful with the correct order of operation!

7.

Find the matrix of the linear transformation that rotates every vector in \(\mathbb{R}^2 \) by an angle of \(\frac{\pi}{6} \) and then reflects about the \(x\)-axis, followed by a reflection about the \(y\)-axis.

Hint.

You might want to combine previous solutions by using Exercise 4.1.5.10. Be careful with the correct order of operation!