Section 4.3 Matrix multiplication
Subsection 4.3.1 Composition of functions
Definition 4.3.1.
Let \(f : \mathbb{R}^n \to \mathbb{R}^m\) and \(g : \mathbb{R}^m \to \mathbb{R}^k\) be functions (not necessarily linear transformations). The composition of \(f\) and \(g\) is the function \(g \circ f : \mathbb{R}^n \to \mathbb{R}^k\) defined by \((g \circ f)(\vec{v}) = g(f(\vec{v}))\text{.}\)
Note 4.3.2.
Be careful! Given functions \(f\) and \(g\text{,}\) there are two ways to compose them: \(g\circ f\) (which computes \(f\) first and then \(g\)) and \(f \circ g\) (which computes \(g\) first and then \(f\)). It can happen that neither of these makes sense, or one makes sense but not the other, or both make sense, depending on the domains and codomains of the functions involved. Moreover, even if both \(f\circ g\) and \(g \circ f\) are defined, they are often not the same function.
The next two examples illustrate these warnings.
Example 4.3.3.
Suppose that \(f : \mathbb{R}^2 \to \mathbb{R}^3\) is defined by \(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+y\\y\\-y\end{bmatrix}\) and \(g : \mathbb{R}^3 \to \mathbb{R}^3\) is defined by \(g\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right) = \begin{bmatrix}x\\x+y\\x+y+z\end{bmatrix}\text{.}\) Then \(g \circ f : \mathbb{R}^2 \to \mathbb{R}^3\) is given by
In this example the composition in the other order, \(f \circ g\text{,}\) does not make sense. This is because the output of \(g\) is in \(\mathbb{R}^3\text{,}\) but \(f\) accepts inputs from \(\mathbb{R}^2\text{,}\) so the expression \(f\left(g\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right)\right) = f\left(\begin{bmatrix}x\\x+y\\x+y+z\end{bmatrix}\right)\) is meaningless.
Example 4.3.4.
Let \(f : \mathbb{R}^2 \to \mathbb{R}^2\) and \(g : \mathbb{R}^2 \to \mathbb{R}^2\) be defined by \(f\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+y\\x-y\end{bmatrix}\) and \(g\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}y\\2x\end{bmatrix}\text{.}\) In this case both \(g \circ f\) and \(f \circ g\) are defined. If we calculate them, we get:
and
As we can see, the formulas are different.
To be absolutely certain they they are really different (so that there is no clever way of re-writing the formulas to make them the same) we should pick a specific vector to plug into both and see that we get different answers. There are lots of different choices here, so we just have to pick one. For instance, \((g \circ f)\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}1\\2\end{bmatrix}\) and \((f \circ g)\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}2\\-2\end{bmatrix}\text{.}\)
One final caution: There are some choices of vector where \(g\circ f\) and \(f\circ g\) do give the same answer (for instance, this happens with \(\begin{bmatrix}0\\0\end{bmatrix}\)). That's OK, but we still have \(g \circ f \neq f \circ g\) because they disagree on at least one input.
You might have noticed that in the previous two examples we started with functions \(f\) and \(g\) that were linear transformations, and their compositions (when they were defined) were also linear transformations. In fact, this is always the case, and is a powerful way of building new linear transformations out of old ones.
Theorem 4.3.5.
Let \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) be linear transformations. Then \(S \circ T : \mathbb{R}^n \to \mathbb{R}^k\) is a linear transformation.
Proof.
Given any vectors \(\vec{v}, \vec{w}\) in \(\mathbb{R}^n\text{,}\) we use the fact that both \(T\) and \(S\) are linear to calculate:
Similarly, given a vector \(\vec{v}\) in \(\mathbb{R}^n\) and a scalar \(c\text{,}\) we have:
Example 4.3.6.
Let \(T : \mathbb{R}^2 \to \mathbb{R}^2\) be the transformation that takes a vector \(\vec{v}\text{,}\) rotates it counterclockwise by \(\pi/6\text{,}\) then doubles the result of that, then takes the result of that and performs the orthogonal projection in the direction of \(\begin{bmatrix}3\\-1\end{bmatrix}\text{.}\) Then we can write \(T = \proj_{\begin{bmatrix}3\\-1\end{bmatrix}} \circ S \circ R_{\pi/6}\text{,}\) where \(S(\vec{v}) = 2\vec{v}\) and \(R_{\pi/6}\) is the rotation counterclockwise by \(\pi/6\text{.}\) Since all three of the terms in the composition are things that we already know are linear transformations, it follows from Theorem 4.3.5 that \(T\) is also a linear transformation.
Notice that using Theorem 4.3.5, together with some functions that we already know are linear transformations, has saved us from a fairly unpleaseant direct verification of the properties in Definition 4.1.2!
Subsection 4.3.2 Matrix multiplication
Suppose that \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) are linear transformations. By Theorem 4.3.5 we know that \(S \circ T : \mathbb{R}^n \to \mathbb{R}^k\) is also a linear transformation. Following out maxim that everything we could want to know about a linear transformation is encoded in its standard matrix, we are led to ask how we could use \([T]\) and \([S]\) to calculate \([S \circ T]\text{.}\) For the operations in Section 4.2 it was relatively straightforward to translate from linear transformations to matrices. With composition the translation is less obvious, as the next example illustrates.
Example 4.3.7.
Let \(T : \mathbb{R}^2 \to \mathbb{R}^3\) and \(S : \mathbb{R}^3 \to \mathbb{R}^4\) be defined by \(T\left(\begin{bmatrix}x\\y\end{bmatrix}\right) = \begin{bmatrix}x+2y \\ 2x+y \\ x-y\end{bmatrix}\) and \(S\left(\begin{bmatrix}x\\y\\z\end{bmatrix}\right) = \begin{bmatrix}x-y+z \\ x+y \\ y-z \\ 2y+3z\end{bmatrix}\text{.}\) Then the composition \(S \circ T\) makes sense, and we calculate a formula for it:
Now if we compute the standard matrices (Definition 4.1.8), we find:
It is far from obvious how to combine \([T]\) and \([S]\) to obtain \([S \circ T]\text{!}\)
Looking directly at the matrices in the previous example, it does not seem that there is an obvious relationship between the numbers in the three matrices. There is a relationship, but to see it we need to look at the matrices through the lens of Theorem 4.1.13.
Suppose that \(T : \mathbb{R}^n \to \mathbb{R}^m\) and \(S : \mathbb{R}^m \to \mathbb{R}^k\) are linear transformations. Let's calculate the first column of \([S \circ T]\text{.}\) By Definition 4.1.8 the first column is \((S \circ T)(\vec{e_1})\text{,}\) so that is what we calculate. Let \(\vec{T_1}\) be the first column of \([T]\text{,}\) and recall from the definition of \([T]\) that \(\vec{T_1} = T(\vec{e_1})\text{.}\) As we do the following computation, we use Theorem 4.1.13 to change from using transformations to using matrices.
What we have just shown is that the first column of \([S \circ T]\) is the product of \([S]\) with the first column of \([T]\text{.}\) Repeating the calculation using \(\vec{e_2}\) similarly shows that the second column of \([S \circ T]\) is \([S]\) times the second column of \([T]\text{.}\) These calculations motivate a definition.
Definition 4.3.8.
Let \(A\) be an \(k \times m\) matrix and let \(B\) be an \(m \times n\) matrix. The product \(AB\) is the \(k \times n\) matrix where, for each \(i\) with \(1 \leq i \leq n\text{,}\) the \(i\)th column of \(AB\) is the product of the matrix \(A\) with the vector that is the \(i\)th column of \(B\text{.}\)
Example 4.3.9.
Let \(A = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6\end{bmatrix}\) and \(B = \begin{bmatrix}0 \amp 1 \\ 2 \amp 3\end{bmatrix}\text{.}\) Then the first column of \(AB\) is
and the second column is
Therefore
We constructed the definition of matrix multiplication precisely to match up with composition of linear transformations, and in the discussion leading up to the definition we essentially proved that our definition was the right one to make the following theorem true.
Theorem 4.3.10.
Let \(T : \mathbb{R}^n\to\mathbb{R}^m\) and \(S : \mathbb{R}^m\to\mathbb{R}^k\) be linear transformations. Then \([S \circ T] = [S][T]\text{.}\)
Example 4.3.11.
We return to the transformations from Example 4.3.7. In that example we did the composition by hand, and then used the definition of the standard matrix to find:
With Theorem 4.3.10 we can now get the same answer for \([S \circ T]\) without having to actually perform the composition, because
Even better, from here we can get back the formula for the composition if we want it:
Example 4.3.12.
Fix two angles \(\theta\) and \(\phi\text{,}\) and let \(R_\theta, R_\phi : \mathbb{R}^2 \to \mathbb{R}^2\) be the counterclockwise rotations by \(\theta\) and \(\phi\text{,}\) respectively. Geometrically, you can see that rotating a vector first by \(\theta\) and then by \(\phi\) is the same as rotating the vector by \(\phi+\theta\text{,}\) and so \(R_\phi \circ R_\theta = R_{\phi +\theta}\text{.}\) By Theorem 4.3.10 we therefore have
In Example 4.1.15 we calculated the standard matrix for a rotation, so we can write out both sides of the above equation explicitly:
Now if we calculate the matrix multiplication on the left side, we get:
For these two matrices to be equal they must have equal corresponding entries. In particular, if we look at the top left entries we recover the addition formula for cosine, and if we look at the bottom left entries we get the addition formula for sine:
Notice how little trigonometry we needed in order to get these identities! All we used was the calculations in Example 4.1.15, the fact that the composition of two rotations is the rotation by the sum of the angles, and some very general facts about composition of linear transformations.
Subsection 4.3.3 Properties of matrix multiplication
You might find it a bit surprising that we have called the operation \(AB\) multiplication of matrices, when it arises from composition of functions. The full explanation of why this is an appropriate name is somewhat beyond the scope of this course, but in this section we will show that matrix multiplication does have many of the same properties that multiplication has for numbers. However, we will also see that there are some ways in which matrix multiplication is quite different from multiplication of numbers - when doing matrix algebra involving multiplication you must always be careful! We start with the properties that behave in the expected way.
Theorem 4.3.13.
Suppose that \(A, B, C\) are matrices, and \(k\) is a scalar. Then the following equations are true whenever they make sense (that is, whenever all of the operations used in the equation are defined).
\(\displaystyle A(B+C) = AB+AC\)
\(\displaystyle (B+C)A = BA+CA\)
\(\displaystyle (AB)C = A(BC)\)
\(\displaystyle k(AB) = (kA)B = A(kB)\)
Multiplication by zero matrices acts as you expect.
Theorem 4.3.14.
If \(A\) is an \(m \times k\) matrix then for any \(n\text{,}\) we have \(0_{n\times m}A = 0_{n \times k}\) and \(A0_{k\times n} = 0_{m \times n}\text{.}\)
There is also a matrix that acts similarly to the number \(1\) for multiplication.
Definition 4.3.15.
For any natural number \(n\text{,}\) the \(n \times n\) identity matrix is the \(n \times n\) matrix \(I_n\) that has a \(1\) in each entry of the main diagonal and \(0\) in each other entry.
Theorem 4.3.16.
If \(A\) is an \(m \times n\) matrix then \(I_mA = A\) and \(AI_n = A\text{.}\)
Now we turn to several examples that show how matrix multiplication can behave differently from multiplication of numbers. Perhaps the most striking thing is that, very often, \(AB\neq BA\) - we say that matrix multiplication is non-commutative. When it does happen that \(AB=BA\) for some matrices \(A\) and \(B\) we say that \(A\) and \(B\) commute.
Example 4.3.17.
If \(A = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4\end{bmatrix}\) and \(B = \begin{bmatrix}1 \amp 1 \\ 2 \amp 0\end{bmatrix}\) then \(AB = \begin{bmatrix}5 \amp 1 \\ 11 \amp 3\end{bmatrix}\) and \(BA = \begin{bmatrix}4 \amp 6 \\ 2 \amp 4\end{bmatrix}\text{.}\)
Example 4.3.18.
Let \(A = \begin{bmatrix}0 \amp 1 \\ 0 \amp 0\end{bmatrix}\) and \(B = \begin{bmatrix}0 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{.}\) Notice that \(A \neq 0_{2 \times 2}\) and \(B \neq 0_{2 \times 2}\text{,}\) yet \(BA = 0_{2 \times 2}\text{!}\) Perhaps even more surprisingly, \(AB = \begin{bmatrix}0 \amp 1 \\ 0 \amp 0\end{bmatrix} \neq 0_{2 \times 2}\text{.}\)
It is a good exercise to find the linear transformations with matrices \(A\) and \(B\text{,}\) and then try to understand geometrically what the calculations in this example mean.
Definition 4.3.19.
If \(A\) is an \(n \times n\) matrix then we define \(A^0 = I_n\text{,}\) \(A^1 = A\text{,}\) \(A^2 = AA\text{,}\) \(A^3 = AAA\text{,}\) etc.
Notice that if \(A\) is not square, that is, if it is \(m\times n\) with \(m \neq n\text{,}\) then \(AA\) does not make sense. For this reason we only define powers of \(A\) when \(A\) is square.
Example 4.3.20.
Let \(A = \begin{bmatrix}1 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{,}\) \(B = \begin{bmatrix}-1 \amp 0 \\ 0 \amp -1\end{bmatrix}\text{,}\) \(C = \begin{bmatrix}-1 \amp 0 \\ 0 \amp 1\end{bmatrix}\text{,}\) and \(D = \begin{bmatrix}2 \amp 3 \\ -1 \amp -2\end{bmatrix}\text{.}\) Then
This example shows that the equation \(X^2 = I_2\text{,}\) where \(X\) is a \(2 \times 2\) matrix, has more than two solutions. It turns out that this equation actually has infinitely many solutions! Compare this with the situation of the equation \(x^2=1\) when \(x\) is a number.
Example 4.3.21.
Even if a matrix has no entries that are \(0\) products with that matrix can still yield the zero matrix. For example, \(\begin{bmatrix}-2 \amp 1 \\ -4 \amp 2\end{bmatrix}^2 = \begin{bmatrix}0 \amp 0 \\ 0\amp 0\end{bmatrix}\text{.}\)
The non-commutativity of matrix multiplication means that we need to be careful when doing certain algebraic manipulations. For example, if \(A\) and \(B\) are \(n \times n\) matrices, then:
but because we often have \(AB \neq BA\) we cannot simplify further. In particular, there are square matrices \(A\) and \(B\) for which \((A+B)^2 \neq A^2+2AB+B^2\text{.}\)
Finally, we come to the transpose. Here the correct interaction between products and transposes is not exactly what you might have hoped for, but it's also not too unpleasant.
Theorem 4.3.22.
Suppose that \(A\) is an \(m \times n\) matrix and \(B\) is an \(n \times k\) matrix. Then \((AB)^t = B^tA^t\text{.}\)
Example 4.3.23.
Let \(A = \begin{bmatrix}2 \amp 1 \\ -1 \amp 0 \\ 3 \amp 2\end{bmatrix}\) and \(B = \begin{bmatrix}1 \amp 2 \\ 3 \amp 4\end{bmatrix}\text{.}\) Then
so
On the other hand,
In this example the product \(A^tB^t\) is not even defined. Even in examples where \(A^tB^t\) is defined, we often have \((AB)^t \neq A^tB^t\text{.}\)
Theorem 4.3.24.
Let \(A\) be any matrix. Then \(A^tA\) is a symmetric matrix.
Proof.
First, observe that if \(A\) is \(m \times n\) then \(A^t\) is \(n \times m\text{,}\) so \(A^tA\) makes sense (and is \(n \times n\)). Now to see that \(A^tA\) is symmetric, we just calculate its transpose using Theorem 4.3.22
The theorem above is mostly presented as an illustration of a proof using matrix algebra. The theorem says that \(A^tA\) is always symmetric. It is interesting to ask whether or not the converse is true, that is, whether every symmetric matrix \(B\) can be "factored" as \(B = A^tA\) for some \(A\text{.}\) The answer turns out to be "yes", as long as we allow the entries of \(A\) to be complex numbers. The proof that this "factoring" is possible will appear much later, in Corollary 6.4.7.
Exercises 4.3.4 Exercises
\(\displaystyle -3A \) \(\displaystyle 3B - A \) \(\displaystyle AC \) \(\displaystyle CB \) \(\displaystyle AE \) \(\displaystyle EA \)1.
Consider the matrices: \(A = \begin{bmatrix} 1 \amp 2 \amp 3\\ 2 \amp 1 \amp 7 \end{bmatrix},
B = \begin{bmatrix} 3 \amp -1 \amp 2 \\ -3 \amp 2 \amp 1 \end{bmatrix},
C = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 1 \end{bmatrix},
D = \begin{bmatrix} -1 \amp 2 \\ 2 \amp -3 \end{bmatrix}, \) and \(E = \begin{bmatrix} 2 \\ 3 \end{bmatrix}.\) Find the following if possible. If it is not possible, explain why.
2.
Let \(A\) and \(B\) be \(n \times n\) matrices for which the system of equations \(A\mathbf{x} = \mathbf{0}\) and \(B\mathbf{x} = \mathbf{0}\) each only have the trivial solution \(\mathbf{x} = \mathbf{0}\text{.}\) Show that the system \(AB \mathbf{x} = \mathbf{0}\) has only the trivial solution.
If \(A^2 = I\text{,}\) then \(A = I\text{.}\) If \(AB = A\text{,}\) then \(B = I\text{.}\) If \(A \) is square, then \((A^t)^3 = (A^3)^t\text{.}\) If \(A\) is symmetric, then \(I + A\) is symmetric. If \(AB = AC \) and \(A \ne 0\text{,}\) then \(B = C \text{.}\) If \(A \ne 0\text{,}\) then \(A^2 \ne 0\text{.}\) If \(A \) commutes with \(A+B \) then \(A \) commutes with \(B \text{.}\) See Hint 4.2.4.1.3 for the definition of the transpose and Hint 4.3.4.1.1 for the definition of matrix multiplication. Look back into [4], Chapter 2, Theorem 2.1.1, for some properties of matrix operation. A square matrix \(A\) is called symmetric if \(A=A^t\text{.}\) Two square matrices \(A,B\) of the same size are said to commute if \(AB=BA\text{.}\) False. False. True. True. False. False. True.3.
In each case either show the statement is true or give an example showing that it is false. (We assume that the matrices are of the correct size so that all matrix products which we consider make sense.)
If \(B\) has a row of zeros, so also does \(AB\) for all \(A\text{.}\) If \(A \) has a row of zeros, so also does \(AB \) for all \(B\text{.}\) If \(B\) has a column of zeros, so also does \(AB \) for all \(A\text{.}\) If \(AB \) has a column of zeros for some \(A\neq 0\text{,}\) then so also does \(B\text{.}\) If \(AB \) has a row of zeros for some \(A\neq 0\text{,}\) so also does \(B \text{.}\) False. True. True. False. False.4.
In each case either show the statement is true or give an example showing that it is false. (We assume that the matrices are of the correct size so that all matrix products which we consider make sense.)
5.
Let \(A = \begin{bmatrix} 1 \amp 2 \\ 3 \amp 4 \end{bmatrix}\) and \(B = \begin{bmatrix} 1 \amp 2 \\ 1 \amp k \end{bmatrix}\text{.}\) Is it possible to find \(k\) such that \(AB = BA\text{?}\) If so, what should \(k \) equal?
6.
Find the matrix of the linear transformation that reflects every vector in \(\mathbb{R}^2 \) about the \(x\)-axis and then rotates by an angle of \(\frac{\pi}{4}. \)
7.
Find the matrix of the linear transformation that rotates every vector in \(\mathbb{R}^2 \) by an angle of \(\frac{\pi}{6} \) and then reflects about the \(x\)-axis, followed by a reflection about the \(y\)-axis.