Section 1.2 Gauss-Jordan elimination
Subsection 1.2.1 The method of elimination
In the previous section we saw examples of solving systems of linear equations using substitution. Our goal is now to study a more efficient method of solving systems of linear equations, the method of elimination.
The central observation is that there are some things we can do to the equations in a system of linear equations that can make the equations easier to work with, while not changing the solutions.
Definition 1.2.1.
Two systems of linear equations in the same variables are called equivalent if they have exactly the same solutions; that is, if a point is a solution to one system then it is a solution to the other, and if a point is not a solution to one of the systems then it is not a solution to the other.
It turns out that to change a system into an equivalent system there are only three things that we will need to do:
Switch the order in which we write two equations.
Multiply an equation by a non-zero constant.
Add a multiple of one equation to another equation, replacing the equation that we are adding into.
Fact 1.2.2.
Applying any of the three operations above to a system of linear equations produces a new system of linear equations this is equivalent to the original system.
The idea of elimination is to repeatedly use the three operations above until we reach a new system of equations that is easier to solve. The name elimination comes from the fact that our strategy is to eliminate variables from the equations so that (for example) the first variable ends up appearing only in the first equation. The best way to understand the process is through an example.
Example 1.2.3.
Consider this system of equations:
If we add \((-1)\) times the first equation to the second equation, and replace the second equation by the result, we obtain the following system:
Notice that only the second equation has changed. Now let's add \(2\) times the first equation to the third equation. We get:
We've made progress - the variable \(x\) now appears only in the first equation! To continue, let's multiply the second equation by \(-1/3\text{:}\)
Now we'll add \(-4\) times the second equation to the third equation.
Next, we eliminate \(y\) from the first equation by subtracting \(2\) times the second equation.
To clean things up a bit, we'll multiply the third equation by \(-3\text{.}\)
Now we've accomplished something really useful! Since the system we've arrived at has the same solutions as the original solution, we can see from here that any solution to the original system has \(z=-10\text{.}\) We could now subsitute \(z=-10\) into the first two equations, or we can continue with elimination. For the purposes of this example, let's continue with elimination. For our next move, we'll add \(2/3\) times the third equation to the second equation.
Finally, we'll subtract \(\frac{1}{3}\) times the third equation from the first.
After all of these steps we now have a system of equations that is very easy to solve - in fact, it just tells us that the solution is \(x=5, y=-6, z=-10\text{.}\) From Fact 1.2.2 we know that our new system has the same solutions as the original solution, so we conclude that the only solution to our original system
is \(x=5, y=-6, z=-10\text{.}\)
Subsection 1.2.2 Elimination using matrices
The method of elimination is very powerful, but it gets quite tedious to write out all of the steps in the process. In Subsection 1.1.2 we saw that we can keep track of a system of equations as an augmented matrix, with the rows of the augmented matrix representing the equations. Augmented matrices give us a convenient way to keep track of our work when we use elimination to solve systems. Corresponding to the three operations on equations discussed above, we have:
Definition 1.2.4.
The elementary row operations on a matrix are:
Swapping the order of two rows. When swapping row \(i\) with row \(j\) we often write \(R_i \leftrightarrow R_j\text{.}\)
Multiply a row by a non-zero constant. When we multiply row \(i\) by the constant \(c\) we often write \(cR_i\text{.}\)
Add a multiple of one row into another, replacing the row we are adding into. When we add \(c\) times row \(i\) into row \(j\) we often write \(R_j + cR_i\text{.}\) Instead of something like \(R_3+(-2)R_1\) we usually write \(R_3-2R_1\text{.}\)
Definition 1.2.5.
Suppose that \(A\) and \(B\) are two matrices with the same number of rows and the same number of columns. We say that \(A\) and \(B\) are row equivalent if there is a finite sequence of elementary row operations that transforms \(A\) into \(B\text{.}\) The process of using elementary row operations to turn one matrix into another matrix is called row reduction.
Fact 1.2.6.
Suppose that \(S_1\) and \(S_2\) are two systems of linear equations, with corresponding augmented matrices \(A_1\) and \(A_2\text{.}\) The systems \(S_1\) and \(S_2\) are equivalent if and only if \(A_1\) and \(A_2\) are row-equivalent.
With these definitions in place, we now re-do Example 1.2.3 using the notation of augmented matrices.
Example 1.2.7.
Consider again this system of equations:
Here is how we write the process of solving this system using elementary row operations. The sequence of operations is exactly the same one that we used in Example 1.2.3, so the only thing that is different this time is how we record what we're doing.
The matrices in this calculation are all row-equivalent, so Fact 1.2.6 tells us that all of these augmented matrices represent systems with exactly the same set of solutions.
In particular, our original system (which corresponds to the first augmented matrix) has the same solutions as the system corresponding to the final augmented matrix. That final system is \(x=5, y=-6, z=-10\text{,}\) so once again we have found the unique solution to our original system.
Subsection 1.2.3 Gauss-Jordan elimination and reduced row echelon form
In Example 1.2.7 we used a sequence of row operations to transform the augmented matrix of a linear system into the augmented matrix of a much simpler linear system. We could have used the three elementary row operations in any order, and stopped at any time, and we still would have had a system equivalent to our original one. In this section we answer two questions: How can we recognize when we have reached the simplest possible form of a system, and what sequence of row operations will get us there? We start with the question of how to tell when we should stop doing row operations.
Definition 1.2.8.
In any row of any matrix, the first non-zero entry of the row is called the leading entry of the row.
A matrix is in row echelon form if it has all of the following properties:
If there are any rows where every entry is \(0\) then those rows occur below any rows with a non-zero entry.
In any row where not every entry is \(0\text{,}\) if we look at the leading entry then every number below it in the same column or any column to the left of it is \(0\text{.}\)
Notice that the definition of row echelon form does not require our matrix to be augmented - the definition is the same regardless of whether or not there is a vertical line separating parts of the matrix. We will often (but not always!) be interested in row echelon form for augmented matrices; when determining if an augmented matrix is in row echelon form or not you can simply ignore the vertical line.
Example 1.2.9.
Here are some matrices that are in row echelon form.
\(\displaystyle \matr{cc|c}{3 \amp 2 \amp 1 \\ 0 \amp -1 \amp 3 \\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0}\)
\(\displaystyle \matr{ccc|c}{1 \amp 2 \amp 3 \amp 0 \\ 0 \amp 0 \amp 1 \amp 1 \\ 0 \amp 0 \amp 0 \amp 1}\)
\(\displaystyle \matr{cc}{0 \amp 1 \\ 0 \amp 0}\)
Example 1.2.10.
Here are some matrices that are not in row echelon form.
\(\displaystyle \matr{cc|c}{5 \amp 2 \amp 1 \\ 2 \amp -1 \amp 3}\)
\(\displaystyle \matr{ccc|c}{1 \amp 2 \amp 3 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1 }\)
\(\displaystyle \matr{cc}{1 \amp 1 \\ 1 \amp 2 \\ 0 \amp 0}\)
We will see some uses for row echelon form later on. For our current purposes it isn't quite strict enough, so we introduce the following concept.
Definition 1.2.11.
A matrix is in reduced row echelon form if it is in row echelon form and it also satisfies the following additional properties:
If a row has any non-zero entries then the leading entry of that row is \(1\text{.}\)
In any row where not every entry is \(0\text{,}\) if we look at the leading entry then every number both above and below it in the same column is \(0\text{.}\)
As with row echelon form, the definition of reduced row echelon form is the same regardless of whether or not the matrix is augmented.
Example 1.2.12.
Here are some matrices that are in reduced row echelon form.
\(\displaystyle \matr{cc|c}{1 \amp 0 \amp -3\\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0}\)
\(\displaystyle \matr{ccc|c}{1 \amp 0 \amp 3 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1}\)
\(\displaystyle \matr{cc}{0 \amp 1 \\ 0 \amp 0}\)
\(\displaystyle \matr{ccc}{1 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1}\)
Example 1.2.13.
Here are some matrices that are not in reduced row echelon form.
\(\displaystyle \matr{cc|c}{5 \amp 2 \amp 1 \\ 2 \amp -1 \amp 3}\)
\(\displaystyle \matr{ccc|c}{2 \amp 0 \amp 1 \amp 3 \\ 0 \amp 1 \amp 0 \amp 1 \\ 0 \amp 0 \amp 0 \amp 0}\)
\(\displaystyle \matr{ccc}{1 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1}\)
Fact 1.2.14.
If \(A\) is any matrix then there is a unique matrix \(B\) such that \(B\) is in reduced row echelon form and \(A\) is row equivalent to \(B\text{.}\) We call this matrix the reduced row echelon form of \(A\), and sometimes denote it by \(\RREF(A)\text{.}\)
Note 1.2.15.
The reduced row echelon form of a matrix \(A\) is not the same matrix as \(A\text{,}\) unless the original matrix \(A\) was already in reduced row echelon form. That is, \(\RREF(A) = A\) only when \(A\) itself is already in reduced row echelon form. Performing row operations produces a matrix that is row-equivalent to the original matrix, but not equal to the original one!
Fact 1.2.16.
Suppose that \(A\) and \(B\) are matrices. Then \(A\) is row equivalent to \(B\) if and only if \(\RREF(A) = \RREF(B)\text{.}\)
If we start with an augmented matrix \(A\) then \(\RREF(A)\) is the simplest form we can reach using row operations; that is, \(\RREF(A)\) represents the simplest system of linear equations that has the same solutions as the system represented by \(A\text{.}\) So when we are faced with a system of linear equations our goal should be to row reduce its augmented matrix until we reach reduced row echelon form. If we choose our row operations at random it is very unlikely that we will get to reduced row echelon form. Fortunately, there is an algorithm (the Gauss-Jordan algorithm) to help us decide which operations to use.
Since we are not planning to prove Fact 1.2.14 we will also not give an entirely precise statement of the Gauss-Jordan algorithm. Instead, we give an informal description that will be adequate for working with examples. Here is the algorithm:
Find the first column (from the left) of the matrix that has a non-zero entry in it. Make that non-zero entry into a \(1\text{,}\) and move it to the top row.
Use the \(1\) you obtained in the previous step to make the rest of that column be \(0\text{.}\)
Find the next column (further to the right from the previous one used) that has a non-zero entry in a position other than the first row. Make that entry into a \(1\text{,}\) and move it to the second row.
Use the \(1\) you obtained in the previous step to make the rest of that column be \(0\text{.}\)
Find the next column (further to the right from the previous one used) that has a non-zero entry in a position other than the first two rows. Make that entry into a \(1\text{,}\) and move it to the third row.
Use the \(1\) you obtained in the previous step to make the rest of that column be \(0\text{.}\)
Repeat this process until you cannot continue.
Fact 1.2.17.
If we start from a matrix \(A\) and follow the Gauss-Jordan algorithm we will always end at \(\RREF(A)\text{.}\)
There are always many different sequences of row operations that take \(A\) to \(\RREF(A)\text{,}\) and sometimes you may find a sequence that is shorter than the one given by the Gauss-Jordan algorithm. The benefit of the Gauss-Jordan algorithm is simply that it provides a sequence that is guaranteed to take you from \(A\) to \(\RREF(A)\text{.}\)
Example 1.2.18.
Take a moment to verify that in Example 1.2.7 the steps we followed were exactly the steps of the Gauss-Jordan algorithm.
Example 1.2.19.
Let \(A = \matr{ccc|c}{0 \amp 1 \amp 1 \amp 0 \\ 2 \amp 0 \amp 4 \amp 2}\text{.}\) To find \(\RREF(A)\) we can follow the Gauss-Jordan algorithm.
We've arrived at a matrix in reduced row echelon form, so we stop.
Example 1.2.20.
Let \(B = \matr{ccc}{1 \amp 1 \amp 1 \\ 1 \amp 1 \amp 3 \\ 0 \amp 0 \amp 2}\text{.}\) To find \(\RREF(B)\) we follow Gauss-Jordan.
Therefore \(\RREF(B) = \matr{ccc}{1 \amp 1 \amp 0 \\ 0 \amp 0 \amp 1 \\ 0 \amp 0 \amp 0}\text{.}\)
Exercises 1.2.4 Exercises
1.
Find the solution of each of the following systems of linear equations using the Gauss-Jordan algorithm and augmented matrices.
\(\displaystyle x =- 2, y = -1 \)
\(\displaystyle x = \frac{1}{5}, y = -\frac{2}{5} \)
\(\displaystyle x = \frac{2}{17}, y = \frac{7}{17}\)
\(\displaystyle x = -17, y = 13 \)
First, we write the augmented matrix for the system of linear equations.
\begin{align*} \matr{cc|c}{1\amp -3\amp 1\\ 2 \amp -7 \amp 3} \end{align*}There is already a 1 in the top left entry, so we use that 1 to make the rest of the first column become 0 by subtracting 2 times row 1 from row 2. We obtain\begin{align*} \matr{cc|c}{1\amp -3\amp 1\\ 0\amp -1 \amp 1} \amp \hspace{.25in} \end{align*}Now we move to the second column. We want a 1 in the second row of the second column, so we multiply row 2 by (-1) and obtain\begin{align*} \matr{cc|c}{1\amp -3\amp 1\\ 0\amp 1 \amp -1} \amp \hspace{.25in} \end{align*}Now we use the 1 in row 2 column 2 to make the rest of column 2 be all 0. To do this, we add 3 times row 2 to row 1, and get\begin{align*} \matr{cc|c}{1\amp 0\amp -2\\ 0\amp 1 \amp -1} \amp \hspace{.25in} \end{align*}At this point we see that the solution to the system is \(x = -2, y = -1 \text{.}\)For the remaining parts of this question we will show the row operations in the same manner as we did in Example 1.2.20.
\begin{align*} \matr{cc|c}{1\amp -2\amp 1\\ 3 \amp 4 \amp -1} \amp \to_{R_2 - 3R_1}\matr{cc|c}{1 \amp -2 \amp 1 \\ 0 \amp 10 \amp -4}\\ \amp \to_{(1/10)R_2}\matr{cc|c}{1 \amp -2 \amp 1 \\ 0 \amp 1 \amp -4/10}\\ \amp \to_{R_1+2R_2}\matr{cc|c}{1 \amp 0 \amp 2/10 \\ 0 \amp 1 \amp -4/10} \end{align*}Translating back to equations, we find that the solution to the system is \(x = \frac{1}{5}, y = -\frac{2}{5} \text{.}\)The augmented matrix to the system is given by,
\begin{align*} \matr{cc|c}{2\amp -3\amp -1\\ 3 \amp 4 \amp 2}. \end{align*}Now we follow the Gauss-Jordan algorithm.\begin{align*} \matr{cc|c}{2\amp -3\amp -1\\ 3 \amp 4 \amp 2} \amp \to_{(1/2)R_1}\matr{ccc}{1 \amp -3/2 \amp -1/2 \\ 3 \amp 4 \amp 2}\\ \amp \to_{R_2-3R_1}\matr{cc|c}{1 \amp -3/2 \amp -1/2 \\ 0 \amp 17/2 \amp 7/2}\\ \amp \to_{(2/17)R_2}\matr{cc|c}{1 \amp -3/2 \amp -1/2 \\ 0 \amp 1 \amp 7/17}\\ \amp \to_{R_1+(3/2)R_2}\matr{cc|c}{1 \amp 0 \amp 2/17 \\ 0 \amp 1 \amp 7/17} \end{align*}Therefore, the solution to the system is \(x = \frac{2}{17} , y = \frac{7}{17} \text{.}\)The augmented matrix to the system is given by,
\begin{align*} \matr{cc|c}{3\amp 4\amp 1\\ 4 \amp 5 \amp -3}. \end{align*}Now we follow Gauss-Jordan:\begin{align*} \matr{cc|c}{3\amp 4\amp 1\\ 4 \amp 5 \amp -3} \amp \to_{(1/3)R_1} \matr{cc|c}{1 \amp 4/3 \amp 1/3 \\ 4 \amp 5 \amp -3}\\ \amp \to_{R_2-4R_1}\matr{cc|c}{1 \amp 4/3 \amp 1/3 \\ 0 \amp -1/3 \amp -13/3}\\ \amp \to_{-3R_2}\matr{cc|c}{1 \amp 4/3 \amp 1/3 \\ 0 \amp 1 \amp 13}\\ \amp \to_{R_1-(4/3)R_2}\matr{cc|c}{1 \amp 0 \amp -17 \\ 0 \amp 1 \amp 13} \end{align*}Therefore, the solution to the system is \(x = -17 , y = 13 \text{.}\)
2.
We will see in Chapter 2 that a single linear equation in three variables represents a plane in three-dimensional space. Do the three planes, \(x+y-3z=2\text{,}\) \(2x+y+z=1\text{,}\) and \(3x + 2y-2z =0\) have a common point of intersection? If so, find one and if not, tell why there is no such point. Note that a point of intersection is a point that satisfies all three equations.
3.
Find a quadratic \(a +bx +cx^2\) such that the graph of \(y = a + bx + cx^2\) contains the points \((-1,6)\text{,}\) \((2,0)\text{,}\) and \((3,2)\text{.}\)
\(\displaystyle \begin{bmatrix} 1\amp -1 \amp 2\\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \end{bmatrix}\) \(\displaystyle \begin{bmatrix} 1 \amp 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 1 \amp 3 \\ 0 \amp 1 \amp 4 \amp -2 \end{bmatrix}\) \(\displaystyle \begin{bmatrix}1\amp -2 \amp 3 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1 \end{bmatrix}\) \(\displaystyle \begin{bmatrix} 1\amp 0 \amp 0 \amp 3 \amp 1 \\ 0 \amp 0 \amp 0 \amp 1 \amp 1\\ 0 \amp 0 \amp 0 \amp 0 \amp 1 \end{bmatrix}\) \(\displaystyle \begin{bmatrix} 1 \amp 1 \\ 0 \amp 1\end{bmatrix}\) \(\displaystyle \begin{bmatrix} 0 \amp 0 \amp 1 \\ 0 \amp 0 \amp 1 \\ 0 \amp 0 \amp 1 \end{bmatrix}\) Neither in REF nor in RREF. Neither in REF not in RREF. In RREF. In REF, but not in RREF. In REF, but not in RREF. Neither in REF nor in RREF. The matrix is not in REF: The zero row is not at the bottom. The matrix is not in REF: The leading entry of the second row has a non-zero entry below and left of it. The matrix is in RREF as it satisfies all of the requirements of Definition 1.2.11. The matrix is in REF, but not in RREF: The fourth column, which contains a leading \(1\) in the second row, has a non-zero entry (\(3\)). The matrix is in REF, but not in RREF: The second column, which contains a leading \(1\) in the second row, has a non-zero entry (\(1\)). The matrix is not in REF: The leading entry of the second row (\(1\) in the \((2, 3)\)-entry) is not strictly to the right of the leading entry of the first row (\(1\) in the \((1, 3)\)-entry). In particular, the matrix is not in RREF.4.
Which of the following matrices are in reduced row-echelon form? Which are in row-echelon form?
5.
Give two distinct echelon form versions of this matrix.
6.
Row reduce the following matrix to obtain a row-echelon form. Then continue to obtain the reduced row-echelon form.
\(\displaystyle 2 \times 2\) \(\displaystyle 2 \times 3\) \(\displaystyle 3 \times 2\) \(\displaystyle 3 \times 3\)7.
List the reduced row-echelon forms possible for each size.
Swap the locations of row \(i\) with row \(j\text{:}\) \(R_{i} \leftrightarrow R_{j}\text{.}\) Multiply each entry of a single row \(i\) by a nonzero quantity \(\alpha\text{:}\) \(\alpha R_{i}\text{.}\) Multiply each entry of one row \(R_{i}\) by some quantity \(\alpha\text{,}\) and add these values to the entries in the same columns of a second row \(R_{j}\) (where \(i\neq j\)). Leave the first row the same after this operation (i.e. \(R_{i}\) remains unchanged), but replace the second row by the new values (i.e. row \(j\) is new): \(R_{j} + \alpha R_{i}\text{.}\)8.
Prove that each of the three row operations is reversible. More precisely, if the matrix \(B\) is obtained from \(A\) by application of a single row operation, show that there is a single row operation that will transform B back into \(A\text{.}\)
Suppose that \(A\) is row equivalent to \(B\text{.}\) This means that there is a sequence of row operations that will take us from \(A\) to \(B\text{.}\) Another sequence of row operations (such as the ones from the Gauss-Jordan algorithm) will take us from \(B\) to \(\RREF(B)\text{.}\) Put together, this gives us a sequence starting at \(A\) and ending at \(\RREF(B)\text{,}\) so \(A\) is row equivalent to \(\RREF(B)\text{.}\) Now \(\RREF(B)\) is in reduced row echelon form, and Fact 1.2.14 says that the only matrix row equivalent to \(A\) that is in reduced row echelon form is \(\RREF(A)\text{,}\) so we must have \(\RREF(A) = \RREF(B)\text{.}\) For the other direction, suppose that \(\RREF(A) = \RREF(B)\text{.}\) We have a sequence of row operations taking \(A\) to \(\RREF(A) = \RREF(B)\text{.}\) We have another sequence taking us from \(B\) to \(\RREF(B)\text{,}\) and Exercise 1.2.4.8 tells us that we can reverse each of these operations, which means that we can get a sequence taking us from \(\RREF(B)\) back to \(B\text{.}\) Put together this gives us a sequence from \(A\) to \(B\text{,}\) so \(A\) is row equivalent to \(B\text{.}\)9.
Prove Fact Fact 1.2.16. That is, show that \(A\) is row equivalent to \(B\) if and only if \(\RREF(A) = \RREF(B)\text{.}\)