Code › codeit-ai-sprint

Matrix Operations and NumPy Basics

A summary of matrix addition, scalar multiplication, matrix multiplication, element-wise multiplication, and NumPy operations

I am continuing with the Machine Learning Basics course. In the previous post, I looked at the big picture of what machine learning is and why linear algebra is needed. This time, I learned more directly about operations related to matrices. I did study matrices back in university, but I do not remember much of it, so I went through this part as if I were organizing the concepts from the beginning again.

The first topic was matrix addition. To add two matrices, their sizes must be exactly the same. For example, a 2 x 2 matrix and another 2 x 2 matrix can be added, but a 2 x 2 matrix and a 2 x 3 matrix cannot be added. In other words, both the number of rows and the number of columns must match so that elements in the same positions can be added together.

Next, I learned scalar multiplication. Here, a scalar means a single number, not a vector or a matrix. When a matrix is multiplied by the scalar 2, each element inside the matrix is multiplied by 2.

[1 2]   multiplied by 2   [2 4]
[3 4]          →          [6 8]

The concept itself is simple, but it is a basic operation used when adjusting the values of an entire matrix by a certain ratio.

The most important topic in this part was matrix multiplication. Unlike matrix addition, matrix multiplication is not possible just because the two matrices have the same size. Given two matrices A and B, the following shape condition must be satisfied.

Size of A: m × n
Size of B: n × p

In other words, the number of columns in the first matrix A must be the same as the number of rows in the second matrix B.

The resulting matrix follows the number of rows from the first matrix and the number of columns from the second matrix.

(m × n) × (n × p) = (m × p)

For example, if A is a 2 x 3 matrix and B is a 3 x 4 matrix, then multiplying A and B is possible, and the result becomes a 2 x 4 matrix.

(2 × 3) × (3 × 4) = (2 × 4)

The order also matters in matrix multiplication. In ordinary number multiplication, 2 × 3 and 3 × 2 produce the same result, but this is not the case for matrices.

Suppose we have the following two matrices, A and B.

A = [1 2 3]
    [4 5 6]

Size of A: 2 × 3
B = [ 1  2  3  4]
    [ 5  6  7  8]
    [ 9 10 11 12]

Size of B: 3 × 4

First, if we calculate AB, it looks like this.

AB =

[1 2 3]   [ 1  2  3  4]
[4 5 6] × [ 5  6  7  8]
          [ 9 10 11 12]

In this case, the number of columns in the first matrix A is 3, and the number of rows in the second matrix B is also 3.

Number of columns in A: 3
Number of rows in B: 3

So each row of A can be paired with each column of B.

For example, the first value in the resulting matrix is calculated by multiplying the first row of A with the first column of B and then adding the results.

First row of A: [1 2 3]
First column of B: [1]
                   [5]
                   [9]

Calculation:
1×1 + 2×5 + 3×9
= 1 + 10 + 27
= 38

Calculating all values in the same way gives the following result.

AB =

[1 2 3]   [ 1  2  3  4]
[4 5 6] × [ 5  6  7  8]
          [ 9 10 11 12]

=

[ 38  44  50  56]
[ 83  98 113 128]

So the result of AB is a 2 × 4 matrix.

(2 × 3) × (3 × 4) = (2 × 4)

On the other hand, if we try to calculate BA, it has the following form.

BA =

[ 1  2  3  4]   [1 2 3]
[ 5  6  7  8] × [4 5 6]
[ 9 10 11 12]

Here, the number of columns in the first matrix B is 4, while the number of rows in the second matrix A is 2.

Number of columns in B: 4
Number of rows in A: 2

Matrix multiplication works by pairing a row from the first matrix with a column from the second matrix. However, in BA, a row of B has 4 numbers, while a column of A has only 2 numbers.

First row of B: [1 2 3 4]

First column of A: [1]
                   [4]

If we try to multiply and add these values, the number of elements does not match.

1×1 + 2×4 + 3×? + 4×?

There are no matching values for the last two numbers, so the calculation cannot continue.

Therefore, BA cannot be calculated.

(3 × 4) × (2 × 3)

Number of columns in the first matrix B: 4
Number of rows in the second matrix A: 2

→ 4 and 2 do not match, so the multiplication is not possible

So even when using the same A and B, AB may be possible while BA is not. Since matrix multiplication requires the number of columns in the first matrix to match the number of rows in the second matrix, the order of multiplication affects whether the operation is possible and what the shape of the resulting matrix will be.

Another concept I learned was element-wise multiplication. As the name suggests, this operation multiplies elements in the same positions.

For example, if matrices A and B are given as follows,

A = [1 2]
    [3 4]

B = [5 6]
    [7 8]

the result of element-wise multiplication is:

[1*5  2*6]   =   [ 5  12]
[3*7  4*8]       [21  32]

Since this operation multiplies elements in the same positions, the two matrices must have the same size, just like matrix addition.

In the later part of the lesson, I also practiced using NumPy. In NumPy, the meaning changes depending on the multiplication operator.

A * B

The code above means element-wise multiplication.

On the other hand, the code below means matrix multiplication.

A @ B

So in NumPy, * and @ both look like multiplication operators, but they represent different operations.

A simple example looks like this.

import numpy as np

A = np.array([
    [1, 2],
    [3, 4]
])

B = np.array([
    [5, 6],
    [7, 8]
])

print(A * B)
# [[ 5 12]
#  [21 32]]

print(A @ B)
# [[19 22]
#  [43 50]]

A * B is the result of multiplying elements in the same positions, while A @ B is calculated according to the rules of matrix multiplication. When reading or writing NumPy code, I will need to clearly distinguish between these two operations.

In this lesson, I learned that it is important not only to understand the matrix operations themselves, but also to distinguish under what conditions each operation is possible. Matrix addition and element-wise multiplication require the two matrices to have the same size, while matrix multiplication requires the number of columns in the first matrix to match the number of rows in the second matrix.

I am still at the stage of learning basic operations, but since machine learning often represents data as vectors and matrices, these seem like concepts that will continue to appear later.