First Steps in Machine Learning and Why Linear Algebra Matters

Following my review of core Python syntax, I have finally dived into the “Machine Learning Basics” course. This chapter covered everything from the relationship between Artificial Intelligence, Machine Learning, and Deep Learning, to one of the fundamental algorithms in supervised learning: k-NN (k-Nearest Neighbors). It also introduced matrices and vectors from linear algebra, the backbone of machine learning. Rather than just memorizing formulas, this was a great opportunity to rethink why these concepts are so powerful and efficient from a developer’s perspective.

First, we clarified the relationship between Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Although these terms are often used interchangeably, they exist in a hierarchical, nested structure. AI is the broadest umbrella category; machine learning sits inside it, and deep learning—which stacks multiple neural network layers—is a subset of machine learning. Modern deep learning has evolved rapidly by training these deep layers on massive amounts of big data, discovering intricate patterns that humans could never explicitly hardcode. As Tom Mitchell famously defined it, machine learning is fundamentally about building systems that “improve their performance (Task) at a given goal with more experience (Data).”

┌───────────────────────────────┐ │ Artificial │ │ Intelligence │ │ ┌─────────────────────────┐ │ │ │ Machine Learning │ │ │ │ ┌───────────────────┐ │ │ │ │ │ Deep Learning │ │ │ │ │ └───────────────────┘ │ │ │ └─────────────────────────┘ │ └───────────────────────────────┘

The lecture focused heavily on ‘Supervised Learning’, which is the most widely adopted and practical type of machine learning. The core premise of supervised learning is providing the program with both the raw data and its corresponding ‘answer’ (Label). Supervised learning is generally split into two types of problems: ‘Classification’, which predicts discrete categories—such as whether an email is spam or not—and ‘Regression’, which predicts continuous numerical values, like forecasting apartment prices based on historical data. On the other hand, ‘Unsupervised Learning’ provides no labels, leaving the program to discover the underlying patterns and structures of the data on its own.

To tackle classification, we explored k-NN (k-Nearest Neighbors), one of the most intuitive algorithms available. For instance, in a Titanic survival prediction dataset, if you are given a specific passenger’s ticket price and age, the algorithm identifies the k closest neighboring data points to that individual. If k=5, it checks the 5 nearest passengers; if the majority of them survived, the algorithm predicts that this passenger survived as well. In short, it is a highly intuitive yet powerful algorithm that relies on a “majority vote” of the surrounding data.

Calculating these ‘distances’ between data points and managing countless variables requires a solid foundation in mathematics—specifically linear algebra, calculus, and probability & statistics. In particular, when handling thousands or millions of data points, ‘Matrices’ from linear algebra seem to be a crucial element. Bundling massive amounts of information into a single matrix not only keeps the code clean but also structures the data in a way that allows computers (especially GPUs) to perform massive parallel computations incredibly efficiently. It has been a while since I studied matrices back in college, so it felt great to dust off those old memories and re-learn them in this context.

However, studying vectors in this course raised a personal question: vectors can be formatted as either row vectors or column vectors, so why do the machine learning community and academic papers universally default to ‘Column Vectors’ as the standard? Digging into it out of curiosity, I found it comes down to how ‘Linear Transformations’ are conventionally structured in matrix multiplication. (What does that even mean?)

In standard mathematics, when we write a function f(x), the input variable x sits on the right. To map this concept to matrix multiplication as Ax, the vector x multiplied to the right of matrix A must be vertical—a column vector—for the matrix dimensions to align properly. Defining a single data point as a column vector makes scaling up much cleaner when combining multiple data points into a single large matrix A. If you flip it and build everything around row vectors (written as xA), the mathematical operations flow from right to left, which is visually counterintuitive. It also unnecessarily complicates how we express weight matrix multiplications when designing neural network architectures.

Finally, we went through a brief hands-on session with numpy, the essential Python library for manipulating these matrices and vectors. Implementing matrix operations with native Python lists requires clunky nested loops, but NumPy allows you to generate matrices and handle addition or scalar multiplication cleanly, mirroring standard algebraic notation.

import numpy as np

# Create a 2x3 matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

# Scalar multiplication (multiply every element by 2)
result = A * 2
print(result)
# [[ 2  4  6]
#  [ 8 10 12]]

This lesson felt like a reset before going deeper into machine learning. I am still at the beginning, but clarifying the relationship between AI, machine learning, deep learning, supervised learning, and the math underneath makes the next steps feel easier to follow.