How Computers See: Part I

Computer Science and Artificial Intelligence

info.viefield@gmail.com

Summary

Computers don’t see like humans; they translate light into numbers, detect patterns, and learn through neural networks. From pixels to perception, machine vision interprets the world, enabling tasks from self-driving cars to medical analysis.

The Illusion of Vision

When you look at the world, your perception of sight feels instant. Your ability to recognise shapes, colors, and motion seems effortless and hardwired.

Humans and countless other life forms share this impressive ability.

Computers, on the other hand, don’t have it so easy. They don’t have eyes and lack the biological machinery we use to make sense of visual scenes.

Yet, somehow, they see!

They navigate roads [Clip], sort objects on factory lines [Clip], analyze medical scans [Clip], even unlock your phone with help from facial recognition technology[ Clip].

So, what’s really going on here?
How are computers able to see?

This explainer unpacks how computers simulate the biological process of “seeing”, using mathematical principles.

Step One: Turning Light into Numbers

First, every digital image starts as a grid of tiny squares called pixels. Each pixel corresponds to values from the readings of light-sensitive components in a device (e.g., a digital camera) that output numerical values based on the intensity of light at the moment of capture.

A computer doesn’t perceive cats, trees, and faces like we do. It sees numbers and the patterns they make.

Here is a simple case study:

From left to right: Interpreting an image from pixels to numbers

The image above shows how every pixel holds a unique value. These values represent a pixel’s brightness, color, depth, even infrared intensity.

For the computer, an image is simply a matrix of values. Imagine a spreadsheet with thousands or millions of cells.

That’s what it sees!

When you show a computer a photograph, its “first impression” is basically:

“Here is a 1920 × 1080 array of numbers”!

And what it does with all those numbers?

Well, that’s entirely up to the instructions it’s programmed to execute.

Step Two: Extracting Patterns from Chaos

To make good use of the raw pixels now represented by numbers, computers search for patterns.

This starts with simple mathematical guesses:

Edges → Are there sharp changes in pixel values?
Corners → Where do“edges” meet?
Textures → Are there repeated patterns of pixel values?
Color blobs → Are there areas with similar pixel values?

These guesses are a computer’s preliminary attempt at visualizing our world.

Try squinting your eyes until the objects around you look blurred.

Can you observe that the details on these objects are lost and all you can perceive are the shapes their outlines make?

That’s similar to what computers do at this stage.

Edge filter example

Step Three: Learning What Matters (Convolutional Thinking)

Modern computer vision goes beyond the rigidity of pre-defined instructions. Instead, computers learn what to look for.

This learning happens in layers using Convolutional Neural Networks (CNNs).

You can think of this as a cascade of filters, each layer looking for more complex patterns than the one before:

1^st layer: tiny edges, color streaks
2^nd layer: corners, curves
3^rd layer: textures, simple shapes
4^th layer: object parts like wheels, eyes, leaves
Final layers: complete objects like cars, faces, and dogs

The computer builds up visual understanding piece by piece, just like how children learn to identify objects by noticing recurring shapes.

This is the essence of computer vision:

Learning from sensory inputs instead of relying solely on hard-coded instructions.

Beyond CNNs: A New Way to See (Vision Transformers)

More recently, a new architecture emerged: the Vision Transformer (ViT).

Instead of scanning images with small filters, a ViT cuts the image into patches, a bit like slicing a picture into puzzle pieces.

Then it processes these pieces similarly to how language models process words.

This lets the model pay attention to relationships across the entire image, sometimes leading to better recognition, especially in complex scenes.

It’s a different way of “seeing” that’s more global and less rigid.

1 thought on “How Computers See: Part I”

info.viefield@gmail.com
January 27, 2026 at 1:25 am

Great post

Reply

How Computers See: Part I

Summary

The Illusion of Vision

Step One: Turning Light into Numbers

Step Two: Extracting Patterns from Chaos

Step Three: Learning What Matters (Convolutional Thinking)

Beyond CNNs: A New Way to See (Vision Transformers)

1 thought on “How Computers See: Part I”

Leave a Comment Cancel Reply

Stay Connected

Sign Up for Our Newsletter

Log in to submit your suggestion

Back to page


Horizontal changes	Vertical changes