Lecture

What is Computer Vision AI?

Computers don’t see the world like humans, so how do they recognize images and videos?

The technology that makes this possible is called Computer Vision.

Computer Vision is a technology that allows computers to process and analyze visual data like images and videos. It’s used in many fields, from autonomous vehicles to security systems.

In this lesson, we’ll explore the concept of computer vision, review common use cases, and try a simple hands-on activity.

You don't need to fully understand all the code yet! For now, simply observe how the code runs and interacts with visual data in computer vision. Press Shift + Enter to run the code. 😊

How Computer Vision Works

For computers to understand images, they first need to convert them into digital data.

This is typically done using pixel-based processing and vectorization methods.

1. Digitalization of Pixel-Based Images

To process an image, a computer first converts it into a grid of pixels, each representing a small part of the overall image.

Pixel

The smallest unit of a digital image, each pixel contains a color value.

Color images represent colors using RGB (Red, Green, Blue) values, with each color ranging from 0 to 255.

Example of RGB Color Values

RGB(255, 0, 0) = Red

RGB(0, 255, 0) = Green

RGB(0, 0, 255) = Blue

RGB(255, 255, 0) = Yellow

RGB(0, 0, 0) = Black

RGB(255, 255, 255) = White

Resolution

Resolution refers to the number of pixels in an image, and higher resolution allows for more detailed image representation.

For example, an image with a resolution of 1920x1080 consists of 1920 pixels in width and 1080 pixels in height, totaling 2,073,600 pixels. This allows it to express colors more intricately than an image with a smaller resolution like 640x480.

2. Vectorization of Object-Based Images

Vector images represent visuals using mathematical coordinates and geometric shapes like lines, curves, and polygons.

However, pixel-based images like JPG or PNG lose quality when enlarged, as the individual pixels become visible and the image appears blurry.

For example, images in the svg (Scalable Vector Graphics) format are vector images and retain sharp quality when resized. However, pixel-based images like jpg, png will show pixels when enlarged, leading to quality degradation.

In computer vision, vector images are used in various tasks such as object recognition and edge detection.

Key Terminology in Computer Vision

Let's summarize the core concepts that are essential to understanding computer vision.

1. Image Preprocessing

This process involves performing tasks like brightness adjustment, filter application, and removal of noise (unnecessary information) from images.

2. Object Detection

Identifies and locates specific objects such as cars, people, and animals within an image.

Deep learning models like YOLO and Faster R-CNN are representative object detection models.

3. Image Classification

Analyzes images to classify them into specific categories.

For instance, when identifying defective products in millions of product photos, using image classification AI can greatly enhance productivity in sorting out the defective items.

4. Semantic Segmentation

A technique that assigns each pixel in an image to a specific category, helping the AI understand the content of the image.

For example, it visually distinguishes objects like cars, roads, and pedestrians by coloring them differently.

5. Keypoint Detection

A technique used to find important points in an image that describe the shape or structure of an object.

For example, on a human face, it can detect key points like the eyes, nose, and mouth. These points help the AI track the position and movement of the face.

Essential Library for Computer Vision - OpenCV

A representative Python library for implementing computer vision is OpenCV.

OpenCV is a Python library that provides tools for image processing and object detection. It is commonly used in many computer vision applications and projects.

Computer vision is utilized in various fields like autonomous driving, healthcare, security, industrial automation, and augmented reality, and it continues to evolve to recognize images faster and more accurately.

In the next lesson, we will explore Natural Language Processing, which is one of the core areas of AI.

Quiz

0 / 1

`OpenCV` is a library frequently used in the field of computer vision.

True

False