Module 1: Introduction to Computer Vision
1. What is Computer Vision?
Definition: Computer Vision is the field of study focused on enabling machines to process and interpret visual information from the world, similar to how humans do using their vision system.
- It enables automated tasks such as detection, classification, tracking, and recognition.
- Often powered by deep learning and image processing techniques.
- Used in diverse domains: healthcare, security, retail, robotics, and self-driving cars.
2. Key Concepts and Components
- Pixels: Smallest unit of an image; each pixel contains intensity/color information.
- Color spaces: RGB, HSV, Grayscale - various ways to represent color information.
- Image formats: JPG, PNG, TIFF - impact quality, compression, and storage.
- Resolution: The number of pixels in width and height of the image.
3. Applications of Computer Vision
- Autonomous Driving (lane and object detection)
- Medical Imaging (e.g., tumor detection, retinal scan analysis)
- Surveillance (e.g., person tracking, facial recognition)
- Retail (e.g., customer counting, shelf monitoring)
- OCR and historical document digitization
4. Image Understanding vs Image Processing
Image Processing: Basic transformations to improve or analyze an image (e.g., filters, enhancement).
Image Understanding: Extracting semantic meaning from visual data (e.g., classifying an image as a cat).
Image Processing → Feature Extraction → Image Understanding
5. Basic Equation of Image Formation
g(x, y) = f(x, y) * h(x, y) + n(x, y)
Where:
- f(x, y): original image (ideal signal)
- h(x, y): point spread function (camera optics)
- n(x, y): additive noise
- g(x, y): observed image captured by camera
This equation is a basic model for how digital images are formed considering system imperfections.
6. Visual Perception Pipeline
Diagram (conceptual):
Scene → Camera Sensor → Digital Image → Preprocessing → Feature Extraction → Inference → Output
7. Real-World Example: Autonomous Vehicles
Self-driving cars use computer vision to interpret their surroundings.
- Input: Real-time camera feed from road
- CV Tasks: Detect lanes, traffic signs, pedestrians
- Output: Decisions for navigation, braking, or acceleration
8. Timeline of Key Milestones
- 1960s–80s: Rule-based and symbolic vision (edge detection, Hough transform)
- 1990s: Hand-crafted features (SIFT, SURF, HOG)
- 2012: Deep learning revolution (AlexNet on ImageNet)
- Now: Transformers, self-supervised learning, multimodal vision
9. Hands-on Practice (Colab Compatible)
OpenCV Canny Edge Detection on a user-uploaded image:
!pip install opencv-python-headless matplotlib
import cv2
import numpy as np
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from google.colab import files
uploaded = files.upload()
img_path = next(iter(uploaded))
# Load and show original
img = cv2.imread(img_path)
cv2_imshow(img)
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2_imshow(gray)
# Gaussian blur
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# Canny edge detection
edges = cv2.Canny(blur, 100, 200)
cv2_imshow(edges)
cv2.imwrite("canny_edges.jpg", edges)
10. Assignment
Objective: Understand foundational CV concepts, run code, and reflect on vision applications.
- 1. Write a 1-page summary: What is computer vision?
- 2. List and describe 3 real-world applications of CV
- 3. Run the edge detection code and upload your image + results
- 4. (Optional) Sketch and explain a vision pipeline
Due: End of Week 1