Module 1: Introduction to Computer Vision

1. What is Computer Vision?

Definition: Computer Vision is the field of study focused on enabling machines to process and interpret visual information from the world, similar to how humans do using their vision system.

It enables automated tasks such as detection, classification, tracking, and recognition.
Often powered by deep learning and image processing techniques.
Used in diverse domains: healthcare, security, retail, robotics, and self-driving cars.

2. Key Concepts and Components

Pixels: Smallest unit of an image; each pixel contains intensity/color information.
Color spaces: RGB, HSV, Grayscale - various ways to represent color information.
Image formats: JPG, PNG, TIFF - impact quality, compression, and storage.
Resolution: The number of pixels in width and height of the image.

3. Applications of Computer Vision

Autonomous Driving (lane and object detection)
Medical Imaging (e.g., tumor detection, retinal scan analysis)
Surveillance (e.g., person tracking, facial recognition)
Retail (e.g., customer counting, shelf monitoring)
OCR and historical document digitization

4. Image Understanding vs Image Processing

Image Processing: Basic transformations to improve or analyze an image (e.g., filters, enhancement).

Image Understanding: Extracting semantic meaning from visual data (e.g., classifying an image as a cat).

Image Processing → Feature Extraction → Image Understanding

5. Basic Equation of Image Formation

g(x, y) = f(x, y) * h(x, y) + n(x, y)

Where:
    - f(x, y): original image (ideal signal)
    - h(x, y): point spread function (camera optics)
    - n(x, y): additive noise
    - g(x, y): observed image captured by camera

This equation is a basic model for how digital images are formed considering system imperfections.

6. Visual Perception Pipeline

Diagram (conceptual):

Scene → Camera Sensor → Digital Image → Preprocessing → Feature Extraction → Inference → Output

7. Real-World Example: Autonomous Vehicles

Self-driving cars use computer vision to interpret their surroundings.

Input: Real-time camera feed from road
CV Tasks: Detect lanes, traffic signs, pedestrians
Output: Decisions for navigation, braking, or acceleration

8. Timeline of Key Milestones

1960s–80s: Rule-based and symbolic vision (edge detection, Hough transform)
1990s: Hand-crafted features (SIFT, SURF, HOG)
2012: Deep learning revolution (AlexNet on ImageNet)
Now: Transformers, self-supervised learning, multimodal vision

9. Hands-on Practice (Colab Compatible)

OpenCV Canny Edge Detection on a user-uploaded image:

!pip install opencv-python-headless matplotlib

import cv2
import numpy as np
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from google.colab import files

uploaded = files.upload()
img_path = next(iter(uploaded))

# Load and show original
img = cv2.imread(img_path)
cv2_imshow(img)

# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2_imshow(gray)

# Gaussian blur
blur = cv2.GaussianBlur(gray, (5, 5), 0)

# Canny edge detection
edges = cv2.Canny(blur, 100, 200)
cv2_imshow(edges)

cv2.imwrite("canny_edges.jpg", edges)

10. Assignment

Objective: Understand foundational CV concepts, run code, and reflect on vision applications.

1. Write a 1-page summary: What is computer vision?
2. List and describe 3 real-world applications of CV
3. Run the edge detection code and upload your image + results
4. (Optional) Sketch and explain a vision pipeline

Due: End of Week 1