Exploring Computer Vision

A few months ago, I decided to dive into the world of computer vision using Python, exploring its potential for both personal projects and real-world applications. This journey has been an exciting learning experience, filled with challenges, discoveries, and growth.

Tools and Libraries Used

Throughout my journey, I’ve utilized several powerful Python libraries and frameworks to bring my ideas to life:

  • OpenCV: A versatile library for image and video processing.

  • MediaPipe: A framework for building perception pipelines, ideal for hand tracking and gesture recognition.

  • YOLO (You Only Look Once): A state-of-the-art object detection system that processes images in real-time.

These Python-based tools, combined with various online resources, enabled me to build innovative solutions while deepening my understanding of computer vision.

What I’ve Built

Here are some of the projects I’ve developed using these libraries:

  1. Facial Recognition and Tracking

    • Developed systems that can detect and track faces in real time, leveraging OpenCV and YOLO.

  2. Gesture-Based Microcontroller Control

    • Implemented a system where hand gestures control a microcontroller, enabling interactive hardware manipulation.

  3. Object Detection

    • Created applications that identify and classify objects in video streams using YOLO's pre-trained models and fine-tuned them for custom datasets.

  4. Gesture-Based Drawing

    • Designed a program that lets users draw in a virtual environment by tracking hand movements with MediaPipe.

The Process

Most of these projects were built through self-learning, tackling one challenge at a time. I focused on not just running pre-trained models but also understanding their inner workings. This helped me modify their behavior, optimize performance, and create custom solutions tailored to each project.

I also organized my code into reusable classes to keep it clean and maintainable, ensuring that future projects build upon a solid foundation.