Vision Tracker | AI-Powered Smart Tracking Camera

In this project, we’re going to build a smart pan-and-tilt tracking system for the HuskyLens V2 — turning it into an interactive AI vision module that can physically follow what it sees.

The HuskyLens V2 is already a powerful standalone AI camera with built-in models for face recognition, object tracking, hand gesture detection, pose estimation, and more. But what if it could do more than just detect? What if it could react and move in real time?

That’s exactly what we’re building.

Using two servo motors and an ESP32-C6, we’ll create a smooth pan-and-tilt mechanism that allows the HuskyLens V2 to physically track a target. As the AI detects a face, object, hand, or pose, the ESP32-C6 reads the tracking data and adjusts the servo angles accordingly — keeping the subject centered in the frame.

This project extends the capabilities of HuskyLens beyond simple detection and brings it into the physical world. Instead of just identifying objects, the system actively follows them — opening possibilities for:

Smart surveillance systems
Interactive robots
Auto-tracking cameras
AI-powered turrets
Smart classroom or lab demonstrations

By the end of this tutorial, you’ll have a fully functional AI-powered tracking camera system that combines embedded systems, servo control, and real-time machine vision — perfect for makers, robotics enthusiasts, and AI explorers.

Let’s get started!

Supplies

1x HuskyLens V2 — Vision AI camera module

1x ESP32-C6 — Microcontroller for servo control and HuskyLens communication

2× DSS-M15S Servo Motors — For pan and tilt motion

About HuskyLens V2

The HuskyLens V2 is an embedded AI vision module designed for makers and projects that require real-time visual intelligence. Unlike traditional cameras that simply capture images, HuskyLens V2 processes vision tasks on-device using its built-in AI capabilities — there’s no need for an external computer or cloud connection.

Key Technical Features

Dual-core Kendryte K230 Processor – Provides dedicated AI acceleration with up to 6 TOPS performance for efficient edge inference.
1 GB LPDDR4 Memory – Ensures smooth model execution and real-time processing.
8 GB eMMC 5.1 Storage – Stores models, configurations, and logs without external storage.
2 MP Camera Sensor – High-quality image capture optimized for AI detection and tracking.
20+ Built-in AI Models – Ready-to-use vision algorithms, including:
Face detection and recognition
Object tracking
Hand gesture detection
Pose estimation
Color detection
Tag recognition, and more
Custom Model Support – Ability to train your own AI models and deploy them directly to the device.
Interactive Touch Display – 2.4″ touchscreen UI for easy configuration and real-time feedback.
Communication Interfaces – USB-C for direct connection, UART/I²C for microcontroller integration, and optional Wi-Fi module support.

HuskyLens V2 runs AI inference locally — meaning video frames are processed on the device itself using optimized neural network models. When a target (e.g., face or object) is detected, HuskyLens calculates positional data such as X/Y coordinates relative to the frame center. This data can be shared with a microcontroller like the ESP32-C6 over UART or I²C, enabling real-time motion control — which is exactly what we’ll use for this pan-and-tilt tracking setup.

CAD & 3D Printing

To build a clean and compact tracking system, I designed the complete pan-and-tilt assembly in Autodesk Fusion 360. The design focuses on stability, proper weight distribution, and easy assembly while keeping wiring neatly managed inside the structure.

The mechanism consists of four main parts:

Part 1 – Pan Motor Mount

This part securely holds the pan servo motor in place. It ensures proper alignment and provides a rigid base so the entire system rotates smoothly without wobble.

Part 2 – Pan Base

The pan base attaches directly to the pan motor horn. It rotates along with the motor and includes a dedicated mounting provision for the tilt servo motor. This creates the second axis of movement.

Part 3 – Bottom Housing

This is the structural foundation of the system. It serves multiple purposes:

Houses the ESP32-C6
Holds the pan motor securely
Provides internal space to tuck in extra wires for clean cable management
Includes a precise cutout for the Type-C port of the ESP32-C6 for easy programming and power access

Part 4 – Tilt Arm

The tilt mechanism is designed as a two-part assembly:

It mounts directly onto the tilt servo horn
The top section holds the HuskyLens V2, secured using two screws

This structure keeps the camera stable while allowing smooth vertical motion.

All parts were printed on my Bambu Lab P1S using Black PLA.

You can:

Download the ready-to-print STL files and print them directly
Or download the original Fusion 360 file to modify dimensions, adapt to different servos, or customize the mounting system based on your requirements

Downloads

Pan.stl

Base.stl

Pan_Body.stl

Tilt.stl

Tilt (2).stl

Pan Motor Assembly

Take Part-1 (Pan Motor Mount) and one DSS-M15S servo motor.
Remove the small screws from the servo casing
Insert the servo motor into Part-1 so it sits firmly in the mounting slot.
Reinstall and tighten the screws to close the servo casing.
Use 4x mounting screws to secure the servo firmly to Part-1.
Check that the servo is properly aligned and does not move inside the mount.

Circuit Connection

Now let’s connect everything together as shown in the diagram.

1. Servo Connections (Pan & Tilt)

Both servos have three wires:

Red → VCC (5V / VIN)
Black/Brown → GND
Yellow/Orange → Signal

Pan Servo

Signal → GPIO 5 of the Beetle ESP32-C6
VCC → VIN
GND → GND

Tilt Servo

Signal → GPIO 4 of the Beetle ESP32-C6
VCC → VIN
GND → GND

Important: Make sure all GNDs are connected together (ESP32 + both servos + HuskyLens).

2. HuskyLens V2 Connection (I2C Mode)

On the HuskyLens V2:

Go to Settings → Protocol Type
Select I2C

Now connect using the 4-pin Gravity cable:

Green (SDA) → GPIO 19 (ESP32-C6 SDA)
Blue (SCL) → GPIO 20 (ESP32-C6 SCL)
Red (VCC) → VIN
Black (GND) → GND

3. Power

The ESP32-C6 can be powered using the Type-C port.
VIN powers both servos and the HuskyLens.

If using high-torque servos, consider using an external 5V supply for stable operation.

Route all wires through the opening in Part-3 (bottom housing) before final assembly.

Place the ESP32-C6 inside its dedicated slot in Part-3.
Align the Type-C port with the side opening of the housing.
Use 2× M2 screws to secure the ESP32-C6 firmly in place.

Pan Base Assembly

1. Initialize the Servos

Before closing the assembly, we need to center both servos.

Connect the ESP32-C6 to your PC using the Type-C cable.
Open the Arduino IDE (or your preferred environment).
Upload the servo initialization code.

#include <ESP32Servo.h>

Servo tiltServo;

Servo panServo;

#define TILT_PIN 4

#define PAN_PIN 5

void setup() {

Serial.begin(115200);

// Attach servos

tiltServo.attach(TILT_PIN);

panServo.attach(PAN_PIN);

// Move both servos to center (90°)

tiltServo.write(90);

panServo.write(90);

Serial.println("Servos moved to center position (90 degrees)");

}

void loop() {

// Nothing here — servos stay centered

}

This code moves both the pan and tilt servos to their center position.

⚠️ Centering the servos before mechanical assembly is important. It ensures proper alignment and prevents limited rotation or strain after mounting.

Wait until both servos move to their center position.

2. Attach the Bottom Housing (Part-3)

Take Part-3 (Bottom Housing).
Carefully route the wires inside if not already positioned.
Snap Part-3 onto the existing pan motor assembly.

Make sure it:

Fully covers the exposed pan motor section
Sits flush without pinching wires
Aligns properly with the Type-C opening

Once snapped in place, the base structure is complete and ready for the tilt assembly in the next step.

Tilt Motor Assembly

1. Install the Tilt Servo

Take Part-2 (Pan Base with Tilt Mount).
Insert the tilt servo motor into the dedicated slot in Part-2.
Use 4 screws to firmly secure the servo in place.

2. Attach the Pan Base to the Pan Servo

Take the assembled base (with centered pan servo).
Align Part-2 with the pan servo horn.
Carefully place it onto the servo shaft.

Tilt Assembly

1. Install the Servo Horn

Take the tilt arm main part.
Insert the disc servo horn into the provided slot inside the part.
Use 2 screws to secure the horn firmly in place.
Make sure the horn sits flat and does not move.

2. Attach to the Tilt Servo

Ensure the tilt servo is still in the center (90°) position.
Align the tilt arm with the servo shaft.
Carefully place it onto the servo spline.
Insert and tighten the center screw to lock it in place.

⚠️ Make sure the arm is straight before tightening, so the movement range is balanced.

3. Attach the Second Tilt Arm Part

Take the second tilt arm piece.
Align it with the mounted tilt arm section.
Use 2 screws to secure both parts together (as shown in the images).

Once tightened, check the movement manually to ensure smooth tilt motion without obstruction.

The tilt mechanism is now complete and ready for mounting the HuskyLens V2.

HuskyLens Mount Assembly

1. Connect the Gravity Cable

Take the Gravity I2C cable coming from the ESP32-C6.
Connect it to the HuskyLens V2 Gravity port.
Ensure the connector is fully inserted and properly aligned.
Gently route the cable so it does not interfere with tilt movement.

2. Mount the HuskyLens V2

Place the HuskyLens V2 onto the top mounting holes of the tilt arm.
Align the mounting holes.
Insert 2× M3 screws through the bracket.
Tighten the screws securely, but do not over-tighten.

Code and Working

1. Upload the Tracking Code

Open Arduino IDE on your PC.
Copy and paste the provided tracking code into a new sketch.
Go to Tools → Board → Select “Beetle ESP32-C6”.
Select the correct COM Port from Tools → Port.
Click Upload.

#include <Wire.h>

#include "DFRobot_HuskylensV2.h"

#include <ESP32Servo.h>

HuskylensV2 huskylens;

// Servo objects

Servo tiltServo;

Servo panServo;

// Servo pins

#define TILT_PIN 4

#define PAN_PIN 5

// Camera resolution (HuskyLens default)

#define FRAME_WIDTH 320

#define FRAME_HEIGHT 240

// Current servo angles

int panAngle = 90;

int tiltAngle = 90;

// Tuning (adjust if movement too fast/slow)

float panGain = 0.05;

float tiltGain = 0.05;

void setup() {

Serial.begin(115200);

Wire.begin();

// Attach servos

tiltServo.attach(TILT_PIN);

panServo.attach(PAN_PIN);

// Center servos initially

tiltServo.write(tiltAngle);

panServo.write(panAngle);

while (!huskylens.begin(Wire)) {

Serial.println("HuskyLens Begin failed!");

delay(100);

}

huskylens.switchAlgorithm(ALGORITHM_FACE_RECOGNITION);

delay(2000);

}

void loop() {

if (huskylens.getResult(ALGORITHM_ANY)) {

while (huskylens.available(ALGORITHM_ANY)) {

Result *result =

static_cast<Result *>(huskylens.popCachedResult(ALGORITHM_ANY));

int x = result->xCenter;

int y = result->yCenter;

Serial.print("Face Center: ");

Serial.print(x);

Serial.print(", ");

Serial.println(y);

// Calculate error from center

int errorX = x - FRAME_WIDTH / 2;

int errorY = y - FRAME_HEIGHT / 2;

// Adjust angles

panAngle -= errorX * panGain;

tiltAngle += errorY * tiltGain;

// Constrain angles

panAngle = constrain(panAngle, 0, 180);

tiltAngle = constrain(tiltAngle, 0, 180);

// Move servos

panServo.write(panAngle);

tiltServo.write(tiltAngle);

}

delay(20);

}

Wait for the code to compile and upload successfully.

2. How It Works

Once the code is uploaded:

Power on the system.
Open any AI mode on the HuskyLens V2 — for example:
Face Tracking
Object Tracking
Hand Gesture
Pose Detection

As soon as a target is detected:

The HuskyLens V2 sends bounding box and tracking data (X, Y coordinates and ID) to the ESP32-C6 over I2C.
The ESP32-C6 processes this data and calculates how far the object is from the center of the frame.
Based on the position difference, it adjusts the pan and tilt servo angles.

This keeps the detected subject centered in real time.

The system now acts as a fully AI-powered smart tracking camera — capable of physically following faces, objects, hands, or poses using real-time vision data.

Face Recognition

Face Recognition detects faces, shows facial key points (eyes, nose, mouth), and can learn & recognize multiple people.

White box → Face detected
Colored box + ID + % → Face recognized
Example: Face ID1 97%
ID1 = First learned face
97% = Confidence level

RGB Status Light (Back Side):

🔵 Blue → Face detected
🟡 Yellow → Learning face
🟢 Green → Recognized face

How to Learn a Face

Select Face Recognition mode
Align face inside white box
Make sure center crosshair is inside box
Press Button-A (top-right)
Face saved as ID1, ID2, etc.

Important Parameters (Quick Guide)

Forget IDs → Deletes all learned faces
Multi-Face Acceleration → Faster tracking (slightly lower accuracy)
Detection Threshold
Low → Detects easily (may detect false faces)
High → Strict detection
Recognition Threshold
Low → Easy matching (more false matches)
High → Strict matching
NMS Threshold → Removes duplicate overlapping boxes
Face Features → Show/hide key points
Set Name → Assign custom name to ID
Display Name → Show/hide name on screen
Restore Defaults → Reset everything
Export Model → Save learned faces (up to 5 models)
Import Model → Load saved faces into another HuskyLens

Export / Import (Quick Flow)

Export:

Face Recognition → Export Model → Select model number → Save

Import:

Copy .json & .bin files → Paste into new device → Import Model → Select same number

Object Recognition

The object Recognition feature of HUSKYLENS 2 can identify 80 types of fixed objects.

During detection, it automatically frames the target object and displays its name along with a confidence score

Object Tracking

Object Tracking allows you to learn and track one custom object at a time.

How to Learn an Object

Enter Object Tracking mode
Align camera with target object
Touch & drag on screen to frame the object
Release to complete learning

Tracking Result

When the learned object appears:

Screen shows a colored bounding box
Displays: Obj: ID1 66%

Where:

Obj = Default name
ID1 = First learned object
66% = Confidence level

Color Recognition

This function enables detection, learning, recognition, and tracking of specified colors.

Object Classification

Classifies objects into 1000 predefined categories
Uses built-in AI model (no training required)
Displays object name + confidence level
Example: Laptop 92%

⚠ Unlike Object Recognition:

Does NOT show bounding boxes
Does NOT provide object position (no X–Y coordinates)

Best for: Identifying what an object is, not where it is.

Self-Learning

This feature allows capturing multi-angle images of any object, learning, and recognizing any custom object.

Instance Segmentation

Detects objects and outlines their exact shape (contours)
Assigns a unique mask to each detected object
Helps in measuring object area and shape
Supports up to 80 object categories
Categories are the same as Object Recognition mode

Unlike simple bounding boxes, this feature marks the precise boundary of each object for better visual understanding.

Hand Gesture Recognition

Detects the palm and 21 key points
Shows all finger joints in real-time
Supports learning, recognizing, and tracking gestures

21 Key Points Include:

1 Wrist
4 joints per finger:
Thumb
Index finger
Middle finger
Ring finger
Little finger

(Each finger has: root, first joint, second joint, fingertip)

Perfect for gesture control and interactive AI projects

Pose Recognition

Detects human body in the image
Identifies and plots 17 body key points
Can learn, recognize, and track different poses
Detects multiple people at the same time
Works from different angles
Can predict some hidden (occluded) joints

17 Key Points Include:

Nose
Eyes (Left & Right)
Ears (Left & Right)
Shoulders (Left & Right)
Elbows (Left & Right)
Wrists (Left & Right)
Hips (Left & Right)
Knees (Left & Right)
Ankles (Left & Right)

Great for fitness tracking, gesture control, and smart interaction projects.

License Plate Recognition

Detects vehicle license plates in the scene
Displays the plate number on screen
Supports learning specific license plates
Can recognize and track learned plates
Shows ID and confidence level

Useful for parking systems, smart gates, and access control.

Character Recognition (OCR)

Uses OCR (Optical Character Recognition)
Detects Chinese and English text on screen
Displays the recognized text content
Can learn, recognize, and track characters

All detected text areas are shown with bounding boxes
Only the text block closest to the center crosshair is recognized
Recognized text appears at the top-left of the box

Useful for smart readers, label scanning, and text-based automation projects

Line Tracking

Detects lines with strong color contrast from the background
Marks different detected paths with different colors
Works in real-time
Ideal for line-following robots and path tracking projects

Perfect for smart cars and autonomous navigation

Face Emotion Recognition

Recognizes 7 facial expressions:
Anger
Disgust
Fear
Happiness
Neutral
Sadness
Surprise
Expressions are pre-trained at factory
No manual learning required
Displays detected emotion on screen

Great for emotion-based AI interaction projects.

QR Code Recognition

Detects and reads QR codes
Displays the encoded information on screen
Can learn and recognize specific QR codes
Supports tracking detected QR codes
Allows users to assign custom names

Perfect for smart login systems, inventory tracking, and interactive projects

Barcode Recognition

Detects barcodes in the scene
Displays the encoded information
Supports learning and recognizing specific barcodes
Can track detected barcodes
Allows custom naming of barcodes

Useful for billing systems, inventory management, and automation projects

RTSP Video Streaming

Streams live video wirelessly using RTSP
View camera output on mobile, laptop, or PC
Works with RTSP-supported apps (e.g., VLC)
See real-time AI detection results on other devices

Perfect for monitoring, demos, and remote AI projects

HUSKYLENS 2 Microscope Lens Module

The HUSKYLENS 2 Microscope Lens Module is a special attachment designed to convert the HUSKYLENS 2 into a smart digital microscope.

Uses a 6mm focal length lens
Provides 30× magnification
Equipped with 2MP GC2093 sensor
Can detect details as small as ~3 μm

This allows the device to not only observe microscopic objects but also identify them using AI.

It expands HUSKYLENS 2 from a vision sensor to an AI-powered microscope system.

3D Printed Microscope Stand for HUSKYLENS 2

I designed a 3D printed microscope stand specially for HUSKYLENS 2.

Provides stable vertical mounting
Maintains proper focus distance
Ideal for microscopic observation and inspection

Since HUSKYLENS 2 supports custom model installation, you can:

Train your own AI image model
Deploy it directly on the device
Perform AI-based microscopic analysis

This enables powerful applications like:

Smart biology experiments
Automated micro quality inspection
AI-powered STEM learning

The stand transforms HUSKYLENS 2 into a complete AI Microscopy System

Conclusion

In this project, we transformed the HUSKYLENS 2 from a powerful AI camera into a physically interactive tracking system.

By integrating two servo motors with the ESP32-C6, we built a smooth pan-and-tilt mechanism that allows the camera not just to detect — but to react and follow in real time.

The system now:

Reads AI detection data
Adjusts servo angles dynamically
Keeps the subject centered automatically

What started as a vision module has become a complete AI-powered tracking platform.

This project demonstrates how embedded systems, servo control, and machine vision can work together to bridge the gap between digital intelligence and physical movement.

With this foundation, you can expand into:

Smart surveillance
Interactive robotics
Auto-tracking cameras
AI-based classroom demos
Advanced maker and research projects

You’ve now built more than just a tracker — you’ve built a system where AI doesn’t just see… it moves.

Keep building. Keep experimenting.