Skip to content

Lesson 6.7: Vision Processing

🎯 What You’ll Learn

By the end of this lesson you will be able to:

  • Explain how AprilTag detection works and why it’s used in FRC
  • Describe the vision processing pipeline from camera image to robot pose
  • Understand the difference between 2D targeting and 3D pose estimation
  • Compare Limelight and PhotonVision as vision processing platforms
  • Identify how latency affects vision measurements and how to compensate

Why Vision Processing?

In Unit 5, you learned that odometry (Lesson 5.11) tracks the robot’s position using wheel encoders and the gyro. Odometry is fast and smooth, but it drifts — small errors accumulate over time. After driving across the field and back, your odometry might be off by 10-30 cm.

Vision processing gives the robot a way to correct that drift. By detecting known landmarks on the field (AprilTags), the robot can calculate its actual position and correct the odometry estimate.

Vision also enables:

  • Auto-alignment — automatically aiming at a scoring target
  • Game piece detection — finding notes, cones, or cubes on the field
  • Multi-target tracking — seeing multiple AprilTags simultaneously for better accuracy

AprilTags in FRC

Since 2023, FRC fields have AprilTags — square black-and-white markers placed at known positions around the field. Each tag has a unique ID and a known 3D position.

How AprilTag Detection Works

  1. Camera captures an image — the camera sees the field, including any visible AprilTags
  2. Image processing finds the tag — the vision system detects the tag’s four corners in the image
  3. Pose estimation calculates position — using the known tag size and camera calibration, the system calculates the 3D position and orientation of the tag relative to the camera
  4. Robot pose is computed — knowing where the camera is on the robot and where the tag is on the field, the system calculates the robot’s field position
Camera Image → Tag Detection → Camera-to-Tag Transform → Robot Pose on Field

What You Get from a Detection

Each AprilTag detection provides:

DataWhat It Means
Tag IDWhich tag was detected (maps to a known field position)
tx, tyHorizontal and vertical angle to the tag center (2D targeting)
DistanceHow far the camera is from the tag
3D PoseFull 6-DOF position and orientation of the tag relative to the camera
AmbiguityHow confident the detection is (lower is better)

2D Targeting vs 3D Pose Estimation

There are two fundamentally different ways to use vision data:

2D Targeting

Use the tag’s position in the camera image to aim at it. This is simpler and works well for:

  • Aiming a turret at a scoring target
  • Aligning to a game piece
  • Simple distance estimation
// Get horizontal angle to target
double tx = limelight.getEntry("tx").getDouble(0.0);
// Use tx to aim the turret
turret.setAngle(turret.getAngle() + tx);

3D Pose Estimation

Use the tag’s 3D position to calculate the robot’s position on the field. This is more complex but enables:

  • Correcting odometry drift
  • Knowing your exact field position for autonomous
  • Multi-tag fusion for higher accuracy
// Get robot pose from vision
Pose2d visionPose = getVisionEstimate();
// Feed it to the pose estimator
poseEstimator.addVisionMeasurement(visionPose, timestamp);

Most competitive teams use both — 3D pose estimation for localization and 2D targeting for fine alignment.


Your robot's odometry says it's at (2.0, 3.0) meters on the field, but vision processing detects an AprilTag and calculates the robot is actually at (2.15, 2.85) meters. What should the robot do?


Vision Processing Platforms

Two platforms dominate FRC vision processing:

Limelight

Limelight is a self-contained vision processing camera. It handles everything — camera, processing, and networking — in one unit.

FeatureDetails
SetupPlug in power and Ethernet, configure via web interface
ProcessingOn-device — no additional coprocessor needed
AprilTag supportBuilt-in MegaTag and MegaTag2 pipelines
NetworkTablesPublishes results automatically to NT
LatencyTypically 20-40ms pipeline latency
Multi-tagMegaTag2 fuses multiple tags for better accuracy

Limelight communicates with your robot code through NetworkTables. You read values like tx, ty, tv (target visible), and botpose (robot pose from AprilTags).

PhotonVision

PhotonVision is open-source vision processing software that runs on a coprocessor (Raspberry Pi, Orange Pi, etc.) with a USB camera.

FeatureDetails
SetupInstall on a coprocessor, connect USB camera, configure via web interface
ProcessingOn the coprocessor — you choose the hardware
AprilTag supportBuilt-in AprilTag pipeline with multi-tag PnP
Code integrationJava library with PhotonCamera and PhotonPoseEstimator classes
LatencyDepends on hardware — typically 20-50ms
Multi-tagMulti-tag PnP for improved accuracy

PhotonVision integrates directly into your Java code:

PhotonCamera camera = new PhotonCamera("front-camera");
var result = camera.getLatestResult();
if (result.hasTargets()) {
PhotonTrackedTarget target = result.getBestTarget();
double yaw = target.getYaw();
}

Which Should You Use?

ConsiderationLimelightPhotonVision
Ease of setupEasier — all-in-oneMore setup — separate hardware
CostHigher ($400+)Lower ($50-100 for Pi + camera)
FlexibilityFixed hardwareChoose your camera and coprocessor
Multi-cameraMultiple LimelightsMultiple cameras on one coprocessor
Open sourceNoYes

Both are excellent choices. Many top teams use Limelight for its simplicity; others prefer PhotonVision for its flexibility and cost.


The Vision Pipeline

Regardless of platform, the vision pipeline follows the same steps:

1. Image Capture

The camera captures frames at 30-90 FPS depending on resolution and hardware.

2. Tag Detection

The vision processor finds AprilTags in the image. This involves:

  • Converting to grayscale
  • Finding quadrilateral shapes
  • Decoding the tag ID from the pattern
  • Locating the four corner pixels precisely

3. Pose Estimation (SolvePnP)

Using the known tag size (6.5 inches in FRC), the camera’s calibration parameters, and the detected corner positions, the system solves the Perspective-n-Point (PnP) problem to calculate the 3D transform from camera to tag.

4. Field Localization

The camera-to-tag transform is combined with:

  • The known tag position on the field (from the FRC field layout)
  • The known camera position on the robot (camera mount offset)

This produces the robot’s position on the field.

5. Latency Compensation

Vision processing takes time (20-50ms). By the time the robot receives the pose estimate, it has already moved. The pose estimator compensates by applying the vision measurement at the timestamp when the image was captured, not when the result arrived.


Multi-Camera Fusion

Many competitive teams run multiple cameras to improve accuracy and coverage:

Camera PositionWhat It SeesBenefit
Front-facingTags ahead of the robotGood for driving toward scoring positions
Rear-facingTags behind the robotGood for backing into positions
Side-facingTags to the left/rightWider field of view

When multiple cameras see tags simultaneously, the pose estimates can be fused for higher accuracy. Each camera provides an independent measurement, and the pose estimator blends them based on confidence.


Your robot has two cameras — one facing forward and one facing backward. During autonomous, the robot is driving forward and only the rear camera can see AprilTags. What happens to localization?


Latency: The Hidden Challenge

Vision latency is the time between when the camera captures an image and when your robot code receives the processed result. Typical latencies:

SourceLatency
Camera exposure5-15ms
Image transfer2-5ms
Processing (tag detection + PnP)10-30ms
NetworkTables transfer1-5ms
Total20-50ms

At 3 m/s, a 40ms latency means the robot has moved 12 cm since the image was captured. If you apply the vision pose at the current time instead of the capture time, you introduce a 12 cm error.

How Latency Compensation Works

WPILib’s SwerveDrivePoseEstimator handles this automatically:

poseEstimator.addVisionMeasurement(
visionPose,
captureTimestamp // When the image was taken, not when the result arrived
);

The pose estimator rewinds its internal state to the capture timestamp, applies the vision correction, and then replays all odometry updates that happened since then. This produces a much more accurate result than applying the correction at the current time.


Connecting to Your Team’s Code

Your team’s vision setup likely involves:

  1. Camera configuration — Limelight web interface or PhotonVision dashboard
  2. NetworkTables reading — getting vision data in your robot code
  3. Pose estimation — feeding vision data to the pose estimator in your drivetrain

Look at CommandSwerveDrivetrain.java for how your team integrates vision data. Search for:

  • addVisionMeasurement — where vision poses are fed to the estimator
  • Limelight or PhotonCamera — how vision data is read
  • getLatency or timestamp — how latency compensation is handled

Checkpoint: Vision Processing
Explain the vision processing pipeline from camera image to robot pose in 4-5 steps. Then: your robot's auto routine consistently ends up 15cm to the right of where it should be. The odometry looks correct at the start but drifts during the routine. How could vision processing help fix this?

Vision Pipeline:

  1. Camera captures an image containing AprilTags
  2. Vision processor detects the tags and identifies their corner positions in the image
  3. SolvePnP calculates the 3D transform from camera to each detected tag
  4. Using the known tag positions on the field and camera position on the robot, the system calculates the robot’s field pose
  5. The pose is sent to the robot code with a timestamp for latency-compensated fusion with odometry

Fixing the auto drift: Vision processing can correct the odometry drift during the auto routine. By detecting AprilTags while driving, the pose estimator blends the (drifting) odometry with (accurate) vision measurements. This keeps the robot’s position estimate accurate throughout the routine, so path following stays on target. The 15cm drift would be corrected each time the camera sees a tag, preventing the error from accumulating.


Key Terms

📖 All terms below are also in the full glossary for quick reference.

TermDefinition
AprilTagA square black-and-white fiducial marker with a unique ID, placed at known positions on the FRC field for robot localization
Vision PipelineThe sequence of image capture, tag detection, pose estimation, and field localization that converts camera images into robot position data
SolvePnPThe Perspective-n-Point algorithm that calculates a 3D transform from 2D image points and known 3D object points
Latency CompensationApplying vision measurements at the timestamp when the image was captured rather than when the result was received, accounting for processing delay
Multi-Tag FusionUsing detections of multiple AprilTags simultaneously to produce a more accurate pose estimate than any single tag alone
LimelightA self-contained vision processing camera for FRC with built-in AprilTag detection and NetworkTables integration
PhotonVisionOpen-source vision processing software for FRC that runs on a coprocessor with USB cameras

What’s Next?

You now understand how vision gives your robot eyes. In Lesson 6.8: Pose Estimation and Localization, you’ll dive deeper into how the pose estimator fuses odometry and vision data using a Kalman filter — and how to tune the standard deviations that control how much the robot trusts each data source.