Lesson 6.7: Vision Processing
🎯 What You’ll Learn
By the end of this lesson you will be able to:
- Explain how AprilTag detection works and why it’s used in FRC
- Describe the vision processing pipeline from camera image to robot pose
- Understand the difference between 2D targeting and 3D pose estimation
- Compare Limelight and PhotonVision as vision processing platforms
- Identify how latency affects vision measurements and how to compensate
Why Vision Processing?
In Unit 5, you learned that odometry (Lesson 5.11) tracks the robot’s position using wheel encoders and the gyro. Odometry is fast and smooth, but it drifts — small errors accumulate over time. After driving across the field and back, your odometry might be off by 10-30 cm.
Vision processing gives the robot a way to correct that drift. By detecting known landmarks on the field (AprilTags), the robot can calculate its actual position and correct the odometry estimate.
Vision also enables:
- Auto-alignment — automatically aiming at a scoring target
- Game piece detection — finding notes, cones, or cubes on the field
- Multi-target tracking — seeing multiple AprilTags simultaneously for better accuracy
AprilTags in FRC
Since 2023, FRC fields have AprilTags — square black-and-white markers placed at known positions around the field. Each tag has a unique ID and a known 3D position.
How AprilTag Detection Works
- Camera captures an image — the camera sees the field, including any visible AprilTags
- Image processing finds the tag — the vision system detects the tag’s four corners in the image
- Pose estimation calculates position — using the known tag size and camera calibration, the system calculates the 3D position and orientation of the tag relative to the camera
- Robot pose is computed — knowing where the camera is on the robot and where the tag is on the field, the system calculates the robot’s field position
Camera Image → Tag Detection → Camera-to-Tag Transform → Robot Pose on FieldWhat You Get from a Detection
Each AprilTag detection provides:
| Data | What It Means |
|---|---|
| Tag ID | Which tag was detected (maps to a known field position) |
| tx, ty | Horizontal and vertical angle to the tag center (2D targeting) |
| Distance | How far the camera is from the tag |
| 3D Pose | Full 6-DOF position and orientation of the tag relative to the camera |
| Ambiguity | How confident the detection is (lower is better) |
2D Targeting vs 3D Pose Estimation
There are two fundamentally different ways to use vision data:
2D Targeting
Use the tag’s position in the camera image to aim at it. This is simpler and works well for:
- Aiming a turret at a scoring target
- Aligning to a game piece
- Simple distance estimation
// Get horizontal angle to targetdouble tx = limelight.getEntry("tx").getDouble(0.0);// Use tx to aim the turretturret.setAngle(turret.getAngle() + tx);3D Pose Estimation
Use the tag’s 3D position to calculate the robot’s position on the field. This is more complex but enables:
- Correcting odometry drift
- Knowing your exact field position for autonomous
- Multi-tag fusion for higher accuracy
// Get robot pose from visionPose2d visionPose = getVisionEstimate();// Feed it to the pose estimatorposeEstimator.addVisionMeasurement(visionPose, timestamp);Most competitive teams use both — 3D pose estimation for localization and 2D targeting for fine alignment.
Your robot's odometry says it's at (2.0, 3.0) meters on the field, but vision processing detects an AprilTag and calculates the robot is actually at (2.15, 2.85) meters. What should the robot do?
Vision Processing Platforms
Two platforms dominate FRC vision processing:
Limelight
Limelight is a self-contained vision processing camera. It handles everything — camera, processing, and networking — in one unit.
| Feature | Details |
|---|---|
| Setup | Plug in power and Ethernet, configure via web interface |
| Processing | On-device — no additional coprocessor needed |
| AprilTag support | Built-in MegaTag and MegaTag2 pipelines |
| NetworkTables | Publishes results automatically to NT |
| Latency | Typically 20-40ms pipeline latency |
| Multi-tag | MegaTag2 fuses multiple tags for better accuracy |
Limelight communicates with your robot code through NetworkTables. You read values like tx, ty, tv (target visible), and botpose (robot pose from AprilTags).
PhotonVision
PhotonVision is open-source vision processing software that runs on a coprocessor (Raspberry Pi, Orange Pi, etc.) with a USB camera.
| Feature | Details |
|---|---|
| Setup | Install on a coprocessor, connect USB camera, configure via web interface |
| Processing | On the coprocessor — you choose the hardware |
| AprilTag support | Built-in AprilTag pipeline with multi-tag PnP |
| Code integration | Java library with PhotonCamera and PhotonPoseEstimator classes |
| Latency | Depends on hardware — typically 20-50ms |
| Multi-tag | Multi-tag PnP for improved accuracy |
PhotonVision integrates directly into your Java code:
PhotonCamera camera = new PhotonCamera("front-camera");var result = camera.getLatestResult();if (result.hasTargets()) { PhotonTrackedTarget target = result.getBestTarget(); double yaw = target.getYaw();}Which Should You Use?
| Consideration | Limelight | PhotonVision |
|---|---|---|
| Ease of setup | Easier — all-in-one | More setup — separate hardware |
| Cost | Higher ($400+) | Lower ($50-100 for Pi + camera) |
| Flexibility | Fixed hardware | Choose your camera and coprocessor |
| Multi-camera | Multiple Limelights | Multiple cameras on one coprocessor |
| Open source | No | Yes |
Both are excellent choices. Many top teams use Limelight for its simplicity; others prefer PhotonVision for its flexibility and cost.
The Vision Pipeline
Regardless of platform, the vision pipeline follows the same steps:
1. Image Capture
The camera captures frames at 30-90 FPS depending on resolution and hardware.
2. Tag Detection
The vision processor finds AprilTags in the image. This involves:
- Converting to grayscale
- Finding quadrilateral shapes
- Decoding the tag ID from the pattern
- Locating the four corner pixels precisely
3. Pose Estimation (SolvePnP)
Using the known tag size (6.5 inches in FRC), the camera’s calibration parameters, and the detected corner positions, the system solves the Perspective-n-Point (PnP) problem to calculate the 3D transform from camera to tag.
4. Field Localization
The camera-to-tag transform is combined with:
- The known tag position on the field (from the FRC field layout)
- The known camera position on the robot (camera mount offset)
This produces the robot’s position on the field.
5. Latency Compensation
Vision processing takes time (20-50ms). By the time the robot receives the pose estimate, it has already moved. The pose estimator compensates by applying the vision measurement at the timestamp when the image was captured, not when the result arrived.
Multi-Camera Fusion
Many competitive teams run multiple cameras to improve accuracy and coverage:
| Camera Position | What It Sees | Benefit |
|---|---|---|
| Front-facing | Tags ahead of the robot | Good for driving toward scoring positions |
| Rear-facing | Tags behind the robot | Good for backing into positions |
| Side-facing | Tags to the left/right | Wider field of view |
When multiple cameras see tags simultaneously, the pose estimates can be fused for higher accuracy. Each camera provides an independent measurement, and the pose estimator blends them based on confidence.
Your robot has two cameras — one facing forward and one facing backward. During autonomous, the robot is driving forward and only the rear camera can see AprilTags. What happens to localization?
Latency: The Hidden Challenge
Vision latency is the time between when the camera captures an image and when your robot code receives the processed result. Typical latencies:
| Source | Latency |
|---|---|
| Camera exposure | 5-15ms |
| Image transfer | 2-5ms |
| Processing (tag detection + PnP) | 10-30ms |
| NetworkTables transfer | 1-5ms |
| Total | 20-50ms |
At 3 m/s, a 40ms latency means the robot has moved 12 cm since the image was captured. If you apply the vision pose at the current time instead of the capture time, you introduce a 12 cm error.
How Latency Compensation Works
WPILib’s SwerveDrivePoseEstimator handles this automatically:
poseEstimator.addVisionMeasurement( visionPose, captureTimestamp // When the image was taken, not when the result arrived);The pose estimator rewinds its internal state to the capture timestamp, applies the vision correction, and then replays all odometry updates that happened since then. This produces a much more accurate result than applying the correction at the current time.
Connecting to Your Team’s Code
Your team’s vision setup likely involves:
- Camera configuration — Limelight web interface or PhotonVision dashboard
- NetworkTables reading — getting vision data in your robot code
- Pose estimation — feeding vision data to the pose estimator in your drivetrain
Look at
addVisionMeasurement— where vision poses are fed to the estimatorLimelightorPhotonCamera— how vision data is readgetLatencyortimestamp— how latency compensation is handled
Vision Pipeline:
- Camera captures an image containing AprilTags
- Vision processor detects the tags and identifies their corner positions in the image
- SolvePnP calculates the 3D transform from camera to each detected tag
- Using the known tag positions on the field and camera position on the robot, the system calculates the robot’s field pose
- The pose is sent to the robot code with a timestamp for latency-compensated fusion with odometry
Fixing the auto drift: Vision processing can correct the odometry drift during the auto routine. By detecting AprilTags while driving, the pose estimator blends the (drifting) odometry with (accurate) vision measurements. This keeps the robot’s position estimate accurate throughout the routine, so path following stays on target. The 15cm drift would be corrected each time the camera sees a tag, preventing the error from accumulating.
Key Terms
📖 All terms below are also in the full glossary for quick reference.
| Term | Definition |
|---|---|
| AprilTag | A square black-and-white fiducial marker with a unique ID, placed at known positions on the FRC field for robot localization |
| Vision Pipeline | The sequence of image capture, tag detection, pose estimation, and field localization that converts camera images into robot position data |
| SolvePnP | The Perspective-n-Point algorithm that calculates a 3D transform from 2D image points and known 3D object points |
| Latency Compensation | Applying vision measurements at the timestamp when the image was captured rather than when the result was received, accounting for processing delay |
| Multi-Tag Fusion | Using detections of multiple AprilTags simultaneously to produce a more accurate pose estimate than any single tag alone |
| Limelight | A self-contained vision processing camera for FRC with built-in AprilTag detection and NetworkTables integration |
| PhotonVision | Open-source vision processing software for FRC that runs on a coprocessor with USB cameras |
What’s Next?
You now understand how vision gives your robot eyes. In Lesson 6.8: Pose Estimation and Localization, you’ll dive deeper into how the pose estimator fuses odometry and vision data using a Kalman filter — and how to tune the standard deviations that control how much the robot trusts each data source.