Vision SDK basics

This guide builds on the information provided in the Overview including definitions of important terms and answers to common developer questions.

What is “classification”?

Classification is the process by which an algorithm identifies the presence of a feature in an image.

For example, the Vision SDK classifies whether there are certain road signs in a given image.

What is “detection”?

Detection is like classification except instead of only identifying whether a given feature is present, a detection algorithm also identifies where in the image the feature occurred.

For example, the Vision SDK detects vehicles in each image, and indicates where it sees them with bounding boxes.

The Vision SDK supports the following detection classes: cars (or trucks), bicycles/motorcycles, pedestrians, traffic lights, traffic signs, and construction cones.

What is “segmentation”?

Segmentation is the process by which each pixel in an image is assigned to a different category, or “class”.

For example, the Vision SDK analyzes each frame of road imagery and paints the pixels different colors corresponding to its underlying class.

The Vision SDK supports the following segmentation classes: cars (or trucks), road surfaces, line markups, non-drivable flat surfaces (such as sidewalks), markup on road surface (dashed lines, double yellow lines, other markups), crosswalks, car hood and other static parts (stickers, phone holder), and other objects.

What is the difference between detection and segmentation?

Detection identifies discrete objects (for example, individual vehicles). The number of detections in an image changes from one image to the next, depending on what appears.

Segmentation, goes pixel-by-pixel and assigns each to a different category. For a given segmentation model, the same number of pixels are classified and colored in every image. Features from segmentation can be any shape describable by a 2D pixel grid, while features from object detection are indicated with boxes defined by four pixels making up the corners.

Can the Vision SDK read all road signs?

The latest version of the Vision SDK recognizes over 200 of the most common road signs today, including speed limits, regulatory signs (merges, turn restrictions, no passing, etc.), warning signs (traffic signal ahead, bicycle crossing, narrow road, etc.), and many others.

The Vision SDK does not read individual letters or words on signs, but rather learns to recognize each sign type holistically. As a result, it generally cannot interpret guide signs (for example, “Mariposa St. Next Exit”).

What are the requirements for camera calibration?

Because the Vision SDK is designed to work with an arbitrary mounting position, it needs a short period of time when it’s first initialized for camera calibration. After camera calibration is complete, the device will be able to accurately gauge the locations of other objects in the driving scene.

AR navigation, Safety mode, and some events from Vision require the camera calibration. The camera calibration is complicated process that requires data from different sensors, can take 20-30 seconds, and should happen while the vehicle is in motion. Once calibration is complete, the Vision SDK will automatically adjust to vibrations and changes in orientation while driving.


Your device will not be able to calibrate without being mounted.

Will the Vision SDK drain my battery?

The Vision SDK consumes CPU, GPU, and other resources to process road imagery on-the-fly. Just as with any other navigation or video application, we recommend having your device plugged in if you are going to use it for extended periods of time.

Will my device get hot if I run the Vision SDK for a long time?

Mobile devices will get warmer over time if they are exposed to direct sunlight and as the on-board AI consumes a decent amount of compute resources, but we have not run into any heat issues with moderate-to-heavy use.

Was this page helpful?