Vision SDK basics
This guide builds on the information provided in the Overview including definitions of important terms and answers to common developer questions.
Classification is the process by which an algorithm identifies the presence of a feature in an image.
For example, the Vision SDK classifies whether there are certain road signs in a given image.
Detection is like classification except instead of only identifying whether a given feature is present, a detection algorithm also identifies where in the image the feature occurred.
For example, the Vision SDK detects vehicles in each image, and indicates where it sees them with bounding boxes.
The Vision SDK supports the following detection classes: cars (or trucks), bicycles/motorcycles, pedestrians, traffic lights, traffic signs, and construction cones.
Segmentation is the process by which each pixel in an image is assigned to a different category, or “class”.
For example, the Vision SDK analyzes each frame of road imagery and paints the pixels different colors corresponding to its underlying class.
The Vision SDK supports the following segmentation classes: cars (or trucks), road surfaces, line markups, non-drivable flat surfaces (such as sidewalks), markup on road surface (dashed lines, double yellow lines, other markups), crosswalks, car hood and other static parts (stickers, phone holder), and other objects.
Detection identifies discrete objects (for example, individual vehicles). The number of detections in an image changes from one image to the next, depending on what appears.
Segmentation goes pixel-by-pixel and assigns each to a different category. For a given segmentation model, the same number of pixels are classified and colored in every image. Features from segmentation can be any shape describable by a 2D pixel grid, while features from object detection are indicated with boxes defined by four pixels making up the corners.
The latest version of the Vision SDK recognizes over 200 of the most common road signs today, including speed limits, regulatory signs (merges, turn restrictions, no passing, etc.), warning signs (traffic signal ahead, bicycle crossing, narrow road, etc.), and many others.
The Vision SDK does not read individual letters or words on signs, but rather learns to recognize each sign type holistically. As a result, it generally cannot interpret guide signs (for example, “Mariposa St. Next Exit”).
Because the Vision SDK is designed to work with an arbitrary mounting position, it needs a short period of time when it’s first initialized for camera calibration. After camera calibration is complete, the device will be able to accurately gauge the locations of other objects in the driving scene.
AR navigation, Safety mode, and some events from Vision require the camera calibration. The camera calibration is complicated process that requires data from different sensors, can take 20-30 seconds, and should happen while the vehicle is in motion. Once calibration is complete, the Vision SDK will automatically adjust to vibrations and changes in orientation while driving.
Your device will not be able to calibrate without being mounted.
The Vision SDK consumes CPU, GPU, and other resources to process road imagery on-the-fly. Just as with any other navigation or video application, we recommend having your device plugged in if you are going to use it for extended periods of time.
Mobile devices will get warmer over time if they are exposed to direct sunlight and as the on-board AI consumes a decent amount of compute resources, but we have not run into any heat issues with moderate-to-heavy use.