The Intel RealSense Depth Camera D435i, designed for positioning and maneuvering mobile robots and other portable systems, includes an inertial measurement unit (IMU) that enables developers to create solutions with advanced depth-sensing and tracking capabilities. Intel introduced the camera in 2018 and an advanced version early this year.
As robots, drones and other autonomous mobile devices must —eventually — interact independently and intelligently with their environments, they must track their locations as they move, navigating unfamiliar spaces while discovering, monitoring and avoiding still and moving obstacles in real time.
Moving toward that goal, the D435i includes two fisheye lens sensors, an IMU and an Intel Movidius Myriad 2 video processing unit (VPU), a system-on-chip component for image processing and computer vision at very high performance per watt.
Vision-based simultaneous localization and mapping (V‑SLAM) algorithms run directly on the VPU with very low latency. The T265 has demonstrated less than 1% closed-loop drift under intended use conditions. It also offers sub 6 ms latency between movement and reflection of movement in the pose.
The RealSense device measures 1 x 0.5 x 4 inches (108 mm x 24.5 mm x 12.5 mm), weighs around two ounces (55 g), and draws 1.5 watts to operate the entire system, including the cameras, IMU and VPU. Its spatial sensing and tracking capabilities are based on technology developed by RealityCap, acquired by Intel in 2015.
The camera performs inside-out tracking: it does not depend on external sensors to understand its environment. Tracking is based on information gathered from the two fisheye cameras, each with a 163-degree range of view (±5 degrees) and capturing images at 30 frames per second. The wide field of view from each sensor keeps points of reference visible to the system for a relatively long time, even if moving quickly.
Visual-Inertial Odometry. A key strength of visual-inertial odometry is that the sensors complement each other. The images from the camera are supplemented by data from the onboard IMU, which includes a gyroscope and accelerometer. The aggregated data from these sensors is fed into the SLAM algorithms.
The algorithm identifies sets of salient features in the environment, such as a corner of a room or object that can be recognized over time to infer the device’s changing position relative to those points.
The visual information prevents long-term drift from the inertial that degrades position accuracy. The IMU operates at a higher frequency than the cameras, allowing for quicker response and recognition by the algorithm to changes in the device’s position. A map of visual features and their positions is built up over time. In re-localization, the camera uses the features it has seen before to recognize when it has returned to a familiar place. The camera can locate its point of origin with an error margin of less than one percent.
Drone testing demonstrated that, in both cases, the tracking and position data generated by the peripheral was closely correlated with what was provided by GPS. This supports the viability of using it for navigation in areas where GPS is not available, such as under a bridge or inside an industrial structure.