SIFT SLAM ViSion details MIT 16.412J Spring 2004 Vikash K. mansinghka
SIFT SLAM Vision Details MIT 16.412J Spring 2004 Vikash K. Mansinghka 1
Outline Lightning Summary Black Box Model of SIFT Slam vision System Challenges in Computer Vision What these challenges mean for visual Slam e How sift extracts candidate landmarks How landmarks are tracked in SIFT SLAM Alternative vision-based SLAM systems Open questions
Outline • Lightning Summary • Black Box Model of SIFT SLAM Vision System • Challenges in Computer Vision • What these challenges mean for visual SLAM • How SIFT extracts candidate landmarks • How landmarks are tracked in SIFT SLAM • Alternative vision-based SLAM systems • Open questions 2
Lightning Summary Motivation SLAM without modifying the environment Landmark candidates are extracted by the sift process Candidates matched between cameras to get 3D positions Candidates pruned according to consistency w/ robot's expectations Survivors sent off for statistical processing
Lightning Summary • Motivation: SLAM without modifying the environment • Landmark candidates are extracted by the SIFT process • Candidates matched between cameras to get 3D positions • Candidates pruned according to consistency w/ robot’s expectations • Survivors sent off for statistical processing 3
Review of robot specifications ● Triclops3- camera“ stereo vision system Odometry system which produces p, 8 Center camera is "reference
Review of Robot Specifications • Triclops 3-camera “stereo” vision system • Odometry system which produces [p, q, �] • Center camera is “reference” 4
Black box model of vision System For now, based on black-magic(SIFT). Produces landmarks Assume landmarks globally indexed by i ● Per frame inputs p,,8-odometry input(x, z, bearing deltas. List of (i, i)-new landmark pos(from SLAM) Per frame output is a list of (i, ci,Ti, ri, Ci) for each visible landmark讠 where: i is its measured 3D pos(wrt. camera pos i is its map 3D pos(wrt initial robot pos), if it isn't new (ri, ci) is its pixel coordinates in center camera
Black Box Model of Vision System • For now, based on black-magic (SIFT). Produces landmarks. • Assume landmarks globally indexed by i. • Per frame inputs: – [p, q, �] - odometry input (x, z, bearing deltas.) – List of (i, xi) - new landmark pos (from SLAM) • Per frame output is a list of (i, x landmark i where: �, xi, ri, ci) for each visible i – x�i is its measured 3D pos (w.r.t. camera pos) – xi is its map 3D pos (w.r.t. initial robot pos), if it isn’t new – (ri, ci) is its pixel coordinates in center camera 5
Challenges in Computer Vision Intuitively appealing f computationally realizable Stable feature extraction is hard; results rarely general Extracted features are sparse Matching requires exponential time Matches are often wrong
Challenges in Computer Vision • Intuitively appealing �= computationally realizable • Stable feature extraction is hard; results rarely general • Extracted features are sparse • Matching requires exponential time • Matches are often wrong 6
Implications for Visual SLAM Hard to reliably find landmarks Really Hard to reliably find landmarks Really really hard to reliably find landmarks e Data association is slow and unreliable e False matches introduce substantial errors Accurate probabilistic models unavailable
Implications for Visual SLAM • Hard to reliably find landmarks • Really Hard to reliably find landmarks • Really Really Hard to reliably find landmarks • Data association is slow and unreliable • False matches introduce substantial errors • Accurate probabilistic models unavailable 7
Remarks on SIFT approach e For visual slam. landmarks must be identifiable across arge changes in distance Small changes in view direction (Bonus) Changes in illumination ● Solution: Produce"scale-invariant" image representation Extract points with associated scale information Use matcher empirically capable of handling small displacements
Remarks on SIFT approach • For visual SLAM, landmarks must be identifiable across: – Large changes in distance – Small changes in view direction – (Bonus) Changes in illumination • Solution: – Produce “scale-invariant” image representation – Extract points with associated scale information – Use matcher empirically capable of handling small displacements 8
The Scale-Invariant Feature Transform Described in Lowe, IJCV 2004(preprint; use Google) ·上 our stages Scale-space extrema extraction Keypoint pruning and localization(not used in SLAM Orientation assignment Keypoint descriptor(not used in SLAM)
The Scale-Invariant Feature Transform • Described in Lowe, IJCV 2004 (preprint; use Google) • Four stages: – Scale-space extrema extraction – Keypoint pruning and localization (not used in SLAM) – Orientation assignment – Keypoint descriptor (not used in SLAM) 9
Lightning Introduction to Scale Space ● Motivation: Objects can be recognized at many levels of detail Large distances correspond to low I.o.d Different kinds of information are available at each level 10
Lightning Introduction to Scale Space • Motivation: – Objects can be recognized at many levels of detail – Large distances correspond to low l.o.d. – Different kinds of information are available at each level 10