Projects:

Global Hypothesis Generation for 6D Object Pose Estimation

DSAC – Differentiable RANSAC for Camera Localization

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image

Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images

Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression

6-DOF Model Based Tracking via Object Coordinate Regression

Learning 6D Object Pose Estimation using 3D Object Coordinates


 

Global Hypothesis Generation for 6D Object Pose Estimation:

Authors:

Frank Michel, Alexander Kirillov, Eric Brachmann, Alexander Krull, Stefan Gumhold, Bogdan Savchynskyy, Carsten Rother

Abstract:

This paper addresses the task of estimating the 6D pose of a known 3D object from a single RGB-D image. Most modern approaches solve this task in three steps: i) Compute local features; ii) Generate a pool of pose-hypotheses; iii) Select and refine a pose from the pool. This work focuses on the second step. While all existing approaches generate the hypotheses pool via local reasoning, e.g. RANSAC or Hough-voting, we are the first to show that global reasoning is beneficial at this stage. In particular, we formulate a novel fully-connected Conditional Random Field (CRF) that outputs a very small number of pose-hypotheses. Despite the potential functions of the CRF being non-Gaussian, we give a new and efficient two-step optimization procedure, with some guarantees for optimality. We utilize our global hypotheses generation procedure to produce results that exceed state-of-the-art for the challenging “Occluded Object Dataset”.

Publication:

F. Michel, A. Kirillov, E. Brachmann, A. Krull, S. Gumhold, B. Savchynskyy, C. Rother, “Global Hypothesis Generation for 6D Object Pose Estimation”, CVPR 2017 (spotlight).[pdf]


 

DSAC – Differentiable RANSAC for Camera Localization:

Authors:

Eric Brachmann¹, Alexander Krull¹, Sebastian Nowozin², Jamie Shotton², Frank Michel¹, Stefan Gumhold¹, Carsten Rother¹

¹ TU Dresden, ² Microsoft

Abstract:

RANSAC is an important algorithm in robust optimization and a central building block for many computer vision applications. In recent years, traditionally hand-crafted pipelines have been replaced by deep learning pipelines, which can be trained in an end-to-end fashion. However, RANSAC has so far not been used as part of such deep learning pipelines, because its hypothesis selection procedure is non-differentiable. In this work, we present two different ways to overcome this limitation. The most promising approach is inspired by reinforcement learning, namely to replace the deterministic hypothesis selection by a probabilistic selection for which we can derive the expected loss w.r.t. to all learnable parameters. We call this approach DSAC, the differentiable counterpart of RANSAC. We apply DSAC to the problem of camera localization, where deep learning has so far failed to improve on traditional approaches. We demonstrate that by directly minimizing the expected loss of the output camera poses, robustly estimated by RANSAC, we achieve an increase in accuracy. In the future, any deep learning pipeline can use DSAC as a robust optimization component.

Overview:

Differentiable Camera Localization Pipeline. Given an RGB image, we let a CNN with parameters w predict 2D-3D cor-
respondences, so called scene coordinates. From these, we sample minimal sets of four scene coordinates and create a pool of
hypotheses h. For each hypothesis, we create an image of reprojection errors which is scored by a second CNN with parameters v. We
select a hypothesis probabilistically according to the score distribution. The selected pose is also refined.

Code:

Our code is available via GitHub. We also provide a package of trained CNNs for the 7-Scenes data set.

Publication:

E. Brachmann, A. Krull, S. Nowozin, J. Shotton, F. Michel, S. Gumhold, C. Rother, “DSAC – Differentiable RANSAC for Camera Localization”, CVPR 2017 (oral).[pdf]


 

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image:

Authors:

Eric Brachmann, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, Carsten Rother

Abstract:

In recent years, the task of estimating the 6D pose of object instances and complete scenes, ie camera localization, from a single input image has received considerable attention. Consumer mbox{RGB-D} cameras have made this feasible, even for difficult, texture-less objects and scenes. In this work, we show that a single RGB image is sufficient to achieve visually convincing results. Our key concept is to model and exploit the uncertainty of the system at all stages of the processing pipeline. The uncertainty comes in the form of continuous distributions over 3D object coordinates and discrete distributions over object labels. We give three technical contributions. Firstly, we develop a regularized, auto-context regression framework which iteratively reduces uncertainty in object coordinate and object label predictions. Secondly, we introduce an efficient way to marginalize object coordinate distributions over depth. This is necessary to deal with missing depth information. Thirdly, we utilize the distributions over object labels to detect multiple objects simultaneously with a fixed budget of RANSAC hypotheses. We tested our system for object pose estimation and camera localization on commonly used data sets. We see a major improvement over competing systems.

Spotlight Video:

Results:

Teaser

(Left) Result of our method. The pose of the lamp is estimated sufficiently well for augmented realty. (Right) Result of a state-of-the-art system [Krull et al., ICCV 2015] that uses an RGB-D input image. The pose is less well suited for augmented reality.

2x2_Objects

Pose estimation from an RGB image. Four objects, partially overlaid with 3D models with the estimated pose.

supp_chess

Camera localization results. We show the estimated camera path (green) for one complete image sequence. The ground truth camera path (orange) is also shown for comparison.

Code:

We provide the source code of our method under the BSD License. The following package contains code, a short documentation and configuration files with parameter settings used in the experiments of the paper: CVPR16 Code

In order to test the code, we provide the follow dummy data package which contains some images of the dataset of Hinterstoisser et al. in the format our code requires: Dummy Data (for the original, complete dataset visit: http://campar.in.tum.de/Main/StefanHinterstoisser)

Please cite our CVPR16 paper if you use our code in your own work.

Publication:

E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold, C. Rother, “Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image”, CVPR 2016.[pdf][supplement]


 

Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images:

Authors:

Alexander Krull, Eric Brachmann, Frank Michel, Michael Ying Yang, Stefan Gumhold, Carsten Rother

Abstract:

Analysis-by-synthesis has been a successful approach for many tasks in computer vision, such as 6D pose estimation of an object in an RGB-D image which is the topic of this work. The idea is to compare the observation with the output of a forward process, such as a rendered image of the object of interest in a particular pose. Due to occlusion or complicated sensor noise, it can be difficult to perform this comparison in a meaningful way. We propose an approach that “learns to compare”, while taking these difficulties into account. This is done by describing the posterior density of a particular object pose with a convolutional neural network (CNN) that compares observed and rendered images. The network is trained with the maximum likelihood paradigm. We observe empirically that the CNN does not specialize to the geometry or appearance of specific objects. It can be used with objects of vastly different shapes and appearances, and in different backgrounds. Compared to state-of-the-art, we demonstrate a significant improvement on two different datasets which include a total of eleven objects, cluttered background, and heavy occlusion.

Spotlight Video:

Results:

qr1
qr2

Publication:

A. Krull, E. Brachmann, F. Michel, M. Ying Yang, S. Gumhold, C. Rother: “Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images“,Supplementary Material, ICCV 2015.


 

Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression:

Authors:

Frank Michel, Alexander Krull, Eric Brachmann, Michael Ying Yang, Stefan Gumhold, Carsten Rother

Abstract:

In this paper, we address the problem of one shot pose estimation of articulated objects from an RGB-D image. In particular, we consider object instances with the topology of a kinematic chain, i.e. assemblies of rigid parts connected by prismatic or revolute joints. This object type occurs often in daily live, for instance in the form of furniture or electronic devices. Instead of treating each object part separately we are using the relationship between parts of the kinematic chain and propose a new minimal pose sampling approach. This enables us to create a pose hypothesis for a kinematic chain consisting of K parts by sampling K 3D-3D point correspondences. To asses the quality of our method, we gathered a large dataset containing four objects and 7000+ annotated RGB-D frames. On this dataset we achieve considerably better results than a modified state-of-the-art pose estimation system for rigid objects.

Results:

Result_laptop Result_Cabinet
Result_Cupboard Result_Train

Dataset:

We created a dataset of four different kind of kinematic chains which differ in the number and type of joints. The objects are a laptop with a hinged lid (one revolute joint), a cabinet with a door and drawer (one revolute and one prismatic joint), a cupboard with one movable drawer (one prismatic joint) and a toy train consisting of four parts (four revolute joints). It can be downloaded here. See the readme for more information.

Publication:

F. Michel, A. Krull, E. Brachmann, M. Y. Yang, S. Gumhold, C. Rother: “Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression“, Supplementary Material, Extended Abstract, BMVC 2015.


 

6-DOF Model Based Tracking via Object Coordinate Regression:

Authors:

Alexander Krull, Frank Michel, Eric Brachmann, Stefan Gumhold, Stephan Ihrke, Carsten Rother

Abstract:

This work investigates the problem of 6-Degrees-Of-Freedom (6-DOF) object tracking from RGB-D images, where the object is rigid and a 3D model of the object is known. As in many previous works, we utilize a Particle Filter (PF) framework. In order to have a fast tracker, the key aspect is to design a clever proposal distribution which works reliably even with a small number of particles. To achieve this we build on a recently developed state-of-the-art system for single image 6D pose estimation of known 3D objects, using the concept of so-called 3D object coordinates. The idea is to train a random forest that regresses the 3D object coordinates from the RGB-D image. Our key technical contribution is a two-way procedure to integrate the random forest predictions in the proposal distribution generation. This has many practical advantages, in particular better generalization ability with respect to occlusions, changes in lighting and fast-moving objects. We demonstrate experimentally that we exceed state-of-the-art on a given, public dataset. To raise the bar in terms of fast-moving objects and object occlusions, we also create a new dataset, which will be made publicly available.

Results:

Dataset:

The dataset can be downloaded here. See the readme for more information.

Publication:

A. Krull, F. Michel, E. Brachmann, S. Gumhold, S. Ihrke, C. Rother: “6-DOF Model Based Tracking via Object Coordinate Regression“, Supplementary Material, ACCV 2014.


 

Learning 6D Object Pose Estimation using 3D Object Coordinates:

Authors:

Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, Carsten Rother

Abstract:

This work addresses the problem of estimating the 6D Pose of specific objects from a single RGB-D image. We present a flexible approach that can deal with generic objects, both textured and texture-less. The key new concept is a learned, intermediate representation in form of a dense 3D object coordinate labelling paired with a dense class labelling. We are able to show that for a common dataset with texture-less objects, where template-based techniques are suitable and state of the art, our approach is slightly superior in terms of accuracy. We also demonstrate the benefits of our approach, compared to template-based techniques, in terms of robustness with respect to varying lighting conditions. Towards this end, we contribute a new ground truth dataset with 10k images of 20 objects captured each under three different lighting conditions. We demonstrate that our approach scales well with the number of objects and has capabilities to run fast.

Overview:

Results:

benchmarking image

benchmarking image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Two qualitative pose estimation results on our dataset. Left: Input RGB-D frames with estimated pose displayed as blue bounding box, ground truth pose as green bounding box. Right: Object coordinate prediction of one tree. The upper inlay shows the ground truth object coordinates. The lower inlay shows for each pixel the best object coordinate prediction of all trees with respect to ground truth.

Datasets:

We make available our own dataset, background images we used during training, and additional annotations for the dataset of Hinterstoisser et al.[1].

20 Objects Light Dataset:

RGB-D images and ground truth poses for 20 textured and texture-less objects, each recorded under three different lighting conditions. See the readme for more information.

Download Dataset (4.4GB)

Background Dataset:

This dataset contains RGB-D images of different, cluttered office backgrounds. They were used to represent the background class
when training our random forest. See the readme for more information.

Download Dataset (1.1GB)

Occlusion Dataset:

NOTE: Below you find the version of the occlusion dataset as it was used in our ECCV14 paper. However, we released a reworked version of the dataset here: Occlusion Challenge. The reworked version contains all data (images, poses, 3D models of objects) and some annotation errors have been corrected. We advise to use the reworked version of the dataset.

This dataset contains additional annotations of occluded objects for the dataset of Hinterstoisser et al.[1]. See the readme for more information.

Download Dataset (2.6MB)

Software:

We provide pre-compiled binaries for Linux (Ubuntu 12.04). See the readme for more information.

Download Binaries (2.8MB)

Publication:

E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton, and C. Rother: “Learning 6D Object Pose Estimation using 3D Object Coordinates“, Supplementary Material, ECCV 2014.

References:

[1] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, N. Navab, “Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes”, ACCV 2012