About

The computer vision and machine perception group at Saarland University focuses on 3D computer vision and lifelong learning, where the emphasis lies on building machine perception algorithms that are not rigid but able to adapt to their environment and evolve. The group connects to surrounding interdisciplinary fields, such as fundamentals, reinforcement learning, natural language processing and privacy. The group is headed by Eddy Ilg, who is known for his contributions to computer vision on optical flow and his work on AR technology at Meta.

Saarland University is surrounded by the Saarland Informatics Campus that is one of the leading locations for computer science in Germany and Europe with five strong research institutes and three collaborating university departments. It holds 28 ERC Grants and 7 Gottfried Wilhelm Leibniz Prize winners. The group is english-speaking and has collaborations with many other famous researchers - inside Saarland Informatics Campus as well as internationally.

News

[6/7/2022]
Eddy Ilg was awarded the professorship at Saarland University.
[5/25/2022]
Eddy Ilg became an ELLIS member.
[5/8/2022]
The group has an open Hiwi Position.
[5/8/2022]
Tom Fischer and Nikhil Paliwal are joining the group for their Master's thesis.
[4/27/2022]
The group became a Continual AI Unit.
[4/10/2022]
The group is hiring! See here.
[3/2/2022]
NinjaDesc has been accepted to CVPR 2022.
[2/28/2022]
Our paper on explicit radiance field reconstruction without deep learning is available on arXiv.
[10/15/2021]
[7/29/2020]
[7/7/2020]
Our paper on tight learned inertial odometry was accepted to RAL/IROS.
[7/3/2020]
Our paper on deep local shapes was accepted to ECCV.

People

Prof. Eddy Ilg

Eddy Ilg

I lead the CVMP group and became professor at Saarland University in June 2022.

Previously I have worked in the US at Meta Reality Labs on project LiveMaps that aims to build a complete machine perception stack for augmented reality, ranging from hardware to mapping and localization as well as 3D scene understanding up to photorealistic 3D reconstruction. This work gifted me with a very broad perspective on AI and understanding the challenges involved in building AI systems.

I obtained my PhD from the University of Freiburg and I am very well known for my work on optical flow, which led to a paradigm shift and first established deep learning approaches in the field. Furthermore I am known my work in related areas on the estimation of disparities, motion and occlusion boundaries, occlusions as well as uncertainties. Please have a look at my thesis.

As a fun fact, during high school I wrote an operating system and won the prize for the best work in software engineering at the German young researcher's federal contest. I am generally a very curious and creative person, in my free time I like to do acting, play piano, go running and to do woodworking. During my studies I was supported by a scholarship from the German Academic Scholarship Foundation for extraordinarily talented students with a vast horizon.

My current research interests lie in bringing AI systems to the next level and using deep learning in combination with 3D computer vision, computer graphics and NLP. My vision is to combine research from these areas to build AI systems that are not rigid and can evolve over time.

During my PhD I have previously had the chance to lead a small team and a strong team atmosphere is very important to me. Research is fundamentally about moving forward into the unknown and I personally find it is invaluable to have many and diverse perspectives on one's work. Therefore I support working in small teams with an intensive disucssion culture and a healthy team athmosphere. To my students I can serve as mentor in all aspects of their career from theory to engineering and from academia to industry.


Tom Fischer

Tom Fischer

Starting in June, I will write my master thesis with Professor Ilg in the newly founded group Computer Vision and Machine Perception.

I did my Bachelor degree in Cybersecurity from 2016-2020 at Saarland University with a focus on cryptography and secure machine learning. Since 2020 I am doing my Master's in computer science. My research interests lie in the intersection between computer vision and deep learning, where I want to design transparent and robust deep learning solutions for computer vision problems.

For my thesis, I am investigating if and how it is possible to extend state of the art deep learning approaches for optical flow with explicit diffusion processes. We hope that using the well-understood diffusion modelling theory lets us design more explainable and stable networks to solve the optical flow problem efficiently.

When I am not staring at a computer screen, I like to cook and experiment with different recipes and cuisines. I love eating out with friends and I am always looking for restaurant recommendations!


Nikhil Paliwal

Nikhil Paliwal

I am a Master's student at Saarland University studying in the Data Science and Artificial Intelligence program. I have a background in self-supervision, model compression techniques and deep learning. I did my bachelors in Communication and Computer Engineering from the LNM Institute of Information Technology, India from 2015-19. I also received Chairman's gold medal for best-overall performance in graduating batch.

I am aiming towards making deep learning more accessible by targetting large supervised label requirement, computational resource needs and catastrophic interference in continual systems. In my master thesis I will be focusing on topic of knowledge transfer with continual learning.

In my leisure time, I enjoy activities such as Football or swimming. I also enjoy science fiction, reading novels and watching movies.

Job Postings

Open PhD Student Positions
in Computer Vision and Continual Learning

Saarland University is one of the leading locations for computer science in Germany and Europe. The research group on computer vision and machine perception is headed by Eddy Ilg with close ties to MPI Saarbrücken and to Meta and currently has two open positions for PhD students.

While machine learning is omnipresent, a strong limitation of conventional machine learning systems is that they are trained on fixed datasets and not able to adapt. The goal of your PhD will be to create state-of-the-art 2D and 3D machine perception algorithms that can learn incrementally and evolve. This is a topic that is highly relevant for the future of AI and connects to 2D and 3D vision, natural language processing, reinforcement learning and fundamental research. You’ll be working in an interdisciplinary setting embedded in the environment of Saarland Informatics Campus, working with leading researchers in the field and building international relationships.

Requirements:

Please include the following information with your application:

And send the application to .

You can find more information and news at cvmp.cs.uni-saarland.de.

Eddy Ilg is known for his contributions to computer vision on optical flow and his work on AR technology at Meta. Saarland University is surrounded by the Saarland Informatics Campus Campus with five strong research institutes and three collaborating university departments. It offers a dynamic and stimulating research environment and holds 28 ERC Grants and 7 Gottfried Wilhelm Leibniz Prize winners. You will be working in an english-speaking diverse environment with international students and collaborate with many other famous professors, inside Saarland Informatics Campus as well as internationally.

Please share this post!

#phd #phdposition #phdstudent #postdoc #machinelearning #deeplearning #computervision #machineperception #artificialintelligence #computergraphics #saarlanduniversity #saarlandinformaticscampus #hiring #research #ai #technology #university #augmentedreality

Open PostDoc Position
in Computer Vision and Continual Learning

Saarland University is one of the leading locations for computer science in Germany and Europe. The research group on computer vision and machine perception is headed by Eddy Ilg with close ties to MPI Saarbrücken and to Meta. The group currently has an open position for a PostDoc that will be co-affiliated with MPI Saarbrücken with the groups of Christian Theobalt and Bernt Schiele. Optionally there is the opportunity to co-found a startup in an area with groundbreaking potential.

While machine learning is omnipresent, a strong limitation of conventional machine learning systems is that they are trained on fixed datasets and not able to adapt. Although continual learning addresses this shortcoming, it is still in a very early stage and not yet well established in computer vision. The group is looking for a PostDoc that will lead the effort to advance 2D and 3D machine perception algorithms to be able to learn incrementally and evolve, which is highly relevant for the future of AI. You will be working in an interdisciplinary setting between computer vision, 3D reconstruction, NLP and fundamental machine learning. You will be co-supervising PhD students, contributing to the direction of the group and supporting the group in building international relationships.

Requirements:

Please include the following information with your application:

And send the application to .

You can find more information and news at cvmp.cs.uni-saarland.de.

Eddy Ilg is known for his contributions to computer vision on optical flow and his work on AR technology at Meta. Saarland University is surrounded by the Saarland Informatics Campus Campus with five strong research institutes and three collaborating university departments. It offers a dynamic and stimulating research environment and holds 28 ERC Grants and 7 Gottfried Wilhelm Leibniz Prize winners. You will be working in an english-speaking diverse environment with international students and collaborate with many other famous professors, inside Saarland Informatics Campus as well as internationally.

Please share this post!

#phd #phdposition #phdstudent #postdoc #machinelearning #deeplearning #computervision #machineperception #artificialintelligence #computergraphics #saarlanduniversity #saarlandinformaticscampus #hiring #research #ai #technology #university #augmentedreality

Hiwi Position

Starting from June 20th, we are looking for a Hiwi that can support the group in setting up a machine learning Python environment. You will implement components that allow to easily train and analyze neural networks, including data loading, multi-GPU training and dispatiching jobs on the cluster. The job offers the opportunity to gain expertise from Eddy Ilg, who has implemented many deep learning projects and can share industry experience in this field.

Requirements:

Please send applications to .

Thesis Offerings

Optical Flow Estimation with Deep Learning and Explicit Diffusion

Bachelor’s or Master’s thesis co-supervised with Joachim Weickert:

Traditional correspondence estimation between a pair of images is usually described by a data and a smoothness term. While the data term is modeled in modern deep learning approaches with the help of cost volumes, the smoothness term is not modeled at all.

The thesis will investigate introducing novel operations into deep networks that enable them to make use of explicit diffusion. The desired outcome is a conference paper. Please contact for more information.

Publications

2022

ERF: Explicit Radiance Field Reconstruction From Scratch

Samir Aroudj, Steven Lovegrove, Eddy Ilg, Tanner Schmidt, Michael Goesele and Richard Newcombe

arXiv Paper 2022

We propose a novel explicit dense 3D reconstruction approach that processes a set of images of a scene with sensor poses and calibrations and estimates a photo-real digital model. One of the key innovations is that the underlying volumetric representation is completely explicit in contrast to neural network-based (implicit) alternatives. We encode scenes explicitly using clear and understandable mappings of optimization variables to scene geometry and their outgoing surface radiance. We represent them using hierarchical volumetric fields stored in a sparse voxel octree. Robustly reconstructing such a volumetric scene model with millions of unknown variables from registered scene images only is a highly non-convex and complex optimization problem. To this end, we employ stochastic gradient descent (Adam) which is steered by an inverse differentiable renderer.

We demonstrate that our method can reconstruct models of high quality that are comparable to state-of-the-art implicit methods. Importantly, we do not use a sequential reconstruction pipeline where individual steps suffer from incomplete or unreliable information from previous stages, but start our optimizations from uniformed initial solutions with scene geometry and radiance that is far off from the ground truth. We show that our method is general and practical. It does not require a highly controlled lab setup for capturing, but allows for reconstructing scenes with a vast variety of objects, including challenging ones, such as outdoor plants or furry toys. Finally, our reconstructed scene models are versatile thanks to their explicit design. They can be edited interactively which is computationally too costly for implicit alternatives.

NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning

Tony Ng, Hyo Jin Kim, Vincent Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk and Chris Sweeney

Conference on Computer Vision and Pattern Recognition (CVPR) 2022

In the light of recent analyses on privacy-concerning scene revelation from visual descriptors, we develop descriptors that conceal the input image content. In particular, we propose an adversarial learning framework for training visual descriptors that prevent image reconstruction, while maintaining the matching accuracy. We let a feature encoding network and image reconstruction network compete with each other, such that the feature encoder tries to impede the image reconstruction with its generated descriptors, while the reconstructor tries to recover the input image from the descriptors. The experimental results demonstrate that the visual descriptors obtained with our method significantly deteriorate the image reconstruction quality with minimal impact on correspondence matching and camera localization performance.

2021

Mitigating Reverse Engineering Attacks on Local Feature Descriptors

Deeksha Dangwal, Vincent T. Lee, Hyo Jin Kim, Tianwei Shen, Meghan Cowan, Rajvi Shah, Caroline Trippel, Brandon Reagen, Timothy Sherwood, Vasileios Balntas, Armin Alaghi and Eddy Ilg

British Machine Vision Conference (BMVC) 2021

As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse engineering attacks are possible and allow an adversary to reconstruct RGB images. This poses a potential risk to user privacy. We take this a step further and model potential adversaries using a privacy threat model. Subsequently, we show under controlled conditions a reverse engineering attack on sparse feature maps and analyze the vulnerability of popular descriptors including FREAK, SIFT and SOSNet. Finally, we evaluate potential mitigation techniques that select a subset of descriptors to carefully balance privacy reconstruction risk while preserving image matching accuracy; our results show that similar accuracy can be obtained when revealing less information.

2020

Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction

Rohan Chabra, Jan Eric Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove and Richard Newcombe

European Conference on Computer Vision (ECCV) 2020

Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception. To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables encoding and reconstruction of high-quality 3D shapes without prohibitive memory requirements. DeepLS replaces the dense volumetric signed distance function (SDF) representation used in traditional surface reconstruction systems with a set of locally learned continuous SDFs defined by a neural network, inspired by recent work such as DeepSDF. Unlike DeepSDF, which represents an object-level SDF with a neural network and a single latent code, we store a grid of independent latent codes, each responsible for storing information about surfaces in a small local neighborhood. This decomposition of scenes into local shapes simplifies the prior distribution that the network must learn, and also enables efficient inference. We demonstrate the effectiveness and generalization power of DeepLS by showing object shape encoding and reconstructions of full scenes, where DeepLS delivers high compression, accuracy, and local shape completion.

TLIO: Tight Learned Inertial Odometry

Wenxin Liu, David Caruso, Eddy Ilg, Jing Dong, Anastasios I. Mourikis, Kostas Daniilidis, Vijay Kumar and Jakob Engel

IEEE Robotics and Automation Letters 2020

In this work we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the capability of using trained neural networks to obtain accurate 2D displacement estimates from segments of IMU data and obtained good position estimates from concatenating them. This paper demonstrates a network that regresses 3D displacement estimates and its uncertainty, giving us the ability to tightly fuse the relative state measurement into a stochastic cloning EKF to solve for pose, velocity and sensor biases. We show that our network, trained with pedestrian data from a headset, can produce statistically consistent measurement and uncertainty to be used as the update step in the filter, and the tightly-coupled system outperforms velocity integration approaches in position estimates, and AHRS attitude filter in orientation estimates.

Domain Adaptation of Learned Features for Visual Localization

Sungyong Baik, Hyo Jin Kim, Tianwei Shen, Eddy Ilg, Kyoung Mu Lee and Christopher Sweeney

British Machine Vision Conference (BMVC) 2020

We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain.

2019

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Osama Makansi, Eddy Ilg, Özgün Çiçek and Thomas Brox

Conference on Computer Vision and Pattern Recognition (CVPR) 2019

Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.

2018

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy and Thomas Brox

Int. Journal of Computer Vision (IJCV) 2018

The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks.We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.

FusionNet and AugmentedFlowNet: Selective Proxy Ground Truth for Training on Unlabeled Images

Osama Makansi, Eddy Ilg and Thomas Brox

arXiv Paper 2018

Recent work has shown that convolutional neural networks (CNNs) can be used to estimate optical flow with high quality and fast runtime. This makes them preferable for real-world applications. However, such networks require very large training datasets. Engineering the training data is difficult and/or laborious. This paper shows how to augment a network trained on an existing synthetic dataset with large amounts of additional unlabelled data. In particular, we introduce a selection mechanism to assemble from multiple estimates a joint optical flow field, which outperforms that of all input methods. The latter can be used as proxy-ground-truth to train a network on real-world data and to adapt it to specific domains of interest. Our experimental results show that the performance of networks improves considerably, both, in cross-domain and in domain-specific scenarios. As a consequence, we obtain state-of-the-art results on the KITTI benchmarks.

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

Eddy Ilg, Tonmoy Saikia, Margret Keuper and Thomas Brox

European Conference on Computer Vision (ECCV) 2018

Occlusions play an important role in optical flow and disparity estimation, since matching costs are not available in occluded areas and occlusions indicate motion boundaries. Moreover, occlusions are relevant for motion segmentation and scene flow estimation. In this paper, we present an efficient learning-based approach to estimate occlusion areas jointly with optical flow or disparities. The estimated occlusions and motion boundaries clearly improve over the state of the art. Moreover, we present networks with state-of-the-art performance on the popular KITTI benchmark and good generic performance. Making use of the estimated occlusions, we also show imprved results on motion segmentation and scene flow estimation.

Uncertainty Estimates and Multi-Hypotheses Networks for Optical Flow

Eddy Ilg, Özgün Çiçek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter and Thomas Brox

European Conference on Computer Vision (ECCV) 2018

Optical flow estimation can be formulated as an end-to-end supervised learning problem, which yields estimates with a superior accuracy-runtime tradeoff compared to alternative methodology. In this paper, we make such networks estimate their local uncertainty about the correctness of their prediction, which is vital information when building decisions on top of the estimations. For the first time we compare several strategies and techniques to estimate uncertainty in a large-scale computer vision task like optical flow estimation. Moreover, we introduce a new network architecture and loss function that enforce complementary hypotheses and provide uncertainty estimates efficiently with a single forward pass and without the need for sampling or ensembles. We demonstrate the quality of the uncertainty estimates, which is clearly above previous confidence measures on optical flow and allows for interactive frame rates.

2017

End-to-End Learning of Video Super-Resolution with Motion Compensation

Osama Makansi, Eddy Ilg and Thomas Brox

German Conference on Pattern Recognition (GCPR) 2017

Learning approaches have shown great success in the task of super-resolving an image given a low resolution input. Video superresolution aims for exploiting additionally the information from multiple images. Typically, the images are related via optical flow and consecutive image warping. In this paper, we provide an end-to-end video superresolution network that, in contrast to previous works, includes the estimation of optical flow in the overall network architecture. We analyze the usage of optical flow for video super-resolution and find that common off-the-shelf image warping does not allow video super-resolution to benefit much from optical flow. We rather propose an operation for motion compensation that performs warping from low to high resolution directly. We show that with this network configuration, video superresolution can benefit from optical flow and we obtain state-of-the-art results on the popular test sets. We also show that the processing of whole images rather than independent patches is responsible for a large increase in accuracy.

DeMoN: Depth and Motion Network for Learning Monocular Stereo

Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy and Thomas Brox

Conference on Computer Vision and Pattern Recognition (CVPR) 2017

In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion, but additionally surface normals, optical flow between the images and confidence of the matching. A crucial component of the approach is a training loss based on spatial relative differences. Compared to traditional two-frame structure from motion methods, results are more accurate and more robust. In contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and, thus, better generalizes to structures not seen during training.

Lucid Data Dreaming for Object Tracking

Anna Khoreva, Rodrigo Benenson, Eddy Ilg, Thomas Brox and Bernt Schiele

Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) 2017

Convolutional networks reach top quality in pixel-level object tracking but require a large amount of training data (1k to 10k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20 to 100 less annotated data than competing methods. Our approach is suitable for both for single and multiple object tracking. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize (“lucid dream”1 ) plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the tracking task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general “objectness” knowledge are required for the object tracking task.

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy and Thomas Brox

Conference on Computer Vision and Pattern Recognition (CVPR) 2017

The FlowNet demonstrated that optical flow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flow has still been defined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flow and make it work really well. The large improvements in quality and speed are caused by three major contributions: first, we focus on the training data and show that the schedule of presenting data during training is very important. Second, we develop a stacked architecture that includes warping of the second image with intermediate optical flow. Third, we elaborate on small displacements by introducing a subnetwork specializing on small motions. FlowNet 2.0 is only marginally slower than the original FlowNet but decreases the estimation error by more than 50%. It performs on par with state-of-the-art methods, while running at interactive frame rates. Moreover, we present faster variants that allow optical flow computation at up to 140fps with accuracy matching the original FlowNet.

2016

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy and Thomas Brox

Conference on Computer Vision and Pattern Recognition (CVPR) 2016

Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluation of scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.

2015

FlowNet: Learning Optical Flow with Convolutional Networks

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers and Thomas Brox

Int. Conference on Computer Vision (ICCV) 2015

Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks CNNs succeeded at. In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

2014

Reconstruction of Rigid Body Models from Motion Distorted Laser Range Data Using Optical Flow

Eddy Ilg, Rainer Kuemmerle, Wolfram Burgard, Thomas Brox

Int. Conference on Robotics and Automation (ICRA) 2014

The setup of tilting a 2D laser range finder up and down is a widespread strategy to acquire 3D point clouds. This setup requires that the scene is static while the robot takes a 3D scan. If an object moves through the scene during the measurement process and one does not take into account these movements, the resulting model will get distorted. This paper presents an approach to reconstruct the 3D model of a moving rigid object from the inconsistent set of 2D measurements by the help of a camera. Our approach utilizes optical flow in the camera images to estimate the motion in the image plane and point-line constraints to compensate the missing information about the motion in depth. We combine multiple sweeps and/or views into to a single consistent model using a point-to-plane ICP approach and optimize single sweeps by smoothing the resulting trajectory. Experiments obtained in real outdoor scenarios with moving cars demonstrate that our approach yields accurate models.

Contact

E-Mail:
Phone: available in June
Address: available in June
Google Maps

Imprint

Please contact for questions. The author is not responsible for the content of external pages.