Event-based Fusion for Motion Deblurring with Cross-modal Attention

ECCV'2022 Oral

Lei Sun1,2, Christos Sakaridis2, Jingyun Liang2, Qi Jiang1, Kailun Yang3, Peng Sun1, Yaozu Ye1, Kaiwei Wang1, Luc Van Gool2
1ZhejiangUniversity, 2ETH Zurich, 3KIT, 4KU Leuven
model image.

EFNet restores blurry image with high-temporal-resolution events.

Abstract

Traditional frame-based cameras inevitably suffer from motion blur due to long exposure times. As a kind of bio-inspired camera, the event camera records the intensity changes in an asynchronous way with high temporal resolution, providing valid image degradation information within the exposure time.

In this work, We rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network. To effectively fuse event and image features, we design an event-image cross-modal attention module applied at multiple levels of our network, which allows to focus on relevant features from the event branch and filter out noise.

We also introduce a novel symmetric cumulative event representation specifically for image deblurring as well as an event mask gated connection between the two stages of our network which helps avoid information loss. At the dataset level, to foster event-based motion deblurring and to facilitate evaluation on challenging real-world images, we introduce the Real Event Blur (REBlur) dataset, captured with an event camera in an illumination controlled optical laboratory.

Our Event Fusion Network (EFNet) sets the new state of the art in motion deblurring, surpassing both the prior best-performing image-based method and all event-based methods with public implementations on the GoPro dataset (by up to 2.47dB) and on our REBlur dataset, even in extreme blurry conditions.

Modules

model image.

EICA uses a multi-head attention mechanism to fuse event and image features. We transpose the feature maps to reduce the computation complexity from O(h2w2) to O(c2) for high-resolution event-based image deblurring.

EMGC selectively connects certain regions of the feature maps of the first stage of our network to the second stage. This is inspired by the observation that regions in which events occur are more severely degraded in the blurry image. We binarize the events and use the resulting "event mask" to guide the restoration.

Event Representation

model image.

Symmetric Cumulative Event Representation (SCER) feed the asynchronous events to our network, and encodes information about blur in the image by accumulating events at symmetric endpoints.

Qualitative Results

GoPro

Blurry image Each Channel in SCER Result

model image.
model image.

model image.

REBlur

Blurry image Each Channel in SCER Result

model image.

model image.

BibTeX

@inproceedings{sun2022event,
      author = {Sun, Lei and Sakaridis, Christos and Liang, Jingyun and Jiang, Qi and Yang, Kailun and Sun, Peng and Ye, Yaozu and Wang, Kaiwei and Van Gool, Luc},
      title = {Event-Based Fusion for Motion Deblurring with Cross-modal Attention},
      booktitle = {European Conference on Computer Vision (ECCV)},
      year = 2022
      }