Laneformer: Object-Aware Row-Column Transformers for Lane Detection

Jianhua Han; Xiajun Deng; Xinyue Cai; Zhen Yang; Hang Xu; Chunjing Xu; Xiaodan Liang

doi:10.1609/aaai.v36i1.19961

Authors

Jianhua Han Huawei Noah's Ark Lab
Xiajun Deng Sun Yat-sen University
Xinyue Cai Huawei Noah's Ark Lab
Zhen Yang Huawei Noah’s Ark Lab
Hang Xu Huawei Noah's Ark Lab
Chunjing Xu Huawei Noah's Ark Lab
Xiaodan Liang Sun Yat-sen University

DOI:

https://doi.org/10.1609/aaai.v36i1.19961

Keywords:

Computer Vision (CV)

Abstract

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail in incorporating relations of long-range lane points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture that revolutionizes the conventional transformers into better capturing the shape and semantic characteristics of lanes, with minimal overhead in latency. First, coupling with deformable pixel-wise self-attention in the encoder, Laneformer presents two new row and column self-attention operations to efficiently mine point context along with the lane shapes. Second, motivated by the appearing objects would affect the decision of predicting lane segments, Laneformer further includes the detected object instances as extra inputs of multi-head attention blocks in the encoder and decoder to facilitate the lane point detection by sensing semantic contexts. Specifically, the bounding box locations of objects are added into Key module to provide interaction with each pixel and query while the ROI-aligned features are inserted into Value module. Extensive experiments demonstrate our Laneformer achieves state-of-the-art performances on CULane benchmark, in terms of 77.1% F1 score. We hope our simple and effective Laneformer will serve as a strong baseline for future research in self-attention models for lane detection.

Laneformer: Object-Aware Row-Column Transformers for Lane Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription