RAP-SAM: Towards Real-Time All-Purpose Segment Anything

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/.

Related collections

Author and article information

Journal

Publication date Created: 18 January 2024

Article

ArXiV ID: 2401.10228

SO-VID: 230a1252-5fb5-433f-8f70-447f23fd728d

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Project Page: https://xushilin1.github.io/rap_sam/

Categories cs.CV

ScienceOpen disciplines: Computer vision & Pattern recognition

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 230