Updated on 2026.06.11
Usage instructions: here
NeurIPS 2025
| Keyword | Title & Abstract | Authors | Links |
|---|---|---|---|
sim2real |
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation We introduce EnerVerse, a generative robotics foundation model that constructs and interprets embodied spaces. EnerVerse employs a chunk-wise autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning. To model the 3D robotics world, we adopt a multi-view video representation, providing rich… |
Guanghui Ren Team | OpenReview |
sim2real |
Taming generative video models for zero-shot optical flow extraction Extracting optical flow from videos remains a core computer vision problem. Motivated by the recent success of large general-purpose models, we ask whether frozen self-supervised video models trained only to predict future frames can be prompted, without fine-tuning, to output flow. Prior attempts to read out depth or illumination from video generators required fine-tuning; that strategy is… |
Daniel LK Yamins Team | OpenReview |
sim2real |
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain… |
Andreas Krause Team | OpenReview |
tactile |
Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies Palpation, the use of touch in medical examination, is almost exclusively performed by humans. We investigate a proof of concept for an artificial palpation method based on self-supervised learning. Our key idea is that an encoder-decoder framework can learn a representation from a sequence of tactile measurements that contains all the relevant information about the palpated object. We… |
Aviv Tamar Team | OpenReview |
sim2real |
RFMPose: Generative Category-level Object Pose Estimation via Riemannian Flow Matching We introduce RFMPose, a novel generative framework for category-level 6D object pose estimation that learns deterministic pose trajectories through Riemannian Flow Matching (RFM). Existing discriminative approaches struggle with multi-hypothesis predictions (e.g., symmetry ambiguities) and often require specialized network architectures. RFMPose advances this paradigm through three key… |
Jiming Chen Team | OpenReview |
sim2real |
DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer Training Reinforcement Learning (RL) controllers in simulation offers cost-efficiency and safety advantages. However, the resultant policies often suffer significant performance degradation during real-world deployment due to the reality gap. Previous works like System Identification (Sys-Id) have attempted to bridge this discrepancy by improving simulator fidelity, but encounter challenges… |
Chunlin Chen Team | OpenReview |
sim2real |
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we propose \textbf{URDF-Anything}, an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM). URDF-Anything utilizes an… |
Shanghang Zhang Team | OpenReview |
tactile |
Enhancing Tactile-based Reinforcement Learning for Robotic Control Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state information. Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more… |
Sethu Vijayakumar Team | OpenReview |
learnedcontrol |
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Humanoid robots are promising to acquire various skills by imitating human behaviors. However, existing algorithms are only capable of tracking smooth, low-speed human motions, even with delicate reward and curriculum design. This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion… |
Xuelong Li Team | OpenReview |
learnedcontrol |
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during… |
Xuelong Li Team | OpenReview |
tactile |
Universal Visuo-Tactile Video Understanding for Embodied Interaction Tactile perception is essential for embodied agents to understand the physical attributes of objects that cannot be determined through visual inspection alone. While existing methods have made progress in visual and language modalities for physical understanding, they fail to effectively incorporate tactile information that provides crucial haptic feedback for real-world interaction. In this… |
Wenbo Ding Team | OpenReview |
tactile |
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse,… |
Yunzhu Li Team | OpenReview |
tactile |
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent… |
Aran Nayebi Team | OpenReview |
learnedcontrol |
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error… |
Shanghang Zhang Team | OpenReview |
sim2real |
TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer Illumination and texture rerendering are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly… |
Zhaoxiang Zhang Team | OpenReview |
sim2real |
Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning Multi-agent reinforcement learning (MARL), as a thriving field, explores how multiple agents independently make decisions in a shared dynamic environment. Due to environmental uncertainties, policies in MARL must remain robust to tackle the sim-to-real gap. We focus on robust two-player zero-sum Markov games (TZMGs) in offline settings, specifically on tabular robust TZMGs (RTZMGs). We propose a… |
Xinyu Li Team | OpenReview |
sim2real learnedcontrol |
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an… |
Zongqing Lu Team | OpenReview |
sim2real |
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation…. |
Hao Dong Team | OpenReview |
tactile |
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Tactile sensing is crucial for achieving human-level robotic capabilities in manipulation tasks. As a promising solution, Vision-based Tactile Sensors (VBTSs) offer high spatial resolution and cost-effectiveness, but present unique challenges in robotics for their complex physical characteristics and visual signal processing requirements. The lack of efficient and accurate simulation tools for… |
Siyuan Huang Team | OpenReview |
tactile |
RAPID Hand: Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Embodied Intelligence This paper addresses the scarcity of low-cost but high-dexterity platforms for collecting real-world multi-fingered robot manipulation data towards generalist robot autonomy. To achieve it, we propose the RAPID Hand, a co-optimized hardware and software platform where the compact 20-DoF hand, robust whole-hand perception, and high-DoF teleoperation interface are jointly designed. Specifically,… |
Hui Cheng Team | OpenReview |