Updated on 2026.06.11

Usage instructions: here

NeurIPS 2025

Keyword Title & Abstract Authors Links
sim2real EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
We introduce EnerVerse, a generative robotics foundation model that constructs and interprets embodied spaces. EnerVerse employs a chunk-wise autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning. To model the 3D robotics world, we adopt a multi-view video representation, providing rich…
Guanghui Ren Team OpenReview
sim2real Taming generative video models for zero-shot optical flow extraction
Extracting optical flow from videos remains a core computer vision problem. Motivated by the recent success of large general-purpose models, we ask whether frozen self-supervised video models trained only to predict future frames can be prompted, without fine-tuning, to output flow. Prior attempts to read out depth or illumination from video generators required fine-tuning; that strategy is…
Daniel LK Yamins Team OpenReview
sim2real SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain…
Andreas Krause Team OpenReview
tactile Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies
Palpation, the use of touch in medical examination, is almost exclusively performed by humans. We investigate a proof of concept for an artificial palpation method based on self-supervised learning. Our key idea is that an encoder-decoder framework can learn a representation from a sequence of tactile measurements that contains all the relevant information about the palpated object. We…
Aviv Tamar Team OpenReview
sim2real RFMPose: Generative Category-level Object Pose Estimation via Riemannian Flow Matching
We introduce RFMPose, a novel generative framework for category-level 6D object pose estimation that learns deterministic pose trajectories through Riemannian Flow Matching (RFM). Existing discriminative approaches struggle with multi-hypothesis predictions (e.g., symmetry ambiguities) and often require specialized network architectures. RFMPose advances this paradigm through three key…
Jiming Chen Team OpenReview
sim2real DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
Training Reinforcement Learning (RL) controllers in simulation offers cost-efficiency and safety advantages. However, the resultant policies often suffer significant performance degradation during real-world deployment due to the reality gap. Previous works like System Identification (Sys-Id) have attempted to bridge this discrepancy by improving simulator fidelity, but encounter challenges…
Chunlin Chen Team OpenReview
sim2real URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model
Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we propose \textbf{URDF-Anything}, an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM). URDF-Anything utilizes an…
Shanghang Zhang Team OpenReview
tactile Enhancing Tactile-based Reinforcement Learning for Robotic Control
Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state information. Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more…
Sethu Vijayakumar Team OpenReview
learnedcontrol KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
Humanoid robots are promising to acquire various skills by imitating human behaviors. However, existing algorithms are only capable of tracking smooth, low-speed human motions, even with delicate reward and curriculum design. This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion…
Xuelong Li Team OpenReview
learnedcontrol Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during…
Xuelong Li Team OpenReview
tactile Universal Visuo-Tactile Video Understanding for Embodied Interaction
Tactile perception is essential for embodied agents to understand the physical attributes of objects that cannot be determined through visual inspection alone. While existing methods have made progress in visual and language modalities for physical understanding, they fail to effectively incorporate tactile information that provides crucial haptic feedback for real-world interaction. In this…
Wenbo Ding Team OpenReview
tactile Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse,…
Yunzhu Li Team OpenReview
tactile Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent…
Aran Nayebi Team OpenReview
learnedcontrol AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error…
Shanghang Zhang Team OpenReview
sim2real TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
Illumination and texture rerendering are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly…
Zhaoxiang Zhang Team OpenReview
sim2real Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning
Multi-agent reinforcement learning (MARL), as a thriving field, explores how multiple agents independently make decisions in a shared dynamic environment. Due to environmental uncertainties, policies in MARL must remain robust to tackle the sim-to-real gap. We focus on robust two-player zero-sum Markov games (TZMGs) in offline settings, specifically on tabular robust TZMGs (RTZMGs). We propose a…
Xinyu Li Team OpenReview
sim2real learnedcontrol From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Achieving general agile whole-body control on humanoid robots remains a major challenge due to diverse motion demands and data conflicts. While existing frameworks excel in training single motion-specific policies, they struggle to generalize across highly varied behaviors due to conflicting control requirements and mismatched data distributions. In this work, we propose BumbleBee (BB), an…
Zongqing Lu Team OpenReview
sim2real DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation….
Hao Dong Team OpenReview
tactile Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Tactile sensing is crucial for achieving human-level robotic capabilities in manipulation tasks. As a promising solution, Vision-based Tactile Sensors (VBTSs) offer high spatial resolution and cost-effectiveness, but present unique challenges in robotics for their complex physical characteristics and visual signal processing requirements. The lack of efficient and accurate simulation tools for…
Siyuan Huang Team OpenReview
tactile RAPID Hand: Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Embodied Intelligence
This paper addresses the scarcity of low-cost but high-dexterity platforms for collecting real-world multi-fingered robot manipulation data towards generalist robot autonomy. To achieve it, we propose the RAPID Hand, a co-optimized hardware and software platform where the compact 20-DoF hand, robust whole-hand perception, and high-DoF teleoperation interface are jointly designed. Specifically,…
Hui Cheng Team OpenReview