Updated on 2026.06.11
Usage instructions: here
Dexterous
| Publish Date | Title & Abstract | Authors | Links | |
|---|---|---|---|---|
| 2026-06-09 | TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation Dexterous Manipulation TactileContact-rich manipulation requires robots to continuously perceive and regulate evolving physical interactions under dynamic contact transitions or complex surface geometries. Recent imitation learning methods improve contact-aware control by incorporating tactile or force feedback, but they rarely model the asymmetric spatiotemporal roles of global force and local tactile sensing. To address… |
Wenchao Ding Team | ArXiv | |
| 2026-06-09 | JOIN: Anchor-Grasp-Conditioned Joining via Opposition, Inference, and Navigation for Bimanual Assistive Manipulation DexterousAssistive mobility and manipulation platforms have received increasing attention as a means of restoring independence to individuals with disabilities. While effective for many basic activities of daily living (ADLs), a significant percentage of everyday tasks such as opening a jar, pouring a liquid, lifting a tray, or basic meal preparation, is fundamentally bimanual and remains out of reach for… |
Taşkın Padır Team | ArXiv | |
| 2026-06-09 | A Resurgent Analytic Framework for Indicial Umbral Calculus via Mellin-Barnes and Borel-Laplace Theories DexterousIndicial umbral calculus offers an effective operational framework for manipulating transcendental functions, yet its analytic foundations have long remained only partially understood. In this work, we provide a rigorous analytic realisation of the theory grounded in Mellin-Barnes integrals, Borel-Laplace summation, and resurgent analysis. By elevating umbral operators from formal algebraic… |
Roberto Ricci | ArXiv | |
| 2026-06-09 | WorldOlympiad: Can Your World Model Survive a Triathlon? DexterousWe introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight into whether generated videos obey physical rules, preserve coherent 3D structure, and sustain… |
Bohan Zhuang Team | ArXiv / Web | |
| 2026-06-09 | The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models DexterousThis study investigates cross-lingual distributional skew (the Shibboleth Effect) in frontier large language models (LLMs) subjected to sustained adversarial conditions. We develop a multi-agent geopolitical wargame, the Cerulean Sea Crisis, a synthetic maritime territorial dispute designed to mirror the structural dynamics of Eastern Mediterranean conflicts. Six frontier models (GPT-4o, Llama-4,… |
Hakan Mehmetcik | ArXiv | |
| 2026-06-09 | Task Robustness via Re-Labelling Vision-Action Robot Data Dexterous Manipulation VLAThe recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces Task Robustness via Re-Labelling… |
Glen Berseth Team | ArXiv / Web | |
| 2026-06-09 | Degeneracy and trajectory control of spin eigenmodes excited by fs-optical pulses in a nearly compensated ferrimagnet DexterousWe investigate optically excited spin dynamics in a uniaxial ferrimagnet near the magnetization compensation point under a magnetic field applied along the magnetic anisotropy axis. Experiment and numerical modeling reveal an unusual regime where the frequencies of two spin eigenmodes approach each other and become highly field sensitive. The modes, corresponding to opposite rotations of the Neel… |
D. O. Ignatyeva Team | ArXiv | |
| 2026-06-09 | MV-Actor: Aligning Multi-View Semantics and Spatial Awareness for Bimanual Manipulation Dexterous ManipulationRobotic manipulation has been widely applied in industrial scenarios. Compared with single-arm manipulation, bimanual manipulation is equipped with multiple cameras to capture information from different viewpoints. However, existing multi-view policies encode each view independently or fuse view features shallowly, resulting in limited sharing semantic perception and unreliable spatial awareness…. |
You Yang Team | ArXiv | |
| 2026-06-09 | A single-step lithography process for reconfigurable SiN photonics with TiN heaters and Al interconnects DexterousThermo-optic phase shifters are key building blocks in Silicon and Silicon Nitride-based reconfigurable photonic integrated circuits. They enable manipulating the phase of an optical signal by means of electrically-driven heating of an optical waveguide. Conventional fabrication schemes typically require dedicated lithographic steps to separately define the resistive heaters, the current… |
Mher Ghulinyan Team | ArXiv | |
| 2026-06-09 | LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination Dexterous VLAVision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study \textit{scene-induced occlusion} as a fundamental challenge for VLA models and introduce… |
Zhongyu Wei Team | ArXiv | |
| 2026-06-09 | IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic Manipulation Dexterous Manipulation TactileReal-world robotic manipulation tasks often involve forceful interactions with the environment, such as using tools of varying weights, transporting objects with different masses, and performing contact-rich tasks like table wiping. Previous learning-based approaches typically employ imitation learning policies that output target end-effector poses tracked by low-level impedance controllers. In… |
Yilun Du Team | ArXiv / Web | |
| 2026-06-09 | Overview of ESDD2: Environment-Aware Speech and Sound Deepfake Detection Challenge DexterousThe Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2), held in conjunction with ICME 2026, evaluated systems for five component-level audio spoofing detection, where speech and environmental sounds may be manipulated independently or jointly. After the challenge concludes, we analyze the final leaderboard and summarize effective design choices from the top-performing… |
Ming Li Team | ArXiv | |
| 2026-06-09 | Hand-centric Human-to-Robot Trajectory Transfer from Video Demonstrations via Open-World Contact Localization Dexterous ManipulationLearning from human video demonstrations remains challenging due to noisy hand-object interactions, unseen objects with partial observation, and cross-embodiment discrepancy. To address these challenges, we present \textit{HOWTransfer} (\emph{H}and-\emph{O}bject \emph{O}pen-\emph{W}orld Transfer), a hand-centric framework that distills human demonstrations into contact-aware, taxonomy-informed,… |
Rania Rayyes Team | ArXiv | |
| 2026-06-09 | UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data DexterousDexterous hands are essential for fine-grained manipulation, but their hardware designs vary substantially across embodiments. Differences in kinematics, joint definitions, and degrees of freedom make it difficult to define a shared state representation compared with parallel grippers. As a result, dexterous-hand data remains fragmented and difficult to use for joint training. In this work, we… |
Yu-Gang Jiang Team | ArXiv | |
| 2026-06-09 | ManiSplat: Manipulation Trajectory Synthesis from Monocular Video via Decoupled 3D Gaussian Splatting DexterousReconstructing dynamic and interactive 3D scenes from real-world observations remains a fundamental challenge in computer vision and robotics. While recent advances in 3D Gaussian Splatting have enabled high-fidelity static reconstruction, extending it to interactive environments with articulated robots and manipulable objects remains difficult due to complex contact interactions and abrupt pose… |
Gaoang Wang Team | ArXiv | |
| 2026-06-09 | snaproot: Decentralized File Integrity Verification Using Blockchain-Anchored Cryptographic Hashing DexterousThe rapid growth of digital content has made reliable integrity verification increasingly important. Existing solutions rely either on centralized authorities, which introduce trust dependencies and single points of failure, or on decentralized storage systems that incur prohibitive resource overhead. In this paper, we present snaproot, a lightweight system that implements the hash-anchoring… |
Tarkan Yavas Team | ArXiv | |
| 2026-06-09 | Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations Dexterous Manipulation VLARobotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous… |
Jinwoo Shin Team | ArXiv | |
| 2026-06-09 | Anomalous mobility edges and extended-localized transition in a quasiperiodic emitter-cavity array DexterousThe manipulation of localization in quasiperiodic systems by mobility edges or localization transition holds significant physical importance. In this letter, we demonstrated that the dissipation can induce the emergence of anomalous mobility edges and extended-localized transition in emitter-cavity arrays controlled by quasiperiodic potentials. Specifically, we observe that the localization… |
X. X. Yi Team | ArXiv | |
| 2026-06-09 | VeriSpace: Spatially Grounded Action Verification for Vision-Language-Action Models Dexterous Manipulation VLAVision-language-action (VLA) models have shown strong promise for robotic manipulation, but their reliability at test time remains limited by one-shot action prediction, where even small action errors can cause grasp failure, collision, or incorrect task progression. A natural alternative is to equip VLA systems with test-time verification, allowing multiple candidate actions to be proposed and… |
Jing Liu Team | ArXiv | |
| 2026-06-09 | UMI-Bench 1.0: An Open and Reproducible Real-World Benchmark for Tabletop Robotic Manipulation with UMI Data DexterousReal-robot evaluation is essential for understanding whether learned manipulation policies can operate reliably outside curated demonstrations. This need is particularly pressing for Universal Manipulation Interface (UMI)-style policies, whose performance depends on the coupling between wrist-view observations, action representation, data collection, and physical deployment. Existing real-world… |
Yan Ding Team | ArXiv | |
| 2026-06-08 | MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models Dexterous Manipulation VLATemporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived context, the hippocampal… |
Gao Huang Team | ArXiv / Web | |
| 2026-06-08 | iMaC: Translating Actions into Motion and Contact Images for Embodied World Models DexterousEmbodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint angles and end-effector poses), which suffer from limited expressive capacity, poor generalization across diverse embodiments, and unnatural dynamic modeling for… |
Haibin Yan Team | ArXiv / Web | |
| 2026-06-08 | AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing Dexterous ManipulationWorld-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch to model near-term frame variations that are redundant and weakly informative…. |
Yao Mu Team | ArXiv / Web | |
| 2026-06-08 | SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps DexterousHuman hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-grasps as affordance-aware proposals and resolves the final contacts with robot-native optimization. SynManDex samples object-conditioned digital human pre-grasps,… |
Yao Mu Team | ArXiv | |
| 2026-06-08 | AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile Learning Dexterous Manipulation TactileForce and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and… |
Yong-Lu Li Team | ArXiv | |
| 2026-06-08 | Difference-Aware Retrieval Policies for Imitation Learning Dexterous ManipulationParametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a… |
Abhishek Gupta Team | ArXiv / Web | |
| 2026-06-08 | Human-Centred Risk Mitigation for AI-Mediated Information Manipulation: A SOCMINT Framework Based on Information Manipulation Sets DexterousAI-mediated information manipulation increasingly takes the form of social cyber attacks that target trust, attention, credibility, reputation, and decision-making rather than only technical infrastructures or isolated false contents. Existing defensive approaches often oscillate between incident-level analysis, which fragments campaigns into weak signals, and attribution-first analysis, which… |
Antonio Scala | ArXiv | |
| 2026-06-08 | Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models Dexterous Manipulation VLAVision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene. Existing safety filters sidestep this problem by querying a vision-language model (VLM) to identify obstacles and their locations. This, however, is too slow… |
Nader Sehatbakhsh Team | ArXiv | |
| 2026-06-08 | ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action Models Dexterous Manipulation VLAVision-Language-Action (VLA) models demonstrate strong perfor-1 mance on language-conditioned robotic manipulation within their training dis-2 tribution, yet their generalization capabilities remain fundamentally limited. They3 lack the robustness required to handle perturbations, frequently failing when con-4 fronted with lighting changes, altered camera viewpoints, or small initial-state5… |
Nader Sehatbakhsh Team | ArXiv | |
| 2026-06-08 | What Makes Synthetic Speech Sound Sarcastic? A Prosody-Controlled Perception Study DexterousProsody plays a central role in sarcasm perception, yet previous studies have relied on naturally produced speech that lacks fine-grained control over individual acoustic dimensions. As prosodic cues co-vary in natural data, isolating their independent contributions remains challenging. We introduce a controlled framework using neural text-to-speech (TTS) with prompt-based prosodic conditioning… |
Matt Coler Team | ArXiv | |
| 2026-06-08 | BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Dexterous HF-Hot 🔥 HF#46As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python scripts. Here, we introduce BrainSurgery, a tool for robust and reproducible… |
Peter Schneider-Kamp Team | ArXiv | |
| 2026-06-08 | What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks DexterousLarge language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fundamental perceptual mismatch: content that is readily recognized as harmful by… |
Yuan Hong Team | ArXiv | |
| 2026-06-08 | Physics-Aware Sparse Learning and Selective Online Adaptation for Euler-Lagrange Robot Dynamics DexterousAccurate dynamics models are essential for model-based robotic control, yet nominal Euler–Lagrange models often become inaccurate in the presence of payload variation, unmodeled coupling, friction, aerodynamic effects, and changing operating conditions. Most learning-based correction methods improve prediction accuracy by introducing a single additive residual, but do not preserve the internal… |
Wei Pan Team | ArXiv | |
| 2026-06-08 | ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies Dexterous Manipulation VLA Sim2RealVision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA – a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a… |
Toshiaki Koike-Akino Team | ArXiv | |
| 2026-06-08 | DexPIE: Stable Dexterous Policy Improvement from Real-World Experience Dexterous ManipulationDexterous manipulation presents substantial challenges for imitation learning due to its high-dimensional action space and complex contact-rich dynamics. Policies trained purely from demonstrations often suffer from compounding errors during deployment and require large amounts of expert data to achieve reliable performance. To move beyond the limitations of demonstration data, in this work, we… |
Yaonan Wang Team | ArXiv / Web | |
| 2026-06-08 | I Was Scrolling and Then I Saw a Pregnant Strawberry DexterousAI minidramas (also known as fruit dramas) are short, algorithmically distributed generative AI video series featuring anthropomorphized characters that have recently emerged as a widespread phenomenon on social media platforms. This paper argues that despite their seemingly innocuous aesthetic, these videos reproduce deeply gendered narrative structures in which female characters are… |
Piera Riccio | ArXiv | |
| 2026-06-08 | CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control Dexterous Manipulation VLAVision-language-action models have shown strong promise for robot manipulation, yet raw language is primarily needed to specify task intent rather than to be repeatedly processed during high-frequency low-level execution. Motivated by this separation, we propose a cerebello-thalamic-inspired vision-action model (CT-VAM) for efficient task-conditioned visuomotor control. CT-VAM acts as a compact… |
Jiahu Qin Team | ArXiv | |
| 2026-06-08 | ContextShift: A Controlled Benchmark for Context Dependence in Object Detection DexterousModern object detectors achieve strong performance on standard benchmarks, yet their robustness to contextual variation remains insufficiently understood. Prior evaluations largely rely on aggregate metrics such as AP on uncontrolled distribution shifts, which can obscure how performance degrades under context change. We introduce ContextShift, a controlled benchmark that systematically… |
Ohad Ben-Shahar Team | ArXiv | |
| 2026-06-08 | $ω$-EVA: Envision, Verify, and Act with Latent Interactive World Models DexterousEmbodied policies typically map current observations directly to actions, leaving candidate-action consequences implicit. World models provide predictive supervision, representations, or external simulation, but rarely let a policy inspect the imagined consequence of its own proposal before acting. We introduce $ω$-EVA, a latent interactive world model that realizes an Envision–Verify–Act loop… |
Alois Knoll Team | ArXiv | |
| 2026-06-08 | Dense Force Estimation with an Event-based Optical Tactile Sensor Dexterous TactileHumans rely on spatially dense, geometry and force-aware tactile feedback at high temporal resolution for dexterous manipulation. While vision-based tactile sensors enable dense force estimation, they are limited by camera frame rates, motion blur, and data bandwidth. Event-based optical tactile sensors offer an attractive alternative with microsecond temporal resolution and low motion blur, but… |
Valentina Cavinato Team | ArXiv | |
| 2026-06-05 | Agentopia: Long-Term Life Simulation and Learning in Agent Societies DexterousHumans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LLMs can learn from such simulated social experience to better understand and replicate human behavior. However, prior agent society simulations typically operate at the scale of days, limiting the depth of social interactions and long-term… |
Yunzhe Tao Team | ArXiv | |
| 2026-06-05 | Affordance-Based Hierarchical Reinforcement Learning for Quadruped Pedipulation DexterousThe object manipulation capabilities of quadruped robots is an open research challenge. While previous studies have focused on low-level policy learning, task execution still relies on expert-designed high-level trajectories. Autonomous selection of both an affordable interaction point on the target object and an affordable robot base pose removes the need for pre-designed trajectories. This… |
Cagri Kilic Team | ArXiv | |
| 2026-06-05 | Simulation-Driven Imitation Learning for Biosignals-Free Shared-Autonomy Prosthetic Grasping Dexterous Manipulation Sim2RealBiosignals-free shared-autonomy control of upper-limb prosthetic hands aims to enable natural and low-effort manipulation without relying on EMG or other physiological signals. Recent imitation-learning-based approaches have shown promising results, but their scalability is limited by the cost and variability of collecting large amounts of real-world human demonstration data. In this work, we… |
Xianta Jiang Team | ArXiv | |
| 2026-06-05 | Spline Policy: A Structured Representation for Robot Policies Dexterous Manipulation VLAModern imitation-learning policies for robot manipulation often represent actions as fixed-resolution action chunks, which are simple and effective but expose limited geometric and temporal structure before execution. This paper studies Spline Policy (SP), a structured representation that replaces action chunks with spline parameters while keeping the policy backbone unchanged. The predicted… |
Sylvain Calinon Team | ArXiv | |
| 2026-06-05 | RhinoVLA Technical Report Dexterous Manipulation VLAVision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection operators, computation grows linearly with the number of input tokens when model dimensions are fixed. Motivated by… |
Yuxi Liu Team | ArXiv | |
| 2026-06-05 | Vacuum fluctuation induced quantum resource harvesting in triple-layer graphene DexterousWe examine the non-Markovian dynamics and the generation of quantum coherence and entanglement within a triple-layer graphene (TLG) system embedded in a planar microcavity. Using time-dependent perturbation theory, we derive an exact analytic solution for the system and demonstrate how the confined electromagnetic field mediates quantum correlations between the graphene layers. We employ three… |
Rachid Ahl Laamara Team | ArXiv | |
| 2026-06-05 | CAPE: Contrastive Action-conditioned Parallel Encoding for Embodied Planning DexterousEmbodied agents need to predict the future consequences of candidate actions in order to plan effectively before execution. Existing visual dynamics models learn by reconstructing future visual states or rolling out dense latent representations, which spreads learning capacity across visually salient but planning-irrelevant content rather than the action-conditioned changes that drive… |
Zhengping Che Team | ArXiv | |
| 2026-06-05 | When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations DexterousLarge Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we conduct a systematic sensitivity analysis… |
Mahdi Alkaeed | ArXiv | |
| 2026-06-05 | Resonance-induced frequency splitting and evanescent modes at temporal interfaces in elastic metamaterials DexterousTemporal interfaces, defined by abrupt changes in material properties, break temporal translational symmetry and enable wave phenomena fundamentally different from those at spatial interfaces. Unlike spatial scattering, temporal scattering preserves momentum rather than energy, leading to instantaneous frequency shifts governed by the dispersion relations on either side of the interface. Existing… |
Gengkai Hu Team | ArXiv | |
| 2026-06-05 | Adversarial Creation and Detection of AI-Generated Social Bot Content DexterousThe convergence of large language models and social bots allows malicious actors to manipulate the information ecosystem by generating human-like content at scale. Existing models for detecting AI-generated content often fail in the wild, primarily due to the lack of ground-truth data. We address this gap through an adversarial methodology that models the impersonation of real social media users… |
Filippo Menczer Team | ArXiv | |
| 2026-06-05 | Robotic Policy Adaptation via Weight-Space Meta-Learning Dexterous Manipulation VLA HF-HotVision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We… |
Luca Franco Team | ArXiv | |
| 2026-06-05 | Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models Dexterous VLAMost vision-language-action (VLA) models map observations directly to actions without explicit intermediate planning, which limits performance on long-horizon tasks where early mistakes compound. We propose Coarse-to-Control, a plan-execute VLA that introduces planning natively in the action-token space. The key idea is to let the policy first predict a compact sequence of coarse action tokens… |
Yu-Gang Jiang Team | ArXiv | |
| 2026-06-05 | LARA: Latent Action Representation Alignment for Vision-Language-Action Models Dexterous Manipulation VLAVisual-language action (VLA) models enable robots to predict actions directly from observations and language instructions, but their performance depends on large-scale, high-quality data and is limited by the scarcity of real-world robot action datasets. To facilitate VLA model learning with abundant unlabeled human videos, Latent Action Models (LAM) learn latent action representations from… |
Siyuan Huang Team | ArXiv | |
| 2026-06-05 | Detecting Temporally Localized Manipulations in Authentic Video Streams DexterousThe rapid advancement of video editing and generative artificial intelligence technologies has made realistic video manipulation increasingly accessible. Although existing datasets have significantly advanced research in deepfake detection, object removal, and video inpainting, they do not adequately model scenarios in which a short manipulated segment is inserted into an otherwise authentic… |
Ibrahim Delibasoglu Team | ArXiv | |
| 2026-06-05 | Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning DexterousWorld Action Models (WAMs) offer a promising approach to embodied intelligence, yet existing methods rely heavily on video prediction as action priors and lack adaptive multimodal reasoning, limiting their effectiveness on long-horizon, complex tasks. We observe that WAMs require different multimodal reasoning modes under different execution contexts: textual reasoning is essential during task… |
Yong Li Team | ArXiv | |
| 2026-06-05 | A Multi-Operator Mixed-Reality Interface for Multi-Robot Control and Coordination: Co-Located and Private Workspace Collaboration Dexterous ManipulationMulti-operator control of robot teams requires not only access to the same mission information, but also mechanisms for maintaining shared awareness and preventing conflicting interventions. Building on our previous HORUS interface (Holistic Operational Reality for Unified Systems) we present a mixed-reality interface that extends single-operator multi-robot supervision to collaborative… |
Carmine Tommaso Recchiuto Team | ArXiv | |
| 2026-06-05 | Task Editing for Generalizable 3D Visuomotor Policy Learning Dexterous Manipulation3D visuomotor policies offer a promising direction for complex robotic manipulation, as depth maps and point clouds provide rich geometric information for spatial reasoning. However, their success often depends on large-scale real-world demonstrations, which are costly and time-consuming to collect. To this end, existing methods commonly use demonstration generation strategies to improve data… |
Wei-Shi Zheng Team | ArXiv | |
| 2026-06-05 | The Sound of Malware: A Memory Forensics Approach for Android Malware Analysis via Audio Signals DexterousAndroid malware analysis is currently facing increasing challenges in achieving robust classification and detecting stealth attacks. Modern threats employ advanced evasion strategies such as code obfuscation, dynamic loading, packing, and even steganographic manipulation of traditional static and dynamic features. These techniques reduce the effectiveness of signature-based systems and degrade… |
Giorgio Giacinto Team | ArXiv | |
| 2026-06-05 | GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios Dexterous ManipulationGenerative policies provide expressive and multimodal action distributions, making them attractive for reinforcement learning (RL) in complex continuous-control tasks. Among them, flow-based policies are especially appealing because they generate actions through deterministic transport maps. However, applying such generative policies to likelihood-based on-policy learning remains limited by the… |
Ye Shi Team | ArXiv | |
| 2026-06-05 | T-GMP: Terrain-conditioned Generative Motion Priors for Versatile and Natural Humanoid Locomotion DexterousAchieving both anthropomorphic naturalness and robust terrain traversal remains a fundamental challenge in humanoid locomotion. Existing Reinforcement Learning (RL) approaches typically rely on fixed motion priors, limiting their adaptability to varying environments. We propose Terrain-conditioned Generative Motion Priors (T-GMP), a module that captures a terrain-conditioned latent motion… |
Fenghua He Team | ArXiv | |
| 2026-06-04 | HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers Dexterous LearnedControlFor a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and… |
Aaron Ames Team | ArXiv | |
| 2026-06-04 | TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies Dexterous Manipulation VLARobot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from… |
Mingyu Ding Team | ArXiv | |
| 2026-06-04 | Superconducting triode effect in a quantum-dot Josephson junction with a biased top gate DexterousNon-reciprocal supercurrents enable non-dissipative rectification, holding great promise for superconducting electronics. Conventionally, this non-reciprocity, termed the superconducting diode effect, requires the simultaneous breaking of time-reversal and parity symmetries. Here, we propose a superconducting triode effect in an asymmetric quantum-dot Josephson junction coupled to an additional… |
X. C. Xie Team | ArXiv | |
| 2026-06-04 | CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments DexterousMulti-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents’ ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground,… |
Bingsheng Yao Team | ArXiv | |
| 2026-06-04 | Robustness of Entanglement Manipulation for almost i.i.d. sources DexterousWe study the robustness of asymptotic entanglement manipulation beyond the exact i.i.d. regime, focusing on Mazzola–Sutter–Renner (MSR) almost i.i.d. sources, which allow a sublinear number of deviations from a tensor-power structure. For pure MSR sources along a bipartite reference state $ |
φ\rangle_{AB}$, we prove that the entanglement concentration rate is robust: every rate below the entropy… | Nilanjana Datta | ArXiv |
| 2026-06-04 | HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes DexterousIndoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or single-room furnishing), producing whole-home scenes that lack global coherence, realism, and… |
Hongsheng Li Team | ArXiv | |
| 2026-06-04 | WebMCP Tool Surface Poisoning: Runtime Manipulation Attacks on LLM Agents DexterousWebMCP is a newly emerging protocol that enables websites to expose tools directly to AI agents, bypassing traditional user interfaces and introducing new security risks. The dynamic exposure of agent-accessible tools in WebMCP expands the attack surface of web sessions, especially when third-party scripts are involved. In this study, we identify a new potential threat, termed Mid-Session Tool… |
Kuo-Hui Yeh Team | ArXiv | |
| 2026-06-04 | A framework for low-overhead quantum fault tolerance via spacetime lifting DexterousFault-tolerant quantum computation is inherently a spacetime problem, requiring not merely good static quantum error-correcting codes but also low-overhead protocols for protecting and manipulating encoded quantum information over time. Fault complexes provide a homological framework for treating such protocols as single spacetime objects. In this work, we initiate the study of low-overhead fault… |
Zi-Wen Liu Team | ArXiv | |
| 2026-06-04 | VOLT: Vision and Language Trajectory Segmentation for Faster-than-Demonstration Policies Dexterous ManipulationHumans often take longer to demonstrate a task than a robot would need to execute it. Rather than learning to replicate the demonstration at the same pace, many industrial and practical applications require robots to perform tasks as quickly as possible. In this paper, we investigate several hypotheses for learning policies that operate faster-than-demonstrations. Our experiments show that the… |
Siddarth Jain Team | ArXiv | |
| 2026-06-04 | DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions DexterousGUI agents - vision-based models that control desktops, web browsers, and mobile devices through graphical user interfaces - promise to automate a wide range of digital tasks. While million-scale datasets have enabled substantial progress on click-grounding, drag grounding (e.g. drag-and-drop, swipe, highlight) data remains an order of magnitude smaller and current models fall short on complex… |
Ronan Riochet Team | ArXiv | |
| 2026-06-04 | Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness DexterousFactual sycophancy occurs when a language model abandons a correct, verifiable answer under social pressure. Because a flip occurs only when pressure toward a false answer exceeds the model’s neutral preference for the truth, flip rates conflate two mechanisms: the strength of that baseline preference (truth margin), and how far pressure shifts it (manipulation sensitivity). We decompose factual… |
Walter Daelemans Team | ArXiv | |
| 2026-06-04 | Synthetic Data Generation and Vision-based Wrinkle and Keypoint Detection for Bimanual Cloth Manipulation Dexterous ManipulationRobotic manipulation of textiles remains challenging because continuous deformation and self-occlusions hinder the robust visual perception required to estimate the cloth’s state. To address the lack of annotated real-world data, we developed a Blender-based synthetic pipeline exporting auto-annotated keypoints, and combined manually labeled renders with real-world data to train a wrinkle… |
Atal Anil Kumar Team | ArXiv | |
| 2026-06-04 | Multi-Resolution Tactile Imitation Learning for Contact-Rich Robotic Manipulation Dexterous Manipulation TactileTouch sensing is beneficial for solving a wide variety of manipulation tasks. While there exists a wide range of tactile sensors with different properties, exploiting the fusion of multiple heterogeneous tactile sensors to improve manipulation learning remains underexplored. We present Multi-Resolution Tactile Sensing (MiTaS), a representation framework that leverages multiple tactile sensors… |
Georgia Chalvatzaki Team | ArXiv | |
| 2026-06-04 | Robust Ensemble of Selectively Strengthened and Augmented Predictors DexterousEvasion attacks present a significant challenge to the robustness of machine learning (ML)-based classifiers, particularly in critical applications such as fraud detection and cybersecurity. Although existing defense mechanisms are effective in some settings, they often suffer from limited generalizability and do not systematically improve model robustness across diverse attack scenarios. To… |
Mehran Ebrahimi Team | ArXiv | |
| 2026-06-04 | TAM: Torque Adaptation Module for Robust Motion Transfer in Manipulation Dexterous Sim2RealA policy tuned for one robot often behaves differently on another, whether due to the sim-to-real gap, unknown payloads, or the differing dynamics of two instances of the same robot. In contact-rich, dynamic manipulation, even small motion discrepancies can result in failure to track reference motion, since they disrupt the timing and modes of contact. Common remedies, such as domain… |
Dieter Fox Team | ArXiv | |
| 2026-06-04 | ActiveMimic: Egocentric Video Pretraining with Active Perception DexterousEgocentric human video offers a scalable alternative to robot data for pretraining, yet models pretrained on such video consistently underperform those pretrained on robot data. We attribute this gap to a missing signal, the active perception behavior in egocentric videos, where humans continuously reposition their viewpoint during manipulation, inducing camera motion that standard pipelines… |
Yu-Gang Jiang Team | ArXiv / Web | |
| 2026-06-04 | Deep reinforcement learning with spatial and temporal awareness for active boundary control of buoyancy-driven convection DexterousDeep reinforcement learning (DRL) applied to thermal convection control consistently produces \textit{degenerate actuation}: wall-temperature policies whose outputs are saturated, pseudo-random, or spatially incoherent. Two compounding deficiencies are responsible: multilayer-perceptron policies that discard spatial flow structure, and memoryless policies that cannot distinguish self-induced flow… |
Alfredo Pinelli Team | ArXiv | |
| 2026-06-04 | AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding Dexterous Manipulation VLAVision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise perception–action mappings. To address this challenge, we propose \textbf{AffordanceVLA}, a unified… |
Yingcong Chen Team | ArXiv / Web | |
| 2026-06-04 | RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing DexterousImage safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. To explore this vulnerability, we introduce RedEdit, a novel black-box red-teaming agent that… |
Li Liu Team | ArXiv | |
| 2026-06-04 | MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation Dexterous Manipulation LearnedControlWe present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is challenging because the space of possible contact interactions grows combinatorially with the task horizon and the number of objects in the scene. MotionDisco enables rapid discovery… |
Majid Khadiv Team | ArXiv | |
| 2026-06-03 | GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors Dexterous Sim2Real LearnedControlScaling humanoid loco-manipulation requires robot-compatible demonstrations across diverse objects, whole-body motions, and scene geometries, but teleoperation and motion capture are difficult to scale because each collection depends on physical setups, instrumented actors, and robot operation. We present GRAIL, a digital generation pipeline that remains fully virtual until deployment: it… |
Ye Yuan Team | ArXiv / Web | |
| 2026-06-03 | X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation DexterousRigorous evaluation of learning-based robotic systems is an essential prerequisite for deployment. However, real-world test data is expensive to gather; moreover, in a typical iterative development context, data gathered from the latest policy is necessarily limited in scale. This motivates evaluation methodologies that make use of heterogeneous data sources, including simulation, historical… |
Marco Pavone Team | ArXiv | |
| 2026-06-03 | InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space DexterousLanguage-guided photo retouching aims to adjust color and tone while preserving geometry and texture. Recently, diffusion-based retouching shows a superior visual quality, but often struggles with both fidelity issues due to its generative nature and efficiency because of its iterative sampling process. In this work, we propose an efficient and fidelity-preserving retouching method using… |
Tianfan Xue Team | ArXiv | |
| 2026-06-03 | Non-obvious Manipulability in the Additively Separable Group Activity Selection Problem DexterousIn this work, we study the additively separable Group Activity Selection Problem (AS-GASP) in an imperfect information setting, where agents have private preferences over activities and weights over other agents. Our goal is to design mechanisms that assign agents to activities based on their declared preferences and weights, with the objective of maximizing social welfare while ensuring truthful… |
Giovanna Varricchio Team | ArXiv | |
| 2026-06-03 | Small-angle solution scattering: from fundamental theory to practical approximations DexterousSmall-angle scattering (SAS) is widely used in structural biology, soft matter, and colloidal science to probe molecular structures in solution. SAS rests on a single physical principle: wave interference from a distribution of scatterers, averaged over orientations. Yet the theoretical foundations of SAS are spread across the literature, often based on differing notation, definitions, and… |
Jochen S. Hub Team | ArXiv | |
| 2026-06-03 | Potential-Guided Flow Matching for Vision-Language-Action Policy Improvement Dexterous VLALarge vision-language-action (VLA) policies are increasingly trained as conditional generative models over action chunks. Yet deployment produces mixed-quality experience-successful demonstrations, partial completions, recoverable mistakes, and failures-that is difficult to use with standard imitation. Full behavior cloning (BC) imitates failures, filtered BC discards useful sub-trajectories, and… |
Gang Wang Team | ArXiv | |
| 2026-06-03 | CLIF: Cross-layer LEO-ISL Fingerprinting for Physical and Network Attack Detection in Dense LEO Constellations DexterousLow-Earth Orbit (LEO) mega-constellations such as Starlink by SpaceX and Kuiper by Amazon rely on optical Inter-Satellite Links (ISLs) for autonomous mesh routing to provide low-latency telecommunication, Internet of Things (IoT), and security services globally. As commercial operators and governments deploy increasingly dense constellations and form multi-operator peering coalitions, ISL… |
Biplab Sikdar Team | ArXiv | |
| 2026-06-03 | DIST-FL: Enhancing Security for TEE-based Aggregation in Federated Learning DexterousTrusted Execution Environments (TEEs)-aided federated learning protocols emerge as promising solutions to counter server-side adversaries and ensure the trustworthiness of the server. In this paper, we dissect existing protocols and demonstrate that server-side adversaries can still manipulate client selection and replay aggregation to compromise system robustness and privacy, by exploiting TEE… |
Yinqian Zhang Team | ArXiv | |
| 2026-06-03 | AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety DexterousAs AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annotated with fine-grained safety risk categories. The dataset contains 2,123 real-world Replika conversations collected… |
TengTeng Ma Team | ArXiv | |
| 2026-06-03 | M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking Dexterous Sim2Real LearnedControlBuilding a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector… |
Shengbo Eben Li Team | ArXiv | |
| 2026-06-03 | HapTile: A Haptic-Informed Vision-Tactile-Language-Action Dataset for Contact-Rich Imitation Learning Dexterous VLA TactileDespite the importance of tactile sensing for reliable manipulation, most existing Vision-Language-Action (VLA) datasets remain vision-only, and those that do incorporate tactile information typically lack the joint combination of task diversity, language conditioning, and action trajectories. Furthermore, existing teleoperation pipelines rarely provide haptic feedback to the operator, despite… |
Shan Luo Team | ArXiv | |
| 2026-06-03 | Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation? Dexterous HF-HotVideo generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window onto this question: if a model has… |
Mike Zheng Shou Team | ArXiv | |
| 2026-06-03 | Spatial Deformation Mechnisim of Meta-Atom Coupling and Scaling DexterousMetasurfaces enable precise manipulation of light-matter interactions, and meta-atom coupling and scaling dominates their resonant properties and functional responses. Conventionally, coupled-mode theory (CMT), coupled dipole theory (CDT) and full-wave simulation are widely adopted to analyze such coupling effects. Nevertheless, CMT and CDT are essentially phenomenological theories. Although… |
Lei Liang Team | ArXiv | |
| 2026-06-03 | A model-free approach to control barrier functions for higher-order systems DexterousControl barrier functions (CBFs) are a widely applied modular tool to ensure safe operation of nonlinear dynamical control systems. However, for their construction accurate knowledge of the system dynamics is typically needed. This requirement was recently alleviated for relative-degree-one systems using techniques from prescribed performance control (PPC) or funnel control (FC). This article… |
Karl Worthmann Team | ArXiv | |
| 2026-06-03 | VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training Dexterous VLAUniversal Manipulation Interface (UMI) enables scalable real-world robot data collection without hardware-specific teleoperation, yet leveraging UMI data to train large-scale Vision-Language-Action (VLA) models remains fundamentally challenging. We identify two critical mismatches: wrist-mounted fisheye views, with severe radial distortion and local gripper-centric perspectives, are… |
Xuelong Li Team | ArXiv | |
| 2026-06-03 | Hybrid Adversarial Defence for Natural Language Understanding Tasks DexterousLarge Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence framework that combines entropy-based models, designed to reduce hallucinations, with uncertainty-based models and geometric-based models, designed to reduce vulnerability…. |
Stuart E. Middleton Team | ArXiv | |
| 2026-06-03 | Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization DexterousRecent advances in generative image editing have improved the realism and controllability of localized image manipulation, raising new challenges for image manipulation detection and localization (IMDL). However, existing IMDL benchmarks still have limitations in visual realism, manipulation diversity, and generator coverage, making it difficult to reflect recent trends in image manipulation. To… |
Jungong Han Team | ArXiv | |
| 2026-06-03 | Arbitrary manipulation of nuclear spins in hexagonal boron nitride DexterousDue to its localized nature and controllability, the negatively charged boron vacancy centers (V$_\text{B}^-$) in hexagonal boron nitride (hBN) are a promising spin platform for accessing its neighboring nuclei with potential for performing quantum computational tasks. However, the methods of utilizing and manipulating the nuclear spins are still lacking. In this work, we propose a protocol for… |
Mehdi Abdi Team | ArXiv | |
| 2026-06-03 | 3DThinkVLA: Endowing Vision-Language-Action Models with Latent 3D Priors via 3D-Thinking-Guided Co-training Dexterous VLAWe propose a 3D-thinking-guided co-training framework that enables vision-language-action (VLA) models to perform 3D spatial reasoning implicitly during action prediction. Our core insight is that 3D geometry perception and 3D spatial reasoning are distinct capabilities that can be disentangled and injected at different feature hierarchies. During training, three tightly coupled components work… |
Weihao Yuan Team | ArXiv | |
| 2026-06-03 | Input-to-State Stable Bundle Koopman Neural ODEs for Learning Controlled Dynamics under Environmental Constraints DexterousWe propose ISS-BKNO, a unified framework that integrates Koopman operator identification, Neural ordinary differential equations (ODEs), fiber bundle geometry, and input-to-state stability (ISS) certification. Unlike prior approaches that address stability, extrinsic inputs, or environmental constraints in isolation, the proposed framework simultaneously learns controlled nonlinear dynamics while… |
Lin Feng | ArXiv |