Dexterous

Updated on 2026.06.11

Usage instructions: here

Dexterous

Publish Date	Title & Abstract	Authors	Links
2026-06-09	TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation `Dexterous` `Manipulation` `Tactile` Contact-rich manipulation requires robots to continuously perceive and regulate evolving physical interactions under dynamic contact transitions or complex surface geometries. Recent imitation learning methods improve contact-aware control by incorporating tactile or force feedback, but they rarely model the asymmetric spatiotemporal roles of global force and local tactile sensing. To address…	Wenchao Ding Team	ArXiv
2026-06-09	JOIN: Anchor-Grasp-Conditioned Joining via Opposition, Inference, and Navigation for Bimanual Assistive Manipulation `Dexterous` Assistive mobility and manipulation platforms have received increasing attention as a means of restoring independence to individuals with disabilities. While effective for many basic activities of daily living (ADLs), a significant percentage of everyday tasks such as opening a jar, pouring a liquid, lifting a tray, or basic meal preparation, is fundamentally bimanual and remains out of reach for…	Taşkın Padır Team	ArXiv
2026-06-09	A Resurgent Analytic Framework for Indicial Umbral Calculus via Mellin-Barnes and Borel-Laplace Theories `Dexterous` Indicial umbral calculus offers an effective operational framework for manipulating transcendental functions, yet its analytic foundations have long remained only partially understood. In this work, we provide a rigorous analytic realisation of the theory grounded in Mellin-Barnes integrals, Borel-Laplace summation, and resurgent analysis. By elevating umbral operators from formal algebraic…	Roberto Ricci	ArXiv
2026-06-09	WorldOlympiad: Can Your World Model Survive a Triathlon? `Dexterous` We introduce WorldOlympiad, a benchmark for diagnosing video-based world models across physical faithfulness, geometric consistency, and interaction fidelity. While existing benchmarks often focus on visual quality, semantic alignment, or short-term temporal coherence, they provide limited insight into whether generated videos obey physical rules, preserve coherent 3D structure, and sustain…	Bohan Zhuang Team	ArXiv / Web
2026-06-09	The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models `Dexterous` This study investigates cross-lingual distributional skew (the Shibboleth Effect) in frontier large language models (LLMs) subjected to sustained adversarial conditions. We develop a multi-agent geopolitical wargame, the Cerulean Sea Crisis, a synthetic maritime territorial dispute designed to mirror the structural dynamics of Eastern Mediterranean conflicts. Six frontier models (GPT-4o, Llama-4,…	Hakan Mehmetcik	ArXiv
2026-06-09	Task Robustness via Re-Labelling Vision-Action Robot Data `Dexterous` `Manipulation` `VLA` The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces Task Robustness via Re-Labelling…	Glen Berseth Team	ArXiv / Web
2026-06-09	Degeneracy and trajectory control of spin eigenmodes excited by fs-optical pulses in a nearly compensated ferrimagnet `Dexterous` We investigate optically excited spin dynamics in a uniaxial ferrimagnet near the magnetization compensation point under a magnetic field applied along the magnetic anisotropy axis. Experiment and numerical modeling reveal an unusual regime where the frequencies of two spin eigenmodes approach each other and become highly field sensitive. The modes, corresponding to opposite rotations of the Neel…	D. O. Ignatyeva Team	ArXiv
2026-06-09	MV-Actor: Aligning Multi-View Semantics and Spatial Awareness for Bimanual Manipulation `Dexterous` `Manipulation` Robotic manipulation has been widely applied in industrial scenarios. Compared with single-arm manipulation, bimanual manipulation is equipped with multiple cameras to capture information from different viewpoints. However, existing multi-view policies encode each view independently or fuse view features shallowly, resulting in limited sharing semantic perception and unreliable spatial awareness….	You Yang Team	ArXiv
2026-06-09	A single-step lithography process for reconfigurable SiN photonics with TiN heaters and Al interconnects `Dexterous` Thermo-optic phase shifters are key building blocks in Silicon and Silicon Nitride-based reconfigurable photonic integrated circuits. They enable manipulating the phase of an optical signal by means of electrically-driven heating of an optical waveguide. Conventional fabrication schemes typically require dedicated lithographic steps to separately define the resistive heaters, the current…	Mher Ghulinyan Team	ArXiv
2026-06-09	LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination `Dexterous` `VLA` Vision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study \textit{scene-induced occlusion} as a fundamental challenge for VLA models and introduce…	Zhongyu Wei Team	ArXiv
2026-06-09	IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic Manipulation `Dexterous` `Manipulation` `Tactile` Real-world robotic manipulation tasks often involve forceful interactions with the environment, such as using tools of varying weights, transporting objects with different masses, and performing contact-rich tasks like table wiping. Previous learning-based approaches typically employ imitation learning policies that output target end-effector poses tracked by low-level impedance controllers. In…	Yilun Du Team	ArXiv / Web
2026-06-09	Overview of ESDD2: Environment-Aware Speech and Sound Deepfake Detection Challenge `Dexterous` The Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2), held in conjunction with ICME 2026, evaluated systems for five component-level audio spoofing detection, where speech and environmental sounds may be manipulated independently or jointly. After the challenge concludes, we analyze the final leaderboard and summarize effective design choices from the top-performing…	Ming Li Team	ArXiv
2026-06-09	Hand-centric Human-to-Robot Trajectory Transfer from Video Demonstrations via Open-World Contact Localization `Dexterous` `Manipulation` Learning from human video demonstrations remains challenging due to noisy hand-object interactions, unseen objects with partial observation, and cross-embodiment discrepancy. To address these challenges, we present \textit{HOWTransfer} (\emph{H}and-\emph{O}bject \emph{O}pen-\emph{W}orld Transfer), a hand-centric framework that distills human demonstrations into contact-aware, taxonomy-informed,…	Rania Rayyes Team	ArXiv
2026-06-09	UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data `Dexterous` Dexterous hands are essential for fine-grained manipulation, but their hardware designs vary substantially across embodiments. Differences in kinematics, joint definitions, and degrees of freedom make it difficult to define a shared state representation compared with parallel grippers. As a result, dexterous-hand data remains fragmented and difficult to use for joint training. In this work, we…	Yu-Gang Jiang Team	ArXiv
2026-06-09	ManiSplat: Manipulation Trajectory Synthesis from Monocular Video via Decoupled 3D Gaussian Splatting `Dexterous` Reconstructing dynamic and interactive 3D scenes from real-world observations remains a fundamental challenge in computer vision and robotics. While recent advances in 3D Gaussian Splatting have enabled high-fidelity static reconstruction, extending it to interactive environments with articulated robots and manipulable objects remains difficult due to complex contact interactions and abrupt pose…	Gaoang Wang Team	ArXiv
2026-06-09	snaproot: Decentralized File Integrity Verification Using Blockchain-Anchored Cryptographic Hashing `Dexterous` The rapid growth of digital content has made reliable integrity verification increasingly important. Existing solutions rely either on centralized authorities, which introduce trust dependencies and single points of failure, or on decentralized storage systems that incur prohibitive resource overhead. In this paper, we present snaproot, a lightweight system that implements the hash-anchoring…	Tarkan Yavas Team	ArXiv
2026-06-09	Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations `Dexterous` `Manipulation` `VLA` Robotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous…	Jinwoo Shin Team	ArXiv
2026-06-09	Anomalous mobility edges and extended-localized transition in a quasiperiodic emitter-cavity array `Dexterous` The manipulation of localization in quasiperiodic systems by mobility edges or localization transition holds significant physical importance. In this letter, we demonstrated that the dissipation can induce the emergence of anomalous mobility edges and extended-localized transition in emitter-cavity arrays controlled by quasiperiodic potentials. Specifically, we observe that the localization…	X. X. Yi Team	ArXiv
2026-06-09	VeriSpace: Spatially Grounded Action Verification for Vision-Language-Action Models `Dexterous` `Manipulation` `VLA` Vision-language-action (VLA) models have shown strong promise for robotic manipulation, but their reliability at test time remains limited by one-shot action prediction, where even small action errors can cause grasp failure, collision, or incorrect task progression. A natural alternative is to equip VLA systems with test-time verification, allowing multiple candidate actions to be proposed and…	Jing Liu Team	ArXiv
2026-06-09	UMI-Bench 1.0: An Open and Reproducible Real-World Benchmark for Tabletop Robotic Manipulation with UMI Data `Dexterous` Real-robot evaluation is essential for understanding whether learned manipulation policies can operate reliably outside curated demonstrations. This need is particularly pressing for Universal Manipulation Interface (UMI)-style policies, whose performance depends on the coupling between wrist-view observations, action representation, data collection, and physical deployment. Existing real-world…	Yan Ding Team	ArXiv
2026-06-08	MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models `Dexterous` `Manipulation` `VLA` Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived context, the hippocampal…	Gao Huang Team	ArXiv / Web
2026-06-08	iMaC: Translating Actions into Motion and Contact Images for Embodied World Models `Dexterous` Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint angles and end-effector poses), which suffer from limited expressive capacity, poor generalization across diverse embodiments, and unnatural dynamic modeling for…	Haibin Yan Team	ArXiv / Web
2026-06-08	AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing `Dexterous` `Manipulation` World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch to model near-term frame variations that are redundant and weakly informative….	Yao Mu Team	ArXiv / Web
2026-06-08	SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps `Dexterous` Human hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-grasps as affordance-aware proposals and resolves the final contacts with robot-native optimization. SynManDex samples object-conditioned digital human pre-grasps,…	Yao Mu Team	ArXiv
2026-06-08	AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile Learning `Dexterous` `Manipulation` `Tactile` Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and…	Yong-Lu Li Team	ArXiv
2026-06-08	Difference-Aware Retrieval Policies for Imitation Learning `Dexterous` `Manipulation` Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a…	Abhishek Gupta Team	ArXiv / Web
2026-06-08	Human-Centred Risk Mitigation for AI-Mediated Information Manipulation: A SOCMINT Framework Based on Information Manipulation Sets `Dexterous` AI-mediated information manipulation increasingly takes the form of social cyber attacks that target trust, attention, credibility, reputation, and decision-making rather than only technical infrastructures or isolated false contents. Existing defensive approaches often oscillate between incident-level analysis, which fragments campaigns into weak signals, and attribution-first analysis, which…	Antonio Scala	ArXiv
2026-06-08	Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models `Dexterous` `Manipulation` `VLA` Vision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene. Existing safety filters sidestep this problem by querying a vision-language model (VLM) to identify obstacles and their locations. This, however, is too slow…	Nader Sehatbakhsh Team	ArXiv
2026-06-08	ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action Models `Dexterous` `Manipulation` `VLA` Vision-Language-Action (VLA) models demonstrate strong perfor-1 mance on language-conditioned robotic manipulation within their training dis-2 tribution, yet their generalization capabilities remain fundamentally limited. They3 lack the robustness required to handle perturbations, frequently failing when con-4 fronted with lighting changes, altered camera viewpoints, or small initial-state5…	Nader Sehatbakhsh Team	ArXiv
2026-06-08	What Makes Synthetic Speech Sound Sarcastic? A Prosody-Controlled Perception Study `Dexterous` Prosody plays a central role in sarcasm perception, yet previous studies have relied on naturally produced speech that lacks fine-grained control over individual acoustic dimensions. As prosodic cues co-vary in natural data, isolating their independent contributions remains challenging. We introduce a controlled framework using neural text-to-speech (TTS) with prompt-based prosodic conditioning…	Matt Coler Team	ArXiv
2026-06-08	BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling `Dexterous` `HF-Hot` 🔥 HF#46 As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python scripts. Here, we introduce BrainSurgery, a tool for robust and reproducible…	Peter Schneider-Kamp Team	ArXiv
2026-06-08	What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks `Dexterous` Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fundamental perceptual mismatch: content that is readily recognized as harmful by…	Yuan Hong Team	ArXiv
2026-06-08	Physics-Aware Sparse Learning and Selective Online Adaptation for Euler-Lagrange Robot Dynamics `Dexterous` Accurate dynamics models are essential for model-based robotic control, yet nominal Euler–Lagrange models often become inaccurate in the presence of payload variation, unmodeled coupling, friction, aerodynamic effects, and changing operating conditions. Most learning-based correction methods improve prediction accuracy by introducing a single additive residual, but do not preserve the internal…	Wei Pan Team	ArXiv
2026-06-08	ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies `Dexterous` `Manipulation` `VLA` `Sim2Real` Vision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA – a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a…	Toshiaki Koike-Akino Team	ArXiv
2026-06-08	DexPIE: Stable Dexterous Policy Improvement from Real-World Experience `Dexterous` `Manipulation` Dexterous manipulation presents substantial challenges for imitation learning due to its high-dimensional action space and complex contact-rich dynamics. Policies trained purely from demonstrations often suffer from compounding errors during deployment and require large amounts of expert data to achieve reliable performance. To move beyond the limitations of demonstration data, in this work, we…	Yaonan Wang Team	ArXiv / Web
2026-06-08	I Was Scrolling and Then I Saw a Pregnant Strawberry `Dexterous` AI minidramas (also known as fruit dramas) are short, algorithmically distributed generative AI video series featuring anthropomorphized characters that have recently emerged as a widespread phenomenon on social media platforms. This paper argues that despite their seemingly innocuous aesthetic, these videos reproduce deeply gendered narrative structures in which female characters are…	Piera Riccio	ArXiv
2026-06-08	CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control `Dexterous` `Manipulation` `VLA` Vision-language-action models have shown strong promise for robot manipulation, yet raw language is primarily needed to specify task intent rather than to be repeatedly processed during high-frequency low-level execution. Motivated by this separation, we propose a cerebello-thalamic-inspired vision-action model (CT-VAM) for efficient task-conditioned visuomotor control. CT-VAM acts as a compact…	Jiahu Qin Team	ArXiv
2026-06-08	ContextShift: A Controlled Benchmark for Context Dependence in Object Detection `Dexterous` Modern object detectors achieve strong performance on standard benchmarks, yet their robustness to contextual variation remains insufficiently understood. Prior evaluations largely rely on aggregate metrics such as AP on uncontrolled distribution shifts, which can obscure how performance degrades under context change. We introduce ContextShift, a controlled benchmark that systematically…	Ohad Ben-Shahar Team	ArXiv
2026-06-08	$ω$-EVA: Envision, Verify, and Act with Latent Interactive World Models `Dexterous` Embodied policies typically map current observations directly to actions, leaving candidate-action consequences implicit. World models provide predictive supervision, representations, or external simulation, but rarely let a policy inspect the imagined consequence of its own proposal before acting. We introduce $ω$-EVA, a latent interactive world model that realizes an Envision–Verify–Act loop…	Alois Knoll Team	ArXiv
2026-06-08	Dense Force Estimation with an Event-based Optical Tactile Sensor `Dexterous` `Tactile` Humans rely on spatially dense, geometry and force-aware tactile feedback at high temporal resolution for dexterous manipulation. While vision-based tactile sensors enable dense force estimation, they are limited by camera frame rates, motion blur, and data bandwidth. Event-based optical tactile sensors offer an attractive alternative with microsecond temporal resolution and low motion blur, but…	Valentina Cavinato Team	ArXiv
2026-06-05	Agentopia: Long-Term Life Simulation and Learning in Agent Societies `Dexterous` Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LLMs can learn from such simulated social experience to better understand and replicate human behavior. However, prior agent society simulations typically operate at the scale of days, limiting the depth of social interactions and long-term…	Yunzhe Tao Team	ArXiv
2026-06-05	Affordance-Based Hierarchical Reinforcement Learning for Quadruped Pedipulation `Dexterous` The object manipulation capabilities of quadruped robots is an open research challenge. While previous studies have focused on low-level policy learning, task execution still relies on expert-designed high-level trajectories. Autonomous selection of both an affordable interaction point on the target object and an affordable robot base pose removes the need for pre-designed trajectories. This…	Cagri Kilic Team	ArXiv
2026-06-05	Simulation-Driven Imitation Learning for Biosignals-Free Shared-Autonomy Prosthetic Grasping `Dexterous` `Manipulation` `Sim2Real` Biosignals-free shared-autonomy control of upper-limb prosthetic hands aims to enable natural and low-effort manipulation without relying on EMG or other physiological signals. Recent imitation-learning-based approaches have shown promising results, but their scalability is limited by the cost and variability of collecting large amounts of real-world human demonstration data. In this work, we…	Xianta Jiang Team	ArXiv
2026-06-05	Spline Policy: A Structured Representation for Robot Policies `Dexterous` `Manipulation` `VLA` Modern imitation-learning policies for robot manipulation often represent actions as fixed-resolution action chunks, which are simple and effective but expose limited geometric and temporal structure before execution. This paper studies Spline Policy (SP), a structured representation that replaces action chunks with spline parameters while keeping the policy backbone unchanged. The predicted…	Sylvain Calinon Team	ArXiv
2026-06-05	RhinoVLA Technical Report `Dexterous` `Manipulation` `VLA` Vision-Language-Action (VLA) models have shown strong potential for robotic manipulation, but real-time deployment on edge hardware remains challenging. In this work, we identify VLM visual and context tokens as a major source of deployment latency: for GEMM-dominated projection operators, computation grows linearly with the number of input tokens when model dimensions are fixed. Motivated by…	Yuxi Liu Team	ArXiv
2026-06-05	Vacuum fluctuation induced quantum resource harvesting in triple-layer graphene `Dexterous` We examine the non-Markovian dynamics and the generation of quantum coherence and entanglement within a triple-layer graphene (TLG) system embedded in a planar microcavity. Using time-dependent perturbation theory, we derive an exact analytic solution for the system and demonstrate how the confined electromagnetic field mediates quantum correlations between the graphene layers. We employ three…	Rachid Ahl Laamara Team	ArXiv
2026-06-05	CAPE: Contrastive Action-conditioned Parallel Encoding for Embodied Planning `Dexterous` Embodied agents need to predict the future consequences of candidate actions in order to plan effectively before execution. Existing visual dynamics models learn by reconstructing future visual states or rolling out dense latent representations, which spreads learning capacity across visually salient but planning-irrelevant content rather than the action-conditioned changes that drive…	Zhengping Che Team	ArXiv
2026-06-05	When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations `Dexterous` Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we conduct a systematic sensitivity analysis…	Mahdi Alkaeed	ArXiv
2026-06-05	Resonance-induced frequency splitting and evanescent modes at temporal interfaces in elastic metamaterials `Dexterous` Temporal interfaces, defined by abrupt changes in material properties, break temporal translational symmetry and enable wave phenomena fundamentally different from those at spatial interfaces. Unlike spatial scattering, temporal scattering preserves momentum rather than energy, leading to instantaneous frequency shifts governed by the dispersion relations on either side of the interface. Existing…	Gengkai Hu Team	ArXiv
2026-06-05	Adversarial Creation and Detection of AI-Generated Social Bot Content `Dexterous` The convergence of large language models and social bots allows malicious actors to manipulate the information ecosystem by generating human-like content at scale. Existing models for detecting AI-generated content often fail in the wild, primarily due to the lack of ground-truth data. We address this gap through an adversarial methodology that models the impersonation of real social media users…	Filippo Menczer Team	ArXiv
2026-06-05	Robotic Policy Adaptation via Weight-Space Meta-Learning `Dexterous` `Manipulation` `VLA` `HF-Hot` Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We…	Luca Franco Team	ArXiv
2026-06-05	Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models `Dexterous` `VLA` Most vision-language-action (VLA) models map observations directly to actions without explicit intermediate planning, which limits performance on long-horizon tasks where early mistakes compound. We propose Coarse-to-Control, a plan-execute VLA that introduces planning natively in the action-token space. The key idea is to let the policy first predict a compact sequence of coarse action tokens…	Yu-Gang Jiang Team	ArXiv
2026-06-05	LARA: Latent Action Representation Alignment for Vision-Language-Action Models `Dexterous` `Manipulation` `VLA` Visual-language action (VLA) models enable robots to predict actions directly from observations and language instructions, but their performance depends on large-scale, high-quality data and is limited by the scarcity of real-world robot action datasets. To facilitate VLA model learning with abundant unlabeled human videos, Latent Action Models (LAM) learn latent action representations from…	Siyuan Huang Team	ArXiv
2026-06-05	Detecting Temporally Localized Manipulations in Authentic Video Streams `Dexterous` The rapid advancement of video editing and generative artificial intelligence technologies has made realistic video manipulation increasingly accessible. Although existing datasets have significantly advanced research in deepfake detection, object removal, and video inpainting, they do not adequately model scenarios in which a short manipulated segment is inserted into an otherwise authentic…	Ibrahim Delibasoglu Team	ArXiv
2026-06-05	Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning `Dexterous` World Action Models (WAMs) offer a promising approach to embodied intelligence, yet existing methods rely heavily on video prediction as action priors and lack adaptive multimodal reasoning, limiting their effectiveness on long-horizon, complex tasks. We observe that WAMs require different multimodal reasoning modes under different execution contexts: textual reasoning is essential during task…	Yong Li Team	ArXiv
2026-06-05	A Multi-Operator Mixed-Reality Interface for Multi-Robot Control and Coordination: Co-Located and Private Workspace Collaboration `Dexterous` `Manipulation` Multi-operator control of robot teams requires not only access to the same mission information, but also mechanisms for maintaining shared awareness and preventing conflicting interventions. Building on our previous HORUS interface (Holistic Operational Reality for Unified Systems) we present a mixed-reality interface that extends single-operator multi-robot supervision to collaborative…	Carmine Tommaso Recchiuto Team	ArXiv
2026-06-05	Task Editing for Generalizable 3D Visuomotor Policy Learning `Dexterous` `Manipulation` 3D visuomotor policies offer a promising direction for complex robotic manipulation, as depth maps and point clouds provide rich geometric information for spatial reasoning. However, their success often depends on large-scale real-world demonstrations, which are costly and time-consuming to collect. To this end, existing methods commonly use demonstration generation strategies to improve data…	Wei-Shi Zheng Team	ArXiv
2026-06-05	The Sound of Malware: A Memory Forensics Approach for Android Malware Analysis via Audio Signals `Dexterous` Android malware analysis is currently facing increasing challenges in achieving robust classification and detecting stealth attacks. Modern threats employ advanced evasion strategies such as code obfuscation, dynamic loading, packing, and even steganographic manipulation of traditional static and dynamic features. These techniques reduce the effectiveness of signature-based systems and degrade…	Giorgio Giacinto Team	ArXiv
2026-06-05	GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios `Dexterous` `Manipulation` Generative policies provide expressive and multimodal action distributions, making them attractive for reinforcement learning (RL) in complex continuous-control tasks. Among them, flow-based policies are especially appealing because they generate actions through deterministic transport maps. However, applying such generative policies to likelihood-based on-policy learning remains limited by the…	Ye Shi Team	ArXiv
2026-06-05	T-GMP: Terrain-conditioned Generative Motion Priors for Versatile and Natural Humanoid Locomotion `Dexterous` Achieving both anthropomorphic naturalness and robust terrain traversal remains a fundamental challenge in humanoid locomotion. Existing Reinforcement Learning (RL) approaches typically rely on fixed motion priors, limiting their adaptability to varying environments. We propose Terrain-conditioned Generative Motion Priors (T-GMP), a module that captures a terrain-conditioned latent motion…	Fenghua He Team	ArXiv
2026-06-04	HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers `Dexterous` `LearnedControl` For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and…	Aaron Ames Team	ArXiv
2026-06-04	TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies `Dexterous` `Manipulation` `VLA` Robot manipulation alternates between low-risk transit phases that call for fast execution and high-risk contact stages that demand slow, precise motion. Yet existing Vision-Language-Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV-cache reuse, or reinforcement learning only shift the policy from…	Mingyu Ding Team	ArXiv
2026-06-04	Superconducting triode effect in a quantum-dot Josephson junction with a biased top gate `Dexterous` Non-reciprocal supercurrents enable non-dissipative rectification, holding great promise for superconducting electronics. Conventionally, this non-reciprocity, termed the superconducting diode effect, requires the simultaneous breaking of time-reversal and parity symmetries. Here, we propose a superconducting triode effect in an asymmetric quantum-dot Josephson junction coupled to an additional…	X. C. Xie Team	ArXiv
2026-06-04	CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments `Dexterous` Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents’ ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground,…	Bingsheng Yao Team	ArXiv
2026-06-04	Robustness of Entanglement Manipulation for almost i.i.d. sources `Dexterous` We study the robustness of asymptotic entanglement manipulation beyond the exact i.i.d. regime, focusing on Mazzola–Sutter–Renner (MSR) almost i.i.d. sources, which allow a sublinear number of deviations from a tensor-power structure. For pure MSR sources along a bipartite reference state $	φ\rangle_{AB}$, we prove that the entanglement concentration rate is robust: every rate below the entropy…	Nilanjana Datta	ArXiv
2026-06-04	HomeWorld: A Unified Floorplan-to-Furnished Framework for Generating Controllable, Densely Interactive Whole-Home Scenes `Dexterous` Indoor scene generation is crucial for robot simulation and modern interior design. However, complex layouts together with scarce 3D scene data make learning-based generation challenging. Existing methods often rely on hand-crafted rules or focus on isolated sub-tasks (e.g., floorplan synthesis or single-room furnishing), producing whole-home scenes that lack global coherence, realism, and…	Hongsheng Li Team	ArXiv
2026-06-04	WebMCP Tool Surface Poisoning: Runtime Manipulation Attacks on LLM Agents `Dexterous` WebMCP is a newly emerging protocol that enables websites to expose tools directly to AI agents, bypassing traditional user interfaces and introducing new security risks. The dynamic exposure of agent-accessible tools in WebMCP expands the attack surface of web sessions, especially when third-party scripts are involved. In this study, we identify a new potential threat, termed Mid-Session Tool…	Kuo-Hui Yeh Team	ArXiv
2026-06-04	A framework for low-overhead quantum fault tolerance via spacetime lifting `Dexterous` Fault-tolerant quantum computation is inherently a spacetime problem, requiring not merely good static quantum error-correcting codes but also low-overhead protocols for protecting and manipulating encoded quantum information over time. Fault complexes provide a homological framework for treating such protocols as single spacetime objects. In this work, we initiate the study of low-overhead fault…	Zi-Wen Liu Team	ArXiv
2026-06-04	VOLT: Vision and Language Trajectory Segmentation for Faster-than-Demonstration Policies `Dexterous` `Manipulation` Humans often take longer to demonstrate a task than a robot would need to execute it. Rather than learning to replicate the demonstration at the same pace, many industrial and practical applications require robots to perform tasks as quickly as possible. In this paper, we investigate several hypotheses for learning policies that operate faster-than-demonstrations. Our experiments show that the…	Siddarth Jain Team	ArXiv
2026-06-04	DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions `Dexterous` GUI agents - vision-based models that control desktops, web browsers, and mobile devices through graphical user interfaces - promise to automate a wide range of digital tasks. While million-scale datasets have enabled substantial progress on click-grounding, drag grounding (e.g. drag-and-drop, swipe, highlight) data remains an order of magnitude smaller and current models fall short on complex…	Ronan Riochet Team	ArXiv
2026-06-04	Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness `Dexterous` Factual sycophancy occurs when a language model abandons a correct, verifiable answer under social pressure. Because a flip occurs only when pressure toward a false answer exceeds the model’s neutral preference for the truth, flip rates conflate two mechanisms: the strength of that baseline preference (truth margin), and how far pressure shifts it (manipulation sensitivity). We decompose factual…	Walter Daelemans Team	ArXiv
2026-06-04	Synthetic Data Generation and Vision-based Wrinkle and Keypoint Detection for Bimanual Cloth Manipulation `Dexterous` `Manipulation` Robotic manipulation of textiles remains challenging because continuous deformation and self-occlusions hinder the robust visual perception required to estimate the cloth’s state. To address the lack of annotated real-world data, we developed a Blender-based synthetic pipeline exporting auto-annotated keypoints, and combined manually labeled renders with real-world data to train a wrinkle…	Atal Anil Kumar Team	ArXiv
2026-06-04	Multi-Resolution Tactile Imitation Learning for Contact-Rich Robotic Manipulation `Dexterous` `Manipulation` `Tactile` Touch sensing is beneficial for solving a wide variety of manipulation tasks. While there exists a wide range of tactile sensors with different properties, exploiting the fusion of multiple heterogeneous tactile sensors to improve manipulation learning remains underexplored. We present Multi-Resolution Tactile Sensing (MiTaS), a representation framework that leverages multiple tactile sensors…	Georgia Chalvatzaki Team	ArXiv
2026-06-04	Robust Ensemble of Selectively Strengthened and Augmented Predictors `Dexterous` Evasion attacks present a significant challenge to the robustness of machine learning (ML)-based classifiers, particularly in critical applications such as fraud detection and cybersecurity. Although existing defense mechanisms are effective in some settings, they often suffer from limited generalizability and do not systematically improve model robustness across diverse attack scenarios. To…	Mehran Ebrahimi Team	ArXiv
2026-06-04	TAM: Torque Adaptation Module for Robust Motion Transfer in Manipulation `Dexterous` `Sim2Real` A policy tuned for one robot often behaves differently on another, whether due to the sim-to-real gap, unknown payloads, or the differing dynamics of two instances of the same robot. In contact-rich, dynamic manipulation, even small motion discrepancies can result in failure to track reference motion, since they disrupt the timing and modes of contact. Common remedies, such as domain…	Dieter Fox Team	ArXiv
2026-06-04	ActiveMimic: Egocentric Video Pretraining with Active Perception `Dexterous` Egocentric human video offers a scalable alternative to robot data for pretraining, yet models pretrained on such video consistently underperform those pretrained on robot data. We attribute this gap to a missing signal, the active perception behavior in egocentric videos, where humans continuously reposition their viewpoint during manipulation, inducing camera motion that standard pipelines…	Yu-Gang Jiang Team	ArXiv / Web
2026-06-04	Deep reinforcement learning with spatial and temporal awareness for active boundary control of buoyancy-driven convection `Dexterous` Deep reinforcement learning (DRL) applied to thermal convection control consistently produces \textit{degenerate actuation}: wall-temperature policies whose outputs are saturated, pseudo-random, or spatially incoherent. Two compounding deficiencies are responsible: multilayer-perceptron policies that discard spatial flow structure, and memoryless policies that cannot distinguish self-induced flow…	Alfredo Pinelli Team	ArXiv
2026-06-04	AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding `Dexterous` `Manipulation` `VLA` Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise perception–action mappings. To address this challenge, we propose \textbf{AffordanceVLA}, a unified…	Yingcong Chen Team	ArXiv / Web
2026-06-04	RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing `Dexterous` Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. To explore this vulnerability, we introduce RedEdit, a novel black-box red-teaming agent that…	Li Liu Team	ArXiv
2026-06-04	MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation `Dexterous` `Manipulation` `LearnedControl` We present MotionDisco, a framework that discovers contact-rich, long-horizon humanoid loco-manipulation motions from scratch, without relying on teleoperation or motion retargeting from human demonstrations. This is challenging because the space of possible contact interactions grows combinatorially with the task horizon and the number of objects in the scene. MotionDisco enables rapid discovery…	Majid Khadiv Team	ArXiv
2026-06-03	GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors `Dexterous` `Sim2Real` `LearnedControl` Scaling humanoid loco-manipulation requires robot-compatible demonstrations across diverse objects, whole-body motions, and scene geometries, but teleoperation and motion capture are difficult to scale because each collection depends on physical setups, instrumented actors, and robot operation. We present GRAIL, a digital generation pipeline that remains fully virtual until deployment: it…	Ye Yuan Team	ArXiv / Web
2026-06-03	X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation `Dexterous` Rigorous evaluation of learning-based robotic systems is an essential prerequisite for deployment. However, real-world test data is expensive to gather; moreover, in a typical iterative development context, data gathered from the latest policy is necessarily limited in scale. This motivates evaluation methodologies that make use of heterogeneous data sources, including simulation, historical…	Marco Pavone Team	ArXiv
2026-06-03	InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space `Dexterous` Language-guided photo retouching aims to adjust color and tone while preserving geometry and texture. Recently, diffusion-based retouching shows a superior visual quality, but often struggles with both fidelity issues due to its generative nature and efficiency because of its iterative sampling process. In this work, we propose an efficient and fidelity-preserving retouching method using…	Tianfan Xue Team	ArXiv
2026-06-03	Non-obvious Manipulability in the Additively Separable Group Activity Selection Problem `Dexterous` In this work, we study the additively separable Group Activity Selection Problem (AS-GASP) in an imperfect information setting, where agents have private preferences over activities and weights over other agents. Our goal is to design mechanisms that assign agents to activities based on their declared preferences and weights, with the objective of maximizing social welfare while ensuring truthful…	Giovanna Varricchio Team	ArXiv
2026-06-03	Small-angle solution scattering: from fundamental theory to practical approximations `Dexterous` Small-angle scattering (SAS) is widely used in structural biology, soft matter, and colloidal science to probe molecular structures in solution. SAS rests on a single physical principle: wave interference from a distribution of scatterers, averaged over orientations. Yet the theoretical foundations of SAS are spread across the literature, often based on differing notation, definitions, and…	Jochen S. Hub Team	ArXiv
2026-06-03	Potential-Guided Flow Matching for Vision-Language-Action Policy Improvement `Dexterous` `VLA` Large vision-language-action (VLA) policies are increasingly trained as conditional generative models over action chunks. Yet deployment produces mixed-quality experience-successful demonstrations, partial completions, recoverable mistakes, and failures-that is difficult to use with standard imitation. Full behavior cloning (BC) imitates failures, filtered BC discards useful sub-trajectories, and…	Gang Wang Team	ArXiv
2026-06-03	CLIF: Cross-layer LEO-ISL Fingerprinting for Physical and Network Attack Detection in Dense LEO Constellations `Dexterous` Low-Earth Orbit (LEO) mega-constellations such as Starlink by SpaceX and Kuiper by Amazon rely on optical Inter-Satellite Links (ISLs) for autonomous mesh routing to provide low-latency telecommunication, Internet of Things (IoT), and security services globally. As commercial operators and governments deploy increasingly dense constellations and form multi-operator peering coalitions, ISL…	Biplab Sikdar Team	ArXiv
2026-06-03	DIST-FL: Enhancing Security for TEE-based Aggregation in Federated Learning `Dexterous` Trusted Execution Environments (TEEs)-aided federated learning protocols emerge as promising solutions to counter server-side adversaries and ensure the trustworthiness of the server. In this paper, we dissect existing protocols and demonstrate that server-side adversaries can still manipulate client selection and replay aggregation to compromise system robustness and privacy, by exploiting TEE…	Yinqian Zhang Team	ArXiv
2026-06-03	AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety `Dexterous` As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annotated with fine-grained safety risk categories. The dataset contains 2,123 real-world Replika conversations collected…	TengTeng Ma Team	ArXiv
2026-06-03	M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking `Dexterous` `Sim2Real` `LearnedControl` Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector…	Shengbo Eben Li Team	ArXiv
2026-06-03	HapTile: A Haptic-Informed Vision-Tactile-Language-Action Dataset for Contact-Rich Imitation Learning `Dexterous` `VLA` `Tactile` Despite the importance of tactile sensing for reliable manipulation, most existing Vision-Language-Action (VLA) datasets remain vision-only, and those that do incorporate tactile information typically lack the joint combination of task diversity, language conditioning, and action trajectories. Furthermore, existing teleoperation pipelines rarely provide haptic feedback to the operator, despite…	Shan Luo Team	ArXiv
2026-06-03	Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation? `Dexterous` `HF-Hot` Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window onto this question: if a model has…	Mike Zheng Shou Team	ArXiv
2026-06-03	Spatial Deformation Mechnisim of Meta-Atom Coupling and Scaling `Dexterous` Metasurfaces enable precise manipulation of light-matter interactions, and meta-atom coupling and scaling dominates their resonant properties and functional responses. Conventionally, coupled-mode theory (CMT), coupled dipole theory (CDT) and full-wave simulation are widely adopted to analyze such coupling effects. Nevertheless, CMT and CDT are essentially phenomenological theories. Although…	Lei Liang Team	ArXiv
2026-06-03	A model-free approach to control barrier functions for higher-order systems `Dexterous` Control barrier functions (CBFs) are a widely applied modular tool to ensure safe operation of nonlinear dynamical control systems. However, for their construction accurate knowledge of the system dynamics is typically needed. This requirement was recently alleviated for relative-degree-one systems using techniques from prescribed performance control (PPC) or funnel control (FC). This article…	Karl Worthmann Team	ArXiv
2026-06-03	VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training `Dexterous` `VLA` Universal Manipulation Interface (UMI) enables scalable real-world robot data collection without hardware-specific teleoperation, yet leveraging UMI data to train large-scale Vision-Language-Action (VLA) models remains fundamentally challenging. We identify two critical mismatches: wrist-mounted fisheye views, with severe radial distortion and local gripper-centric perspectives, are…	Xuelong Li Team	ArXiv
2026-06-03	Hybrid Adversarial Defence for Natural Language Understanding Tasks `Dexterous` Large Language Models (LLMs) are vulnerable both to hallucination and adversarial manipulation. Although these problems are closely related, existing defences typically address them separately. We investigate a hybrid defence framework that combines entropy-based models, designed to reduce hallucinations, with uncertainty-based models and geometric-based models, designed to reduce vulnerability….	Stuart E. Middleton Team	ArXiv
2026-06-03	Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization `Dexterous` Recent advances in generative image editing have improved the realism and controllability of localized image manipulation, raising new challenges for image manipulation detection and localization (IMDL). However, existing IMDL benchmarks still have limitations in visual realism, manipulation diversity, and generator coverage, making it difficult to reflect recent trends in image manipulation. To…	Jungong Han Team	ArXiv
2026-06-03	Arbitrary manipulation of nuclear spins in hexagonal boron nitride `Dexterous` Due to its localized nature and controllability, the negatively charged boron vacancy centers (V$_\text{B}^-$) in hexagonal boron nitride (hBN) are a promising spin platform for accessing its neighboring nuclei with potential for performing quantum computational tasks. However, the methods of utilizing and manipulating the nuclear spins are still lacking. In this work, we propose a protocol for…	Mehdi Abdi Team	ArXiv
2026-06-03	3DThinkVLA: Endowing Vision-Language-Action Models with Latent 3D Priors via 3D-Thinking-Guided Co-training `Dexterous` `VLA` We propose a 3D-thinking-guided co-training framework that enables vision-language-action (VLA) models to perform 3D spatial reasoning implicitly during action prediction. Our core insight is that 3D geometry perception and 3D spatial reasoning are distinct capabilities that can be disentangled and injected at different feature hierarchies. During training, three tightly coupled components work…	Weihao Yuan Team	ArXiv
2026-06-03	Input-to-State Stable Bundle Koopman Neural ODEs for Learning Controlled Dynamics under Environmental Constraints `Dexterous` We propose ISS-BKNO, a unified framework that integrates Koopman operator identification, Neural ordinary differential equations (ODEs), fiber bundle geometry, and input-to-state stability (ISS) certification. Unlike prior approaches that address stability, extrinsic inputs, or environmental constraints in isolation, the proposed framework simultaneously learns controlled nonlinear dynamics while…	Lin Feng	ArXiv