Mobile-VTON: High-Fidelity On-Device Virtual Try-On

CVPR 2026

¹University of Sydney ²MBZUAI ³University of Melbourne ⁴Google

^*Equal Contribution ^†Corresponding Author

Abstract

Virtual try-on (VTON) has recently achieved impressive visual fidelity, but most existing systems require uploading personal photos to cloud-based GPUs, raising privacy concerns and limiting on-device deployment. To address this, we present Mobile-VTON, a high-quality, privacy-preserving framework that enables fully offline virtual try-on on commodity mobile devices using only a single user image and a garment image. Mobile-VTON introduces a modular TeacherNet-GarmentNet-TryonNet (TGT) architecture that integrates knowledge distillation, garment-conditioned generation, and garment alignment into a unified pipeline optimized for on-device efficiency. Within this framework, we propose a Feature-Guided Adversarial (FGA) Distillation strategy that combines teacher supervision with adversarial learning to better match real-world image distributions. GarmentNet is trained with a trajectory-consistency loss to preserve garment semantics across diffusion steps, while TryonNet uses latent concatenation and lightweight cross-modal conditioning to enable robust garment-to-person alignment without large-scale pretraining. By combining these components, Mobile-VTON achieves high-fidelity generation with low computational overhead. Experiments on VITON-HD and DressCode at 1024 x 768 show that it matches or outperforms strong server-based baselines while running entirely offline. These results demonstrate that high-quality VTON is not only feasible but also practical on-device, offering a secure solution for real-world applications.

BibTeX

@misc{wan2026textscmobilevtonhighfidelityondevicevirtual, title={\textsc{Mobile-VTON}: High-Fidelity On-Device Virtual Try-On}, author={Zhenchen Wan and Ce Chen and Runqi Lin and Jiaxin Huang and Tianxi Chen and Yanwu Xu and Tongliang Liu and Mingming Gong}, year={2026}, eprint={2603.00947}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.00947}, }

Mobile-VTON: High-Fidelity On-Device Virtual Try-On

Abstract

On-Device Inference Demo

Qualitative virtual try-on results on the VITON-HD In-the-Wild test set. Mobile-VTON generates high-fidelity results with precise garment alignment and skin texture preservation.

Qualitative virtual try-on results on the DressCode test set. Mobile-VTON generates high-fidelity results with precise garment alignment and skin texture preservation.

Qualitative virtual try-on results on the VITON-HD test set. Mobile-VTON generates high-fidelity results with precise garment alignment and skin texture preservation.

BibTeX