Project Description
I aim to build an advanced, research-grade Virtual Try-On (VTON) engine that can realistically place Tops (e.g., shirts, blouses), Bottoms (e.g., pants, skirts), and Full outfits (e.g., dresses, suits) onto a human model from nothing more than a single 2-D photograph. The workflow should centre on state-of-the-art deep generative techniques—diffusion models, flow-matching, and transformer-based architectures—so the final renders look genuinely photo-realistic, preserve garment texture, and respect body pose and occlusion.
The system will be trained on a curated dataset I already possess, then fine-tuned to accept JPEG and PNG uploads at inference time. Clean, modular PyTorch (or equivalent) code, a reproducible training pipeline, and inference scripts that run on a single high-end GPU are expected.
Deliverables
• End-to-end source code with clear comments
• Pre-trained model checkpoints and weights
• A short technical report explaining architecture choices, training schedule, and evaluation metrics
• Demo notebook or web stub that accepts a user image plus a clothing image and returns the composite
Acceptance criteria: demo outputs should pass a side-by-side realism test against ground-truth photos for at least 90 % of a 50-image validation set.
If you have published or shipped work using diffusion or transformer VTON approaches, that practical insight would be invaluable as we iterate toward production quality.