Project Description
I want to build a ChatGPT-style application that works exclusively with images and imposes no built-in content filters. The core capability is image analysis, specifically object detection and recognition, with an immediate focus on identifying people and faces in any photo a user uploads.
You’re free to choose the tech stack, but I expect modern computer-vision frameworks—think PyTorch, TensorFlow, or a YOLO-family model—backed by a concise, well-documented API so the system can later expand into other tasks such as classification or captioning if I choose. Fast, server-side inference is important; cloud GPU deployment or an optimized on-prem setup is acceptable as long as latency stays low.
Please include:
• A clean front-end where users drop an image and instantly see bounding boxes or masks around detected faces and people, plus confidence scores.
• Source code, model weights, and a brief README that explains local setup and any additional training steps.
• A short test suite or demo script that proves detection accuracy on at least a small sample set.
I’m open to milestone suggestions—as long as progress is demonstrated in usable increments, the finer budget breakdown can be flexible.