Project Description
I want to put together a quick-and-dirty paper prototype that proves one thing: textual instructions can be fed into an LLM, interpreted, and turned into accurate anatomical motion that I can later visualise in my copilot 3d js etc. The workflow I have in mind is simple on paper yet technically layered. First, the LLM receives short, structured cues (“Raise left arm to 90°”, “Rotate torso 15° right”, etc.). It must parse those sentences, identify the key actions, and output a sequence of joint-level positions. From there, the prototype should pass those positions to a lightweight Vision-AI / pose-estimation module so I can preview the motion as stick-figure frames and sanity-check anatomical correctness.
Because this is an early prototype, I’m not asking for polished UI—sketched screens, Figma wireframes, or a Jupyter notebook demo will do. What matters is demonstrating the end-to-end logic: text in, pose data out, and a quick visual confirmation that the pose is biomechanically plausible.
Deliverables
• A runnable or shareable prototype (paper, wireframe, or notebook) proving the pipeline from textual cue to pose output.
• LLM prompt template and parsing code that extracts the joint actions.
• Minimal visual preview (stick figure, skeleton overlay, or similar) driven by the generated pose data.
• Short read-me documenting assumptions, libraries, and how to extend the prototype into full production.
Acceptance criteria
• Given five sample cues I provide, the system returns joint coordinates that match each instruction within a tolerance we agree on.
• Visual preview clearly reflects those coordinates so misalignments are obvious at a glance.
You’re free to lean on OpenAI, Hugging Face, Mediapipe, Three.js, or any other familiar tools, as long as the chain remains transparent and reproducible. If you enjoy tinkering with LLM parsing and quick Vision AI hacks, this should be a fun sprint.