Python Music Video Automation Suite

—

Pending

💰 AUD 250–750 👤 Unknown 🕒 20d ago status: new

Required Skills

Python Software Architecture Transcription Machine Learning (ML) C++ Programming Arduino Video Production Video Editing Audio Processing Automation

Project Description

I’m building a fully-automated pipeline that turns any .wav file into a finished, high-quality MP4 music video without the per-video fees of online generators like Neuralframes. My workflow needs to live in a clean Anaconda environment and rely on the latest releases of Flux together with Hunyuan-Video in GGUF format as the core engines. Here’s the flow I want the script to cover: • Ingest a user-supplied .wav • Detect its BPM automatically and segment it into logical beats • Transcribe the vocals, generate editable lyrics, and let me tweak or overwrite them before render time • For roughly 65 beat-aligned segments, build a text prompt list (also editable) that feeds Flux + Hunyuan-Video to create matching frames • Stitch the generated frames and original audio into a single MP4, perfectly synced • Run headless so I can batch 1,000+ tracks with simple CLI commands; no GUI polish required, just stability, logging, and clear config files Key expectations • Python 3.11+ in Anaconda, modularized for easy updates of models • Output video must be 1080p or higher, H.264 in MP4 container • External dependencies such as ffmpeg, whisper, librosa (or your preferred audio library) and any model weights should auto-download or be documented in the environment.yml • A short README and sample run script that processes one demo song end-to-end Acceptance I’ll run the suite on a fresh machine, point it at a test .wav, adjust a couple of generated prompts, and get a synchronized MP4 with no manual video editing. If that passes and the code is clearly organized for scaling, the job is complete. **Project Title:** AI Music Video Generator (Desktop Software – Python) --- ## Overview Develop a **desktop application** that generates full-length AI music videos from a WAV audio file and lyrics input. The software must provide **fine-grained manual control over every scene**, while also supporting AI-assisted automation. The goal is to replicate and exceed tools like Neural Frames by allowing: * Scene-by-scene control * Frame-by-frame prompting * AI image and video generation * Full synchronization with music and lyrics --- ## Core Functional Requirements ### 1. Audio Input & Analysis * Import `.wav` audio files * Automatically analyze: * BPM (tempo) * Beat structure * Song sections (intro, verse, chorus, drop) * Generate a **timeline of scenes** based on audio segmentation --- ### 2. Lyrics Integration & Sync * Input lyrics manually (paste text) * Automatically align lyrics with timestamps using speech-to-text alignment * Display lyrics synced to timeline --- ### 3. Scene Timeline Editor * Visual timeline of the entire song * Split into 50–100 scenes (auto + manual override) * Each scene: * Clickable * Plays only that segment of audio * User can: * Adjust scene duration * Merge/split scenes * Reorder scenes --- ### 4. Scene Prompt System Each scene must allow: * Image prompt input (what to generate visually) * Motion prompt input (how it animates) * Ability to preview and edit prompts per scene --- ### 5. Image Generation Engine * Generate **high-quality images per scene** * Support modern models (e.g. FLUX / Stable Diffusion-class) * Batch or single-frame generation * Save images per scene --- ### 6. Video Generation Engine * Convert generated images into animated video clips * Support: * Camera movement (zoom, pan, rotation) * Motion prompts * Generate short clips (2–6 seconds per scene) --- ### 7. Clip Management System * Store all generated clips per scene * Allow: * Regenerate clips * Replace clips * Preview clips individually --- ### 8. Final Video Assembly * Automatically stitch all clips together * Ensure: * Correct timing * Smooth transitions * Overlay original WAV audio * Export final video (MP4) --- ### 9. Playback System * Preview: * Individual scenes * Full video * Sync playback with audio --- ## Technical Requirements ### Language & Environment * Python (Anaconda-compatible) * Must run locally on Windows ### GUI Framework * PySide6 (Qt-based modern interface) ### AI Integration * Must support integration with: * Image generation models * Video generation models * Modular backend (models can be swapped) ### Video Processing * FFmpeg integration required --- ## File & Project Structure Each project should store: * Audio file * Lyrics * Scene data (JSON) * Generated images * Generated clips * Final output --- ## Advanced Features (Preferred) * Beat-synced cuts and transitions * AI-assisted prompt generation * Style consistency across scenes * Character continuity (optional) * GPU acceleration (CUDA support) --- ## Deliverables * Fully working desktop application * Clean, maintainable Python code * Installation/setup instructions * Ability to run locally without cloud dependency (preferred) --- ## Notes for Developer * This is not a simple generator — it is a **production tool** * User control over scenes is critical * Performance and stability are important * Modular design is required for future upgrades --- ## Objective Create a tool that enables a single user to generate professional-quality AI music videos with full creative control, combining automation with manual direction.

Actions

↗ View on Freelancer