Local AI Avatar for Latin American Spanish

—

Pending

💰 USD 750–1500 👤 Unknown 🕒 13d ago status: new

Required Skills

Python Audio Services Music Voice Talent Unity AI Text-to-speech AI Chatbot Development AI Development

Project Description

I am looking for a freelance developer or team to create a local AI avatar system with real-time voice interaction and facial/lip synchronization for Latin American Spanish. Currently, we already have a basic avatar that can display responses, but it does not speak or animate facial movements naturally. The goal is to build an avatar that can: Speak directly using AI-generated voice (TTS) Synchronize mouth/facial movements with speech Simulate realistic modulation using at least the 5 main vowel mouth shapes (visemes/phonemes) Run locally (offline or local server environment) Allow flexible integration with different AI providers Work primarily in Latin American Spanish Main requirements: • Local execution The system must run locally using CPU/GPU resources. Cloud dependence should be minimal or optional. • Spanish language support The avatar must work primarily in Latin American Spanish, including: natural Latin American Spanish voice generation correct Spanish phonetic lip synchronization proper pronunciation and modulation support for conversational Spanish interaction Chilean Spanish support is a plus, but not mandatory. • Lip sync / facial animation The avatar should animate while speaking, including: mouth movement synchronization basic facial animation blinking / idle movements preferred At minimum, the avatar should support vowel-based mouth shapes so it visually appears to modulate speech naturally in Spanish. Possible technologies are open to proposal: Unity Unreal Engine Three.js WebGL Live2D NVIDIA Audio2Face Oculus LipSync Rhubarb Lip Sync or similar alternatives • AI integration flexibility The conversational AI provider is not fixed. The architecture should allow easy replacement/integration of APIs such as: Grok OpenAI Claude Gemini local LLMs custom APIs We will later modify the backend/API ourselves, so modular architecture is important. • Audio pipeline Ideally the system should support: microphone input speech-to-text AI response generation text-to-speech synchronized avatar playback Deliverables: Fully functional prototype Source code Basic installation documentation Modular architecture Local deployment instructions Preferred experience: AI avatars lip sync systems facial animation TTS/STT Unity/Unreal real-time rendering local AI systems Optional future features: multiple avatars emotions streaming integration facial recognition body animation camera integration Please include: technologies you would use estimated timeline previous related work/demo if available approximate budget estimate for MVP development.

Actions

↗ View on Freelancer