← back
Scalable On-Prem LLM RAG Deployment

Scalable On-Prem LLM RAG Deployment

Pending
💰 USD 10–20 👤 Unknown 🕒 18d ago status: new
Cloud Computing Infrastructure Architecture DevOps Large Language Model Whisper AI Retrieval-Augmented Generation (RAG)
I need solution for LLM (selectd by client)+ RAG deployed on own server(recommended by freelancer) with automatic scalable to 1000 or more converations the same time. Instances /pods should be added and removed automatically to save costs(for now only online dedicated serwers /clauds) later hibdrid of GPU server on premis + online servers Currenly additional information aboout users we have in postgresql only , we want to give user option to talk with RAG data and LLM model System also should count usages, store inforamtion when conversation started and finished in our database. If there is better solution recommended to talk wih the data I am open for it . In future I would like to add sending voice to this server and getting it back (except text). Please share price,timeplan for text only and text+voice + fiull support and documentation
↗ View on Freelancer