Project Description
I need solution for LLM (selectd by client)+ RAG deployed on own server(recommended by freelancer) with automatic scalable to 1000 or more converations the same time. Instances /pods should be added and removed automatically to save costs(for now only online dedicated serwers /clauds) later hibdrid of GPU server on premis + online servers
Currenly additional information aboout users we have in postgresql only , we want to give user option to talk with RAG data and LLM model
System also should count usages, store inforamtion when conversation started and finished in our database.
If there is better solution recommended to talk wih the data I am open for it .
In future I would like to add sending voice to this server and getting it back (except text).
Please share price,timeplan for text only and text+voice + fiull support and documentation