Install Steam
login
|
language
简体中文 (Simplified Chinese)
繁體中文 (Traditional Chinese)
日本語 (Japanese)
한국어 (Korean)
ไทย (Thai)
Български (Bulgarian)
Čeština (Czech)
Dansk (Danish)
Deutsch (German)
Español - España (Spanish - Spain)
Español - Latinoamérica (Spanish - Latin America)
Ελληνικά (Greek)
Français (French)
Italiano (Italian)
Bahasa Indonesia (Indonesian)
Magyar (Hungarian)
Nederlands (Dutch)
Norsk (Norwegian)
Polski (Polish)
Português (Portuguese - Portugal)
Português - Brasil (Portuguese - Brazil)
Română (Romanian)
Русский (Russian)
Suomi (Finnish)
Svenska (Swedish)
Türkçe (Turkish)
Tiếng Việt (Vietnamese)
Українська (Ukrainian)
Report a translation problem








I beg of you
And models/libraries have advanced a lot as well.
Exllama2 is nuts for speed, if you can load the whole model in VRAM it can answer in less than 5 seconds (but requieres modern nvidia gpus)
Llamacpp is a bit slower but can work anywhere (even without gpu but it will be slow) and use some system memory if vram is not enough.
Models deppends on what are you searching for, i have found out that nowadays there are models that can have a nice conversation with the user but they still narrate things and break sometimes.
If anyone of you is still really interested, send me a message anywhere and we can try to make it real
Does ExLLaMa have a server? I tried to move from llama.cpp to exl as soon as got powerful enough gpu but couldnt find it
https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API
ZERO cost. You can run one instance almost everywhere.