List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

Mistral Small

By Mistral AI

With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.

Release Date

10 Jan 2024

Context Size

32K

Mistral Medium

By Mistral AI

This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.

Release Date

10 Jan 2024

Context Size

32K

Bagel 34B v0.2

By Jon Durbin

An experimental fine-tune of [Yi 34b 200k](/models/01-ai/yi-34b-200k) using [bagel](https://github.com/jondurbin/bagel). This is the version of the fine-tune before direct preference optimization (DPO) has been applied. DPO performs better on benchmarks, but this version is likely better for creative writing, roleplay, etc.

Release Date

05 Jan 2024

Context Size

200K

Nous: Hermes 2 Yi 34B

By Nous Research

Nous Hermes 2 Yi 34B was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape. Nous-Hermes 2 on Yi 34B outperforms all Nous-Hermes & Open-Hermes models of the past, achieving new heights in all benchmarks for a Nous Research LLM as well as surpassing many popular finetunes.

Release Date

02 Jan 2024

Context Size

4.10K

Noromaid Mixtral 8x7B Instruct

By NeverSleep

This model was trained for 8h(v1) + 8h(v2) + 12h(v3) on customized modified datasets, focusing on RP, uncensoring, and a modified version of the Alpaca prompting (that was already used in LimaRP), which should be at the same conversational level as ChatLM or Llama2-Chat without adding any additional special tokens.

Release Date

02 Jan 2024

Context Size

Mistral: Mistral 7B Instruct v0.2

By Mistral AI

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes: - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention

Release Date

28 Dec 2023

Context Size

32.77K

Dolphin 2.6 Mixtral 8x7B 🐬

By Cognitive Computations

This is a 16k context fine-tune of [Mixtral-8x7b](/models/mistralai/mixtral-8x7b). It excels in coding tasks due to extensive training with coding data and is known for its obedience, although it lacks DPO tuning. The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models). #moe #uncensored

Release Date

21 Dec 2023

Context Size

32.77K

RWKV v5 World 3B

By RWKV

[RWKV](https://wiki.rwkv.com) is an RNN (recurrent neural network) with transformer-level performance. It aims to combine the best of RNNs and transformers - great performance, fast inference, low VRAM, fast training, "infinite" context length, and free sentence embedding. RWKV-5 is trained on 100+ world languages (70% English, 15% multilang, 15% code). RWKV 3B models are provided for free, by Recursal.AI, for the beta period. More details [here](https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter). #rnn

Release Date

10 Dec 2023

Context Size

10K

RWKV v5 3B AI Town

By recursal

This is an [RWKV 3B model](/models/rwkv/rwkv-5-world-3b) finetuned specifically for the [AI Town](https://github.com/a16z-infra/ai-town) project. [RWKV](https://wiki.rwkv.com) is an RNN (recurrent neural network) with transformer-level performance. It aims to combine the best of RNNs and transformers - great performance, fast inference, low VRAM, fast training, "infinite" context length, and free sentence embedding. RWKV 3B models are provided for free, by Recursal.AI, for the beta period. More details [here](https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter). #rnn

Release Date

10 Dec 2023

Context Size

10K

Mistral: Mixtral 8x7B Instruct

By Mistral AI

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe

Release Date

10 Dec 2023

Context Size

32.77K

StripedHyena Nous 7B

By Together

This is the chat model variant of the [StripedHyena series](/models?q=stripedhyena) developed by Together in collaboration with Nous Research. StripedHyena uses a new architecture that competes with traditional Transformers, particularly in long-context data processing. It combines attention mechanisms with gated convolutions for improved speed, efficiency, and scaling. This model marks a significant advancement in AI architecture for sequence modeling tasks.

Release Date

09 Dec 2023

Context Size

32.77K

StripedHyena Hessian 7B (base)

By Together

This is the base model variant of the [StripedHyena series](/models?q=stripedhyena), developed by Together. StripedHyena uses a new architecture that competes with traditional Transformers, particularly in long-context data processing. It combines attention mechanisms with gated convolutions for improved speed, efficiency, and scaling. This model marks an advancement in AI architecture for sequence modeling tasks.

Release Date

09 Dec 2023

Context Size

32.77K

Psyfighter v2 13B

By KoboldAI

The v2 of [Psyfighter](/models/jebcarter/psyfighter-13b) - a merged model created by the KoboldAI community members Jeb Carter and TwistedShadows, made possible thanks to the KoboldAI merge request service. The intent was to add medical data to supplement the model's fictional ability with more details on anatomy and mental states. This model should not be used for medical advice or therapy because of its high likelihood of pulling in fictional data. It's a merge between: - [KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) - [Doctor-Shotgun/cat-v1.0-13b](https://huggingface.co/Doctor-Shotgun/cat-v1.0-13b) - [Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged](https://huggingface.co/Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged). #merge

Release Date

08 Dec 2023

Context Size

4.10K

Nous: Hermes 2 Vision 7B (alpha)

By Nous Research

This vision-language model builds on innovations from the popular [OpenHermes-2.5](/models/teknium/openhermes-2.5-mistral-7b) model, by Teknium. It adds vision support, and is trained on a custom dataset enriched with function calling This project is led by [qnguyen3](https://twitter.com/stablequan) and [teknium](https://twitter.com/Teknium1). #multimodal

Release Date

07 Dec 2023

Context Size

4.10K

MythoMist 7B

By Gryphe

From the creator of [MythoMax](/models/gryphe/mythomax-l2-13b), merges a suite of models to reduce word anticipation, ministrations, and other undesirable words in ChatGPT roleplaying data. It combines [Neural Chat 7B](/models/intel/neural-chat-7b), Airoboros 7b, [Toppy M 7B](/models/undi95/toppy-m-7b), [Zepher 7b beta](/models/huggingfaceh4/zephyr-7b-beta), [Nous Capybara 34B](/models/nousresearch/nous-capybara-34b), [OpenHeremes 2.5](/models/teknium/openhermes-2.5-mistral-7b), and many others. #merge

Release Date

07 Dec 2023

Context Size

32.77K

Yi 6B (base)

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This is the base 6B parameter model.

Release Date

07 Dec 2023

Context Size

4.10K

Yi 34B Chat

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This 34B parameter model has been instruct-tuned for chat.

Release Date

07 Dec 2023

Context Size

4.10K

Yi 34B (base)

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This is the base 34B parameter model.

Release Date

07 Dec 2023

Context Size

4.10K

Cinematika 7B (alpha)

By OpenRouter

This model is under development. Check the [OpenRouter Discord](https://discord.gg/fVyRaUDgxW) for updates.

Release Date

06 Dec 2023

Context Size

Nous: Capybara 7B

By Nous Research

The Capybara series is a collection of datasets and models made by fine-tuning on data created by Nous, mostly in-house. V1.9 uses unalignment techniques for more consistent and dynamic control. It also leverages a significantly better foundation model, [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1).

Release Date

05 Dec 2023

Context Size

8.19K

Psyfighter 13B

By Jeb Carter

A merge model based on [Llama-2-13B](/models/meta-llama/llama-2-13b-chat) and made possible thanks to the compute provided by the KoboldAI community. It's a merge between: - [KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) - [chaoyi-wu/MedLLaMA_13B](https://huggingface.co/chaoyi-wu/MedLLaMA_13B) - [Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged](https://huggingface.co/Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged). #merge

Release Date

29 Nov 2023

Context Size

4.10K

OpenChat 3.5 7B

By OpenChat

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels. - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/models/openchat/openchat-7b). - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/models/openchat/openchat-8b). #open-source

Release Date

28 Nov 2023

Context Size

8.19K

Noromaid 20B

By NeverSleep

A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge. #merge #uncensored

Release Date

26 Nov 2023

Context Size

8.19K

Neural Chat 7B v3.1

By Intel

A fine-tuned model based on [mistralai/Mistral-7B-v0.1](/models/mistralai/mistral-7b-instruct-v0.1) on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), aligned with DPO algorithm. For more details, refer to the blog: [The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).

Release Date

25 Nov 2023

Context Size

4.10K

Anthropic: Claude v2

By Anthropic

Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.

Release Date

22 Nov 2023

Context Size

200K

Anthropic: Claude v2.1

By Anthropic

Release Date

22 Nov 2023

Context Size

200K

Anthropic: Claude Instant v1.1

By Anthropic

Anthropic's model for low-latency, high throughput text generation. Supports hundreds of pages of text.

Release Date

22 Nov 2023

Context Size

100K

OpenHermes 2.5 Mistral 7B

By Teknium

A continuation of [OpenHermes 2 model](/models/teknium/openhermes-2-mistral-7b), trained on additional code datasets. Potentially the most interesting finding from training on a good ratio (est. of around 7-14% of the total dataset) of code instruction was that it has boosted several non-code benchmarks, including TruthfulQA, AGIEval, and GPT4All suite. It did however reduce BigBench benchmark score, but the net gain overall is significant.

Release Date

20 Nov 2023

Context Size

4.10K

LLaVA 13B

By Haotian Liu

LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities and setting a new state-of-the-art accuracy on Science QA. #multimodal

Release Date

16 Nov 2023

Context Size

2.05K

Nous: Capybara 34B

By Nous Research

This model is trained on the Yi-34B model for 3 epochs on the Capybara dataset. It's the first 34B Nous model and first 200K context length Nous model.

Release Date

15 Nov 2023

Context Size

200K

Showing page 24 of 26 with 762 models total