List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

OpenAI: GPT-3.5 Turbo (older v0613)

OpenAI: GPT-3.5 Turbo (older v0613)

By OpenAI

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

Release Date

25 Jan 2024

Context Size

4.09K

Yi 34B 200K

Yi 34B 200K

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This version was trained on a large context length, allowing ~200k words (1000 paragraphs) of combined input and output.

Release Date

22 Jan 2024

Context Size

200K

Nous: Hermes 2 Mixtral 8x7B DPO

Nous: Hermes 2 Mixtral 8x7B DPO

By Nous Research

Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b). The model was trained on over 1,000,000 entries of primarily [GPT-4](/models/openai/gpt-4) generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. #moe

Release Date

16 Jan 2024

Context Size

32.77K

Nous: Hermes 2 Mixtral 8x7B SFT

Nous: Hermes 2 Mixtral 8x7B SFT

By Nous Research

Nous Hermes 2 Mixtral 8x7B SFT is the supervised finetune only version of [the Nous Research model](/models/nousresearch/nous-hermes-2-mixtral-8x7b-dpo) trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b). The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. #moe

Release Date

16 Jan 2024

Context Size

32.77K

Mistral Tiny

Mistral Tiny

By Mistral AI

Note: This model is being deprecated. Recommended replacement is the newer [Ministral 8B](/mistral/ministral-8b) This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Release Date

10 Jan 2024

Context Size

32K

Mistral Small

Mistral Small

By Mistral AI

With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.

Release Date

10 Jan 2024

Context Size

32K

Mistral Medium

Mistral Medium

By Mistral AI

This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.

Release Date

10 Jan 2024

Context Size

32K

Bagel 34B v0.2

Bagel 34B v0.2

By Jon Durbin

An experimental fine-tune of [Yi 34b 200k](/models/01-ai/yi-34b-200k) using [bagel](https://github.com/jondurbin/bagel). This is the version of the fine-tune before direct preference optimization (DPO) has been applied. DPO performs better on benchmarks, but this version is likely better for creative writing, roleplay, etc.

Release Date

05 Jan 2024

Context Size

200K

Nous: Hermes 2 Yi 34B

Nous: Hermes 2 Yi 34B

By Nous Research

Nous Hermes 2 Yi 34B was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape. Nous-Hermes 2 on Yi 34B outperforms all Nous-Hermes & Open-Hermes models of the past, achieving new heights in all benchmarks for a Nous Research LLM as well as surpassing many popular finetunes.

Release Date

02 Jan 2024

Context Size

4.10K

Noromaid Mixtral 8x7B Instruct

Noromaid Mixtral 8x7B Instruct

By NeverSleep

This model was trained for 8h(v1) + 8h(v2) + 12h(v3) on customized modified datasets, focusing on RP, uncensoring, and a modified version of the Alpaca prompting (that was already used in LimaRP), which should be at the same conversational level as ChatLM or Llama2-Chat without adding any additional special tokens.

Release Date

02 Jan 2024

Context Size

8K

Mistral: Mistral 7B Instruct v0.2

Mistral: Mistral 7B Instruct v0.2

By Mistral AI

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes: - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention

Release Date

28 Dec 2023

Context Size

32.77K

Dolphin 2.6 Mixtral 8x7B 🐬

Dolphin 2.6 Mixtral 8x7B 🐬

By Cognitive Computations

This is a 16k context fine-tune of [Mixtral-8x7b](/models/mistralai/mixtral-8x7b). It excels in coding tasks due to extensive training with coding data and is known for its obedience, although it lacks DPO tuning. The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models). #moe #uncensored

Release Date

21 Dec 2023

Context Size

32.77K

RWKV v5 World 3B

RWKV v5 World 3B

By RWKV

[RWKV](https://wiki.rwkv.com) is an RNN (recurrent neural network) with transformer-level performance. It aims to combine the best of RNNs and transformers - great performance, fast inference, low VRAM, fast training, "infinite" context length, and free sentence embedding. RWKV-5 is trained on 100+ world languages (70% English, 15% multilang, 15% code). RWKV 3B models are provided for free, by Recursal.AI, for the beta period. More details [here](https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter). #rnn

Release Date

10 Dec 2023

Context Size

10K

RWKV v5 3B AI Town

RWKV v5 3B AI Town

By recursal

This is an [RWKV 3B model](/models/rwkv/rwkv-5-world-3b) finetuned specifically for the [AI Town](https://github.com/a16z-infra/ai-town) project. [RWKV](https://wiki.rwkv.com) is an RNN (recurrent neural network) with transformer-level performance. It aims to combine the best of RNNs and transformers - great performance, fast inference, low VRAM, fast training, "infinite" context length, and free sentence embedding. RWKV 3B models are provided for free, by Recursal.AI, for the beta period. More details [here](https://substack.recursal.ai/p/public-rwkv-3b-model-via-openrouter). #rnn

Release Date

10 Dec 2023

Context Size

10K

Mistral: Mixtral 8x7B Instruct

Mistral: Mixtral 8x7B Instruct

By Mistral AI

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe

Release Date

10 Dec 2023

Context Size

32.77K

StripedHyena Hessian 7B (base)

StripedHyena Hessian 7B (base)

By Together

This is the base model variant of the [StripedHyena series](/models?q=stripedhyena), developed by Together. StripedHyena uses a new architecture that competes with traditional Transformers, particularly in long-context data processing. It combines attention mechanisms with gated convolutions for improved speed, efficiency, and scaling. This model marks an advancement in AI architecture for sequence modeling tasks.

Release Date

09 Dec 2023

Context Size

32.77K

StripedHyena Nous 7B

StripedHyena Nous 7B

By Together

This is the chat model variant of the [StripedHyena series](/models?q=stripedhyena) developed by Together in collaboration with Nous Research. StripedHyena uses a new architecture that competes with traditional Transformers, particularly in long-context data processing. It combines attention mechanisms with gated convolutions for improved speed, efficiency, and scaling. This model marks a significant advancement in AI architecture for sequence modeling tasks.

Release Date

09 Dec 2023

Context Size

32.77K

Psyfighter v2 13B

Psyfighter v2 13B

By KoboldAI

The v2 of [Psyfighter](/models/jebcarter/psyfighter-13b) - a merged model created by the KoboldAI community members Jeb Carter and TwistedShadows, made possible thanks to the KoboldAI merge request service. The intent was to add medical data to supplement the model's fictional ability with more details on anatomy and mental states. This model should not be used for medical advice or therapy because of its high likelihood of pulling in fictional data. It's a merge between: - [KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) - [Doctor-Shotgun/cat-v1.0-13b](https://huggingface.co/Doctor-Shotgun/cat-v1.0-13b) - [Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged](https://huggingface.co/Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged). #merge

Release Date

08 Dec 2023

Context Size

4.10K

Yi 6B (base)

Yi 6B (base)

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This is the base 6B parameter model.

Release Date

07 Dec 2023

Context Size

4.10K

MythoMist 7B

MythoMist 7B

By Gryphe

From the creator of [MythoMax](/models/gryphe/mythomax-l2-13b), merges a suite of models to reduce word anticipation, ministrations, and other undesirable words in ChatGPT roleplaying data. It combines [Neural Chat 7B](/models/intel/neural-chat-7b), Airoboros 7b, [Toppy M 7B](/models/undi95/toppy-m-7b), [Zepher 7b beta](/models/huggingfaceh4/zephyr-7b-beta), [Nous Capybara 34B](/models/nousresearch/nous-capybara-34b), [OpenHeremes 2.5](/models/teknium/openhermes-2.5-mistral-7b), and many others. #merge

Release Date

07 Dec 2023

Context Size

32.77K

Nous: Hermes 2 Vision 7B (alpha)

Nous: Hermes 2 Vision 7B (alpha)

By Nous Research

This vision-language model builds on innovations from the popular [OpenHermes-2.5](/models/teknium/openhermes-2.5-mistral-7b) model, by Teknium. It adds vision support, and is trained on a custom dataset enriched with function calling This project is led by [qnguyen3](https://twitter.com/stablequan) and [teknium](https://twitter.com/Teknium1). #multimodal

Release Date

07 Dec 2023

Context Size

4.10K

Yi 34B Chat

Yi 34B Chat

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This 34B parameter model has been instruct-tuned for chat.

Release Date

07 Dec 2023

Context Size

4.10K

Yi 34B (base)

Yi 34B (base)

By 01.AI

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This is the base 34B parameter model.

Release Date

07 Dec 2023

Context Size

4.10K

Cinematika 7B (alpha)

Cinematika 7B (alpha)

By OpenRouter

This model is under development. Check the [OpenRouter Discord](https://discord.gg/fVyRaUDgxW) for updates.

Release Date

06 Dec 2023

Context Size

8K

Nous: Capybara 7B

Nous: Capybara 7B

By Nous Research

The Capybara series is a collection of datasets and models made by fine-tuning on data created by Nous, mostly in-house. V1.9 uses unalignment techniques for more consistent and dynamic control. It also leverages a significantly better foundation model, [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1).

Release Date

05 Dec 2023

Context Size

8.19K

Psyfighter 13B

Psyfighter 13B

By Jeb Carter

A merge model based on [Llama-2-13B](/models/meta-llama/llama-2-13b-chat) and made possible thanks to the compute provided by the KoboldAI community. It's a merge between: - [KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) - [chaoyi-wu/MedLLaMA_13B](https://huggingface.co/chaoyi-wu/MedLLaMA_13B) - [Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged](https://huggingface.co/Doctor-Shotgun/llama-2-13b-chat-limarp-v2-merged). #merge

Release Date

29 Nov 2023

Context Size

4.10K

OpenChat 3.5 7B

OpenChat 3.5 7B

By OpenChat

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels. - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/models/openchat/openchat-7b). - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/models/openchat/openchat-8b). #open-source

Release Date

28 Nov 2023

Context Size

8.19K

Noromaid 20B

Noromaid 20B

By NeverSleep

A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge. #merge #uncensored

Release Date

26 Nov 2023

Context Size

8.19K

Neural Chat 7B v3.1

Neural Chat 7B v3.1

By Intel

A fine-tuned model based on [mistralai/Mistral-7B-v0.1](/models/mistralai/mistral-7b-instruct-v0.1) on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), aligned with DPO algorithm. For more details, refer to the blog: [The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2](https://medium.com/@NeuralCompressor/the-practice-of-supervised-finetuning-and-direct-preference-optimization-on-habana-gaudi2-a1197d8a3cd3).

Release Date

25 Nov 2023

Context Size

4.10K

Anthropic: Claude v2

Anthropic: Claude v2

By Anthropic

Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.

Release Date

22 Nov 2023

Context Size

200K

Showing page 23 of 25 with 737 models total