List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

NeverSleep: Llama 3 Lumimaid 8B

NeverSleep: Llama 3 Lumimaid 8B

By NeverSleep

The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary. To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Release Date

04 May 2024

Context Size

24.58K

Snowflake: Arctic Instruct

Snowflake: Arctic Instruct

By Snowflake

Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. To read more about this model's release, [click here](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/).

Release Date

30 Apr 2024

Context Size

4.10K

Fireworks: FireLLaVA 13B

Fireworks: FireLLaVA 13B

By Fireworks AI

A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercially permissive open source LLaVA model, trained entirely on open source LLM generated instruction following data.

Release Date

26 Apr 2024

Context Size

4.10K

Lynn: Llama 3 Soliloquy 8B v2

Lynn: Llama 3 Soliloquy 8B v2

By Lynn

Soliloquy-L3 v2 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, rich literary expression, and support for up to 24k context length. It outperforms existing ~13B models, delivering enhanced roleplaying capabilities. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Release Date

22 Apr 2024

Context Size

24.58K

Fimbulvetr 11B v2

Fimbulvetr 11B v2

By Sao10K

Creative writing model, routed with permission. It's fast, it keeps the conversation going, and it stays in character. If you submit a raw prompt, you can use Alpaca or Vicuna formats.

Release Date

21 Apr 2024

Context Size

8.19K

Meta: Llama 3 8B Instruct

Meta: Llama 3 8B Instruct

By Meta Llama

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Release Date

18 Apr 2024

Context Size

8.19K

Meta: Llama 3 70B Instruct

Meta: Llama 3 70B Instruct

By Meta Llama

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Release Date

18 Apr 2024

Context Size

8.19K

Mistral: Mixtral 8x22B Instruct

Mistral: Mixtral 8x22B Instruct

By Mistral AI

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe

Release Date

17 Apr 2024

Context Size

65.54K

WizardLM-2 8x22B

WizardLM-2 8x22B

By Microsoft

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe

Release Date

16 Apr 2024

Context Size

65.53K

WizardLM-2 7B

WizardLM-2 7B

By Microsoft

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune of [Mistral 7B Instruct](/models/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/models/microsoft/wizardlm-2-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe

Release Date

16 Apr 2024

Context Size

32K

Zephyr 141B-A35B

Zephyr 141B-A35B

By Hugging Face H4

Zephyr 141B-A35B is A Mixture of Experts (MoE) model with 141B total parameters and 35B active parameters. Fine-tuned on a mix of publicly available, synthetic datasets. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). #moe

Release Date

12 Apr 2024

Context Size

65.54K

Mistral: Mixtral 8x22B (base)

Mistral: Mixtral 8x22B (base)

By Mistral AI

Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via [X](https://twitter.com/MistralAI/status/1777869263778291896). #moe

Release Date

10 Apr 2024

Context Size

65.54K

OpenAI: GPT-4 Turbo

OpenAI: GPT-4 Turbo

By OpenAI

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.

Release Date

09 Apr 2024

Context Size

128K

Google: Gemini 1.5 Pro

Google: Gemini 1.5 Pro

By Google

Google's latest multimodal model, supports image and video[0] in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). * [0]: Video input is not available through OpenRouter at this time.

Release Date

09 Apr 2024

Context Size

2M

Cohere: Command R+

Cohere: Command R+

By Cohere

Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG). It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

04 Apr 2024

Context Size

128K

Cohere: Command R+ (04-2024)

Cohere: Command R+ (04-2024)

By Cohere

Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG). It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

02 Apr 2024

Context Size

128K

Databricks: DBRX 132B Instruct

Databricks: DBRX 132B Instruct

By Databricks

DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/models/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic. It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm). #moe

Release Date

29 Mar 2024

Context Size

32.77K

Midnight Rose 70B

Midnight Rose 70B

By Sophosympatheia

A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia. Descending from earlier versions of Midnight Rose and [Wizard Tulu Dolphin 70B](https://huggingface.co/sophosympatheia/Wizard-Tulu-Dolphin-70B-v1.0), it inherits the best qualities of each.

Release Date

22 Mar 2024

Context Size

4.10K

Cohere: Command R

Cohere: Command R

By Cohere

Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post [here](https://txt.cohere.com/command-r/). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

14 Mar 2024

Context Size

128K

Cohere: Command

Cohere: Command

By Cohere

Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

14 Mar 2024

Context Size

4.10K

Anthropic: Claude 3 Haiku

Anthropic: Claude 3 Haiku

By Anthropic

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

Release Date

13 Mar 2024

Context Size

200K

Anthropic: Claude 3 Opus

Anthropic: Claude 3 Opus

By Anthropic

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Release Date

05 Mar 2024

Context Size

200K

Anthropic: Claude 3 Sonnet

Anthropic: Claude 3 Sonnet

By Anthropic

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Release Date

05 Mar 2024

Context Size

200K

Cohere: Command R (03-2024)

Cohere: Command R (03-2024)

By Cohere

Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post [here](https://txt.cohere.com/command-r/). Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

02 Mar 2024

Context Size

128K

Mistral Large

Mistral Large

By Mistral AI

This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.

Release Date

26 Feb 2024

Context Size

128K

Google: Gemma 7B

Google: Gemma 7B

By Google

Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Release Date

22 Feb 2024

Context Size

8.19K

Nous: Hermes 2 Mistral 7B DPO

Nous: Hermes 2 Mistral 7B DPO

By Nous Research

This is the flagship 7B Hermes model, a Direct Preference Optimization (DPO) of [Teknium/OpenHermes-2.5-Mistral-7B](/models/teknium/openhermes-2.5-mistral-7b). It shows improvement across the board on all benchmarks tested - AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA. The model prior to DPO was trained on 1,000,000 instructions/chats of GPT-4 quality or better, primarily synthetic data as well as other high quality datasets.

Release Date

21 Feb 2024

Context Size

8.19K

Meta: CodeLlama 70B Instruct

Meta: CodeLlama 70B Instruct

By Meta Llama

Code Llama is a family of large language models for code. This one is based on [Llama 2 70B](/models/meta-llama/llama-2-70b-chat) and provides zero-shot instruction-following ability for programming tasks.

Release Date

30 Jan 2024

Context Size

2.05K

RWKV v5: Eagle 7B

RWKV v5: Eagle 7B

By recursal

Eagle 7B is trained on 1.1 Trillion Tokens across 100+ world languages (70% English, 15% multilang, 15% code). - Built on the [RWKV-v5](/models?q=rwkv) architecture (a linear transformer with 10-100x+ lower inference cost) - Ranks as the world's greenest 7B model (per token) - Outperforms all 7B class models in multi-lingual benchmarks - Approaches Falcon (1.5T), LLaMA2 (2T), Mistral (>2T?) level of performance in English evals - Trade blows with MPT-7B (1T) in English evals - All while being an ["Attention-Free Transformer"](https://www.isattentionallyouneed.com/) Eagle 7B models are provided for free, by [Recursal.AI](https://recursal.ai), for the beta period till end of March 2024 Find out more [here](https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers) [rnn](/models?q=rwkv)

Release Date

29 Jan 2024

Context Size

10K

OpenAI: GPT-4 Turbo Preview

OpenAI: GPT-4 Turbo Preview

By OpenAI

The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023. **Note:** heavily rate limited by OpenAI while in preview.

Release Date

25 Jan 2024

Context Size

128K

Showing page 22 of 25 with 737 models total