List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

DeepSeek: R1 Distill Qwen 14B

DeepSeek: R1 Distill Qwen 14B

By DeepSeek

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Release Date

29 Jan 2025

Context Size

131.07K

Perplexity: Sonar Reasoning

Perplexity: Sonar Reasoning

By Perplexity

Sonar Reasoning is a reasoning model provided by Perplexity based on [DeepSeek R1](/deepseek/deepseek-r1). It allows developers to utilize long chain of thought with built-in web search. Sonar Reasoning is uncensored and hosted in US datacenters.

Release Date

29 Jan 2025

Context Size

127K

Perplexity: Sonar

Perplexity: Sonar

By Perplexity

Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features optimized for speed.

Release Date

27 Jan 2025

Context Size

127.07K

Liquid: LFM 7B

Liquid: LFM 7B

By Liquid

LFM-7B, a new best-in-class language model. LFM-7B is designed for exceptional chat capabilities, including languages like Arabic and Japanese. Powered by the Liquid Foundation Model (LFM) architecture, it exhibits unique features like low memory footprint and fast inference speed. LFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese. See the [launch announcement](https://www.liquid.ai/lfm-7b) for benchmarks and more info.

Release Date

25 Jan 2025

Context Size

32.77K

Liquid: LFM 3B

Liquid: LFM 3B

By Liquid

Liquid's LFM 3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller. LFM-3B is the ideal choice for mobile and other edge text-based applications. See the [launch announcement](https://www.liquid.ai/liquid-foundation-models) for benchmarks and more info.

Release Date

25 Jan 2025

Context Size

32.77K

DeepSeek: R1 Distill Llama 70B

DeepSeek: R1 Distill Llama 70B

By DeepSeek

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Release Date

23 Jan 2025

Context Size

131.07K

DeepSeek: R1

DeepSeek: R1

By DeepSeek

DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120). MIT licensed: Distill & commercialize freely!

Release Date

20 Jan 2025

Context Size

163.84K

MiniMax: MiniMax-01

MiniMax: MiniMax-01

By MiniMax

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model. To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2

Release Date

15 Jan 2025

Context Size

1M

Mistral: Codestral 2501

Mistral: Codestral 2501

By Mistral AI

[Mistral](/mistralai)'s cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. Learn more on their blog post: https://mistral.ai/news/codestral-2501/

Release Date

14 Jan 2025

Context Size

256K

Microsoft: Phi 4

Microsoft: Phi 4

By Microsoft

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)

Release Date

10 Jan 2025

Context Size

16.38K

Sao10K: Llama 3.1 70B Hanami x1

Sao10K: Llama 3.1 70B Hanami x1

By Sao10K

This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).

Release Date

08 Jan 2025

Context Size

16K

DeepSeek: DeepSeek V3

DeepSeek: DeepSeek V3

By deepseek-ai

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).

Release Date

26 Dec 2024

Context Size

131.07K

Sao10K: Llama 3.3 Euryale 70B

Sao10K: Llama 3.3 Euryale 70B

By Sao10K

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).

Release Date

18 Dec 2024

Context Size

131.07K

Inflatebot: Mag Mell R1 12B

Inflatebot: Mag Mell R1 12B

By inflatebot

Mag Mell is a merge of pre-trained language models created using mergekit, based on [Mistral Nemo](/mistralai/mistral-nemo). It is a great roleplay and storytelling model which combines the best parts of many other models to be a general purpose solution for many usecases. Intended to be a general purpose "Best of Nemo" model for any fictional, creative use case. Mag Mell is composed of 3 intermediate parts: - Hero (RP, trope coverage) - Monk (Intelligence, groundedness) - Deity (Prose, flair)

Release Date

18 Dec 2024

Context Size

32K

OpenAI: o1

OpenAI: o1

By OpenAI

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

Release Date

17 Dec 2024

Context Size

200K

EVA Llama 3.33 70B

EVA Llama 3.33 70B

By EVA-UNIT-01

EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model This model was built with Llama by Meta.

Release Date

16 Dec 2024

Context Size

16.38K

xAI: Grok 2 Vision 1212

xAI: Grok 2 Vision 1212

By xAI

Grok 2 Vision 1212 advances image-based AI with stronger visual comprehension, refined instruction-following, and multilingual support. From object recognition to style analysis, it empowers developers to build more intuitive, visually aware applications. Its enhanced steerability and reasoning establish a robust foundation for next-generation image solutions. To read more about this model, check out [xAI's announcement](https://x.ai/blog/grok-1212).

Release Date

15 Dec 2024

Context Size

32.77K

xAI: Grok 2 1212

xAI: Grok 2 1212

By xAI

Grok 2 1212 introduces significant enhancements to accuracy, instruction adherence, and multilingual support, making it a powerful and flexible choice for developers seeking a highly steerable, intelligent model.

Release Date

15 Dec 2024

Context Size

131.07K

Cohere: Command R7B (12-2024)

Cohere: Command R7B (12-2024)

By Cohere

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps. Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).

Release Date

14 Dec 2024

Context Size

128K

Google: Gemini 2.0 Flash Experimental

Google: Gemini 2.0 Flash Experimental

By Google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Release Date

11 Dec 2024

Context Size

1.05M

Meta: Llama 3.3 70B Instruct (free)

Meta: Llama 3.3 70B Instruct (free)

By Meta Llama

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)

Release Date

06 Dec 2024

Context Size

131.07K

Meta: Llama 3.3 70B Instruct (free)

Meta: Llama 3.3 70B Instruct (free)

By meta-llama

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Release Date

06 Dec 2024

Context Size

131.07K

Amazon: Nova Lite 1.0

Amazon: Nova Lite 1.0

By Amazon

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.

Release Date

05 Dec 2024

Context Size

300K

Amazon: Nova Micro 1.0

Amazon: Nova Micro 1.0

By Amazon

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.

Release Date

05 Dec 2024

Context Size

128K

Amazon: Nova Pro 1.0

Amazon: Nova Pro 1.0

By Amazon

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents. **NOTE**: Video input is not supported at this time.

Release Date

05 Dec 2024

Context Size

300K

Qwen: QwQ 32B Preview

Qwen: QwQ 32B Preview

By Qwen

QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. **Language Mixing and Code-Switching**: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. **Recursive Reasoning Loops**: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. **Safety and Ethical Considerations**: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. **Performance and Benchmark Limitations**: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Release Date

28 Nov 2024

Context Size

32.77K

Google: Gemini Experimental 1121

Google: Gemini Experimental 1121

By Google

Experimental release (November 21st, 2024) of Gemini.

Release Date

21 Nov 2024

Context Size

40.96K

EVA Qwen2.5 72B

EVA Qwen2.5 72B

By EVA-UNIT-01

EVA Qwen2.5 72B is a roleplay and storywriting specialist model. It's a full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.

Release Date

21 Nov 2024

Context Size

32K

OpenAI: GPT-4o (2024-11-20)

OpenAI: GPT-4o (2024-11-20)

By OpenAI

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Release Date

20 Nov 2024

Context Size

128K

Mistral Large 2411

Mistral Large 2411

By Mistral AI

Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411) It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable improvements in long context understanding, a new system prompt, and more accurate function calling.

Release Date

19 Nov 2024

Context Size

131.07K

Showing page 18 of 26 with 762 models total