List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

Baidu: ERNIE 4.5 VL 28B A3B

Baidu: ERNIE 4.5 VL 28B A3B

By baidu

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

Release Date

12 Aug 2025

Context Size

30K

Z.ai: GLM 4.5V

Z.ai: GLM 4.5V

By Z.ai

GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Release Date

11 Aug 2025

Context Size

65.54K

AI21: Jamba Mini 1.7

AI21: Jamba Mini 1.7

By AI21

Jamba Mini 1.7 is a compact and efficient member of the Jamba open model family, incorporating key improvements in grounding and instruction-following while maintaining the benefits of the SSM-Transformer hybrid architecture and 256K context window. Despite its compact size, it delivers accurate, contextually grounded responses and improved steerability.

Release Date

08 Aug 2025

Context Size

256K

AI21: Jamba Large 1.7

AI21: Jamba Large 1.7

By AI21

Jamba Large 1.7 is the latest model in the Jamba open family, offering improvements in grounding, instruction-following, and overall efficiency. Built on a hybrid SSM-Transformer architecture with a 256K context window, it delivers more accurate, contextually grounded responses and better steerability than previous versions.

Release Date

08 Aug 2025

Context Size

256K

OpenAI: GPT-5 Chat

OpenAI: GPT-5 Chat

By OpenAI

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

Release Date

07 Aug 2025

Context Size

128K

OpenAI: GPT-5

OpenAI: GPT-5

By OpenAI

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

Release Date

07 Aug 2025

Context Size

400K

OpenAI: GPT-5 Mini

OpenAI: GPT-5 Mini

By OpenAI

GPT-5 Mini is a compact version of GPT-5, designed to handle lighter-weight reasoning tasks. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. GPT-5 Mini is the successor to OpenAI's o4-mini model.

Release Date

07 Aug 2025

Context Size

400K

OpenAI: GPT-5 Nano

OpenAI: GPT-5 Nano

By OpenAI

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger counterparts, it retains key instruction-following and safety features. It is the successor to GPT-4.1-nano and offers a lightweight option for cost-sensitive or real-time applications.

Release Date

07 Aug 2025

Context Size

400K

OpenAI: gpt-oss-120b (free)

OpenAI: gpt-oss-120b (free)

By openai

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Release Date

05 Aug 2025

Context Size

131.07K

OpenAI: gpt-oss-120b (free)

OpenAI: gpt-oss-120b (free)

By OpenAI

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Release Date

05 Aug 2025

Context Size

131.07K

OpenAI: gpt-oss-20b (free)

OpenAI: gpt-oss-20b (free)

By openai

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

Release Date

05 Aug 2025

Context Size

131.07K

OpenAI: gpt-oss-20b (free)

OpenAI: gpt-oss-20b (free)

By OpenAI

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Release Date

05 Aug 2025

Context Size

131.07K

Anthropic: Claude Opus 4.1

Anthropic: Claude Opus 4.1

By Anthropic

Claude Opus 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning.

Release Date

05 Aug 2025

Context Size

200K

Horizon Beta

Horizon Beta

By OpenRouter

This is a cloaked model provided to the community to gather feedback. This is an improved version of [Horizon Alpha](/openrouter/horizon-alpha) Note: It’s free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training.

Release Date

01 Aug 2025

Context Size

256K

Mistral: Codestral 2508

Mistral: Codestral 2508

By Mistral AI

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08)

Release Date

01 Aug 2025

Context Size

256K

Qwen: Qwen3 Coder 30B A3B Instruct

Qwen: Qwen3 Coder 30B A3B Instruct

By Qwen

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the Qwen3 architecture, it supports a native context length of 256K tokens (extendable to 1M with Yarn) and performs strongly in tasks involving function calls, browser use, and structured code completion. This model is optimized for instruction-following without “thinking mode”, and integrates well with OpenAI-compatible tool-use formats.

Release Date

31 Jul 2025

Context Size

160K

Horizon Alpha

Horizon Alpha

By OpenRouter

This was a cloaked model provided to the community to gather feedback. It has been deprecated - see [Horizon Beta](/openrouter/horizon-beta). Note: It’s free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training.

Release Date

30 Jul 2025

Context Size

256K

Qwen: Qwen3 30B A3B Instruct 2507

Qwen: Qwen3 30B A3B Instruct 2507

By Qwen

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and agentic tool use. Post-trained on instruction data, it demonstrates competitive performance across reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench) benchmarks. It outperforms its non-instruct variant on subjective and open-ended tasks while retaining strong factual and coding performance.

Release Date

29 Jul 2025

Context Size

262.14K

Z.ai: GLM 4.5

Z.ai: GLM 4.5

By Z.ai

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Release Date

25 Jul 2025

Context Size

131.07K

Z.ai: GLM 4.5 Air (free)

Z.ai: GLM 4.5 Air (free)

By Z.ai

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

Release Date

25 Jul 2025

Context Size

131.07K

Z.ai: GLM 4.5 Air (free)

Z.ai: GLM 4.5 Air (free)

By z-ai

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

Release Date

25 Jul 2025

Context Size

131.07K

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

By Qwen

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains. The model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.

Release Date

25 Jul 2025

Context Size

131.07K

Z.ai: GLM 4 32B

Z.ai: GLM 4 32B

By Z.ai

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.

Release Date

24 Jul 2025

Context Size

128K

Qwen: Qwen3 Coder 480B A35B (free)

Qwen: Qwen3 Coder 480B A35B (free)

By qwen

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

Release Date

23 Jul 2025

Context Size

262K

Qwen: Qwen3 Coder 480B A35B (free)

Qwen: Qwen3 Coder 480B A35B (free)

By Qwen

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts). Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

Release Date

23 Jul 2025

Context Size

262K

ByteDance: UI-TARS 7B

ByteDance: UI-TARS 7B

By bytedance

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Release Date

22 Jul 2025

Context Size

128K

Google: Gemini 2.5 Flash Lite

Google: Gemini 2.5 Flash Lite

By Google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

Release Date

22 Jul 2025

Context Size

1.05M

Qwen: Qwen3 235B A22B Instruct 2507

Qwen: Qwen3 235B A22B Instruct 2507

By Qwen

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

Release Date

21 Jul 2025

Context Size

262.14K

Switchpoint Router

Switchpoint Router

By switchpoint

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you always benefit from the industry's newest models without changing your workflow. This model is configured for a simple, flat rate per response here on OpenRouter. It's powered by the full routing engine from [Switchpoint AI](https://www.switchpoint.dev).

Release Date

11 Jul 2025

Context Size

131.07K

MoonshotAI: Kimi K2 0711

MoonshotAI: Kimi K2 0711

By moonshotai

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

Release Date

11 Jul 2025

Context Size

131.07K

Showing page 11 of 25 with 737 models total