List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

OpenAI: GPT-5.4 Mini

OpenAI: GPT-5.4 Mini

By OpenAI

GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. The model is designed for production environments that require a balance of capability and efficiency, making it well suited for chat applications, coding assistants, and agent workflows that operate at scale. GPT-5.4 mini delivers reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency.

Release Date

17 Mar 2026

Context Size

400K

Mistral: Mistral Small 4

Mistral: Mistral Small 4

By Mistral AI

Mistral Small 4 is the next major release in the Mistral Small family, unifying the capabilities of several flagship Mistral models into a single system. It combines strong reasoning from Magistral, multimodal understanding from Pixtral, and agentic coding capabilities from Devstral, enabling one model to handle complex analysis, software development, and visual tasks within the same workflow.

Release Date

16 Mar 2026

Context Size

262.14K

Perplexity: Embed V1 4B

Perplexity: Embed V1 4B

By Perplexity

pplx-embed-v1 -4B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 4B parameter model maximizing retrieval quality.

Release Date

16 Mar 2026

Context Size

32K

Perplexity: Embed V1 0.6B

Perplexity: Embed V1 0.6B

By Perplexity

pplx-embed-v1-0.6B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 0.6B parameter model targeting lightweight, low-latency embedding generation.

Release Date

16 Mar 2026

Context Size

32K

Z.ai: GLM 5 Turbo

Z.ai: GLM 5 Turbo

By Z.ai

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows involving long execution chains, with improved complex instruction decomposition, tool use, scheduled and persistent execution, and overall stability across extended tasks.

Release Date

15 Mar 2026

Context Size

202.75K

xAI: Grok 4.20 Multi-Agent Beta

xAI: Grok 4.20 Multi-Agent Beta

By xAI

Grok 4.20 Multi-Agent Beta is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents

Release Date

12 Mar 2026

Context Size

2M

xAI: Grok 4.20 Beta

xAI: Grok 4.20 Beta

By xAI

Grok 4.20 Beta is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

Release Date

12 Mar 2026

Context Size

2M

Hunter Alpha

Hunter Alpha

By OpenRouter

Hunter Alpha is a 1 Trillion parameter + 1M token context frontier intelligence model built for agentic use. It excels at long-horizon planning, complex reasoning, and sustained multi-step task execution, with the reliability and instruction-following precision that frameworks like OpenClaw need. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

Release Date

11 Mar 2026

Context Size

1.05M

Healer Alpha

Healer Alpha

By OpenRouter

Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

Release Date

11 Mar 2026

Context Size

262.14K

NVIDIA: Nemotron 3 Super (free)

NVIDIA: Nemotron 3 Super (free)

By nvidia

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Release Date

11 Mar 2026

Context Size

262.14K

NVIDIA: Nemotron 3 Super

NVIDIA: Nemotron 3 Super

By Nvidia

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

Release Date

11 Mar 2026

Context Size

262.14K

ByteDance Seed: Seed-2.0-Lite

ByteDance Seed: Seed-2.0-Lite

By bytedance-seed

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it's an ideal choice for deployment at scale with minimal latency.

Release Date

10 Mar 2026

Context Size

262.14K

Qwen: Qwen3.5-9B

Qwen: Qwen3.5-9B

By Qwen

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.

Release Date

10 Mar 2026

Context Size

262.14K

OpenAI: GPT-5.4 Pro

OpenAI: GPT-5.4 Pro

By OpenAI

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.

Release Date

05 Mar 2026

Context Size

1.05M

OpenAI: GPT-5.4

OpenAI: GPT-5.4

By OpenAI

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.

Release Date

05 Mar 2026

Context Size

1.05M

Inception: Mercury 2

Inception: Mercury 2

By inception

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).

Release Date

04 Mar 2026

Context Size

128K

OpenAI: GPT-5.3 Chat

OpenAI: GPT-5.3 Chat

By OpenAI

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.

Release Date

03 Mar 2026

Context Size

128K

Google: Gemini 3.1 Flash Lite Preview

Google: Gemini 3.1 Flash Lite Preview

By Google

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

Release Date

03 Mar 2026

Context Size

1.05M

ByteDance Seed: Seed-2.0-Mini

ByteDance Seed: Seed-2.0-Mini

By bytedance-seed

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

Release Date

26 Feb 2026

Context Size

262.14K

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

By Google

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

Release Date

26 Feb 2026

Context Size

65.54K

Qwen: Qwen3.5-35B-A3B

Qwen: Qwen3.5-35B-A3B

By Qwen

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.

Release Date

25 Feb 2026

Context Size

262.14K

Qwen: Qwen3.5-27B

Qwen: Qwen3.5-27B

By Qwen

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

Release Date

25 Feb 2026

Context Size

262.14K

Qwen: Qwen3.5-122B-A10B

Qwen: Qwen3.5-122B-A10B

By Qwen

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

Release Date

25 Feb 2026

Context Size

262.14K

Qwen: Qwen3.5-Flash

Qwen: Qwen3.5-Flash

By Qwen

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

Release Date

25 Feb 2026

Context Size

1M

LiquidAI: LFM2-24B-A2B

LiquidAI: LFM2-24B-A2B

By Liquid

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability.

Release Date

25 Feb 2026

Context Size

32.77K

Google: Gemini 3.1 Pro Preview Custom Tools

Google: Gemini 3.1 Pro Preview Custom Tools

By Google

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

Release Date

25 Feb 2026

Context Size

1.05M

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

By nvidia

The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text...

Release Date

25 Feb 2026

Context Size

131.07K

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

By Nvidia

The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text combined. Documents can be retrieved given a user query in text form. The model supports images containing text, tables, charts, and infographics.

Release Date

25 Feb 2026

Context Size

131.07K

OpenAI: GPT-5.3-Codex

OpenAI: GPT-5.3-Codex

By OpenAI

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.

Release Date

24 Feb 2026

Context Size

400K

AionLabs: Aion-2.0

AionLabs: Aion-2.0

By Aion Labs

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.

Release Date

23 Feb 2026

Context Size

131.07K

Showing page 4 of 25 with 737 models total