List of All LLM Models

Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.

Baidu Qianfan: CoBuddy (free)

Baidu Qianfan: CoBuddy (free)

By baidu

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency, with native support for tool...

Release Date

06 May 2026

Context Size

131.07K

Google: Chirp 3

Google: Chirp 3

By Google

Chirp 3 is Google's latest multilingual speech-to-text model. It offers enhanced transcription accuracy across 24 GA languages and 77+ preview languages, with support for automatic language detection, automatic punctuation, and a built-in denoiser for cleaner audio processing.

Release Date

05 May 2026

Context Size

0

OpenAI: GPT-4o Mini Transcribe

OpenAI: GPT-4o Mini Transcribe

By OpenAI

GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities. It's priced per token (input and output), making it suitable for high-volume transcription workflows that benefit from token-level billing transparency at a lower cost point.

Release Date

01 May 2026

Context Size

128K

OpenAI: Whisper Large V3 Turbo

OpenAI: Whisper Large V3 Turbo

By OpenAI

Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for speed and cost efficiency. It supports transcription across 99+ languages with a 12% word error rate, and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. Achieves real-time speed factors up to 216x, making it well-suited for latency-sensitive and high-throughput transcription workloads.

Release Date

01 May 2026

Context Size

0

OpenAI: Whisper Large V3

OpenAI: Whisper Large V3

By OpenAI

Whisper Large V3 is OpenAI's open-source automatic speech recognition model offering both audio transcription and translation. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With 1,550M parameters, it achieves a 10.3% word error rate and is well-suited for noise-robust, multilingual transcription in demanding conditions. Supports timestamp granularities at word and segment levels.

Release Date

01 May 2026

Context Size

0

xAI: Grok 4.3

xAI: Grok 4.3

By xAI

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual accuracy. Reasoning can be configured between none/low/medium/high (default low) effort levels. It supports a 1 million token context window with no output token limit, making it well-suited for long-document analysis, deep research, and multi-step agentic tasks. Pricing is tiered: requests exceeding 200k total tokens are billed at a higher rate.

Release Date

30 Apr 2026

Context Size

1M

IBM: Granite 4.1 8B

IBM: Granite 4.1 8B

By ibm-granite

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks including tool calling, retrieval-augmented generation (RAG), code generation with fill-in-the-middle support, text summarization, classification, and extraction. The model handles 12 languages (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) and implements OpenAI-compatible tool calling. Released under the Apache 2.0 license.

Release Date

30 Apr 2026

Context Size

131.07K

Mistral: Mistral Medium 3.5

Mistral: Mistral Medium 3.5

By Mistral AI

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex multi-step reasoning. It is particularly strong at reliable multi-tool calling and long-horizon tasks, with a 256K context window, configurable reasoning effort per request, and a custom vision encoder that handles variable image sizes and aspect ratios. Self-hostable on as few as four GPUs and available under open weights.

Release Date

30 Apr 2026

Context Size

262.14K

Kling: Video v3.0 Pro

Kling: Video v3.0 Pro

By kwaivgi

Kling v3.0 Pro is Kuaishou's premium video generation model, offering higher visual quality than the Standard tier. It supports text-to-video and image-to-video workflows, with first-frame and last-frame control for precise scene composition. Clips range from 3 to 15 seconds in 16:9, 9:16, or 1:1 aspect ratios. Native audio generation is available as an option.

Release Date

29 Apr 2026

Context Size

0

Kling: Video v3.0 Standard

Kling: Video v3.0 Standard

By kwaivgi

Kling v3.0 Standard is a video generation model from Kuaishou. It supports text-to-video and image-to-video workflows, with first-frame and last-frame control for guided scene composition. Clips range from 3 to 15 seconds in 16:9, 9:16, or 1:1 aspect ratios. Native audio generation is available as an option.

Release Date

29 Apr 2026

Context Size

0

Owl Alpha

Owl Alpha

By OpenRouter

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution. Compatible with Claude Code, OpenClaw, and other mainstream productivity tools. Note: Prompts and completions may be logged by the provider and used to improve the model.

Release Date

28 Apr 2026

Context Size

1.05M

NVIDIA: Nemotron 3 Nano Omni (free)

NVIDIA: Nemotron 3 Nano Omni (free)

By Nvidia

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.enabled on OpenRouter.

Release Date

28 Apr 2026

Context Size

256K

NVIDIA: Nemotron 3 Nano Omni (free)

NVIDIA: Nemotron 3 Nano Omni (free)

By nvidia

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Release Date

28 Apr 2026

Context Size

256K

Poolside: Laguna XS.2 (free)

Poolside: Laguna XS.2 (free)

By poolside

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering a 128K context window and up to 8K output tokens. Quantized to fp8 for fast, cost-efficient agentic coding workflows. Laguna XS.2 is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna XS.2 is subject to the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt), and should be used consistently with Poolside’s [Acceptable Use Policy](https://poolside.ai/legal/acceptable-use-policy). We advise against circumventing Laguna XS.2 safety guardrails without implementing substantially equivalent mitigations appropriate for your use case. Please report security vulnerabilities or safety concerns to security@poolside.ai

Release Date

28 Apr 2026

Context Size

262.14K

Poolside: Laguna XS.2 (free)

Poolside: Laguna XS.2 (free)

By poolside

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...

Release Date

28 Apr 2026

Context Size

262.14K

Poolside: Laguna M.1 (free)

Poolside: Laguna M.1 (free)

By poolside

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K context window and up to 8K output tokens. Quantized to fp8 for efficient inference. By using this model, you agree to Poolside’s [End User License Agreement](https://poolside.ai/legal/eula)

Release Date

28 Apr 2026

Context Size

262.14K

Poolside: Laguna M.1 (free)

Poolside: Laguna M.1 (free)

By poolside

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...

Release Date

28 Apr 2026

Context Size

262.14K

OpenAI: Whisper 1

OpenAI: Whisper 1

By OpenAI

Whisper is OpenAI's open-source automatic speech recognition model, available via API as `whisper-1`. It supports transcription and translation across 50+ languages from audio files up to 25 MB. Accepts formats including mp3, mp4, wav, and webm. Priced per minute of audio duration, billed to the nearest second.

Release Date

27 Apr 2026

Context Size

0

OpenAI: GPT-4o Transcribe

OpenAI: GPT-4o Transcribe

By OpenAI

GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o audio capabilities. It's priced per token (input and output), making it suitable for workflows that benefit from token-level billing transparency.

Release Date

27 Apr 2026

Context Size

128K

Qwen: Qwen3.5 Plus 2026-04-20

Qwen: Qwen3.5 Plus 2026-04-20

By Qwen

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This is an updated version of Qwen3.5 Plus with tiered pricing above 256K tokens.

Release Date

27 Apr 2026

Context Size

1M

Qwen: Qwen3.6 Flash

Qwen: Qwen3.6 Flash

By Qwen

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in above 256K tokens. Prompt caching is supported, with both explicit cache read and cache creation pricing.

Release Date

27 Apr 2026

Context Size

1M

Qwen: Qwen3.6 35B A3B

Qwen: Qwen3.6 35B A3B

By Qwen

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated DeltaNet linear attention with standard gated attention layers, enabling efficient inference at a fraction of the compute cost. The model supports a 262K token native context window (extensible to 1M via YaRN) and accepts text, image, and video inputs. It includes integrated thinking mode with reasoning traces preserved across multi-turn conversations, function calling, and structured output. Released under the Apache 2.0 license.

Release Date

27 Apr 2026

Context Size

262.14K

Qwen: Qwen3.6 Max Preview

Qwen: Qwen3.6 Max Preview

By Qwen

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window. The model includes an integrated thinking mode that preserves reasoning traces across multi-turn conversations and supports structured output and function calling. Access is available exclusively through the Alibaba Cloud Model Studio and Qwen Studio APIs; no open weights are provided.

Release Date

27 Apr 2026

Context Size

262.14K

Qwen: Qwen3.6 27B

Qwen: Qwen3.6 27B

By Qwen

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs — and supports a 262,144-token context window. The model is designed for agentic coding and reasoning tasks, with particular strength in repository-level code comprehension, front-end development workflows, and multi-step problem solving. It includes a built-in thinking mode for extended reasoning and preserves thinking context across conversation history. Qwen3.6 27B supports 201 languages and dialects and is released under the Apache 2.0 license.

Release Date

27 Apr 2026

Context Size

262.14K

OpenAI: GPT-5.5 Pro

OpenAI: GPT-5.5 Pro

By OpenAI

GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, and is designed for long-horizon problem solving, agentic coding, and precise execution across multi-step workflows.

Release Date

24 Apr 2026

Context Size

1.05M

OpenAI: GPT-5.5

OpenAI: GPT-5.5

By OpenAI

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling large-scale reasoning, coding, and multimodal workflows within a single system.

Release Date

24 Apr 2026

Context Size

1.05M

DeepSeek: DeepSeek V4 Pro

DeepSeek: DeepSeek V4 Pro

By DeepSeek

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across knowledge, math, and software engineering benchmarks. Built on the same architecture as DeepSeek V4 Flash, it introduces a hybrid attention system for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for complex workloads such as full-codebase analysis, multi-step automation, and large-scale information synthesis, where both capability and efficiency are critical.

Release Date

24 Apr 2026

Context Size

1.05M

DeepSeek: DeepSeek V4 Flash (free)

DeepSeek: DeepSeek V4 Flash (free)

By DeepSeek

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance. The model includes hybrid attention for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.

Release Date

24 Apr 2026

Context Size

1.05M

DeepSeek: DeepSeek V4 Flash (free)

DeepSeek: DeepSeek V4 Flash (free)

By deepseek

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

Release Date

24 Apr 2026

Context Size

1.05M

Google: Gemini 3.1 Flash TTS Preview

Google: Gemini 3.1 Flash TTS Preview

By Google

Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defining per-character Audio Profiles and scene-level context. It supports up to two speakers with independent voice and style configuration per speaker, outputs PCM audio at 24 kHz / 16-bit mono, and automatically watermarks all output with SynthID. Context window is 32k tokens.

Release Date

24 Apr 2026

Context Size

8.19K

Showing page 2 of 26 with 762 models total