List of All LLM Models
Discover and compare 500+ large language models with real-time rankings, benchmarks, and community votes.
Baidu Qianfan: CoBuddy (free)
By baidu
CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency, with native support for tool...
Release Date
06 May 2026
Context Size
131.07K
Google: Chirp 3
By Google
Chirp 3 is Google's latest multilingual speech-to-text model. It offers enhanced transcription accuracy across 24 GA languages and 77+ preview languages, with support for automatic language detection, automatic punctuation, and a built-in denoiser for cleaner audio processing.
Release Date
05 May 2026
Context Size
0
OpenAI: GPT-4o Mini Transcribe
By OpenAI
GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities. It's priced per token (input and output), making it suitable for high-volume transcription workflows that benefit from token-level billing transparency at a lower cost point.
Release Date
01 May 2026
Context Size
128K

OpenAI: Whisper Large V3 Turbo
By OpenAI
Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for speed and cost efficiency. It supports transcription across 99+ languages with a 12% word error rate, and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. Achieves real-time speed factors up to 216x, making it well-suited for latency-sensitive and high-throughput transcription workloads.
Release Date
01 May 2026
Context Size
0

OpenAI: Whisper Large V3
By OpenAI
Whisper Large V3 is OpenAI's open-source automatic speech recognition model offering both audio transcription and translation. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With 1,550M parameters, it achieves a 10.3% word error rate and is well-suited for noise-robust, multilingual transcription in demanding conditions. Supports timestamp granularities at word and segment levels.
Release Date
01 May 2026
Context Size
0

xAI: Grok 4.3
By xAI
Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual accuracy. Reasoning can be configured between none/low/medium/high (default low) effort levels. It supports a 1 million token context window with no output token limit, making it well-suited for long-document analysis, deep research, and multi-step agentic tasks. Pricing is tiered: requests exceeding 200k total tokens are billed at a higher rate.
Release Date
30 Apr 2026
Context Size
1M

IBM: Granite 4.1 8B
By ibm-granite
Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks including tool calling, retrieval-augmented generation (RAG), code generation with fill-in-the-middle support, text summarization, classification, and extraction. The model handles 12 languages (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) and implements OpenAI-compatible tool calling. Released under the Apache 2.0 license.
Release Date
30 Apr 2026
Context Size
131.07K

Mistral: Mistral Medium 3.5
By Mistral AI
Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex multi-step reasoning. It is particularly strong at reliable multi-tool calling and long-horizon tasks, with a 256K context window, configurable reasoning effort per request, and a custom vision encoder that handles variable image sizes and aspect ratios. Self-hostable on as few as four GPUs and available under open weights.
Release Date
30 Apr 2026
Context Size
262.14K

Kling: Video v3.0 Pro
By kwaivgi
Kling v3.0 Pro is Kuaishou's premium video generation model, offering higher visual quality than the Standard tier. It supports text-to-video and image-to-video workflows, with first-frame and last-frame control for precise scene composition. Clips range from 3 to 15 seconds in 16:9, 9:16, or 1:1 aspect ratios. Native audio generation is available as an option.
Release Date
29 Apr 2026
Context Size
0

Kling: Video v3.0 Standard
By kwaivgi
Kling v3.0 Standard is a video generation model from Kuaishou. It supports text-to-video and image-to-video workflows, with first-frame and last-frame control for guided scene composition. Clips range from 3 to 15 seconds in 16:9, 9:16, or 1:1 aspect ratios. Native audio generation is available as an option.
Release Date
29 Apr 2026
Context Size
0
Owl Alpha
By OpenRouter
Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution. Compatible with Claude Code, OpenClaw, and other mainstream productivity tools. Note: Prompts and completions may be logged by the provider and used to improve the model.
Release Date
28 Apr 2026
Context Size
1.05M

NVIDIA: Nemotron 3 Nano Omni (free)
By Nvidia
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and audio inputs and produces text output, enabling agents to perceive and reason across modalities in a single inference loop. Built on a hybrid MoE Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning versus separate vision + speech pipelines. It supports up to 300K context length and a 16,384 reasoning budget, with extended thinking enabled via reasoning.enabled on OpenRouter.
Release Date
28 Apr 2026
Context Size
256K
NVIDIA: Nemotron 3 Nano Omni (free)
By nvidia
NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...
Release Date
28 Apr 2026
Context Size
256K

Poolside: Laguna XS.2 (free)
By poolside
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering a 128K context window and up to 8K output tokens. Quantized to fp8 for fast, cost-efficient agentic coding workflows. Laguna XS.2 is designed for software engineering and agentic coding use cases, and you are responsible for confirming that it is appropriate for your intended application. Laguna XS.2 is subject to the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt), and should be used consistently with Poolside’s [Acceptable Use Policy](https://poolside.ai/legal/acceptable-use-policy). We advise against circumventing Laguna XS.2 safety guardrails without implementing substantially equivalent mitigations appropriate for your use case. Please report security vulnerabilities or safety concerns to security@poolside.ai
Release Date
28 Apr 2026
Context Size
262.14K
Poolside: Laguna XS.2 (free)
By poolside
Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...
Release Date
28 Apr 2026
Context Size
262.14K

Poolside: Laguna M.1 (free)
By poolside
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K context window and up to 8K output tokens. Quantized to fp8 for efficient inference. By using this model, you agree to Poolside’s [End User License Agreement](https://poolside.ai/legal/eula)
Release Date
28 Apr 2026
Context Size
262.14K
Poolside: Laguna M.1 (free)
By poolside
Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...
Release Date
28 Apr 2026
Context Size
262.14K
OpenAI: Whisper 1
By OpenAI
Whisper is OpenAI's open-source automatic speech recognition model, available via API as `whisper-1`. It supports transcription and translation across 50+ languages from audio files up to 25 MB. Accepts formats including mp3, mp4, wav, and webm. Priced per minute of audio duration, billed to the nearest second.
Release Date
27 Apr 2026
Context Size
0
OpenAI: GPT-4o Transcribe
By OpenAI
GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o audio capabilities. It's priced per token (input and output), making it suitable for workflows that benefit from token-level billing transparency.
Release Date
27 Apr 2026
Context Size
128K

Qwen: Qwen3.5 Plus 2026-04-20
By Qwen
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This is an updated version of Qwen3.5 Plus with tiered pricing above 256K tokens.
Release Date
27 Apr 2026
Context Size
1M

Qwen: Qwen3.6 Flash
By Qwen
Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in above 256K tokens. Prompt caching is supported, with both explicit cache read and cache creation pricing.
Release Date
27 Apr 2026
Context Size
1M

Qwen: Qwen3.6 35B A3B
By Qwen
Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated DeltaNet linear attention with standard gated attention layers, enabling efficient inference at a fraction of the compute cost. The model supports a 262K token native context window (extensible to 1M via YaRN) and accepts text, image, and video inputs. It includes integrated thinking mode with reasoning traces preserved across multi-turn conversations, function calling, and structured output. Released under the Apache 2.0 license.
Release Date
27 Apr 2026
Context Size
262.14K

Qwen: Qwen3.6 Max Preview
By Qwen
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window. The model includes an integrated thinking mode that preserves reasoning traces across multi-turn conversations and supports structured output and function calling. Access is available exclusively through the Alibaba Cloud Model Studio and Qwen Studio APIs; no open weights are provided.
Release Date
27 Apr 2026
Context Size
262.14K

Qwen: Qwen3.6 27B
By Qwen
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs — and supports a 262,144-token context window. The model is designed for agentic coding and reasoning tasks, with particular strength in repository-level code comprehension, front-end development workflows, and multi-step problem solving. It includes a built-in thinking mode for extended reasoning and preserves thinking context across conversation history. Qwen3.6 27B supports 201 languages and dialects and is released under the Apache 2.0 license.
Release Date
27 Apr 2026
Context Size
262.14K
OpenAI: GPT-5.5 Pro
By OpenAI
GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, and is designed for long-horizon problem solving, agentic coding, and precise execution across multi-step workflows.
Release Date
24 Apr 2026
Context Size
1.05M
OpenAI: GPT-5.5
By OpenAI
GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling large-scale reasoning, coding, and multimodal workflows within a single system.
Release Date
24 Apr 2026
Context Size
1.05M

DeepSeek: DeepSeek V4 Pro
By DeepSeek
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across knowledge, math, and software engineering benchmarks. Built on the same architecture as DeepSeek V4 Flash, it introduces a hybrid attention system for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for complex workloads such as full-codebase analysis, multi-step automation, and large-scale information synthesis, where both capability and efficiency are critical.
Release Date
24 Apr 2026
Context Size
1.05M
DeepSeek: DeepSeek V4 Flash (free)
By DeepSeek
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance. The model includes hybrid attention for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.
Release Date
24 Apr 2026
Context Size
1.05M
DeepSeek: DeepSeek V4 Flash (free)
By deepseek
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
Release Date
24 Apr 2026
Context Size
1.05M
Google: Gemini 3.1 Flash TTS Preview
By Google
Gemini 3.1 Flash TTS Preview is a text-to-speech model from Google, and a substantial generational step up from Gemini 2.5 Flash TTS. It takes text input and produces audio output across 70+ languages — nearly 3× the language coverage of its predecessor. The headline addition is a system of 200+ inline audio tags (e.g. `[whispers]`, `[laughs]`, `[excited]`) that let developers steer delivery, emotion, and pacing mid-sentence, alongside a "director's chair" workflow in Google AI Studio for defining per-character Audio Profiles and scene-level context. It supports up to two speakers with independent voice and style configuration per speaker, outputs PCM audio at 24 kHz / 16-bit mono, and automatically watermarks all output with SynthID. Context window is 32k tokens.
Release Date
24 Apr 2026
Context Size
8.19K
Showing page 2 of 26 with 762 models total