Best LLMs for Roleplay 2026 | Creative Roleplaying AI Rankings
Top LLMs for roleplay, character simulation, storytelling, and creative interactions. Real-time leaderboard focused on creativity and consistency.

DeepSeek: DeepSeek V3.2
by DeepSeek
•131.07K tokens
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)


DeepSeek: DeepSeek V4 Flash
by DeepSeek
•1.05M tokens
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance. The model includes hybrid attention for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.


Z.ai: GLM 4.5 Air (free)
by Z.ai
•131.07K tokens
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

4

xAI: Grok 4.1 Fast
by xAI
2M tokens
5

DeepSeek: DeepSeek V4 Pro
by DeepSeek
1.05M tokens
6
Google: Gemini 2.5 Flash Lite
by Google
1.05M tokens
7
Google: Gemma 4 31B (free)
by Google
262.14K tokens
8
Google: Gemini 3 Flash Preview
by Google
1.05M tokens
9

OpenAI: gpt-oss-120b (free)
by OpenAI
131.07K tokens