Which model should I use?

The answer to this question depends on:

How you use the tools
Your budget
Style preferences.

There’s no one-fit recommendation, so we suggest experimenting with a few models to see what works for you.

Popular Models

The model du jour changes frequently, and it’s easy to get caught up in a endless quest to find the ideal model. We suggest trying a few popular models and then just writing.

A good starting point is OpenRouter’s model rankings.

Roleplay model rankings

Recommendations

Here’s a non-exhaustive list of models that are commonly recommended for creative writing. I only include models that I’ve used personally:

OpenRouter

Claude Sonnet 4.5: State of the art model that excels at brainstorming. Has some guardrails.
Deepseek V3: One of the best mostly-uncensored models, very low cost for its size
GLM 4.6: A new model that gives good results at a lower price point than Claude Sonnet 4.5
WizardLM 2: A comparatively older model with virtually no censorship

Local Models

Gemma 3 Abliterated (local): A completely uncensored model small enough to run on high-end GPUs and M-series MacBooks
Tiefighter (local): A small local model fine-tuned for roleplay and creative writing
Midnight Miqu (local): An uncensored local model fine-tuned for storytelling. Too large for most consumer hardware.

Model Quirks

Due to differences in how models are trained, you may find that some models produce unexpected results. You can usually work around this by tuning system or user prompts.

For example, if you find your model frequently adds new chapter headers to the story text, you can update your system prompt:

Use the system and user prompts to adjust model behavior

Text vs. Chat Completions

tl;dr

You will get better results from chat models like Claude Sonnet by using the “Chat” completion type.

You can configure this in the AI -> Write -> Model settings menu.

Use the Chat completion type with chat models like Claude Sonnet

Text Completions

The Completions API (often referred to as “Text Completions”) is simple. The input to the model is a block of text. The model returns the “continuation” of this text. The Completions API is deprecated, but still widely used, especially for role-play and fiction writing.

This style of request is a good fit for a writing assistant, since the model is expected to the output text that immediately follows the prompt.

For example:

Continue writing the story from where it leaves off:

She turned her attention to the shop itself, scanning the shadows of the
cluttered room for any sign of movement. It took a few moments for her eyes to
adjust to the darkness,

The model is expected to generate the token that best continues this block of text, which in this case is a natural continuation of the story.

Prompt Formats

Completion prompts are freeform, but some models have specific expectations about how the prompt is formatted. For example, Pygamilion’s models are trained to understand tags that separate system, user, and model text.

Here’s the example from their documentation:

<|system|>This is a text adventure game. Describe the scenario to the user and
give him three options to pick from on each turn.<|user|>Start!<|model|>

Since LLMs are most commonly used as chat assistants, many models are trained on a prompt format that can be parsed into alternating user/assistant messages.

The problem is that there are are a lot of models with a lot of different prompt formats.

Chat Completions

OpenAI introduced the “Chat Completions” API, which is a structured request format that takes input as a list of messages. This means that we don’t need to convert requests into bespoke formats for every model it supports.

A chat completion request looks like this:

system: You are a helpful assistant.
user: What is a Nanaimo bar?
assistant: A Nanaimo bar is a dessert named after the city of Nanaimo, British Columbia.
user: What are the ingredients?

The model will respond with an assistant message:

assistant: A Nanaimo bar consists of three layers:
* A wafer, nut, and coconut crumb base
* Custard icing in the middle
* A layer of chocolate ganache on top

Some models are trained specifically to work with chat prompts, and only support chat completion requests. Anthropic’s family of models are a notable example. As of this writing, Claude Sonnet 4.5 is one of the best models available for writing prose.