LLM Module
Converts Reddit posts into dramatic text message conversations by calling Ollama's /api/chat endpoint directly with native constrained JSON generation.
LlmService
rewriteAsConversation(post: RedditPost): Promise<Result<Conversation, LlmError>>
Sends a Reddit post to Ollama with a system prompt instructing it to rewrite the content as a two-person iMessage conversation. Uses Ollama's format parameter for token-level constrained JSON output.
Features:
- Direct Ollama
/api/chatcalls (no OpenAI SDK or LiteLLM proxy) - Native constrained generation via Ollama's
formatparameter with JSON schema - Automatic retry (2 attempts) with escalating JSON schema feedback
- Thinking field fallback — extracts JSON from
thinkingwhencontentis empty (Qwen3.5 quirk) - Strips
<think>tags as safety net - Validates output against Zod schema
Types
Conversation
interface Conversation {
leftName: string; // Descriptive relationship label (e.g. "My Boss", "Best Friend")
rightName: string; // Always "Me" — viewer perspective
hookText?: string; // TikTok scroll-stopping hook line
messages: ConversationMessage[];
}Name conventions:
leftNameuses short descriptive relationship labels (2-3 words): "My Boss", "My Ex", "Best Friend", "My Mom", "My Roommate"rightNameis always "Me" — the viewer is the right-side protagonist (blue bubbles)hookTextis a TikTok scroll-stopper starting with "When...", "POV:", "The moment...", or a direct shocking statement
ConversationMessage
interface ConversationMessage {
sender: 'left' | 'right';
text: string;
}Configuration
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL | http://localhost:11434 | Ollama API base URL |
LLM_MODEL | qwen3.5:9b | Ollama model name (colon format) |
LLM_MAX_TOKENS | 8192 | Max output tokens (num_predict) |
LLM_TEMPERATURE | 0.8 | Sampling temperature |
LLM_TIMEOUT_MS | 600000 | Request timeout (10 min) |
LLM Setup
The module calls Ollama directly — no proxy layer needed.
- Install Ollama
- Pull the model:
ollama pull qwen3.5:9b - Ollama runs on
http://localhost:11434by default
TIP
The module uses Ollama's native format parameter for constrained JSON generation. This provides token-level grammar enforcement, guaranteeing valid JSON structure without relying on prompt engineering alone.
WARNING
Qwen3.5 with verbose system prompts may route all output to its thinking field with empty content. The service handles this automatically via thinking field fallback + retry, but keep the system prompt concise if modifying it.