Skip to content

LLM Module

Converts Reddit posts into dramatic text message conversations by calling Ollama's /api/chat endpoint directly with native constrained JSON generation.

LlmService

rewriteAsConversation(post: RedditPost): Promise<Result<Conversation, LlmError>>

Sends a Reddit post to Ollama with a system prompt instructing it to rewrite the content as a two-person iMessage conversation. Uses Ollama's format parameter for token-level constrained JSON output.

Features:

  • Direct Ollama /api/chat calls (no OpenAI SDK or LiteLLM proxy)
  • Native constrained generation via Ollama's format parameter with JSON schema
  • Automatic retry (2 attempts) with escalating JSON schema feedback
  • Thinking field fallback — extracts JSON from thinking when content is empty (Qwen3.5 quirk)
  • Strips <think> tags as safety net
  • Validates output against Zod schema

Types

Conversation

typescript
interface Conversation {
  leftName: string;   // Descriptive relationship label (e.g. "My Boss", "Best Friend")
  rightName: string;  // Always "Me" — viewer perspective
  hookText?: string;  // TikTok scroll-stopping hook line
  messages: ConversationMessage[];
}

Name conventions:

  • leftName uses short descriptive relationship labels (2-3 words): "My Boss", "My Ex", "Best Friend", "My Mom", "My Roommate"
  • rightName is always "Me" — the viewer is the right-side protagonist (blue bubbles)
  • hookText is a TikTok scroll-stopper starting with "When...", "POV:", "The moment...", or a direct shocking statement

ConversationMessage

typescript
interface ConversationMessage {
  sender: 'left' | 'right';
  text: string;
}

Configuration

VariableDefaultDescription
OLLAMA_BASE_URLhttp://localhost:11434Ollama API base URL
LLM_MODELqwen3.5:9bOllama model name (colon format)
LLM_MAX_TOKENS8192Max output tokens (num_predict)
LLM_TEMPERATURE0.8Sampling temperature
LLM_TIMEOUT_MS600000Request timeout (10 min)

LLM Setup

The module calls Ollama directly — no proxy layer needed.

  1. Install Ollama
  2. Pull the model: ollama pull qwen3.5:9b
  3. Ollama runs on http://localhost:11434 by default

TIP

The module uses Ollama's native format parameter for constrained JSON generation. This provides token-level grammar enforcement, guaranteeing valid JSON structure without relying on prompt engineering alone.

WARNING

Qwen3.5 with verbose system prompts may route all output to its thinking field with empty content. The service handles this automatically via thinking field fallback + retry, but keep the system prompt concise if modifying it.

Built with VitePress