Completion & Chat
Completion
llm/complete
Send a single prompt string and get a completion back.
;; Simple completion
(llm/complete "Say hello in 5 words" {:max-tokens 50})With options:
(llm/complete "Explain monads"
{:model "claude-haiku-4-5-20251001"
:max-tokens 200
:temperature 0.3
:system "You are a Haskell expert."})llm/stream
Stream a completion, printing chunks as they arrive.
(llm/stream "Tell me a story" {:max-tokens 200})With a callback function:
(llm/stream "Tell me a story"
(fn (chunk) (display chunk))
{:max-tokens 200})llm/stream returns the full accumulated response string once streaming finishes — so you can show the live stream and keep the final text:
(define story
(llm/stream "Tell me a story" (fn (c) (display c)) {:max-tokens 200}))
;; `story` is the complete text after the stream ends.Chat
llm/chat
Send a list of messages and get a response. Supports system, user, and assistant messages.
(llm/chat
(list (message :system "You are a helpful assistant.")
(message :user "What is Lisp? One sentence."))
{:max-tokens 100})When you pass :tools, llm/chat runs the tool-execution loop for you (see Tools & Agents). Two options bound it: :tool-mode :none lets the model see the tools but never auto-executes them, and :max-tool-rounds N caps the loop (default 10).
Multi-Modal Chat
Send messages that include images alongside text using message/with-image.
;; Load an image and ask the LLM about it
(define img (file/read-bytes "photo.jpg"))
(define msg (message/with-image :user "Describe this image." img))
(llm/chat (list msg))Combine with regular messages:
(llm/chat
(list (message :system "You are an image analyst.")
(message/with-image :user "What text is in this image?" (file/read-bytes "doc.png"))))The image must be a bytevector. Media type (PNG, JPEG, GIF, WebP, PDF) is detected automatically from magic bytes. See Vision Extraction for structured data extraction from images.
llm/send
Send a prompt value (composed from prompt expressions) to the LLM.
(define review-prompt
(prompt
(system "You are a code reviewer. Be concise.")
(user "Review this function.")))
(llm/send review-prompt {:max-tokens 200})Options
All completion and chat functions accept an options map with these keys:
| Key | Description |
|---|---|
:model | Model name (e.g. "claude-haiku-4-5-20251001") |
:max-tokens | Maximum tokens in response |
:temperature | Sampling temperature (0.0–1.0) |
:system | System prompt (for llm/complete) |
:reasoning-effort | Reasoning effort for thinking models — see below |
:tools | List of tool values (see Tools & Agents) |
:timeout | Per-call HTTP timeout in milliseconds (network providers; non-streaming) |
:tags / :metadata | Observability tags/metadata — see Backend Compatibility |
Reasoning effort
:reasoning-effort controls how much a reasoning/thinking model deliberates before answering. It takes a keyword or string: :minimal, :low, :medium, :high, :none, or :xhigh. It is a single portable option — Sema maps it to each provider's native control, so the same code works everywhere:
(llm/complete "Prove that sqrt(2) is irrational."
{:model "gpt-5.4-mini" :reasoning-effort :high :max-tokens 4000})| Provider | Mapped to |
|---|---|
| OpenAI | native reasoning_effort (gpt-5 / o-series) |
| Anthropic | extended thinking — effort sets the thinking budget_tokens (and raises max_tokens above it; temperature is forced to default while thinking) |
| Gemini | thinkingConfig.thinkingBudget (:none/:minimal disable thinking) |
Models and providers that don't support reasoning effort ignore the option (no-op). It is also accepted by llm/chat and per-run on agent/run ({:reasoning-effort :high}).