Skip to content

LLM Primitives

Sema's differentiating feature: LLM operations are first-class language primitives with prompts, conversations, tools, and agents as native data types.

Setup

Set one or more API keys as environment variables:

bash
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
# or any other supported provider

Sema auto-detects and configures all available providers on startup. Use --no-llm to skip auto-configuration.

See Provider Management for the full list of supported providers and configuration options.

Features

Completion & Chat

Simple completions, multi-message chat, and streaming responses.

Prompts & Messages

Prompts as composable s-expressions, message construction, and prompt inspection.

Conversations

Persistent, immutable conversation state with automatic LLM round-trips.

Tools & Agents

Define tools the LLM can invoke, and build agents with system prompts, tools, and multi-turn loops.

Embeddings & Similarity

Generate embeddings (as bytevectors), compute cosine similarity, and access embedding dimensions.

Structured Extraction

Extract structured data from text and images, classify inputs, and work with multi-modal content.

Vector Store & Math

In-memory vector store for semantic search, plus vector math utilities (cosine similarity, dot product, normalize, distance).

Caching

In-memory LLM response caching for iterative development and deduplication.

Cassettes (Record & Replay)

Record real LLM/agent responses to a file once, then replay them deterministically — keyless, offline tests and reproducible demos.

Resilience & Retry

Fallback provider chains, rate limiting, generic retry with exponential backoff, and convenience functions (llm/summarize, llm/compare).

Provider Management

Auto-configuration, runtime provider switching, custom providers, and OpenAI-compatible endpoints.

Cost Tracking & Budgets

Usage tracking, budget enforcement, and batch/parallel operations.

Observability (OpenTelemetry)

Built-in, standards-compliant OpenTelemetry tracing + metrics for every LLM and agent run — no manual instrumentation. Each completion and tool call is auto-traced (invoke_agent → chat → execute_tool) with tokens, cost, and latency, exportable to any OTLP backend or a JSONL file. Off by default, zero-cost when off.

  • Tracing & Metrics — the GenAI spans and metrics, sessions, privacy controls, and embedding Sema in your own app.
  • Backend Compatibility — label the data so tools that use their own attribute names (Arize Phoenix, Langfuse, Traceloop, LangSmith) read it too via SEMA_OTEL_COMPAT. Most other tools work with no extra setup.