--- url: 'https://sema-lang.com/docs/stdlib.md' --- # Standard Library Sema ships with a **comprehensive standard library** of built-in functions across many modules, covering everything from string manipulation and file I/O to HTTP requests, regex, and cryptographic hashing. ## Naming Conventions Sema's stdlib follows consistent naming patterns: | Pattern | Convention | Example | | ----------------- | -------------------- | -------------------------------------- | | `module/function` | Slash-namespaced | `string/trim`, `file/read`, `math/gcd` | | `legacy-name` | Scheme compat aliases | `string-append` → `string/append` | | `type->type` | Arrow conversions | `string/to-symbol`, `list->vector` | | `predicate?` | Predicate suffix | `null?`, `list?`, `even?` | ### Naming aliases Several functions are registered under both a legacy (Scheme-style) name and a canonical slash-namespaced or `predicate?` name (Decision #24). Both forms are kept for backward compatibility; new code should prefer the canonical form on the right. | Legacy name | Canonical alias | | --------------------- | ---------------------- | | `any` | `any?` | | `every` | `every?` | | `time-ms` | `time/now-ms` | | `hash-map` | `map/new` | | `promise-forced?` | `async/forced?` | | `tools->routes` | `route/from-tools` | | `make-bytevector` | `bytevector/make` (also `bytevector/new`) | | `bytevector-length` | `bytevector/length` | | `bytevector-u8-ref` | `bytevector/u8-ref` (also `bytevector/ref`) | | `bytevector-u8-set!` | `bytevector/u8-set!` (also `bytevector/set!`) | | `bytevector-copy` | `bytevector/copy` | | `bytevector-append` | `bytevector/append` | | `bytevector->list` | `bytevector/to-list` | | `list->bytevector` | `bytevector/from-list` (also `list/to-bytevector`) | Predicates (`bytevector?` etc.) and the bare `bytevector` varargs constructor keep their short canonical names — predicates always stay un-namespaced. ## Quick Reference ### [Math & Arithmetic](./math) | Function | Description | | ------------------------------------------------------------------------------ | ------------------------- | | `+`, `-`, `*`, `/`, `mod` | Basic arithmetic | | `<`, `>`, `<=`, `>=`, `=` | Comparison | | `abs`, `min`, `max`, `pow`, `sqrt`, `log` | Numeric utilities | | `floor`, `ceil`, `round`, `truncate` | Rounding | | `sin`, `cos`, `math/tan` | Trigonometry | | `math/asin`, `math/acos`, `math/atan`, `math/atan2` | Inverse trig | | `math/sinh`, `math/cosh`, `math/tanh` | Hyperbolic | | `math/exp`, `math/log10`, `math/log2` | Exponential & logarithmic | | `math/gcd`, `math/lcm`, `math/quotient`, `math/remainder` | Integer math | | `math/random`, `math/random-int` | Random numbers | | `math/clamp`, `math/sign`, `math/lerp`, `math/map-range` | Interpolation & clamping | | `math/degrees->radians`, `math/radians->degrees` | Angle conversion | | `even?`, `odd?`, `positive?`, `negative?`, `zero?` | Numeric predicates | | `math/nan?`, `math/infinite?` | Float predicates | | `pi`, `e`, `math/infinity`, `math/nan` | Constants | | `bit/and`, `bit/or`, `bit/xor`, `bit/not`, `bit/shift-left`, `bit/shift-right` | Bitwise operations | ### [Strings & Characters](./strings) | Function | Description | | ----------------------------------------------------------------------------------- | ------------------------- | | `string/append`, `string/length`, `string/ref`, `string/slice` | Core string ops | | `str`, `format` | Conversion & formatting | | `string/split`, `string/join`, `string/trim` | Split, join, trim | | `string/upper`, `string/lower`, `string/capitalize`, `string/title-case` | Case conversion | | `string/contains?`, `string/starts-with?`, `string/ends-with?` | Search predicates | | `string/replace`, `string/index-of`, `string/last-index-of`, `string/reverse` | Manipulation | | `string/chars`, `string/repeat`, `string/pad-left`, `string/pad-right` | Utilities | | `string/map`, `string/number?`, `string/empty?` | Higher-order & predicates | | `string/after`, `string/before`, `string/between`, `string/take` | Slicing & extraction | | `string/chop-start`, `string/chop-end`, `string/ensure-start`, `string/ensure-end` | Prefix & suffix | | `string/wrap`, `string/unwrap`, `string/remove` | Wrapping & removal | | `string/replace-first`, `string/replace-last` | Targeted replacement | | `string/snake-case`, `string/kebab-case`, `string/camel-case`, `string/pascal-case` | Case conversion | | `string/headline`, `string/words` | Headline & word splitting | | `char/to-integer`, `integer/to-char`, `char/alphabetic?`, ... | Character operations | | `string/to-number`, `number/to-string`, `string/to-symbol`, ... | Type conversions | ### [Lists](./lists) | Function | Description | | ----------------------------------------------------------------------- | ----------------------------- | | `list`, `cons`, `car`, `cdr`, `first`, `rest` | Construction & access | | `cadr`, `caddr`, `last`, `nth` | Positional access | | `length`, `append`, `reverse`, `range` | Basic operations | | `map`, `filter`, `foldl`, `foldr`, `reduce`, `flat-map` | Higher-order functions | | `sort`, `sort-by`, `apply`, `for-each` | Ordering & application | | `take`, `drop`, `flatten`, `flatten-deep`, `zip`, `partition` | Sublists | | `member`, `any`, `every`, `list/index-of`, `list/unique`, `list/dedupe` | Searching | | `list/group-by`, `list/interleave`, `list/chunk`, `frequencies` | Grouping | | `list/sum`, `list/min`, `list/max` | Aggregation | | `list/shuffle`, `list/pick` | Random | | `list/repeat`, `make-list`, `iota` | Construction | | `list/split-at`, `list/take-while`, `list/drop-while` | Splitting | | `assoc`, `assq`, `assv` | Association lists | | `interpose` | Interleaving | | `list/reject`, `list/find`, `list/sole` | Filtering & searching | | `list/pluck`, `list/key-by` | Map extraction | | `list/avg`, `list/median`, `list/mode` | Statistics | | `list/diff`, `list/intersect`, `list/duplicates` | Set operations | | `list/sliding`, `list/page`, `list/cross-join` | Windowing & pagination | | `list/pad`, `list/join`, `list/times` | Padding, joining & generation | | `tap` | Utility | ### [Vectors](./vectors) | Function | Description | | ------------------------------ | --------------- | | `vector` | Create a vector | | `vector->list`, `list->vector` | Conversion | ### [Maps & HashMaps](./maps) | Function | Description | | -------------------------------------------------- | ---------------------------- | | `map/new`, `get`, `assoc`, `dissoc`, `merge` | Core map ops | | `keys`, `vals`, `contains?`, `count` | Inspection | | `map/entries`, `map/from-entries` | Entry conversion | | `map/map-vals`, `map/map-keys`, `map/filter` | Higher-order | | `map/select-keys`, `map/update` | Selection & update | | `map/sort-keys`, `map/except`, `map/zip` | Sorting, exclusion & zipping | | `hashmap/new`, `hashmap/get`, `hashmap/assoc`, ... | HashMap operations | ### [Predicates & Type Checking](./predicates) | Function | Description | | ----------------------------------------------------------------- | --------------------- | | `null?`, `nil?`, `empty?`, `list?`, `pair?` | Collection predicates | | `number?`, `integer?`, `float?`, `string?`, `symbol?`, `keyword?` | Type predicates | | `char?`, `record?`, `bytevector?`, `bool?`, `fn?` | More type predicates | | `map?`, `vector?` | Container predicates | | `promise?`, `promise-forced?` | Promise predicates | | `eq?`, `=`, `zero?`, `even?`, `odd?`, `positive?`, `negative?` | Equality & numeric | | `prompt?`, `message?`, `conversation?`, `tool?`, `agent?` | LLM type predicates | ### [File I/O & Paths](./file-io) | Function | Description | | ------------------------------------------------------------------------------------------------------------- | ---------------------------- | | `display`, `println`, `pprint`, `print`, `io/print-error`, `io/println-error`, `newline`, `io/read-line`, `io/read-stdin`, `io/eof?`, `io/flush` | Console I/O | | `file/read`, `file/write`, `file/append` | File read/write | | `file/read-bytes`, `file/write-bytes` | Binary file I/O | | `file/read-lines`, `file/write-lines` | Line-based I/O | | `file/for-each-line`, `file/fold-lines` | Streaming line I/O | | `file/delete`, `file/rename`, `file/copy` | File management | | `file/exists?`, `file/is-file?`, `file/is-directory?`, `file/is-symlink?` | File predicates | | `file/list`, `file/mkdir`, `file/info` | Directory operations | | `file/glob` | File globbing | | `path/join`, `path/dirname`, `path/basename`, `path/extension`, `path/absolute` | Path manipulation | | `path/ext`, `path/stem`, `path/dir`, `path/filename`, `path/absolute?` | Path predicates & components | ### [PDF Processing](./pdf) | Function | Description | | ------------------------ | ----------------------------------------------------- | | `pdf/extract-text` | Extract all text from a PDF | | `pdf/extract-text-pages` | Extract text per page (returns list) | | `pdf/page-count` | Get number of pages | | `pdf/metadata` | Get metadata map (`:title`, `:author`, `:pages`, ...) | ### [HTTP & JSON](./http-json) | Function | Description | | ------------------------------------------------------------------ | ------------------ | | `http/get`, `http/post`, `http/put`, `http/delete`, `http/request` | HTTP methods | | `json/encode`, `json/encode-pretty`, `json/decode` | JSON serialization | ### [Web Server](./web-server) | Function | Description | | -------------------------------------------------------------------------------- | ---------------------- | | `http/serve` | Start an HTTP server | | `http/router` | Data-driven routing | | `http/ok`, `http/created`, `http/no-content`, `http/not-found`, `http/error` | JSON response helpers | | `http/redirect` | HTTP redirect | | `http/html`, `http/text` | Content-type responses | | `http/file` | Serve a file from disk | | `http/stream` | SSE streaming | | `http/websocket` | WebSocket connections | ### [Regex](./regex) | Function | Description | | --------------------------------------------------- | ----------------------- | | `regex/match?`, `regex/match`, `regex/find-all` | Matching | | `regex/replace`, `regex/replace-all`, `regex/split` | Replacement & splitting | ### [CSV, Crypto & Encoding](./csv) | Function | Description | | --------------------------------------------- | --------------- | | `csv/parse`, `csv/parse-maps`, `csv/encode` | CSV operations | | `uuid/v4` | UUID generation | | `base64/encode`, `base64/decode` | Base64 encoding | | `base64/encode-bytes`, `base64/decode-bytes` | Binary Base64 | | `hash/sha256`, `hash/md5`, `hash/hmac-sha256` | Hashing | ### [Date & Time](./datetime) | Function | Description | | --------------------------- | -------------------- | | `time/now`, `time-ms` | Current time | | `time/format`, `time/parse` | Formatting & parsing | | `time/date-parts` | Date decomposition | | `time/add`, `time/diff` | Arithmetic | | `sleep` | Delay execution | ### [System](./system) | Function | Description | | ----------------------------------------------------------- | --------------------- | | `env`, `sys/env-all`, `sys/set-env` | Environment variables | | `sys/args`, `sys/cwd`, `sys/platform`, `sys/os`, `sys/arch` | System info | | `sys/pid`, `sys/tty`, `sys/which`, `sys/elapsed` | Process info | | `sys/interactive?`, `sys/hostname`, `sys/user` | Session info | | `sys/home-dir`, `sys/temp-dir` | Directory paths | | `sys/term-size` | Terminal size (Unix) | | `sys/on-signal`, `sys/check-signals` | Signal hooks (Unix) | | `shell` | Run shell commands | | `exit` | Exit process | ### [Serial Ports](./serial) | Function | Description | | ---------------------------------------------------------- | ---------------------------------------- | | `serial/list` | List available device paths | | `serial/open`, `serial/close` | Open/close a port (returns int handle) | | `serial/write`, `serial/read-line` | Raw I/O | | `serial/send` | Write line + read JSON response | ### [Bytevectors](./bytevectors) | Function | Description | | -------------------------------------------------------------- | ----------------- | | `bytevector`, `bytevector/new` | Construction | | `bytevector/length`, `bytevector/ref`, `bytevector/set!` | Access & mutation | | `bytevector/copy`, `bytevector/append` | Copy & append | | `bytevector/to-list`, `list/to-bytevector` | List conversion | | `utf8/to-string`, `string/to-utf8` | String conversion | ### [Streams](./streams) | Function | Description | | --------------------------------------------------------------------- | ------------------------ | | `stream/from-string`, `stream/from-bytes`, `stream/byte-buffer` | In-memory streams | | `stream/open-input`, `stream/open-output` | File streams | | `stream/read`, `stream/read-byte`, `stream/read-line`, `stream/read-all` | Reading | | `stream/write`, `stream/write-byte`, `stream/write-string` | Writing | | `stream/close`, `stream/flush`, `stream/copy` | Control | | `stream?`, `stream/readable?`, `stream/writable?`, `stream/available?` | Predicates | | `stream/type`, `stream/to-bytes`, `stream/to-string` | Introspection & extraction | | `*stdin*`, `*stdout*`, `*stderr*` | Standard I/O globals | | `with-stream` | Resource management macro | ### [Concurrency](./concurrency) | Function | Description | | --------------------------------------------------------------------- | ------------------------ | | `async/spawn`, `async/await`, `async/all`, `async/race` | Async task management | | `async/resolved`, `async/rejected` | Pre-settled promises | | `async/run`, `async/sleep`, `async/timeout` | Scheduler control & deadlines | | `async/cancel`, `async/cancelled?` | Cancellation | | `async/promise?`, `async/resolved?`, `async/rejected?`, `async/pending?` | Promise predicates | | `channel/new`, `channel/send`, `channel/recv`, `channel/try-recv` | Channel operations | | `channel/close` | Channel lifecycle | | `channel?`, `channel/closed?`, `channel/empty?`, `channel/full?`, `channel/count` | Channel predicates | ### [Records](./records) | Function | Description | | -------------------- | -------------------- | | `define-record-type` | Define a record type | | `record?` | Record predicate | | `type` | Get record type tag | ### [Terminal Styling](./terminal) | Function | Description | | ---------------------------------------------------------------- | ----------------------------------- | | `term/bold`, `term/red`, `term/green`, ... | Individual style functions | | `term/style` | Apply multiple styles with keywords | | `term/rgb` | 24-bit true color | | `term/strip` | Remove ANSI escape codes | | `term/spinner-start`, `term/spinner-stop`, `term/spinner-update` | Animated spinners | | `io/tty-raw!`, `io/tty-restore!` | Raw-mode TTY (Unix) | | `io/read-key`, `io/read-key-timeout` | Per-keystroke input (Unix) | ### [Text Processing](./text-processing) | Function | Description | | ------------------------------------------------------------------------- | ------------------------------------------ | | `text/chunk`, `text/chunk-by-separator`, `text/split-sentences` | Text chunking | | `text/clean-whitespace`, `text/strip-html` | Text cleaning | | `text/truncate`, `text/word-count`, `text/trim-indent` | Text utilities | | `text/excerpt`, `text/normalize-newlines` | Excerpt extraction & newline normalization | | `prompt/template`, `prompt/render` | Prompt templates | | `document/create`, `document/text`, `document/metadata`, `document/chunk` | Document metadata | ### [SQLite](./sqlite) | Function | Description | | -------------------------------- | ------------------------------ | | `db/open`, `db/open-memory` | Open file or in-memory database | | `db/exec`, `db/exec-batch` | Execute statements | | `db/query`, `db/query-one` | Query rows as maps | | `db/last-insert-id` | Last inserted rowid | | `db/tables` | List tables | | `db/close` | Close connection | ### [Typed Arrays](./typed-arrays) | Function | Description | | --------------------------------------------------------- | ---------------------- | | `f64-array`, `i64-array` | Create from values | | `f64-array/make`, `i64-array/make` | Create with fill | | `f64-array/range`, `i64-array/range` | Create from range | | `f64-array/from-list`, `i64-array/from-list` | Convert from list | | `f64-array/ref`, `i64-array/ref` | Index access | | `f64-array/set!`, `i64-array/set!` | Set element (CoW) | | `f64-array/length`, `i64-array/length` | Length | | `f64-array/sum`, `i64-array/sum` | Fast sum | | `f64-array/dot` | Dot product | | `f64-array/map`, `i64-array/map` | Map over elements | | `f64-array/fold`, `i64-array/fold` | Fold over elements | | `f64-array?`, `i64-array?` | Type predicates | ### [Context](./context) | Function | Description | | ----------------------------------------------------------------- | ---------------------------------------- | | `context/set`, `context/get`, `context/has?` | Core key-value context | | `context/remove`, `context/pull`, `context/all` | Retrieval & cleanup | | `context/merge`, `context/clear` | Bulk operations | | `context/with` | Scoped overrides (auto-restores on exit) | | `context/push`, `context/stack`, `context/pop` | Named stacks | | `context/set-hidden`, `context/get-hidden`, `context/has-hidden?` | Hidden (non-logged) context | ### [Key-Value Store](./kv-store) | Function | Description | | ------------------------------- | ------------------------------ | | `kv/open`, `kv/close` | Open/close a JSON-backed store | | `kv/get`, `kv/set`, `kv/delete` | CRUD operations | | `kv/keys` | List all keys | ### [TOML](./toml) | Function | Description | | ------------------------------- | -------------------- | | `toml/decode` | Decode TOML to Sema | | `toml/encode` | Encode Sema to TOML | ### [Playground & WASM](./playground) | Function | Description | | --------------------- | ------------------------------------------------------ | | `web/user-agent` | Browser user agent string (WASM only) | | `web/user-agent-data` | Structured browser info map (Chromium only, WASM only) | --- --- url: 'https://sema-lang.com/docs/stdlib/math.md' --- # Math & Arithmetic ## Domain & error policy Sema's numeric error behavior follows one rule, split by type: * **Integer division or modulo by zero raises an error.** `(/ 1 0)`, `(modulo 7 0)`, and `(mod 7 0)` all raise (`division by zero` / `modulo by zero`). Integers have no infinity or NaN to return, so the failure surfaces where it happens. * **Floating-point follows IEEE 754** — overflow and undefined real-domain results return `inf`, `-inf`, or `NaN` instead of raising: ```sema (/ 1.0 0) ; => inf (/ -1.0 0) ; => -inf (/ 0.0 0.0) ; => NaN (sqrt -1) ; => NaN (log 0) ; => -inf (log -1) ; => NaN (pow 0 0) ; => 1 (pow 2 -1) ; => 0.5 ``` This matches the hardware and mainstream numeric languages, so `NaN` propagates and `inf` accumulates rather than forcing error handling around every operation. If you need to reject these, test with `math/nan?` / `math/infinite?` explicitly. > **Integer overflow wraps** (two's-complement) — Sema does not yet have arbitrary-precision integers, so e.g. `(+ 9223372036854775807 1)` wraps to a negative number rather than raising or promoting. (See ADR #64.) ## Basic Arithmetic ### `+` Add numbers together. Accepts any number of arguments. ```sema (+ 1 2 3) ; => 6 (+ 10) ; => 10 (+) ; => 0 ``` ### `-` Subtract numbers. With one argument, negates. With multiple, subtracts left to right. ```sema (- 10 3) ; => 7 (- 10 3 2) ; => 5 (- 5) ; => -5 ``` ### `*` Multiply numbers together. ```sema (* 4 5) ; => 20 (* 2 3 4) ; => 24 (*) ; => 1 ``` ### `/` Divide numbers. Returns a float when the division is not exact (so `(/ 10 3)` is `3.3333...`, not `3`). For truncated integer division use [`math/quotient`](#math-quotient). ```sema (/ 10 2) ;; => 5 (/ 10 3) ;; => 3.3333333333333335 (/ 10.0 3) ;; => 3.3333333333333335 ``` ### `mod` Modulo (remainder after division). ```sema (mod 10 3) ; => 1 (mod 7 2) ; => 1 ``` ## Comparison ### `<` Less than. Supports chaining. ```sema (< 1 2) ; => #t (< 1 2 3) ; => #t (< 3 2) ; => #f ``` ### `>` Greater than. ```sema (> 3 2) ; => #t (> 1 2) ; => #f ``` ### `<=` Less than or equal. ```sema (<= 1 2) ; => #t (<= 2 2) ; => #t ``` ### `>=` Greater than or equal. ```sema (>= 3 2) ; => #t (>= 2 2) ; => #t ``` ### `=` Numeric equality. ```sema (= 1 1) ; => #t (= 1 2) ; => #f ``` ## Numeric Utilities ### `abs` Absolute value. ```sema (abs -5) ; => 5 (abs 3) ; => 3 (abs -3.14) ; => 3.14 ``` ### `min` Return the smallest of 1 or more numbers (the no-arg case errors). ```sema (min 1 2 3) ;; => 1 (min 5) ;; => 5 (min) ;; error: Arity error: min expects 1+ args, got 0 ``` ### `max` Return the largest of 1 or more numbers (the no-arg case errors). ```sema (max 1 2 3) ;; => 3 (max 5) ;; => 5 (max) ;; error: Arity error: max expects 1+ args, got 0 ``` ### `pow` Raise a number to a power. ```sema (pow 2 10) ; => 1024 (pow 3 3) ; => 27 ``` ### `sqrt` Square root. ```sema (sqrt 16) ; => 4.0 (sqrt 2) ; => 1.4142... ``` ### `log` Natural logarithm. ```sema (log 1) ; => 0.0 (log 100) ; => 4.605... ``` ### `floor` Round down to nearest integer. ```sema (floor 3.7) ; => 3 (floor -2.3) ; => -3 ``` ### `ceil` Round up to nearest integer. ```sema (ceil 3.2) ; => 4 (ceil -2.7) ; => -2 ``` ### `round` Round to nearest integer. ```sema (round 3.5) ; => 4 (round 3.4) ; => 3 ``` ### `math/round-to` Round to `places` decimal places, returning a float (where `round` only rounds to a whole integer). ```sema (math/round-to 3.14159 2) ; => 3.14 (math/round-to 0.46666 3) ; => 0.467 ``` ### `math/format-fixed` Format a number as a fixed-decimal **string**, padding trailing zeros to `places` digits — for money/metrics display where `math/round-to` (a float, which drops trailing zeros) isn't enough. ```sema (math/format-fixed 1.2 3) ; => "1.200" (math/format-fixed 3.14159 2) ; => "3.14" ``` ## Trigonometry ### `sin` Sine (argument in radians). ```sema (sin 0) ; => 0.0 (sin pi) ; => ~0.0 ``` ### `cos` Cosine (argument in radians). ```sema (cos 0) ; => 1.0 (cos pi) ; => -1.0 ``` ### `math/tan` Tangent (argument in radians). ```sema (math/tan 0) ; => 0.0 (math/tan (/ pi 4)); => ~1.0 ``` ### `math/asin` Inverse sine. Returns radians. ```sema (math/asin 1) ; => ~1.5707 (π/2) (math/asin 0) ; => 0.0 ``` ### `math/acos` Inverse cosine. Returns radians. ```sema (math/acos 0) ; => ~1.5707 (π/2) (math/acos 1) ; => 0.0 ``` ### `math/atan` Inverse tangent. Returns radians. ```sema (math/atan 1) ; => ~0.7854 (π/4) (math/atan 0) ; => 0.0 ``` ### `math/atan2` Two-argument inverse tangent. Returns the angle in radians between the positive x-axis and the point (x, y). ```sema (math/atan2 1 1) ; => ~0.7854 (π/4) (math/atan2 0 -1) ; => ~3.1416 (π) ``` ## Hyperbolic Functions ### `math/sinh` Hyperbolic sine. ```sema (math/sinh 0) ; => 0.0 (math/sinh 1) ; => 1.1752... ``` ### `math/cosh` Hyperbolic cosine. ```sema (math/cosh 0) ; => 1.0 (math/cosh 1) ; => 1.5430... ``` ### `math/tanh` Hyperbolic tangent. ```sema (math/tanh 0) ; => 0.0 (math/tanh 1) ; => 0.7615... ``` ## Exponential & Logarithmic ### `math/exp` Euler's number raised to a power (e^x). ```sema (math/exp 1) ; => 2.71828... (math/exp 0) ; => 1.0 ``` ### `math/log10` Base-10 logarithm. ```sema (math/log10 100) ; => 2.0 (math/log10 1000) ; => 3.0 ``` ### `math/log2` Base-2 logarithm. ```sema (math/log2 8) ; => 3.0 (math/log2 1024) ; => 10.0 ``` ## Integer Math ### `math/gcd` Greatest common divisor. ```sema (math/gcd 12 8) ; => 4 (math/gcd 15 10) ; => 5 ``` ### `math/lcm` Least common multiple. ```sema (math/lcm 4 6) ; => 12 (math/lcm 3 5) ; => 15 ``` ### `math/quotient` Integer quotient (truncated division). ```sema (math/quotient 10 3) ; => 3 (math/quotient 7 2) ; => 3 ``` ### `math/remainder` Remainder after truncated division. ```sema (math/remainder 10 3) ; => 1 (math/remainder 7 2) ; => 1 ``` ## Random Numbers ### `math/random` Return a random float between 0.0 (inclusive) and 1.0 (exclusive). ```sema (math/random) ; => 0.7291... (varies) ``` ### `math/random-int` Return a random integer in a range (inclusive on both ends). ```sema (math/random-int 1 100) ; => 42 (varies) (math/random-int 0 9) ; => 7 (varies) ``` ## Interpolation & Clamping ### `math/clamp` Clamp a value to a range. ```sema (math/clamp 15 0 10) ; => 10 (math/clamp -5 0 10) ; => 0 (math/clamp 5 0 10) ; => 5 ``` ### `math/sign` Return the sign of a number: -1, 0, or 1. ```sema (math/sign -5) ; => -1 (math/sign 0) ; => 0 (math/sign 42) ; => 1 ``` ### `math/lerp` Linear interpolation between two values. `(math/lerp a b t)` returns `a + (b - a) * t`. ```sema (math/lerp 0 100 0.5) ; => 50.0 (math/lerp 0 100 0.25) ; => 25.0 (math/lerp 10 20 0.0) ; => 10.0 ``` ### `math/map-range` Map a value from one range to another. `(math/map-range value in-min in-max out-min out-max)`. ```sema (math/map-range 5 0 10 0 100) ; => 50.0 (math/map-range 0.5 0 1 0 255) ; => 127.5 ``` ## Angle Conversion ### `math/degrees->radians` Convert degrees to radians. ```sema (math/degrees->radians 180) ; => 3.14159... (math/degrees->radians 90) ; => 1.5707... ``` ### `math/radians->degrees` Convert radians to degrees. ```sema (math/radians->degrees pi) ; => 180.0 (math/radians->degrees 1) ; => 57.295... ``` ## Numeric Predicates ### `even?` Test if an integer is even. ```sema (even? 4) ; => #t (even? 3) ; => #f ``` ### `odd?` Test if an integer is odd. ```sema (odd? 3) ; => #t (odd? 4) ; => #f ``` ### `positive?` Test if a number is positive. ```sema (positive? 1) ; => #t (positive? -1) ; => #f (positive? 0) ; => #f ``` ### `negative?` Test if a number is negative. ```sema (negative? -1) ; => #t (negative? 1) ; => #f ``` ### `zero?` Test if a number is zero. ```sema (zero? 0) ; => #t (zero? 1) ; => #f ``` ### `math/nan?` Test if a value is NaN (not a number). ```sema (math/nan? math/nan) ; => #t (math/nan? 42) ; => #f ``` ### `math/infinite?` Test if a value is infinite. ```sema (math/infinite? math/infinity) ; => #t (math/infinite? 42) ; => #f ``` ## Constants ### `pi` The mathematical constant π (3.14159...). ```sema pi ; => 3.141592653589793 ``` ### `e` Euler's number (2.71828...). ```sema e ; => 2.718281828459045 ``` ### `math/infinity` Positive infinity. ```sema math/infinity ; => Inf ``` ### `math/nan` Not a number. ```sema math/nan ; => NaN ``` ## Scheme Aliases ### `modulo` Alias for `mod`. ```sema (modulo 10 3) ; => 1 ``` ### `expt` Alias for `pow` (Scheme name for exponentiation). ```sema (expt 2 10) ; => 1024 ``` ### `ceiling` Alias for `ceil`. ```sema (ceiling 3.2) ; => 4 ``` ### `truncate` Truncate toward zero. ```sema (truncate 3.7) ; => 3 (truncate -3.7) ; => -3 ``` ## Bitwise Operations ### `bit/and` Bitwise AND. ```sema (bit/and 5 3) ; => 1 (bit/and 15 9) ; => 9 ``` ### `bit/or` Bitwise OR. ```sema (bit/or 5 3) ; => 7 (bit/or 8 4) ; => 12 ``` ### `bit/xor` Bitwise XOR. ```sema (bit/xor 5 3) ; => 6 ``` ### `bit/not` Bitwise NOT (complement). ```sema (bit/not 5) ; => -6 ``` ### `bit/shift-left` Left bit shift. ```sema (bit/shift-left 1 4) ; => 16 (bit/shift-left 3 2) ; => 12 ``` ### `bit/shift-right` Right bit shift. ```sema (bit/shift-right 16 2) ; => 4 (bit/shift-right 8 1) ; => 4 ``` --- --- url: 'https://sema-lang.com/docs/stdlib/strings.md' --- # Strings & Characters ## Core String Operations ### `string/split` Split a string by a delimiter. ```sema (string/split "a,b,c" ",") ; => ("a" "b" "c") (string/split "hello world" " ") ; => ("hello" "world") ``` ### `string/lines` Split into lines on `\n` or `\r\n` (Clojure `split-lines` semantics). A trailing newline does not produce a final empty line — handy for processing logs, config, or file contents. Use `string/split` when you need a literal separator instead. ```sema (string/lines "a\nb\r\nc\n") ; => ("a" "b" "c") (string/lines "single") ; => ("single") ``` ### `string/join` Join a list of strings with a separator. ```sema (string/join '("a" "b" "c") ", ") ; => "a, b, c" (string/join '("x" "y") "-") ; => "x-y" ``` ### `string/trim` Remove whitespace from both ends. ```sema (string/trim " hello ") ; => "hello" (string/trim "\thello\n") ; => "hello" ``` ### `string/trim-left` Remove whitespace from the left. ```sema (string/trim-left " hi") ; => "hi" ``` ### `string/trim-right` Remove whitespace from the right. ```sema (string/trim-right "hi ") ; => "hi" ``` ### `string/upper` Convert string to uppercase. ```sema (string/upper "hello") ; => "HELLO" ``` ### `string/lower` Convert string to lowercase. ```sema (string/lower "HELLO") ; => "hello" ``` ### `string/capitalize` Capitalize the first character. ```sema (string/capitalize "hello") ; => "Hello" ``` ### `string/title-case` Capitalize the first character of each word. ```sema (string/title-case "hello world") ; => "Hello World" ``` ### `string/contains?` Test if a string contains a substring. ```sema (string/contains? "hello" "ell") ; => #t (string/contains? "hello" "xyz") ; => #f ``` ### `string/starts-with?` Test if a string starts with a prefix. ```sema (string/starts-with? "hello" "he") ; => #t (string/starts-with? "hello" "lo") ; => #f ``` ### `string/ends-with?` Test if a string ends with a suffix. ```sema (string/ends-with? "hello" "lo") ; => #t (string/ends-with? "hello" "he") ; => #f ``` ### `string/replace` Replace all occurrences of a substring. ```sema (string/replace "hello" "l" "r") ; => "herro" (string/replace "aaa" "a" "b") ; => "bbb" ``` ### `string/index-of` Return the character index of the first occurrence of a substring, or `nil` if not found. ```sema (string/index-of "hello" "ll") ; => 2 (string/index-of "hello" "xyz") ; => nil ``` ### `string/last-index-of` Find the last occurrence of a substring. Returns the character index or `nil` if not found. ```sema (string/last-index-of "abcabc" "abc") ; => 3 (string/last-index-of "hello" "xyz") ; => nil ``` ### `string/chars` Convert a string to a list of characters. ```sema (string/chars "abc") ; => (#\a #\b #\c) ``` ### `string/repeat` Repeat a string N times. ```sema (string/repeat "ab" 3) ; => "ababab" (string/repeat "-" 5) ; => "-----" ``` ### `string/pad-left` Pad a string on the left to a given width. ```sema (string/pad-left "42" 5 "0") ; => "00042" (string/pad-left "hi" 5) ; => " hi" ``` ### `string/pad-right` Pad a string on the right to a given width. ```sema (string/pad-right "hi" 5) ; => "hi " (string/pad-right "42" 5 "0") ; => "42000" ``` ### `string/number?` Test if a string represents a valid number. ```sema (string/number? "42") ; => #t (string/number? "3.14") ; => #t (string/number? "hello") ; => #f ``` ### `string/empty?` Test if a string is empty. ```sema (string/empty? "") ; => #t (string/empty? "hello") ; => #f ``` ### `string/map` Apply a character function to each character in a string, returning a new string. ```sema (string/map char/upcase "hello") ; => "HELLO" ``` ### `string/reverse` Reverse a string. ```sema (string/reverse "hello") ; => "olleh" ``` ## Unicode & Encoding ### `string/byte-length` Return the UTF-8 byte length of a string (as opposed to character count from `string/length`). Useful for understanding the actual memory footprint — emoji and CJK characters use more bytes than ASCII. ```sema (string/byte-length "hello") ; => 5 (ASCII: 1 byte each) (string/byte-length "héllo") ; => 6 (é is 2 bytes in UTF-8) (string/byte-length "日本語") ; => 9 (CJK: 3 bytes each) (string/byte-length "😀") ; => 4 (emoji: 4 bytes) ``` Compare with `string/length` which counts characters: ```sema (string/length "😀") ; => 1 (one character) (string/byte-length "😀") ; => 4 (four bytes) ``` ### `string/codepoints` Return a list of Unicode codepoint integers for each character in a string. This reveals the internal structure of composed characters and emoji sequences. ```sema (string/codepoints "ABC") ; => (65 66 67) (string/codepoints "é") ; => (233) (string/codepoints "😀") ; => (128512) ``` Emoji that appear as a single glyph are often multiple codepoints joined by Zero Width Joiner (U+200D = 8205): ```sema ;; 👨‍👩‍👦 is actually 👨 + ZWJ + 👩 + ZWJ + 👦 (string/codepoints "👨‍👩‍👦") ; => (128104 8205 128105 8205 128102) ;; 👋🏽 is 👋 + skin tone modifier (string/codepoints "👋🏽") ; => (128075 127997) ``` ### `string/from-codepoints` Construct a string from a list of Unicode codepoint integers. This is the inverse of `string/codepoints` and enables building emoji programmatically by combining codepoints. ```sema (string/from-codepoints (list 65 66 67)) ; => "ABC" (string/from-codepoints (list 233)) ; => "é" ``` Build emoji by combining people with ZWJ (8205): ```sema ;; Build a family: 👨 + ZWJ + 👩 + ZWJ + 👧 (string/from-codepoints (list 128104 8205 128105 8205 128103)) ;; => 👨‍👩‍👧 ;; Build a profession: 👩 + ZWJ + 💻 (string/from-codepoints (list 128105 8205 128187)) ;; => 👩‍💻 ;; Add skin tone: 👋 + modifier (string/from-codepoints (list 128075 127997)) ;; => 👋🏽 ;; Build flags from Regional Indicators (A=127462): (string/from-codepoints (list 127475 127476)) ;; => 🇳🇴 (NO = Norway) ``` Roundtrip any string through codepoints: ```sema (string/from-codepoints (string/codepoints "Hello 世界")) ;; => "Hello 世界" ``` ### `string/normalize` Normalize a string to a Unicode normalization form. Supported forms: `:nfc`, `:nfd`, `:nfkc`, `:nfkd` (as keywords or strings). * **NFC** — Canonical Decomposition, followed by Canonical Composition (most common) * **NFD** — Canonical Decomposition * **NFKC** — Compatibility Decomposition, followed by Canonical Composition * **NFKD** — Compatibility Decomposition ```sema ;; NFC: combine decomposed characters ;; e + combining acute accent → é (string/normalize "e\u0301" :nfc) ; => "é" ;; NFD: decompose composed characters (string/length (string/normalize "é" :nfd)) ; => 2 (e + combining accent) ;; NFKC/NFKD: compatibility decomposition (ligatures, etc.) (string/normalize "\uFB01" :nfkc) ; => "fi" (fi ligature → two letters) ;; String form names also work (string/normalize "e\u0301" "NFC") ; => "é" ``` ### `string/foldcase` Apply Unicode case folding to a string. Useful for case-insensitive comparisons and normalization. Uses full Unicode-aware lowercasing. ```sema (string/foldcase "HELLO") ; => "hello" (string/foldcase "Hello World") ; => "hello world" (string/foldcase "Straße") ; => "straße" (string/foldcase "ΩΜΕΓΑ") ; => "ωμεγα" ``` ### `string-ci=?` Case-insensitive string equality comparison. Compares two strings after applying case folding to both. ```sema (string-ci=? "Hello" "hello") ; => #t (string-ci=? "ABC" "abc") ; => #t (string-ci=? "CAFÉ" "café") ; => #t (string-ci=? "hello" "world") ; => #f ``` ## Scheme Compatibility Aliases These functions use legacy Scheme/R7RS naming conventions. They work identically to their modern equivalents and are kept for compatibility. Prefer the `string/` namespaced variants in new code. ### `string/append` Concatenate strings together. ```sema (string/append "hello" " " "world") ; => "hello world" (string/append "a" "b" "c") ; => "abc" ``` ### `string/length` Return the number of characters in a string. ```sema (string/length "hello") ; => 5 (string/length "") ; => 0 (string/length "héllo") ; => 5 (string/length "日本語") ; => 3 ``` ### `string/ref` Return the character at a given index. ```sema (string/ref "hello" 0) ; => #\h (string/ref "hello" 4) ; => #\o ``` ### `string/slice` Extract a substring by start and end character index. ```sema (string/slice "hello" 1 3) ; => "el" (string/slice "hello" 0 5) ; => "hello" (string/slice "héllo" 1 2) ; => "é" ``` ### `str` Convert any value to its string representation. ```sema (str 42) ; => "42" (str #t) ; => "#t" (str '(1 2 3)) ; => "(1 2 3)" ``` ### `format` Format a string with `~a` placeholders. ```sema (format "~a is ~a" "Sema" "great") ; => "Sema is great" (format "~a + ~a = ~a" 1 2 3) ; => "1 + 2 = 3" ``` ## Characters Character literals are written with the `#\` prefix. ```sema #\a ; character literal #\space ; named character: space #\newline ; named character: newline #\tab ; named character: tab ``` ### `char/to-integer` Convert a character to its Unicode code point. ```sema (char/to-integer #\A) ; => 65 (char/to-integer #\a) ; => 97 ``` ### `integer/to-char` Convert a Unicode code point to a character. ```sema (integer/to-char 65) ; => #\A (integer/to-char 955) ; => #\λ ``` ### `char/alphabetic?` Test if a character is alphabetic. ```sema (char/alphabetic? #\a) ; => #t (char/alphabetic? #\5) ; => #f ``` ### `char/numeric?` Test if a character is numeric. ```sema (char/numeric? #\5) ; => #t (char/numeric? #\a) ; => #f ``` ### `char/whitespace?` Test if a character is whitespace. ```sema (char/whitespace? #\space) ; => #t (char/whitespace? #\a) ; => #f ``` ### `char/upper-case?` Test if a character is uppercase. ```sema (char/upper-case? #\A) ; => #t (char/upper-case? #\a) ; => #f ``` ### `char/upcase` Convert a character to uppercase. ```sema (char/upcase #\a) ; => #\A ``` ### `char/downcase` Convert a character to lowercase. ```sema (char/downcase #\Z) ; => #\z ``` ### `char/to-string` Convert a character to a single-character string. ```sema (char/to-string #\a) ; => "a" ``` ### `string/to-char` Convert a single-character string to a character. ```sema (string/to-char "a") ; => #\a ``` ## Character Comparison (R7RS) ### `char=?` Character equality. ```sema (char=? #\a #\a) ; => #t (char=? #\a #\b) ; => #f ``` ### `char #t ``` ### `char>?` Character greater-than. ```sema (char>? #\b #\a) ; => #t ``` ### `char<=?` Character less-than-or-equal. ```sema (char<=? #\a #\b) ; => #t (char<=? #\a #\a) ; => #t ``` ### `char>=?` Character greater-than-or-equal. ```sema (char>=? #\b #\a) ; => #t ``` ### `char-ci=?` Case-insensitive character equality. ```sema (char-ci=? #\A #\a) ; => #t ``` ## Type Conversions ### `string/to-number` Parse a string as a number. ```sema (string/to-number "42") ; => 42 (string/to-number "3.14") ; => 3.14 ``` ### `number/to-string` Convert a number to a string. ```sema (number/to-string 42) ; => "42" (number/to-string 3.14) ; => "3.14" ``` ### `string/to-symbol` Convert a string to a symbol. ```sema (string/to-symbol "foo") ; => foo ``` ### `symbol/to-string` Convert a symbol to a string. ```sema (symbol/to-string 'foo) ; => "foo" ``` ### `string/to-keyword` Convert a string to a keyword. ```sema (string/to-keyword "name") ; => :name ``` ### `keyword/to-string` Convert a keyword to a string. ```sema (keyword/to-string :name) ; => "name" ``` ### `string/to-list` Convert a string to a list of characters. ```sema (string/to-list "abc") ; => (#\a #\b #\c) ``` ### `list->string` Convert a list of characters to a string. ```sema (list->string '(#\h #\i)) ; => "hi" ``` ## Slicing & Extraction ### `string/after` Everything after the first occurrence of a needle. Returns the original string if needle not found. ```sema (string/after "hello@world.com" "@") ; => "world.com" (string/after "no-match" "@") ; => "no-match" ``` ### `string/after-last` Everything after the last occurrence of a needle. ```sema (string/after-last "a.b.c" ".") ; => "c" ``` ### `string/before` Everything before the first occurrence of a needle. ```sema (string/before "hello@world.com" "@") ; => "hello" (string/before "no-match" "@") ; => "no-match" ``` ### `string/before-last` Everything before the last occurrence of a needle. ```sema (string/before-last "a.b.c" ".") ; => "a.b" ``` ### `string/between` Extract the portion between two delimiters. ```sema (string/between "[hello]" "[" "]") ; => "hello" (string/between "start:middle:end" "start:" ":end") ; => "middle" ``` ### `string/take` Take the first N characters (positive) or last N characters (negative). ```sema (string/take "hello" 3) ; => "hel" (string/take "hello" -2) ; => "lo" ``` ## Prefix & Suffix ### `string/chop-start` Remove a prefix if present, otherwise return unchanged. ```sema (string/chop-start "Hello World" "Hello ") ; => "World" (string/chop-start "Hello" "Bye") ; => "Hello" ``` ### `string/chop-end` Remove a suffix if present. ```sema (string/chop-end "file.txt" ".txt") ; => "file" (string/chop-end "file.txt" ".md") ; => "file.txt" ``` ### `string/ensure-start` Ensure a string starts with a prefix (adds it if missing). ```sema (string/ensure-start "/path" "/") ; => "/path" (string/ensure-start "path" "/") ; => "/path" ``` ### `string/ensure-end` Ensure a string ends with a suffix. ```sema (string/ensure-end "path" "/") ; => "path/" (string/ensure-end "path/" "/") ; => "path/" ``` ### `string/wrap` Wrap a string with left and right delimiters. ```sema (string/wrap "hello" "(" ")") ; => "(hello)" (string/wrap "hello" "**") ; => "**hello**" ``` ### `string/unwrap` Remove surrounding delimiters if both present. ```sema (string/unwrap "(hello)" "(" ")") ; => "hello" (string/unwrap "hello" "(" ")") ; => "hello" ``` ## Replacement ### `string/replace-first` Replace only the first occurrence of a substring. ```sema (string/replace-first "aaa" "a" "b") ; => "baa" ``` ### `string/replace-last` Replace only the last occurrence. ```sema (string/replace-last "aaa" "a" "b") ; => "aab" ``` ### `string/remove` Remove all occurrences of a substring. ```sema (string/remove "hello world" "o") ; => "hell wrld" ``` ## Case Conversion ### `string/snake-case` Convert to snake\_case. ```sema (string/snake-case "helloWorld") ; => "hello_world" (string/snake-case "Hello World") ; => "hello_world" ``` ### `string/kebab-case` Convert to kebab-case. ```sema (string/kebab-case "helloWorld") ; => "hello-world" (string/kebab-case "Hello World") ; => "hello-world" ``` ### `string/camel-case` Convert to camelCase. ```sema (string/camel-case "hello_world") ; => "helloWorld" (string/camel-case "Hello World") ; => "helloWorld" ``` ### `string/pascal-case` Convert to PascalCase. ```sema (string/pascal-case "hello_world") ; => "HelloWorld" (string/pascal-case "hello world") ; => "HelloWorld" ``` ### `string/headline` Convert to Title Case headline. ```sema (string/headline "hello_world") ; => "Hello World" (string/headline "helloWorld") ; => "Hello World" ``` ### `string/words` Split a string into words (splits on non-alphanumeric boundaries). ```sema (string/words "hello_world") ; => ("hello" "world") (string/words "helloWorld") ; => ("hello" "World") (string/words "Hello World!") ; => ("Hello" "World") ``` --- --- url: 'https://sema-lang.com/docs/stdlib/lists.md' --- # Lists Lists are the fundamental data structure in Sema. They are built from cons pairs and support a rich set of operations. ## Construction & Access ### `list` Create a new list. ```sema (list 1 2 3) ; => (1 2 3) (list) ; => () (list "a" "b") ; => ("a" "b") ``` ### `cons` Prepend an element to a list. ```sema (cons 0 '(1 2 3)) ; => (0 1 2 3) (cons 1 '()) ; => (1) ``` ### `car` Return the first element of a list. ```sema (car '(1 2 3)) ; => 1 ``` ### `cdr` Return the rest of a list (everything after the first element). ```sema (cdr '(1 2 3)) ; => (2 3) (cdr '(1)) ; => () ``` ::: details Where these names come from `car` and `cdr` are inherited from the [IBM 704](http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf) (1955), the machine Lisp was originally implemented on. The 704 stored cons cells in a single 36-bit word, with two 15-bit pointer fields: the **address** field (bits 21-35) pointed to the first element, and the **decrement** field (bits 3-17) pointed to the rest of the list. `car` stands for "Contents of the Address Register" and `cdr` for "Contents of the Decrement Register" — they were single hardware instructions that extracted these sub-fields. Sema also provides `first`/`rest` as more readable aliases. ::: ### `first` Alias for `car`. Return the first element. ```sema (first '(1 2 3)) ; => 1 ``` ### `rest` Alias for `cdr`. Return the rest of the list. ```sema (rest '(1 2 3)) ; => (2 3) ``` ### `cadr`, `caddr`, ... Compositions of `car` and `cdr`. Available: `caar`, `cadr`, `cdar`, `cddr`, `caaar`, `caadr`, `cadar`, `caddr`, `cdaar`, `cdadr`, `cddar`, `cdddr`. ```sema (cadr '(1 2 3)) ; => 2 (caddr '(1 2 3)) ; => 3 ``` ### `last` Return the last element of a list. ```sema (last '(1 2 3)) ; => 3 ``` ### `nth` Return the element at index N (zero-based). ```sema (nth '(10 20 30) 1) ; => 20 (nth '(10 20 30) 0) ; => 10 ``` ## Association Lists ### `assoc` Look up a key in an association list (list of pairs). Uses `equal?` comparison. ```sema (define alist '(("a" 1) ("b" 2) ("c" 3))) (assoc "b" alist) ; => ("b" 2) (assoc "z" alist) ; => #f ``` ### `assq` Like `assoc` but uses `eq?` comparison (pointer/symbol equality). ```sema (assq 'b '((a 1) (b 2))) ; => (b 2) ``` ### `assv` Like `assoc` but uses `eqv?` comparison (value equality for numbers). ```sema (assv 2 '((1 "one") (2 "two"))) ; => (2 "two") ``` ## Basic Operations ### `length` Return the number of elements in a list. ```sema (length '(1 2 3)) ; => 3 (length '()) ; => 0 ``` ### `append` Concatenate lists. ```sema (append '(1 2) '(3 4)) ; => (1 2 3 4) (append '(1) '(2) '(3)) ; => (1 2 3) ``` ### `reverse` Reverse a list. ```sema (reverse '(1 2 3)) ; => (3 2 1) ``` ### `range` Generate a list of integers. With one argument, generates 0 to N-1. With two, generates from start to end-1. ```sema (range 5) ; => (0 1 2 3 4) (range 1 5) ; => (1 2 3 4) ``` ## Higher-Order Functions ### `map` Apply a function to each element of one or more lists. ```sema (map (fn (x) (* x x)) '(1 2 3)) ; => (1 4 9) (map + '(1 2 3) '(10 20 30)) ; => (11 22 33) ``` ### `filter` Return elements that satisfy a predicate. ```sema (filter even? '(1 2 3 4 5)) ; => (2 4) (filter string? '(1 "a" 2)) ; => ("a") ``` ### `foldl` Left fold. `(foldl f init list)` — accumulates from left to right. ```sema (foldl + 0 '(1 2 3 4 5)) ; => 15 (foldl cons '() '(1 2 3)) ; => (3 2 1) ``` ### `foldr` Right fold. `(foldr f init list)` — accumulates from right to left. ```sema (foldr cons '() '(1 2 3)) ; => (1 2 3) ``` ### `reduce` Like `foldl` but uses the first element as the initial value. ```sema (reduce + '(1 2 3 4 5)) ; => 15 ``` ### `for-each` Apply a function to each element for side effects. ```sema (for-each println '("a" "b" "c")) ;; prints: a, b, c (each on a new line) ``` ### `sort` Sort a list in ascending order. ```sema (sort '(3 1 4 1 5)) ; => (1 1 3 4 5) ``` ### `sort-by` Sort a list by a key function. ```sema (sort-by length '("bb" "a" "ccc")) ; => ("a" "bb" "ccc") (sort-by abs '(-3 1 -2)) ; => (1 -2 -3) ``` ### `flat-map` Map a function over a list and flatten the results by one level. ```sema (flat-map (fn (x) (list x (* x 10))) '(1 2 3)) ; => (1 10 2 20 3 30) ``` ### `apply` Apply a function to a list of arguments. ```sema (apply + '(1 2 3)) ; => 6 (apply max '(3 1 4)) ; => 4 ``` ## Sublists ### `take` Take the first N elements. ```sema (take 3 '(1 2 3 4 5)) ; => (1 2 3) (take 10 '(1 2)) ; => (1 2) ``` ### `drop` Drop the first N elements. ```sema (drop 2 '(1 2 3 4 5)) ; => (3 4 5) ``` ### `list/take-last` Take the last N elements (the tail counterpart to `take`). Clamps to the list length. ```sema (list/take-last 2 '(1 2 3 4)) ; => (3 4) (list/take-last 9 '(1 2)) ; => (1 2) ``` ### `list/drop-last` Drop the last N elements (drops from the tail; the counterpart to `drop`). Clamps to empty. ```sema (list/drop-last 2 '(1 2 3 4)) ; => (1 2) (list/drop-last 9 '(1 2)) ; => () ``` ### `flatten` Flatten nested lists into a single list. ```sema (flatten '(1 (2 (3)) 4)) ; => (1 2 3 4) ``` ### `flatten-deep` Recursively flatten all nested lists. ```sema (flatten-deep '(1 (2 (3 (4))))) ; => (1 2 3 4) ``` ### `zip` Combine corresponding elements from two lists into pairs. ```sema (zip '(1 2 3) '("a" "b" "c")) ; => ((1 "a") (2 "b") (3 "c")) ``` ### `partition` Split a list into two lists based on a predicate. Returns a list of two lists: elements that satisfy the predicate and those that don't. ```sema (partition even? '(1 2 3 4 5)) ; => ((2 4) (1 3 5)) ``` ## Searching ### `member` Return the tail of the list starting from the first matching element. ```sema (member 3 '(1 2 3 4)) ; => (3 4) (member 9 '(1 2 3)) ; => #f ``` ### `list/contains?` Return `#t` if the list contains the element, else `#f`. Unlike `member` (which returns the Scheme-style tail or `#f`), this reads as a predicate. ```sema (list/contains? '(1 2 3) 2) ; => #t (list/contains? '(1 2 3) 9) ; => #f ``` ### `list/nth-or` Indexed access with a fallback: returns the element at `index`, or `default` when out of bounds (the safe counterpart to `nth`, which errors). ```sema (list/nth-or '(10 20 30) 1 :none) ; => 20 (list/nth-or '(10 20 30) 9 :none) ; => :none ``` ### `any` Test if any element satisfies a predicate. ```sema (any even? '(1 3 5 6)) ; => #t (any even? '(1 3 5)) ; => #f ``` ### `every` Test if all elements satisfy a predicate. ```sema (every even? '(2 4 6)) ; => #t (every even? '(2 3 6)) ; => #f ``` ### `list/index-of` Return the index of the first occurrence of a value, or `nil` if not found. ```sema (list/index-of '(10 20 30) 20) ;; => 1 (list/index-of '(10 20 30) 99) ;; => nil ``` ### `list/unique` Remove duplicate elements, preserving order. ```sema (list/unique '(1 2 2 3 3 3)) ; => (1 2 3) ``` ### `list/dedupe` Remove consecutive duplicates from a list. ```sema (list/dedupe '(1 1 2 2 3 3 2)) ; => (1 2 3 2) ``` ## Grouping ### `list/group-by` Group elements by a function, returning a map. ```sema (list/group-by even? '(1 2 3 4 5)) ; => {#f (1 3 5) #t (2 4)} ``` ### `list/interleave` Interleave elements from two lists. ```sema (list/interleave '(1 2 3) '(a b c)) ; => (1 a 2 b 3 c) ``` ### `list/chunk` Split a list into chunks of a given size. ```sema (list/chunk 2 '(1 2 3 4 5)) ; => ((1 2) (3 4) (5)) (list/chunk 3 '(1 2 3 4 5 6)) ; => ((1 2 3) (4 5 6)) ``` ### `frequencies` Count occurrences of each element, returning a map. ```sema (frequencies '(a b a c b a)) ; => {a 3 b 2 c 1} ``` ### `interpose` Insert a separator between elements. ```sema (interpose ", " '("a" "b" "c")) ; => ("a" ", " "b" ", " "c") ``` ## Aggregation ### `list/sum` Sum all numbers in a list. ```sema (list/sum '(1 2 3 4 5)) ; => 15 ``` ### `list/min` Return the minimum value in a list. ```sema (list/min '(3 1 4 1 5)) ; => 1 ``` ### `list/max` Return the maximum value in a list. ```sema (list/max '(3 1 4 1 5)) ; => 5 ``` ## Random ### `list/shuffle` Return a randomly shuffled copy of a list. ```sema (list/shuffle '(1 2 3 4 5)) ; => (3 1 5 2 4) (varies) ``` ### `list/pick` Pick a random element from a list. ```sema (list/pick '(1 2 3 4 5)) ; => 3 (varies) ``` ## Construction ### `list/repeat` Create a list by repeating a value N times. ```sema (list/repeat 3 0) ; => (0 0 0) (list/repeat 4 "x") ; => ("x" "x" "x" "x") ``` ### `make-list` Alias for `list/repeat`. ```sema (make-list 3 0) ; => (0 0 0) ``` ### `iota` Generate a list of numbers. `(iota count)`, `(iota count start)`, or `(iota count start step)`. ```sema (iota 5) ; => (0 1 2 3 4) (iota 3 10) ; => (10 11 12) (iota 4 0 2) ; => (0 2 4 6) ``` ## Splitting ### `list/split-at` Split a list at a given index, returning two lists. ```sema (list/split-at '(1 2 3 4 5) 3) ; => ((1 2 3) (4 5)) ``` ### `list/take-while` Take elements from the front while a predicate holds. ```sema (list/take-while (fn (x) (< x 4)) '(1 2 3 4 5)) ; => (1 2 3) ``` ### `list/drop-while` Drop elements from the front while a predicate holds. ```sema (list/drop-while (fn (x) (< x 4)) '(1 2 3 4 5)) ; => (4 5) ``` ## Filtering ### `list/reject` Return elements that do NOT satisfy a predicate (inverse of `filter`). ```sema (list/reject even? '(1 2 3 4 5)) ; => (1 3 5) ``` ### `list/find` Return the first element that satisfies a predicate, or `nil` if none found. ```sema (list/find even? '(1 3 4 5 6)) ; => 4 (list/find even? '(1 3 5)) ; => nil ``` ### `list/sole` Return the single element matching a predicate. Errors if zero or more than one match. ```sema (list/sole (fn (x) (> x 4)) '(1 2 3 4 5)) ; => 5 ``` ## Set Operations ### `list/diff` Return elements in the first list that are not in the second list. ```sema (list/diff '(1 2 3 4 5) '(3 4)) ; => (1 2 5) ``` ### `list/intersect` Return elements present in both lists. ```sema (list/intersect '(1 2 3 4 5) '(3 4 6)) ; => (3 4) ``` ### `list/duplicates` Return values that appear more than once in a list. ```sema (list/duplicates '(1 2 2 3 3 3 4)) ; => (2 3) ``` ## Extraction ### `list/pluck` Extract a specific key from each map in a list. ```sema (define people (list {:name "Alice" :age 30} {:name "Bob" :age 25})) (list/pluck :name people) ; => ("Alice" "Bob") ``` ### `list/key-by` Transform a list of maps into a map keyed by a function result. ```sema (list/key-by (fn (p) (get p :id)) people) ; => map keyed by :id ``` ## Statistics ### `list/avg` Return the average of a numeric list. ```sema (list/avg '(2 4 6)) ; => 4.0 ``` ### `list/median` Return the statistical median. ```sema (list/median '(3 1 2)) ; => 2.0 (list/median '(1 2 3 4)) ; => 2.5 ``` ### `list/mode` Return the most frequent value. If tied, returns a list. ```sema (list/mode '(1 2 2 3 3 3)) ; => 3 (list/mode '(1 1 2 2)) ; => (1 2) ``` ## Windowing ### `list/sliding` Create a sliding window over a list. Optional step parameter. ```sema (list/sliding '(1 2 3 4 5) 2) ; => ((1 2) (2 3) (3 4) (4 5)) (list/sliding '(1 2 3 4 5 6) 2 3) ; => ((1 2) (4 5)) ``` ### `list/page` Paginate a list. `(list/page items page per-page)` — 1-indexed pages. ```sema (list/page (range 20) 1 5) ; => (0 1 2 3 4) (list/page (range 20) 2 5) ; => (5 6 7 8 9) ``` ### `list/cross-join` Cartesian product of two lists. ```sema (list/cross-join '(1 2) '(3 4)) ; => ((1 3) (1 4) (2 3) (2 4)) ``` ## Padding & Joining ### `list/pad` Pad a list to a target length with a fill value. ```sema (list/pad '(1 2 3) 5 0) ; => (1 2 3 0 0) ``` ### `list/join` Join list elements into a string. Optional final separator. ```sema (list/join '(1 2 3) ", ") ; => "1, 2, 3" (list/join '(1 2 3) ", " " and ") ; => "1, 2 and 3" ``` ## Generation ### `list/times` Generate a list by calling a function N times with the index (0-based). ```sema (list/times 5 (fn (i) (* i i))) ; => (0 1 4 9 16) ``` ## Utility ### `tap` Apply a side-effect function to a value, then return the original value. ```sema (tap 42 (fn (x) (println x))) ; prints 42, returns 42 ``` --- --- url: 'https://sema-lang.com/docs/stdlib/vectors.md' --- # Vectors Vectors are **indexed, immutable** collections written with square-bracket syntax. They're ideal when you want **O(1) indexed access** and a compact literal form. Most list functions also accept vectors, but some return lists even when passed a vector — see [Vectors vs Lists](#vectors-vs-lists) for details. ## Literal Syntax ```sema [1 2 3] ; vector of integers ["a" "b" "c"] ; vector of strings [] ; empty vector [1 [2 3] 4] ; nested vectors ``` ## Construction ### `vector` Create a vector from its arguments. ```sema (vector 1 2 3) ; => [1 2 3] (vector) ; => [] (vector "a" "b") ; => ["a" "b"] ``` ## Predicates & Introspection ### `vector?` Test whether a value is a vector. ```sema (vector? [1 2 3]) ; => #t (vector? '(1 2 3)) ; => #f (vector? 42) ; => #f ``` ### `length` / `count` / `empty?` Vectors participate in Sema's generic collection functions: ```sema (length [10 20 30]) ; => 3 (count [10 20 30]) ; => 3 (empty? []) ; => #t (empty? [1]) ; => #f ``` ## Indexed Access ### `nth` Return the element at index `n` (zero-based). Works on both lists and vectors. ```sema (nth [10 20 30] 0) ; => 10 (nth [10 20 30] 2) ; => 30 ``` Out of bounds is an error: ```sema (nth [10 20 30] 3) ; => error: index 3 out of bounds (length 3) ``` ::: tip Use `first` for safe "index 0" access — it returns `nil` on empty sequences. ::: ### `first` Return the first element of a vector (or list). Returns `nil` for empty vectors. ```sema (first [1 2 3]) ; => 1 (first []) ; => nil ``` ### `rest` Return everything after the first element. **Preserves type** — vector in, vector out. ```sema (rest [1 2 3]) ; => [2 3] (rest []) ; => [] (rest [1]) ; => [] ``` ## Conversion ### `vector->list` Convert a vector to a list. ```sema (vector->list [1 2 3]) ; => (1 2 3) (vector->list []) ; => () ``` ### `list->vector` Convert a list to a vector. ```sema (list->vector '(1 2 3)) ; => [1 2 3] (list->vector '()) ; => [] ``` ## Vectors vs Lists Both lists and vectors work as "sequences", but they're optimized for different things. ### When to use a vector * You need **fast indexed access** (`nth` is O(1) on vectors) * You have a **fixed-size** structure (e.g. `[x y]`, `[start end]`, `[status body]`) * You want compact literals for configuration-style data ### When to use a list * You're building data incrementally with `cons`, `append`, or recursion * You expect to process data head+tail style * You want idiomatic Lisp code-as-data ### Return type behavior Many sequence functions accept vectors but return lists: | Function | Vector in → | Type preserved? | |----------|-------------|-----------------| | `rest` | vector out | ✅ Yes | | `reverse` | vector out | ✅ Yes | | `map` | list out | ❌ No | | `filter` | list out | ❌ No | | `append` | list out | ❌ No | ```sema (map #(* % 2) [1 2 3]) ; => (2 4 6) — list! (reverse [1 2 3]) ; => [3 2 1] — vector (append [1 2] [3 4]) ; => (1 2 3 4) — list! ``` If you need a vector result after a transformation, convert at the end: ```sema (->> [1 2 3] (map #(* % 2)) (list->vector)) ; => [2 4 6] ``` ## Destructuring Sema supports **sequential destructuring** with a vector pattern in `let`, `define`, and function parameters. This works on both list and vector values. ### Exact destructuring ```sema (let (([x y] [10 20])) (+ x y)) ; => 30 (let (([x y] '(10 20))) (+ x y)) ; => 30 ``` ### Rest destructuring with `&` ```sema (let (([head second & tail] [1 2 3 4 5])) [head second tail]) ; => [1 2 (3 4 5)] ``` Note: `tail` is a **list**, not a vector. ### Nested destructuring ```sema (let (([a [b c] d] [1 [2 3] 4])) (+ a b c d)) ; => 10 ``` ## Pattern Matching `match` supports vector patterns: ```sema (define (describe-point p) (match p ([0 0] "origin") ([x 0] (string/append "on x-axis at " (number/to-string x))) ([0 y] (string/append "on y-axis at " (number/to-string y))) ([x y] (string/append "point " (number/to-string x) ", " (number/to-string y))))) (describe-point [0 0]) ; => "origin" (describe-point '(5 0)) ; => "on x-axis at 5" ``` ## Practical Examples ### Tuple-style returns Vectors are great for fixed-arity return values: ```sema (define (min-max xs) [(list/min xs) (list/max xs)]) (min-max '(3 1 4 1 5)) ; => [1 5] ``` ### Chunking and re-vectorizing ```sema (->> (range 10) (list/chunk 3) (map list->vector)) ; => ([0 1 2] [3 4 5] [6 7 8] [9]) ``` ## Performance | Operation | Complexity | |-----------|-----------| | `nth` | O(1) | | `first` | O(1) | | `rest` | O(n) — creates a new vector | | `length` | O(1) | Vectors are **immutable** — there is no `vector-set!`. To "update" a vector, construct a new one. --- --- url: 'https://sema-lang.com/docs/stdlib/maps.md' --- # Maps & HashMaps Sema provides two map types: sorted **maps** (BTreeMap-backed, deterministic ordering) and **hashmaps** (for O(1) performance-critical lookups). ## Maps Maps use curly-brace literal syntax with keyword keys: ```sema {:name "Ada" :age 36} ; map literal {:a 1 :b 2 :c 3} ; keywords as keys ``` Keywords are callable — when used as a function, they look up their value in a map: ```sema (:name {:name "Ada" :age 36}) ; => "Ada" ``` ### `map/new` Create a map from key-value pairs. ```sema (map/new :a 1 :b 2) ; => {:a 1 :b 2} ``` ### `get` Look up a value by key. Works on both maps and hashmaps. ```sema (get {:a 1 :b 2} :a) ; => 1 (get {:a 1 :b 2} :z) ; => nil ``` ### `assoc` Add or update a key-value pair, returning a new map. ```sema (assoc {:a 1} :b 2) ; => {:a 1 :b 2} (assoc {:a 1} :a 99) ; => {:a 99} ``` ### `dissoc` Remove a key, returning a new map. Works on both maps and hashmaps. ```sema (dissoc {:a 1 :b 2} :a) ; => {:b 2} (dissoc (hashmap/new :a 1 :b 2) :a) ; hashmap without :a ``` ### `merge` Merge multiple maps together. Later maps override earlier ones. Works on both maps and hashmaps — the result type matches the first argument. ```sema (merge {:a 1} {:b 2} {:c 3}) ; => {:a 1 :b 2 :c 3} (merge {:a 1} {:a 99}) ; => {:a 99} (merge (hashmap/new :a 1) {:b 2}) ; hashmap with :a and :b ``` ### `keys` Return the keys of a map as a list. ```sema (keys {:a 1 :b 2}) ; => (:a :b) ``` ### `vals` Return the values of a map as a list. ```sema (vals {:a 1 :b 2}) ; => (1 2) ``` ### `contains?` Test if a map contains a key. ```sema (contains? {:a 1} :a) ; => #t (contains? {:a 1} :b) ; => #f ``` ### `count` Return the number of key-value pairs. ```sema (count {:a 1 :b 2}) ; => 2 ``` ### `map/entries` Return the entries as a list of key-value pairs. ```sema (map/entries {:a 1 :b 2}) ; => ((:a 1) (:b 2)) ``` ### `map/from-entries` Create a map from a list of key-value pairs. ```sema (map/from-entries '((:a 1) (:b 2))) ; => {:a 1 :b 2} ``` ## Higher-Order Map Operations ### `map/map-vals` Apply a function to every value in a map. ```sema (map/map-vals (fn (v) (* v 2)) {:a 1 :b 2}) ; => {:a 2 :b 4} ``` ### `map/map-keys` Apply a function to every key in a map. ```sema (map/map-keys (fn (k) (string/to-keyword (string/upper (keyword/to-string k)))) {:a 1}) ; => {:A 1} ``` ### `map/filter` Filter entries by a predicate that takes key and value. ```sema (map/filter (fn (k v) (> v 1)) {:a 1 :b 2 :c 3}) ; => {:b 2 :c 3} ``` ### `map/select-keys` Select only the given keys from a map. ```sema (map/select-keys {:a 1 :b 2 :c 3} '(:a :c)) ; => {:a 1 :c 3} ``` ### `map/update` Update a value at a key by applying a function. ```sema (map/update {:a 1} :a (fn (v) (+ v 10))) ; => {:a 11} ``` ## HashMaps For performance-critical workloads with many keys, use `hashmap` for O(1) lookups instead of the sorted `map`. ### `hashmap/new` Create a new hashmap from key-value pairs. ```sema (hashmap/new :a 1 :b 2 :c 3) ; create a hashmap (hashmap/new) ; empty hashmap ``` ### `hashmap/get` Look up a value in a hashmap. ```sema (hashmap/get (hashmap/new :a 1) :a) ; => 1 ``` ### `hashmap/assoc` Add a key-value pair to a hashmap. ```sema (hashmap/assoc (hashmap/new) :a 1) ; hashmap with :a 1 ``` ### `hashmap/to-map` Convert a hashmap to a sorted map. ```sema (hashmap/to-map (hashmap/new :b 2 :a 1)) ; => {:a 1 :b 2} ``` ### `hashmap/keys` Return the keys of a hashmap (unordered). ```sema (hashmap/keys (hashmap/new :a 1 :b 2)) ; => (:a :b) ``` ### `hashmap/contains?` Test if a hashmap contains a key. ```sema (hashmap/contains? (hashmap/new :a 1) :a) ; => #t ``` ### Generic Operations on HashMaps The generic functions `get`, `assoc`, `dissoc`, `keys`, `vals`, `merge`, `count`, `contains?`, and all `map/*` higher-order operations also work on hashmaps, preserving the hashmap type: ```sema (get (hashmap/new :a 1 :b 2) :a) ; => 1 (assoc (hashmap/new) :x 42) ; hashmap with :x 42 (dissoc (hashmap/new :a 1 :b 2) :a) ; hashmap without :a (merge (hashmap/new :a 1) {:b 2}) ; hashmap with :a and :b (count (hashmap/new :a 1 :b 2)) ; => 2 (map/map-vals (fn (v) (* v 2)) (hashmap/new :a 1)) ; hashmap with :a 2 (map/filter (fn (k v) (> v 1)) (hashmap/new :a 1 :b 2)) ; hashmap with :b ``` ### `map/sort-keys` Sort a map by its keys. Converts hashmaps to sorted maps. ```sema (map/sort-keys (hashmap/new :c 3 :a 1 :b 2)) ; => {:a 1 :b 2 :c 3} ``` ### `map/except` Remove specified keys from a map (inverse of `map/select-keys`). ```sema (map/except {:a 1 :b 2 :c 3} '(:b)) ; => {:a 1 :c 3} (map/except {:a 1 :b 2 :c 3} '(:a :c)) ; => {:b 2} ``` ### `map/zip` Create a map from a list of keys and a list of values. ```sema (map/zip '(:a :b :c) '(1 2 3)) ; => {:a 1 :b 2 :c 3} ``` ## Nested Map Operations ### `map/get-in` Access a value at a nested key path. Returns `nil` (or a default) if any key is missing. ```sema (map/get-in {:a {:b {:c 42}}} [:a :b :c]) ; => 42 (map/get-in {:a {:b 1}} [:a :c]) ; => nil (map/get-in {:a {:b 1}} [:a :c] "default") ; => "default" ``` ### `map/assoc-in` Set a value at a nested key path. Creates intermediate maps if they don't exist. ```sema (map/assoc-in {:a {:b 1}} [:a :b] 42) ; => {:a {:b 42}} (map/assoc-in {} [:a :b :c] 99) ; => {:a {:b {:c 99}}} ``` ### `map/update-in` Update a value at a nested key path by applying a function. ```sema (map/update-in {:a {:b 10}} [:a :b] #(+ % 1)) ; => {:a {:b 11}} ``` ### `map/deep-merge` Recursively merge maps. Nested maps are merged rather than replaced. Non-map values in the overlay override the base. ```sema (map/deep-merge {:a {:b 1 :c 2}} {:a {:b 99}}) ; => {:a {:b 99 :c 2}} (map/deep-merge {:a {:b 1}} {:a 42}) ; => {:a 42} (map/deep-merge {:a 1} {:b 2} {:c 3}) ; => {:a 1 :b 2 :c 3} ``` --- --- url: 'https://sema-lang.com/docs/stdlib/predicates.md' --- # Predicates & Type Checking Predicates return `#t` or `#f` and conventionally end with `?`. ## Emptiness Predicates These three predicates overlap but are not interchangeable. `null?` returns `#t` for both `'()` and `nil` — it tests for "absence of a value or empty list". `nil?` is true only for the `nil` value itself (not for `'()`). `empty?` is the broadest: it accepts `nil`, strings, lists, vectors, maps, and other collections, returning `#t` when the value has no elements. Reach for `empty?` when you have a collection of any shape; reach for `nil?` when you specifically need to distinguish `nil` from `'()`. ### `null?` Test if a value is the empty list or `nil`. ```sema (null? '()) ;; => #t (null? nil) ;; => #t (null? '(1)) ;; => #f ``` ### `nil?` Test if a value is `nil` specifically (not the empty list). ```sema (nil? nil) ;; => #t (nil? '()) ;; => #f (nil? 0) ;; => #f ``` ### `empty?` Test if a collection, string, or `nil` is empty. Accepts strings, lists, vectors, maps, and `nil`. ```sema (empty? "") ;; => #t (empty? '()) ;; => #t (empty? nil) ;; => #t (empty? "hello") ;; => #f (empty? [1 2 3]) ;; => #f ``` ## Collection Predicates ### `list?` Test if a value is a list. ```sema (list? '(1)) ; => #t (list? 42) ; => #f ``` ### `pair?` Test if a value is a non-empty list (Scheme compatibility). ```sema (pair? '(1 2)) ; => #t (pair? '()) ; => #f ``` ### `vector?` Test if a value is a vector. ```sema (vector? [1]) ; => #t (vector? '(1)) ; => #f ``` ### `map?` Test if a value is a map. ```sema (map? {:a 1}) ; => #t (map? '()) ; => #f ``` ## Numeric Predicates ### `number?` Test if a value is a number (integer or float). ```sema (number? 42) ; => #t (number? 3.14) ; => #t (number? "42") ; => #f ``` ### `integer?` Test if a value is an integer. ```sema (integer? 42) ; => #t (integer? 3.14) ; => #f ``` ### `float?` Test if a value is a floating-point number. ```sema (float? 3.14) ; => #t (float? 42) ; => #f ``` ### `zero?` Test if a number is zero. ```sema (zero? 0) ; => #t (zero? 1) ; => #f ``` ### `even?` Test if an integer is even. ```sema (even? 4) ; => #t (even? 3) ; => #f ``` ### `odd?` Test if an integer is odd. ```sema (odd? 3) ; => #t (odd? 4) ; => #f ``` ### `positive?` Test if a number is positive. ```sema (positive? 1) ; => #t (positive? -1) ; => #f ``` ### `negative?` Test if a number is negative. ```sema (negative? -1) ; => #t (negative? 1) ; => #f ``` ## Type Predicates ### `string?` Test if a value is a string. ```sema (string? "hi") ; => #t (string? 42) ; => #f ``` ### `symbol?` Test if a value is a symbol. ```sema (symbol? 'x) ; => #t (symbol? "x") ; => #f ``` ### `keyword?` Test if a value is a keyword. ```sema (keyword? :k) ; => #t (keyword? "k") ; => #f ``` ### `char?` Test if a value is a character. ```sema (char? #\a) ; => #t (char? "a") ; => #f ``` ### `bool?` Test if a value is a boolean. `boolean?` is an alias. ```sema (bool? #t) ; => #t (bool? 0) ; => #f ``` ### `fn?` Test if a value is a function. `procedure?` is an alias. ```sema (fn? car) ; => #t (fn? 42) ; => #f ``` ### `record?` Test if a value is a record instance. ```sema (record? my-record) ; => #t (record? 42) ; => #f ``` ### `bytevector?` Test if a value is a bytevector. ```sema (bytevector? #u8()) ; => #t (bytevector? '()) ; => #f ``` ## Promise Predicates ### `promise?` Test if a value is a promise (created with `delay`). ```sema (promise? (delay 1)) ; => #t (promise? 42) ; => #f ``` ### `promise-forced?` Test if a promise has been forced (evaluated). ```sema (define p (delay (+ 1 2))) (promise-forced? p) ; => #f (force p) (promise-forced? p) ; => #t ``` ## Equality ### `eq?` Test structural equality. `equal?` is an alias. ```sema (eq? 'a 'a) ; => #t (eq? '(1 2) '(1 2)) ; => #t (eq? 1 2) ; => #f ``` ### `=` Numeric equality. ```sema (= 1 1) ; => #t (= 1 1.0) ; => #t (= 1 2) ; => #f ``` ## LLM Type Predicates ### `prompt?` Test if a value is an LLM prompt. ```sema (prompt? (prompt (user "hi"))) ; => #t ``` ### `message?` Test if a value is an LLM message. ```sema (message? (message :user "hi")) ; => #t ``` ### `conversation?` Test if a value is a conversation. ```sema (conversation? (conversation/new {})) ; => #t ``` ### `tool?` Test if a value is a tool definition. ```sema (deftool my-tool "A test tool" {:x {:type :string}} (lambda (x) x)) (tool? my-tool) ; => #t (tool? 42) ; => #f ``` ### `agent?` Test if a value is an agent. ```sema (defagent my-agent {:system "test"}) (agent? my-agent) ; => #t (agent? 42) ; => #f ``` --- --- url: 'https://sema-lang.com/docs/stdlib/bytevectors.md' --- # Bytevectors Bytevectors are sequences of unsigned 8-bit integers (0–255), useful for binary data and string encoding. ## Literal Syntax ```sema #u8(1 2 3) ; bytevector literal #u8() ; empty bytevector #u8(255 0 128) ; arbitrary byte values ``` ## Construction ### `bytevector` Create a bytevector from byte values. ```sema (bytevector 1 2 3) ; => #u8(1 2 3) (bytevector) ; => #u8() ``` ### `bytevector/new` Create a bytevector of a given length, optionally filled with a value. ```sema (bytevector/new 4) ; => #u8(0 0 0 0) (bytevector/new 3 255) ; => #u8(255 255 255) ``` ## Access & Mutation ### `bytevector/length` Return the length of a bytevector. ```sema (bytevector/length #u8(1 2 3)) ; => 3 (bytevector/length #u8()) ; => 0 ``` ### `bytevector/ref` Return the byte at a given index. ```sema (bytevector/ref #u8(10 20 30) 1) ; => 20 (bytevector/ref #u8(10 20 30) 0) ; => 10 ``` ### `bytevector/set!` Set the byte at a given index. Uses copy-on-write — the original bytevector is unchanged. ```sema (bytevector/set! #u8(1 2 3) 0 9) ; => #u8(9 2 3) ``` ## Copy & Append ### `bytevector/copy` Copy a slice of a bytevector. `(bytevector/copy bv start end)`. ```sema (bytevector/copy #u8(1 2 3 4 5) 1 3) ; => #u8(2 3) ``` ### `bytevector/append` Concatenate bytevectors. ```sema (bytevector/append #u8(1 2) #u8(3 4)) ; => #u8(1 2 3 4) ``` ## List Conversion ### `bytevector/to-list` Convert a bytevector to a list of integers. ```sema (bytevector/to-list #u8(65 66)) ; => (65 66) ``` ### `list/to-bytevector` Convert a list of integers to a bytevector. ```sema (list/to-bytevector '(1 2 3)) ; => #u8(1 2 3) ``` ## String Conversion ### `utf8/to-string` Decode a bytevector as a UTF-8 string. ```sema (utf8/to-string #u8(104 105)) ; => "hi" (utf8/to-string #u8(72 101 108)) ; => "Hel" ``` ### `string/to-utf8` Encode a string as a UTF-8 bytevector. ```sema (string/to-utf8 "hi") ; => #u8(104 105) (string/to-utf8 "Hello") ; => #u8(72 101 108 108 111) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/typed-arrays.md' --- # Typed Arrays Typed arrays provide contiguous, unboxed numeric storage for performance-critical workloads. Unlike regular lists (which NaN-box every element), typed arrays store raw `f64` or `i64` values in a flat `Vec`, giving better cache locality and avoiding per-element boxing overhead. Two types are available: * **`f64-array`** — 64-bit floating-point arrays * **`i64-array`** — 64-bit signed integer arrays Both support copy-on-write mutation via `Rc::make_mut`. ## Construction ### `f64-array` Create an f64 array from values. ```sema (f64-array 1.0 2.5 3.7) ; => #f64(1 2.5 3.7) (f64-array) ; => #f64() ``` ### `i64-array` Create an i64 array from values. ```sema (i64-array 1 2 3) ; => #i64(1 2 3) (i64-array) ; => #i64() ``` ### `f64-array/make` Create an f64 array of a given length, optionally filled with a value (default `0.0`). ```sema (f64-array/make 5) ; => #f64(0 0 0 0 0) (f64-array/make 3 1.5) ; => #f64(1.5 1.5 1.5) ``` ### `i64-array/make` Create an i64 array of a given length, optionally filled with a value (default `0`). ```sema (i64-array/make 5) ; => #i64(0 0 0 0 0) (i64-array/make 3 42) ; => #i64(42 42 42) ``` ### `f64-array/range` Create an f64 array from a numeric range. `(f64-array/range start end)` or `(f64-array/range start end step)`. ```sema (f64-array/range 0 5) ; => #f64(0 1 2 3 4) (f64-array/range 0 1 0.25) ; => #f64(0 0.25 0.5 0.75) ``` ### `i64-array/range` Create an i64 array from an integer range. ```sema (i64-array/range 0 5) ; => #i64(0 1 2 3 4) (i64-array/range 0 10 2) ; => #i64(0 2 4 6 8) ``` ### `f64-array/from-list` Convert a list of numbers to an f64 array. ```sema (f64-array/from-list '(1 2 3)) ; => #f64(1 2 3) ``` ### `i64-array/from-list` Convert a list of integers to an i64 array. ```sema (i64-array/from-list '(10 20 30)) ; => #i64(10 20 30) ``` ## Access & Mutation ### `f64-array/ref` / `i64-array/ref` Get the element at a given index. ```sema (f64-array/ref (f64-array 1.0 2.0 3.0) 1) ; => 2.0 (i64-array/ref (i64-array 10 20 30) 0) ; => 10 ``` ### `f64-array/set!` / `i64-array/set!` Set the element at a given index. Uses copy-on-write -- the original array is unchanged unless it has a single reference. ```sema (f64-array/set! (f64-array 1.0 2.0 3.0) 1 9.9) ; => #f64(1 9.9 3) (i64-array/set! (i64-array 10 20 30) 2 99) ; => #i64(10 20 99) ``` ### `f64-array/length` / `i64-array/length` Return the number of elements. ```sema (f64-array/length (f64-array 1.0 2.0 3.0)) ; => 3 (i64-array/length (i64-array/make 10)) ; => 10 ``` ## Aggregation ### `f64-array/sum` / `i64-array/sum` Sum all elements. Runs in a tight Rust loop with no boxing overhead. ```sema (f64-array/sum (f64-array 1.0 2.0 3.0)) ; => 6.0 (i64-array/sum (i64-array 1 2 3 4 5)) ; => 15 ``` ### `f64-array/dot` Compute the dot product of two f64 arrays (must be the same length). ```sema (f64-array/dot (f64-array 1.0 2.0 3.0) (f64-array 4.0 5.0 6.0)) ; => 32.0 (1*4 + 2*5 + 3*6) ``` ## Higher-Order Functions ### `f64-array/map` / `i64-array/map` Apply a function to each element, returning a new typed array. The callback must return the matching numeric type. ```sema (f64-array/map (lambda (x) (* x 2.0)) (f64-array 1.0 2.0 3.0)) ; => #f64(2 4 6) (i64-array/map (lambda (x) (* x x)) (i64-array 1 2 3 4)) ; => #i64(1 4 9 16) ``` ### `f64-array/fold` / `i64-array/fold` Fold over a typed array with an accumulator. ```sema (f64-array/fold (lambda (acc x) (+ acc x)) 0.0 (f64-array 1.0 2.0 3.0)) ; => 6.0 (i64-array/fold (lambda (acc x) (max acc x)) 0 (i64-array 3 1 4 1 5)) ; => 5 ``` ## Type Predicates ### `f64-array?` / `i64-array?` Test whether a value is a typed array. ```sema (f64-array? (f64-array 1.0 2.0)) ; => #t (f64-array? '(1.0 2.0)) ; => #f (i64-array? (i64-array 1 2)) ; => #t ``` ## Examples ### Embedding similarity with dot product ```sema ;; Compute cosine similarity between two embedding vectors (define (cosine-similarity a b) (let ((dot (f64-array/dot a b)) (mag-a (sqrt (f64-array/dot a a))) (mag-b (sqrt (f64-array/dot b b)))) (/ dot (* mag-a mag-b)))) (define v1 (f64-array 1.0 0.0 0.0)) (define v2 (f64-array 0.707 0.707 0.0)) (cosine-similarity v1 v2) ; => ~0.707 ``` ### Numeric computation ```sema ;; Sum of squares of even numbers from 0 to 99 (define nums (i64-array/range 0 100)) (define evens (i64-array/map (lambda (x) (if (even? x) (* x x) 0)) nums)) (i64-array/sum evens) ; => 161700 ``` --- --- url: 'https://sema-lang.com/docs/stdlib/file-io.md' --- # File I/O & Paths ::: tip Sandbox capability `file/*` functions require the `FS_READ` capability (for reads, listings, predicates) or `FS_WRITE` capability (for writes, deletes, renames, mkdir). They run unrestricted under `sema` by default, but are gated in sandboxed environments (e.g., the WASM playground). A sandboxed script that attempts to use them without the capability will receive an error. ::: ## Console I/O ### `display` Print a value without a trailing newline. ```sema (display "no newline") (display 42) ``` ### `println` Print a value followed by a newline. ```sema (println "with newline") (println 42) ``` ### `print` Write values in read-syntax form (strings are quoted) like Scheme's `write`. No trailing newline. Use `display` for human-readable output without quotes. ```sema (print "hello") ;; outputs: "hello" (display "hello") ;; outputs: hello ``` ### `io/print-error` Print to stderr without a trailing newline. ```sema (io/print-error "warning: something happened") ``` ### `io/println-error` Print to stderr with a trailing newline. ```sema (io/println-error "error: file not found") ``` ### `newline` Print a newline character. ```sema (newline) ``` ### `io/read-line` Read a line of input from stdin (trailing `\n` / `\r\n` stripped). ```sema (define name (io/read-line)) ``` Returns `nil` when stdin is closed (Ctrl-D in cooked mode, end of a piped file). Use this to distinguish "user pressed Enter on an empty line" (returns `""`) from "stdin is exhausted" (returns `nil`). ```sema (let loop () (let ((line (io/read-line))) (cond ((nil? line) (println "(eof)")) ((= line "") (loop)) ; blank line, keep reading (else (println "got: " line) (loop))))) ``` ::: warning Breaking change in 1.14.0 Previously `io/read-line` returned `""` on both EOF and empty input, making them indistinguishable. It now returns `nil` on EOF. If you don't want to refactor for this, use `io/eof?` after the call instead. ::: ### `io/read-stdin` Read all of stdin as a string (until EOF). ```sema (define input (io/read-stdin)) ``` ### `io/eof?` Return `#t` after any stdin read (`io/read-line`, `io/read-stdin`, `io/read-key`) has signalled EOF. Non-breaking alternative to checking `io/read-line` for `nil`. ```sema (define line (io/read-line)) (when (io/eof?) (println "stdin closed")) ``` ### `io/flush` Flush stdout. Useful when writing a prompt without a trailing newline before reading input. ```sema (display "name> ") (io/flush) (define name (io/read-line)) ``` ## File Operations ### `file/read` Read the entire contents of a file as a string. ```sema (file/read "data.txt") ; => "file contents..." ``` ### `file/write` Write a string to a file, overwriting any existing content. ```sema (file/write "out.txt" "content") ``` ### `file/append` Append a string to a file. ```sema (file/append "log.txt" "new line\n") ``` ### `file/read-lines` Read a file as a list of lines. Handles both `\n` and `\r\n` line endings. An empty file returns an empty list. ```sema (file/read-lines "data.txt") ; => ("line 1" "line 2" "line 3") (file/read-lines "empty.txt") ; => () ``` ### `file/write-lines` Write a list of strings to a file, one per line. ```sema (file/write-lines "out.txt" '("a" "b" "c")) ``` ### `file/for-each-line` Iterate over lines of a file, calling a function on each line. Memory-efficient for large files. ```sema (file/for-each-line "data.txt" (fn (line) (println line))) ``` ### `file/fold-lines` Fold over lines of a file with an accumulator. Uses a 256KB buffer for high throughput on large files. ```sema (file/fold-lines "data.csv" (fn (acc line) (+ acc 1)) 0) ; => number of lines ``` ### `file/delete` Delete a file. ```sema (file/delete "tmp.txt") ``` ### `file/rename` Rename or move a file. ```sema (file/rename "old.txt" "new.txt") ``` ### `file/copy` Copy a file. ```sema (file/copy "src.txt" "dst.txt") ``` ## Binary File I/O ### `file/read-bytes` Read a file as a bytevector (binary data). ```sema (file/read-bytes "image.png") ; => #u8(137 80 78 71 ...) ``` ### `file/write-bytes` Write a bytevector to a file. ```sema (file/write-bytes "output.bin" my-bytes) ``` ## File Predicates ### `file/exists?` Test if a file or directory exists. ```sema (file/exists? "data.txt") ; => #t or #f ``` ### `file/is-file?` Test if a path is a regular file. ```sema (file/is-file? "data.txt") ; => #t ``` ### `file/is-directory?` Test if a path is a directory. ```sema (file/is-directory? "src/") ; => #t ``` ### `file/is-symlink?` Test if a path is a symbolic link. ```sema (file/is-symlink? "link") ; => #t or #f ``` ## Directory Operations ### `file/list` List entries in a directory. ```sema (file/list "src/") ; => ("main.rs" "lib.rs" ...) ``` ### `file/mkdir` Create a directory. ```sema (file/mkdir "new-dir") ``` ### `file/glob` Find files matching a glob pattern. ```sema (file/glob "src/**/*.rs") ; => ("src/main.rs" "src/lib.rs" ...) (file/glob "*.txt") ; => ("readme.txt" "notes.txt") ``` ### `file/info` Get file metadata. Returns a map with `:size`, `:modified`, and other keys. ```sema (file/info "data.txt") ; => {:size 1234 :modified 1707955200 ...} ``` ## Path Manipulation ### `path/join` Join path components. ```sema (path/join "src" "main.rs") ; => "src/main.rs" (path/join "a" "b" "c.txt") ; => "a/b/c.txt" ``` ### `path/dir` Return the directory portion of a path. Returns `""` when the path has no parent component. ```sema (path/dir "/a/b/c.txt") ;; => "/a/b" (path/dir "foo") ;; => "" ``` `path/dirname` is a legacy alias for `path/dir` — same implementation, same return value. ### `path/filename` Return the filename portion of a path. Returns `""` when there is no filename component (e.g. for `""`). ```sema (path/filename "/a/b/c.txt") ;; => "c.txt" (path/filename "plain.rs") ;; => "plain.rs" ``` `path/basename` is a legacy alias for `path/filename` — same implementation, same return value. ### `path/extension` Return the file extension (without the dot). Returns `""` when the path has no extension. ```sema (path/extension "file.rs") ;; => "rs" (path/extension "file.tar.gz") ;; => "gz" (path/extension "Makefile") ;; => "" (path/extension ".hidden") ;; => "" ``` `path/ext` is a legacy alias for `path/extension` — same implementation, same return value. ::: warning Behavior change Previous versions registered `path/dirname`, `path/basename`, and `path/extension` as independent functions that returned `nil` on the no-parent / no-filename / no-extension case. As of the current release, all six names share one implementation per concept and consistently return `""` (matching `path/dir`, `path/filename`, `path/ext`). ::: ### `path/absolute` Return the absolute path. ```sema (path/absolute ".") ; => "/full/path/to/current/dir" ``` ### `path/stem` Return the filename without extension. ```sema (path/stem "file.rs") ; => "file" (path/stem "archive.tar.gz") ; => "archive.tar" ``` ### `path/absolute?` Test if a path is absolute. ```sema (path/absolute? "/usr/bin") ; => #t (path/absolute? "relative") ; => #f ``` --- --- url: 'https://sema-lang.com/docs/stdlib/pdf.md' --- # PDF Processing Pure-Rust PDF text extraction, page counting, and metadata reading. No external tools required — works cross-platform including macOS, Linux, and Windows. ::: tip These functions use the `pdf-extract` and `lopdf` Rust crates internally. They work with text-based PDFs. For scanned/image-only PDFs, consider using [`llm/extract-from-image`](../llm/extraction) with vision models instead. ::: ## Text Extraction ### `pdf/extract-text` Extract all text from a PDF file, concatenated across all pages. ```sema (pdf/extract-text "invoice.pdf") ; => "Invoice\nDate: 2025-01-15\nAmount: $50.00 USD\n..." ;; Clean up whitespace for LLM processing (text/clean-whitespace (pdf/extract-text "invoice.pdf")) ; => "Invoice Date: 2025-01-15 Amount: $50.00 USD ..." ``` ### `pdf/extract-text-pages` Extract text from a PDF, returning a list of strings — one per page. ```sema (pdf/extract-text-pages "report.pdf") ; => ("Page 1 content..." "Page 2 content..." "Page 3 content...") ;; Get text from a specific page (nth (pdf/extract-text-pages "report.pdf") 0) ; => "Page 1 content..." ;; Process each page separately (for-each (fn (page-text) (println (format "Page has ~a words" (text/word-count page-text)))) (pdf/extract-text-pages "report.pdf")) ``` ## Metadata ### `pdf/page-count` Return the number of pages in a PDF. ```sema (pdf/page-count "report.pdf") ; => 12 ``` ### `pdf/metadata` Return a map of PDF metadata fields. Always includes `:pages`; other fields (`:title`, `:author`, `:subject`, `:creator`, `:producer`) are included when present in the PDF. ```sema (pprint (pdf/metadata "document.pdf")) ; => {:author "John Doe" ; :creator "LibreOffice Writer" ; :pages 5 ; :producer "LibreOffice" ; :title "Quarterly Report"} ;; Access individual fields (get (pdf/metadata "document.pdf") :title) ; => "Quarterly Report" (get (pdf/metadata "document.pdf") :pages) ; => 5 ``` ## Example: Receipt Processor Combine PDF extraction with [LLM structured extraction](../llm/extraction) to build an intelligent document processor: ```sema ;; Extract text from a PDF invoice (define text (text/clean-whitespace (pdf/extract-text "invoice.pdf"))) (define pages (pdf/page-count "invoice.pdf")) (println (format "Extracted ~a chars from ~a page(s)" (string/length text) pages)) ;; Use LLM to classify and extract structured data (llm/auto-configure) (define result (llm/extract {:isReceipt {:type :boolean :description "Is this a receipt or invoice?"} :vendor {:type :string :description "The seller/merchant name"} :amount {:type :string :description "Total amount with currency"} :date {:type :string :description "Invoice date in YYYY-MM-DD format"}} text)) (println (format "Vendor: ~a" (get result :vendor))) (println (format "Amount: ~a" (get result :amount))) ``` See the full [GLaDOS receipt processor example](https://github.com/helgesverre/sema/blob/main/examples/glados-downloads.sema) for a complete implementation. --- --- url: 'https://sema-lang.com/docs/stdlib/csv.md' --- # CSV Functions for parsing and encoding CSV (Comma-Separated Values) data. Sema uses the Rust [`csv`](https://docs.rs/csv) crate, which handles RFC 4180 edge cases like quoted fields, embedded commas, and newlines within fields. ::: tip Type mapping All CSV values are returned as **strings**. Use `string/to-number`, `string/to-symbol`, etc. to convert fields to the types you need. ::: ## Parsing ### `csv/parse` Parse a CSV string into a list of lists (rows of fields). No header processing — every row is returned as-is. **Signature:** `(csv/parse csv-string) → list` ```sema (csv/parse "a,b\n1,2\n3,4") ; => (("a" "b") ("1" "2") ("3" "4")) ``` Quoted fields with commas and newlines are handled correctly: ```sema (csv/parse "name,bio\n\"Ada\",\"Mathematician, writer\"\n") ; => (("name" "bio") ("Ada" "Mathematician, writer")) ``` ### `csv/parse-maps` Parse a CSV string into a list of maps. The first row is used as headers, which become keyword keys in each map. **Signature:** `(csv/parse-maps csv-string) → list` ```sema (csv/parse-maps "name,age\nAda,36\nBob,25") ; => ({:age "36" :name "Ada"} {:age "25" :name "Bob"}) ``` Access fields by keyword: ```sema (define rows (csv/parse-maps "name,age\nAda,36\nBob,25")) (:name (first rows)) ; => "Ada" ``` ## Encoding ### `csv/encode` Encode a list of lists (or vectors) into a CSV string. Each inner list/vector becomes one row. Non-string values are stringified automatically. **Signature:** `(csv/encode rows) → string` ```sema (csv/encode '(("a" "b") ("1" "2"))) ; => "a,b\n1,2\n" ``` Numeric and other values are converted to strings: ```sema (csv/encode '(("name" "score") ("Ada" 100))) ; => "name,score\nAda,100\n" ``` ## Examples ### Round-trip example ```sema (define csv-text "name,age\nAda,36\nBob,25\n") (define parsed (csv/parse csv-text)) (csv/encode parsed) ; => "name,age\nAda,36\nBob,25\n" ``` ### Pipeline: file → CSV → processing ```sema ;; Read a CSV file and extract a column (define data (csv/parse-maps (file/read "users.csv"))) (map (lambda (row) (:name row)) data) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/toml.md' --- # TOML Functions for encoding and decoding [TOML](https://toml.io/) data. Sema itself uses TOML for project configuration (`sema.toml`), making these functions useful for both general config parsing and meta-tooling. ## `toml/decode` `(toml/decode toml-string)` → Sema value Parse a TOML string into Sema data structures. Tables become maps with keyword keys, arrays become lists, and scalar types map to their native Sema equivalents. ```sema (toml/decode "[package]\nname = \"my-app\"\nversion = \"1.0.0\"") ; => {:package {:name "my-app" :version "1.0.0"}} ``` ### Nested Tables TOML dotted keys and sub-tables are decoded into nested maps: ```sema (toml/decode " [server] host = \"localhost\" port = 8080 [server.tls] enabled = true cert = \"/path/to/cert.pem\" ") ; => {:server {:host "localhost" :port 8080 :tls {:enabled true :cert "/path/to/cert.pem"}}} ``` ### Arrays and Arrays of Tables Plain arrays become lists. `[[double-bracket]]` arrays of tables become lists of maps: ```sema (toml/decode " colors = [\"red\", \"green\", \"blue\"] [[fruits]] name = \"apple\" color = \"red\" [[fruits]] name = \"banana\" color = \"yellow\" ") ; => {:colors ("red" "green" "blue") ; :fruits ({:color "red" :name "apple"} ; {:color "yellow" :name "banana"})} ``` ### Inline Tables Inline tables are decoded identically to standard tables: ```sema (toml/decode "point = { x = 1, y = 2 }") ; => {:point {:x 1 :y 2}} ``` ### Datetime Handling TOML datetime values are converted to strings. This includes offset datetimes, local datetimes, local dates, and local times: ```sema (toml/decode "created = 2024-01-15T10:30:00Z") ; => {:created "2024-01-15T10:30:00Z"} ``` ### Error Handling Invalid TOML throws a `SemaError`: ```sema (toml/decode "invalid = ") ; => Error: toml/decode: ... ``` ## `toml/encode` `(toml/encode map)` → TOML string Serialize a Sema map to a TOML string. The top-level value **must** be a map — passing any other type is an error. ```sema (toml/encode {:package {:name "my-app" :version "1.0.0"}}) ; => "[package]\nname = \"my-app\"\nversion = \"1.0.0\"\n" ``` ### Nested Maps Nested maps become TOML tables: ```sema (toml/encode {:database {:host "localhost" :port 5432 :credentials {:user "admin" :password "secret"}}}) ``` ### Error Handling The top-level value must be a map: ```sema (toml/encode "hello") ; => Error: toml/encode: top-level value must be a map ``` `nil` values cannot be encoded (TOML has no null): ```sema (toml/encode {:key nil}) ; => Error: toml/encode: cannot encode nil ``` Non-encodable types like functions and records throw errors: ```sema (toml/encode {:callback println}) ; => Error: toml/encode: cannot encode native-fn ``` ## Type Mapping ### TOML → Sema (decoding) | TOML Type | Sema Type | Example | |-----------|-----------|---------| | Table | map (keyword keys) | `{:key "val"}` | | Array | list | `("a" "b" "c")` | | String | string | `"hello"` | | Integer | int | `42` | | Float | float | `3.14` | | Boolean | bool | `#t` / `#f` | | Datetime | string | `"2024-01-15T10:30:00Z"` | ### Sema → TOML (encoding) | Sema Type | TOML Type | Notes | |-----------|-----------|-------| | map / hashmap | Table | Keys converted via `key_to_string` | | list / vector | Array | | | string | String | | | int | Integer | | | float | Float | | | bool | Boolean | | | keyword | String | `:foo` → `"foo"` | | symbol | String | `'foo` → `"foo"` | | nil | ❌ Error | TOML has no null type | | function / record | ❌ Error | Not representable in TOML | ## Practical Examples ### Reading a Config File ```sema (define config (-> "config.toml" file/read toml/decode)) (println "Server:" (map/get-in config [:server :host]) ":" (map/get-in config [:server :port])) ``` ### Updating Config Values ```sema (define config (-> "config.toml" file/read toml/decode)) ;; Update the port and add a new setting (define updated (-> config (map/assoc-in [:server :port] 9090) (map/assoc-in [:server :debug] true))) (file/write "config.toml" (toml/encode updated)) ``` ### Round-Trip ```sema (define config-str " [server] host = \"0.0.0.0\" port = 3000 [server.cors] origins = [\"https://example.com\"] ") (define config (toml/decode config-str)) (define new-config (map/assoc-in config [:server :port] 8080)) (toml/encode new-config) ``` ## TOML vs JSON | | TOML | JSON | |---|------|------| | **Use case** | Configuration files | Data interchange | | **Comments** | ✅ Yes | ❌ No | | **Null type** | ❌ No | ✅ `null` | | **Date/time** | ✅ Native | ❌ Strings only | | **Top-level** | Must be a table | Any value | | **Sema decode** | `toml/decode` | `json/decode` | | **Sema encode** | `toml/encode` | `json/encode` | ::: tip sema.toml Sema uses TOML for its own project configuration file (`sema.toml`). You can read and manipulate it programmatically: ```sema (define project (-> "sema.toml" file/read toml/decode)) (println "Project:" (map/get-in project [:package :name])) ``` ::: --- --- url: 'https://sema-lang.com/docs/stdlib/http-json.md' --- # HTTP & JSON ## HTTP HTTP functions make synchronous requests and return a response map. All HTTP functions require the **network** sandbox capability. ::: tip Sandbox HTTP functions are gated behind the `NETWORK` capability. They are available by default when running scripts with `sema`, but disabled in sandboxed environments (e.g., the WASM playground). A sandboxed script that attempts to use HTTP will receive an error. ::: ### Response Map All HTTP functions return a map with three keys: | Key | Type | Description | |------------|--------|------------------------------------------------------| | `:status` | int | HTTP status code (e.g., `200`, `404`, `500`) | | `:headers` | map | Response headers as keyword-keyed map | | `:body` | string | Response body as a raw string | ```sema (define resp (http/get "https://httpbin.org/get")) (:status resp) ; => 200 (:headers resp) ; => {:content-type "application/json" :server "..." ...} (:body resp) ; => "{\"args\": {}, ...}" ``` Headers are returned with keyword keys derived from the header name (e.g., `Content-Type` becomes `:content-type`). The body is always a raw string — use `json/decode` to parse JSON responses. ### Options Map The `http/get`, `http/post`, `http/put`, `http/delete`, and `http/request` functions accept an optional **options map** with the following keys: | Key | Type | Description | |------------|------|-----------------------------------------------------| | `:headers` | map | Request headers (string or keyword keys both work) | | `:timeout` | int | Request timeout in milliseconds | ```sema ;; Custom headers and timeout (http/get "https://api.example.com/data" {:headers {"Authorization" "Bearer tok_abc123" "Accept" "application/json"} :timeout 5000}) ``` ### `http/get` ``` (http/get url) (http/get url opts) ``` Make an HTTP GET request. * **url** — string, the request URL * **opts** — optional map with `:headers` and/or `:timeout` ```sema ;; Simple GET (http/get "https://httpbin.org/get") ;; GET with custom headers (http/get "https://api.example.com/users" {:headers {:authorization "Bearer my-token"}}) ``` ### `http/post` ``` (http/post url body) (http/post url body opts) ``` Make an HTTP POST request. * **url** — string, the request URL * **body** — request body: a string (sent as-is) or a map (auto-encoded as JSON with `Content-Type: application/json`) * **opts** — optional map with `:headers` and/or `:timeout` ```sema ;; POST with a map body (auto-JSON-encoded) (http/post "https://httpbin.org/post" {:name "Ada" :age 36}) ;; POST with string body and custom headers (http/post "https://api.example.com/webhook" "raw payload" {:headers {"Content-Type" "text/plain"}}) ;; POST with JSON body and auth (http/post "https://api.example.com/users" {:name "Ada" :role "admin"} {:headers {"Authorization" "Bearer tok_abc123"} :timeout 10000}) ``` ### `http/put` ``` (http/put url body) (http/put url body opts) ``` Make an HTTP PUT request. Behaves identically to `http/post` — map bodies are auto-JSON-encoded. * **url** — string, the request URL * **body** — request body (string or map) * **opts** — optional map with `:headers` and/or `:timeout` ```sema (http/put "https://api.example.com/users/42" {:name "Ada Lovelace" :role "admin"}) ``` ### `http/delete` ``` (http/delete url) (http/delete url opts) ``` Make an HTTP DELETE request. * **url** — string, the request URL * **opts** — optional map with `:headers` and/or `:timeout` ```sema (http/delete "https://api.example.com/users/42" {:headers {"Authorization" "Bearer tok_abc123"}}) ``` ### `http/request` ``` (http/request method url) (http/request method url opts) (http/request method url opts body) ``` Make an HTTP request with any method. Use this for methods not covered by the convenience functions (e.g., `PATCH`, `HEAD`). * **method** — string, HTTP method (case-insensitive, converted to uppercase). Supported: `GET`, `POST`, `PUT`, `DELETE`, `PATCH`, `HEAD` * **url** — string, the request URL * **opts** — optional map with `:headers` and/or `:timeout` * **body** — optional request body (string or map) ```sema ;; PATCH request (http/request "PATCH" "https://api.example.com/users/42" {:headers {"Content-Type" "application/json"}} {:name "Updated Name"}) ;; HEAD request (body will be empty) (define resp (http/request "HEAD" "https://example.com")) (:status resp) ; => 200 (:body resp) ; => "" ``` ### Error Handling Network errors (DNS failure, connection refused, timeout) throw a `SemaError::Io` error. Use `try`/`catch` to handle them: ```sema ;; Handle network errors (try (http/get "https://unreachable.invalid") (catch e (println "Request failed:" e))) ;; Check status codes (define resp (http/get "https://api.example.com/data")) (cond ((= (:status resp) 200) (json/decode (:body resp))) ((= (:status resp) 404) (error "Not found")) ((>= (:status resp) 500) (error "Server error")) (else (error (format "Unexpected status: ~a" (:status resp))))) ;; Timeout handling (try (http/get "https://slow-api.example.com/data" {:timeout 3000}) (catch e (println "Request timed out or failed:" e))) ``` ### Common Patterns #### GET + JSON Decode Pipeline ```sema ;; Fetch JSON data and extract fields (define data (-> (http/get "https://api.example.com/users/1") (:body) (json/decode))) (:name data) ; => "Ada" (:email data) ; => "ada@example.com" ``` #### POST with JSON Body and Auth Headers ```sema (define resp (http/post "https://api.example.com/posts" {:title "Hello World" :body "Content here"} {:headers {"Authorization" "Bearer tok_abc123" "X-Request-Id" "req-001"}})) (when (= (:status resp) 201) (println "Created:" (:body resp))) ``` #### Paginated API Requests ```sema (define (fetch-all-pages base-url) (let loop ((page 1) (results '())) (define resp (http/get (format "~a?page=~a" base-url page))) (define data (json/decode (:body resp))) (define items (:items data)) (if (empty? items) results (loop (+ page 1) (append results items))))) ``` *** ## JSON Functions for encoding Sema values to JSON strings and decoding JSON strings back into Sema values. ### Type Mapping #### Encoding (Sema → JSON) | Sema Type | JSON Type | Notes | |-------------|-----------|--------------------------------------------| | `int` | number | `42` → `42` | | `float` | number | `3.14` → `3.14`. NaN/Infinity cause errors | | `string` | string | `"hello"` → `"hello"` | | `keyword` | string | `:name` → `"name"` | | `symbol` | string | `'foo` → `"foo"` | | `#t` / `#f` | boolean | `#t` → `true`, `#f` → `false` | | `nil` | null | `nil` → `null` | | list | array | `'(1 2 3)` → `[1, 2, 3]` | | vector | array | `[1 2 3]` → `[1, 2, 3]` | | map | object | `{:a 1}` → `{"a": 1}` | | hashmap | object | Same as map | | function | *error* | Cannot encode functions as JSON | | record | *error* | Cannot encode records as JSON | #### Decoding (JSON → Sema) | JSON Type | Sema Type | Notes | |-----------|-----------|-------------------------------------------------| | number | int/float | Integers decode as `int`, decimals as `float` | | string | string | `"hello"` → `"hello"` | | boolean | bool | `true` → `#t`, `false` → `#f` | | null | nil | `null` → `nil` | | array | list | `[1, 2]` → `(1 2)` | | object | map | Keys become keywords: `{"a": 1}` → `{:a 1}` | ### `json/encode` ``` (json/encode value) → string ``` Encode a Sema value as a compact JSON string. Uses **strict** conversion — errors on values that cannot be represented in JSON (functions, records, NaN, Infinity). * **value** — any JSON-encodable Sema value ```sema (json/encode 42) ; => "42" (json/encode "hello") ; => "\"hello\"" (json/encode #t) ; => "true" (json/encode nil) ; => "null" (json/encode '(1 2 3)) ; => "[1,2,3]" (json/encode [1 2 3]) ; => "[1,2,3]" (json/encode {:name "Ada" :age 36}) ; => "{\"age\":36,\"name\":\"Ada\"}" ``` Encoding errors: ```sema ;; NaN and Infinity cannot be represented in JSON (json/encode (/ 0.0 0.0)) ; Error: cannot encode NaN/Infinity as JSON ;; Functions cannot be encoded (json/encode println) ; Error: cannot encode native-fn as JSON ``` ### `json/encode-pretty` ``` (json/encode-pretty value) → string ``` Encode a Sema value as a pretty-printed JSON string with 2-space indentation. Same strict conversion rules as `json/encode`. * **value** — any JSON-encodable Sema value ```sema (json/encode-pretty {:name "Ada" :scores [95 87 92]}) ;; => ;; { ;; "name": "Ada", ;; "scores": [ ;; 95, ;; 87, ;; 92 ;; ] ;; } ``` ### `json/decode` ``` (json/decode json-string) → value ``` Decode a JSON string into a Sema value. JSON objects become maps with keyword keys, arrays become lists. See the [type mapping table](#decoding-json-sema) for full details. * **json-string** — a string containing valid JSON ```sema (json/decode "42") ; => 42 (json/decode "3.14") ; => 3.14 (json/decode "\"hello\"") ; => "hello" (json/decode "true") ; => #t (json/decode "null") ; => nil (json/decode "[1, 2, 3]") ; => (1 2 3) (json/decode "{\"name\": \"Ada\"}") ; => {:name "Ada"} ``` Decoding errors: ```sema ;; Invalid JSON throws an error (json/decode "not json") ; Error: json/decode: expected value at line 1 column 1 ;; Argument must be a string (json/decode 42) ; Error: type error: expected string, got int ``` ### JSON Roundtrips Values that survive an encode → decode roundtrip preserve their structure, though some types are normalized: ```sema ;; Vectors become lists after roundtrip (json/decode (json/encode [1 2 3])) ; => (1 2 3) ;; Keywords in maps are preserved (json/decode (json/encode {:a 1 :b 2})) ; => {:a 1 :b 2} ;; Nested structures work (define data {:users [{:name "Ada"} {:name "Bob"}] :count 2 :active #t}) (define roundtripped (json/decode (json/encode data))) (:count roundtripped) ; => 2 (:active roundtripped) ; => #t ``` ### Error Handling JSON encoding and decoding errors can be caught with `try`/`catch`: ```sema ;; Catch encoding errors (try (json/encode (/ 0.0 0.0)) (catch e (println "Encode failed:" e))) ;; Catch decoding errors (try (json/decode "invalid json {{{") (catch e (println "Decode failed:" e))) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/web-server.md' --- # Web Server Sema includes a built-in HTTP server powered by [axum](https://github.com/tokio-rs/axum), with data-driven routing, middleware as function composition, SSE streaming, and WebSocket support. The server runs on a background thread with a Tokio runtime while keeping all Sema evaluation single-threaded — the same model as Node.js. ## Quick Start ```sema (define (handler req) (http/ok {:message "Hello from Sema!"})) (http/serve handler {:port 3000}) ``` ```bash $ curl http://localhost:3000 {"message":"Hello from Sema!"} ``` ## Serving ### `http/serve` Start an HTTP server. Takes a handler function and an optional options map. The handler receives a request map and returns a response map. This function blocks — it becomes the server's run loop. ```sema (http/serve handler) (http/serve handler {:port 3000}) (http/serve handler {:port 8080 :host "127.0.0.1"}) ``` | Option | Default | Description | | ------- | ----------- | ------------------ | | `:port` | `3000` | TCP port to bind | | `:host` | `"0.0.0.0"` | Address to bind to | The handler is any function `(request-map -> response-map)`. This can be a plain function, a router, or a middleware-wrapped stack. ## Routing ### `http/router` Create a handler function from a list of route definitions. Each route is a vector of `[method pattern handler]`. ```sema (define routes [[:get "/" handle-home] [:get "/users/:id" handle-user] [:post "/users" handle-create] [:any "/echo" handle-echo]]) (define app (http/router routes)) (http/serve app {:port 3000}) ``` Supported methods: `:get`, `:post`, `:put`, `:patch`, `:delete`, `:any` (matches all methods), `:ws` (WebSocket upgrade), and `:static` (static file directory). Routes are matched top-to-bottom — first match wins. Unmatched routes return 404. ### Path Parameters Use `:param` syntax to capture path segments. Extracted values appear in the request's `:params` map. ```sema ;; Route: [:get "/users/:id" handle-user] ;; Request: GET /users/42 (define (handle-user req) (let ((id (:id (:params req)))) (http/ok {:user-id id}))) ; => {"user-id":"42"} ``` Multiple parameters work as expected: ```sema [:get "/users/:uid/posts/:pid" handler] ;; GET /users/1/posts/99 → {:uid "1" :pid "99"} ``` ### Wildcard Routes Use `*` to capture the rest of the path. ```sema [:get "/files/*" handle-files] ;; GET /files/docs/readme.md → {:* "docs/readme.md"} ``` ## Request Map Every handler receives a request map with the following fields: ```sema {:method :get ; HTTP method as keyword :path "/users/42" ; Request path :headers {"content-type" "application/json" ...} ; Headers (string keys) :query {:search "term" :page "1"} ; Query params (keyword keys) :params {:id "42"} ; Route params (keyword keys) :body "{\"name\": \"Ada\"}" ; Raw body string :json {:name "Ada"}} ; Parsed JSON body (if applicable) ``` The `:json` field is automatically populated when the request has `Content-Type: application/json`. > **Request body limit.** Request bodies are capped at **16 MiB**. A larger body is rejected with `413 Payload Too Large` instead of being buffered into memory, so a client can't exhaust the server's memory with an oversized upload. ### Accessing Request Data ```sema ;; Method (:method req) ; => :get ;; Path (:path req) ; => "/users/42" ;; A specific header (get (:headers req) "authorization") ; => "Bearer ..." ;; Query parameter (:page (:query req)) ; => "2" ;; Route parameter (:id (:params req)) ; => "42" ;; JSON body field (:name (:json req)) ; => "Ada" ``` ## Response Map Handlers return a response map with `:status`, `:headers`, and `:body`: ```sema {:status 200 :headers {"content-type" "application/json"} :body "{\"message\": \"ok\"}"} ``` You can construct these by hand, but the response helpers below are more convenient. ## Response Helpers ### `http/ok` Return 200 with a JSON-encoded body. ```sema (pprint (http/ok {:message "success"})) ; => {:body "{"message":"success"}" ; :headers {"content-type" "application/json"} ; :status 200} (pprint (http/ok [1 2 3])) ; => {:body "[1,2,3]" :headers {"content-type" "application/json"} :status 200} ``` ### `http/created` Return 201 with a JSON-encoded body. ```sema (http/created {:id 42 :name "Ada"}) ``` ### `http/no-content` Return 204 with an empty body. ```sema (http/no-content) ``` ### `http/not-found` Return 404 with a JSON-encoded body. ```sema (http/not-found {:error "User not found"}) ``` ### `http/error` Return a custom status code with a JSON-encoded body. ```sema (http/error 422 {:errors ["Invalid email" "Name required"]}) (http/error 503 {:error "Service unavailable"}) ``` ### `http/redirect` Return a 302 redirect to a URL. ```sema (http/redirect "https://example.com/login") ``` ### `http/html` Return 200 with `Content-Type: text/html`. ```sema (http/html "

Hello

Welcome to Sema.

") ``` ### `http/text` Return 200 with `Content-Type: text/plain`. ```sema (http/text "OK") ``` ### `http/file` Return a file from disk with automatic MIME type detection. The file is read on the I/O thread (not the evaluator), so it handles binary files efficiently. ```sema (http/file "public/index.html") (http/file "data/report.pdf" "application/pdf") ; explicit content type ``` The path is resolved relative to the current working directory. If the file doesn't exist, an error is raised. The MIME type is guessed from the file extension (e.g. `.html` → `text/html`, `.css` → `text/css`, `.js` → `application/javascript`). ## Static File Serving ### `:static` Routes Serve an entire directory of static files using the `:static` route type in `http/router`. Files are served with automatic MIME types, cache headers, and path traversal protection. ```sema (define routes [[:static "/assets" "./public"] [:get "/*" handle-spa]]) (http/serve (http/router routes) {:port 3000}) ``` ```bash $ curl http://localhost:3000/assets/style.css body { color: red; } $ curl -I http://localhost:3000/assets/style.css Content-Type: text/css Cache-Control: public, max-age=3600 ``` The `:static` route takes a URL prefix and a directory path. Requests matching the prefix are mapped to files in the directory: * `GET /assets/style.css` → reads `./public/style.css` * `GET /assets/js/app.js` → reads `./public/js/app.js` * `GET /assets/` → reads `./public/index.html` (directory index) **Fallthrough**: If a file doesn't exist, the route does *not* match — the router continues to the next route. This enables SPA (single-page application) patterns where a catch-all route serves `index.html` for client-side routing: ```sema (define routes [[:static "/assets" "./dist/assets"] [:get "/*" (fn (_) (http/file "./dist/index.html"))]]) (http/serve (http/router routes) {:port 3000}) ``` **Security**: Path traversal attempts (e.g. `../etc/passwd`) are rejected with a 400 response. Only GET and HEAD methods are accepted. ## Middleware Middleware in Sema is just function composition — a function that takes a handler and returns a new handler. No special framework needed. ### Writing Middleware ```sema ;; Logging middleware (define (with-logging handler) (fn (req) (let ((resp (handler req))) (println (:method req) (:path req) "->" (:status resp)) resp))) ``` ```sema ;; CORS middleware (define (with-cors handler) (fn (req) (let ((resp (handler req))) (assoc resp :headers (merge (or (:headers resp) {}) {"access-control-allow-origin" "*" "access-control-allow-methods" "GET, POST, PUT, DELETE"}))))) ``` ```sema ;; Auth middleware (define (with-auth handler) (fn (req) (let ((token (get (:headers req) "authorization"))) (if token (handler req) (http/error 401 {:error "Unauthorized"}))))) ``` ### Composing Middleware Stack middleware by nesting function calls. The outermost middleware runs first. ```sema (define app (with-logging (with-cors (with-auth (http/router routes))))) (http/serve app {:port 3000}) ``` Or use the threading macro for a cleaner pipeline: ```sema (define app (-> (http/router routes) with-auth with-cors with-logging)) ``` ## SSE Streaming ### `http/stream` Return a Server-Sent Events stream. Takes a handler function that receives a `send` callback. ```sema (define (handle-events req) (http/stream (fn (send) (send "connected") (sleep 1000) (send "update 1") (sleep 1000) (send "update 2")))) ``` The stream stays open as long as the handler is running. When the handler returns, the stream closes. ```sema ;; Route it like any other handler (define routes [[:get "/events" handle-events]]) ``` ```bash $ curl -N http://localhost:3000/events data: connected data: update 1 data: update 2 ``` ### Streaming LLM Responses SSE is particularly useful for streaming LLM completions to the browser: ```sema (define (handle-chat req) (http/stream (fn (send) (let ((prompt (:prompt (:json req)))) ;; Stream each token as an SSE event (llm/stream prompt (fn (token) (send token))))))) ``` ## WebSocket ### `http/websocket` Handle bidirectional WebSocket connections. Takes a handler function that receives a connection map with `:send`, `:recv`, and `:close` functions. ```sema (define (handle-ws conn) (let ((msg ((:recv conn)))) (when msg ((:send conn) (string/append "echo: " msg)) (handle-ws conn)))) ``` The connection map: | Key | Description | | -------- | -------------------------------------------------------- | | `:send` | `(send message)` — Send a string to the client | | `:recv` | `(recv)` — Block until a message arrives, `nil` on close | | `:close` | `(close)` — Close the connection | ### WebSocket Routes Use the `:ws` method in the router: ```sema (define routes [[:get "/api/status" handle-status] [:ws "/ws/chat" handle-ws]]) (http/serve (http/router routes) {:port 3000}) ``` ### Chat Room Example ```sema (define clients (atom '())) (define (broadcast msg) (for-each (fn (send) (send msg)) @clients)) (define (handle-ws conn) ;; Add this client's send function to the list (swap! clients (fn (lst) (cons (:send conn) lst))) ;; Read loop (let loop ((msg ((:recv conn)))) (when msg (broadcast msg) (loop ((:recv conn)))))) (define routes [[:ws "/chat" handle-ws]]) (http/serve (http/router routes) {:port 3000}) ``` ## Complete Examples ### REST API A JSON API with CRUD operations, middleware, and error handling. ```sema ;; In-memory data store (define db (atom {})) (define next-id (atom 0)) (define (gen-id) (swap! next-id (fn (n) (+ n 1))) @next-id) ;; Handlers (define (list-users _) (http/ok (vals @db))) (define (get-user req) (let ((id (:id (:params req))) (user (get @db id))) (if user (http/ok user) (http/not-found {:error "User not found"})))) (define (create-user req) (let ((data (:json req)) (id (str (gen-id))) (user (assoc data :id id))) (swap! db (fn (d) (assoc d id user))) (http/created user))) (define (delete-user req) (let ((id (:id (:params req)))) (swap! db (fn (d) (dissoc d id))) (http/no-content))) ;; Middleware (define (with-json-errors handler) (fn (req) (let ((resp (handler req))) (if (map? resp) resp (http/error 500 {:error "Internal server error"}))))) (define (with-cors handler) (fn (req) (let ((resp (handler req))) (assoc resp :headers (merge (or (:headers resp) {}) {"access-control-allow-origin" "*" "access-control-allow-methods" "GET, POST, DELETE"}))))) ;; Routes (define routes [[:get "/users" list-users] [:get "/users/:id" get-user] [:post "/users" create-user] [:delete "/users/:id" delete-user]]) ;; Start (define app (-> (http/router routes) with-json-errors with-cors)) (http/serve app {:port 3000}) ``` ### LLM-Powered API An API endpoint that uses Sema's built-in LLM primitives to generate responses. ```sema (define (handle-summarize req) (let ((text (:text (:json req)))) (if text (http/ok {:summary (llm/complete (str "Summarize this:\n\n" text))}) (http/error 400 {:error "Missing 'text' field"})))) (define (handle-extract req) (let ((text (:text (:json req)))) ;; llm/extract takes the schema first, then the text. (http/ok (llm/extract {:name "string" :date "string" :amount "number"} text)))) (define routes [[:post "/summarize" handle-summarize] [:post "/extract" handle-extract] [:get "/health" (fn (_) (http/ok {:status "up"}))]]) (http/serve (http/router routes) {:port 3000}) ``` ### HTML Application Serve dynamic HTML pages. ```sema (define (page title body) (http/html (str "" title "" "" "" body ""))) (define (handle-home _) (page "Home" "

Welcome

Built with Sema.

")) (define (handle-greet req) (let ((name (or (:name (:params req)) "world"))) (page "Greeting" (str "

Hello, " name "!

")))) (define routes [[:get "/" handle-home] [:get "/greet/:name" handle-greet]]) (http/serve (http/router routes) {:port 3000}) ``` ### SPA with Static Assets Serve a single-page application with static assets and a catch-all for client-side routing. ```sema (define routes [[:get "/api/health" (fn (_) (http/ok {:status "up"}))] [:static "/assets" "./dist/assets"] [:get "/*" (fn (_) (http/file "./dist/index.html"))]]) (http/serve (http/router routes) {:port 3000}) ``` CSS, JS, and images under `./dist/assets/` are served with correct MIME types and cache headers. All other GET requests serve `index.html` for client-side routing. ## Architecture Notes * **Single-threaded evaluation**: All Sema code runs on the main thread. HTTP I/O runs on a background Tokio runtime. Requests are bridged via channels. * **Concurrency model**: Requests are processed sequentially by the evaluator. For LLM-backed services (where each request takes 1–5s of LLM latency), this is fine. For high-throughput APIs, consider a reverse proxy. * **Graceful shutdown**: Ctrl+C breaks the channel and the server exits cleanly. * **Sandbox-aware**: `http/serve` requires the `NETWORK` capability when running in sandbox mode. ## See Also * [HTTP Client & JSON](./http-json) — outbound HTTP requests and JSON encoding/decoding * [LLM Primitives](/docs/llm/) — building LLM-powered endpoints * [Key-Value Store](./kv-store) — persistent storage for server state --- --- url: 'https://sema-lang.com/docs/stdlib/system.md' --- # System ::: tip Sandbox capability Several `sys/*` functions are gated by sandbox capabilities: environment access (`env`, `sys/env-all`, `sys/set-env`) requires `ENV_READ` or `ENV_WRITE`, and process operations (`shell`, `sys/which`, signal hooks, `exit`) require `PROCESS`. They run unrestricted under `sema` by default but are restricted in sandboxed environments (e.g., the WASM playground). A sandboxed script that attempts to use them without the capability will receive an error. ::: ## Environment Variables ### `env` Get the value of an environment variable. Returns `nil` if not set. ```sema (env "HOME") ; => "/Users/ada" (env "PATH") ; => "/usr/bin:/bin:..." (env "MISSING") ; => nil ``` ### `sys/env-all` Return all environment variables as a map. ```sema (sys/env-all) ; => {:HOME "/Users/ada" :PATH "..." ...} ``` ### `sys/set-env` Set an environment variable for the current process. ```sema (sys/set-env "KEY" "value") (env "KEY") ; => "value" ``` ## System Information ### `sys/args` Return the command-line arguments as a list. ```sema (sys/args) ; => ("sema" "script.sema" "--flag") ``` ### `sys/cwd` Return the current working directory. ```sema (sys/cwd) ; => "/current/dir" ``` ### `sys/platform` Return the platform name. ```sema (sys/platform) ; => "macos" / "linux" / "windows" ``` ### `sys/os` Return the operating system name. ```sema (sys/os) ; => "macos" ``` ### `sys/arch` Return the CPU architecture. ```sema (sys/arch) ; => "aarch64" / "x86_64" ``` ## Process Information ### `sys/pid` Return the current process ID. ```sema (sys/pid) ; => 12345 ``` ### `sys/tty` Return the TTY device path, or `nil` if not running in a terminal. ```sema (sys/tty) ; => "/dev/ttys003" or nil ``` ### `sys/which` Find the full path to an executable, or `nil` if not found. ```sema (sys/which "cargo") ; => "/Users/ada/.cargo/bin/cargo" (sys/which "nonexistent") ; => nil ``` ### `sys/elapsed` Return nanoseconds elapsed since the process started. ```sema (sys/elapsed) ; => 482937100 ``` ## Session Information ### `sys/interactive?` Test if stdin is a TTY (i.e., running interactively). ```sema (sys/interactive?) ; => #t in REPL, #f in scripts ``` ### `sys/hostname` Return the system hostname. ```sema (sys/hostname) ; => "my-machine" ``` ### `sys/user` Return the current username. ```sema (sys/user) ; => "ada" ``` ## Directory Paths ### `sys/home-dir` Return the user's home directory. ```sema (sys/home-dir) ; => "/Users/ada" ``` ### `sys/temp-dir` Return the system temporary directory. ```sema (sys/temp-dir) ; => "/tmp" ``` ## Terminal ### `sys/term-size` Return the terminal's current size as a map `{:rows N :cols M}`, or `nil` when no controlling TTY is attached (e.g., when stdout is redirected to a file). Queries `ioctl(TIOCGWINSZ)` against stdout, then stderr, then stdin. ```sema (sys/term-size) ;; => {:rows 47 :cols 180} ``` Pair with `sys/on-signal :winch` to redraw on terminal resize: ```sema (define (redraw size) ;; ... layout for size ... ) (redraw (sys/term-size)) (sys/on-signal :winch (fn () (redraw (sys/term-size)))) ``` ::: warning Unix only Returns `nil` on Windows and any non-Unix target. ::: ## Signals Async-signal-safe handlers backed by atomic flags. Signal handlers themselves only flip a flag — your callbacks run later, in the main thread, when you call `sys/check-signals`. This keeps the single-threaded `Rc`-based runtime intact. ::: warning Unix only Signal hooks are no-ops on Windows. ::: ### `sys/on-signal` Register a callback for a signal. Multiple callbacks per signal are supported; they fire in registration order. Supported signals: | Keyword | Signal | Typical use | |----------|------------|--------------------------------------| | `:winch` | `SIGWINCH` | Terminal resize — redraw the UI | | `:int` | `SIGINT` | Ctrl-C — clean shutdown | | `:term` | `SIGTERM` | Termination request — clean shutdown | ```sema (sys/on-signal :int (fn () (println "interrupted, cleaning up") (exit 0))) ``` ### `sys/check-signals` Dispatch any pending signal callbacks. Call this from your event loop (typically right after `io/read-key` / `io/read-key-timeout` returns) so handlers run in a predictable place rather than asynchronously interrupting Sema code. ```sema (let loop () (sys/check-signals) (let ((key (io/read-key-timeout 50))) (when key (handle-key key)) (loop))) ``` If no signals are pending, this is essentially free — it just checks three atomic booleans. ## Shell & Process Control ### `shell` Run a shell command. Returns a map with `:stdout`, `:stderr`, and `:exit-code`. A single-string command runs through the system shell (`sh -c` / `cmd /C`); passing extra arguments runs the command directly, without shell parsing. Requires the `PROCESS` capability. ```sema (shell "echo hello") ; => {:stdout "hello\n" :stderr "" :exit-code 0} (:stdout (shell "ls -la")) ; => "total 42\n..." (:exit-code (shell "false")) ; => 1 ``` ### `exit` Exit the process with a given status code. ```sema (exit 0) ; exit successfully (exit 1) ; exit with error ``` --- --- url: 'https://sema-lang.com/docs/stdlib/sqlite.md' --- # SQLite Sema includes built-in SQLite support via the `db/*` functions, backed by [rusqlite](https://docs.rs/rusqlite). Databases are opened by name (a logical handle) and can be either file-backed or in-memory. WAL mode and foreign keys are enabled by default. ::: tip `db/open` and `db/open-memory` require filesystem write capabilities (they are gated by `FS_WRITE`). ::: ## Opening & Closing ### `db/open` Open (or create) a SQLite database file. Returns a handle string for use in subsequent calls. Enables WAL journal mode and foreign keys automatically. ```sema ;; Open with path as handle (db/open "mydata.db") ; => "mydata.db" ;; Open with a named handle (db/open "mydb" "/path/to/data.db") ; => "mydb" ``` ### `db/open-memory` Open an in-memory SQLite database. Useful for tests, temporary data, and caching. ```sema (db/open-memory) ; handle is ":memory:" (db/open-memory "testdb") ; handle is "testdb" ``` ### `db/close` Close a database connection and release the handle. Returns `nil`. ```sema (db/close "mydb") ``` ## Executing SQL ### `db/exec` Execute a SQL statement that modifies data (INSERT, UPDATE, DELETE, CREATE TABLE, etc.). Returns the number of affected rows as an integer. Supports parameterized queries. ```sema (db/exec "mydb" "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)") ; => 0 (db/exec "mydb" "INSERT INTO users (name, age) VALUES (?, ?)" "Alice" 30) ; => 1 (db/exec "mydb" "UPDATE users SET age = ? WHERE name = ?" 31 "Alice") ; => 1 ``` ### `db/exec-batch` Execute multiple SQL statements at once. **Static SQL only** — there is no parameter binding, so the entire string is run verbatim. Useful for schema setup and migrations. Returns `nil`. ::: danger SQL injection Never interpolate user-controlled input into the SQL string passed to `db/exec-batch` — doing so is a SQL injection vulnerability. For any value that comes from outside the program, use the parameterized [`db/exec`](#db-exec) (with `?` placeholders) instead, one statement at a time. ::: ```sema (db/exec-batch "mydb" " CREATE TABLE posts (id INTEGER PRIMARY KEY, user_id INTEGER, title TEXT); CREATE TABLE tags (id INTEGER PRIMARY KEY, name TEXT); CREATE INDEX idx_posts_user ON posts(user_id); ") ``` ## Querying ### `db/query` Execute a SELECT query and return all results as a list of maps. Column names become keyword keys. Supports parameterized queries. ```sema (db/query "mydb" "SELECT * FROM users") ; => ({:id 1 :name "Alice" :age 31}) (db/query "mydb" "SELECT name, age FROM users WHERE age > ?" 25) ; => ({:age 31 :name "Alice"}) ``` ### `db/query-one` Execute a SELECT query and return only the first row as a map, or `nil` if no rows match. ```sema (db/query-one "mydb" "SELECT * FROM users WHERE name = ?" "Alice") ; => {:id 1 :name "Alice" :age 31} (db/query-one "mydb" "SELECT * FROM users WHERE name = ?" "Nobody") ; => nil ``` ## Utility ### `db/last-insert-id` Return the rowid of the last inserted row. ```sema (db/exec "mydb" "INSERT INTO users (name, age) VALUES (?, ?)" "Bob" 25) (db/last-insert-id "mydb") ; => 2 ``` ### `db/tables` List all user-created tables in the database (excludes internal SQLite tables). Returns a list of strings. ```sema (db/tables "mydb") ; => ("posts" "tags" "users") ``` ## Type Mapping | Sema type | SQLite type | Notes | | ----------- | ----------- | ---------------------------- | | `nil` | NULL | | | Boolean | INTEGER | `#t` = 1, `#f` = 0 | | Integer | INTEGER | | | Float | REAL | | | String | TEXT | | | Bytevector | BLOB | | | Other | TEXT | Converted via `to-string` | SQLite values map back as: NULL to `nil`, INTEGER to int, REAL to float, TEXT to string, BLOB to bytevector. ## Examples ### Basic CRUD ```sema (db/open-memory "app") (db/exec "app" "CREATE TABLE todos (id INTEGER PRIMARY KEY, task TEXT, done INTEGER DEFAULT 0)") ;; Insert (db/exec "app" "INSERT INTO todos (task) VALUES (?)" "Buy groceries") (db/exec "app" "INSERT INTO todos (task) VALUES (?)" "Write docs") ;; Query (db/query "app" "SELECT * FROM todos WHERE done = 0") ; => ({:done 0 :id 1 :task "Buy groceries"} {:done 0 :id 2 :task "Write docs"}) ;; Update (db/exec "app" "UPDATE todos SET done = 1 WHERE id = ?" 1) ;; Delete (db/exec "app" "DELETE FROM todos WHERE done = 1") (db/close "app") ``` ### Using with LLM extraction ```sema (db/open-memory "contacts") (db/exec "contacts" "CREATE TABLE people (name TEXT, email TEXT, company TEXT)") ;; Extract structured data from text and insert directly (define info (llm/extract {:name {:type :string} :email {:type :string} :company {:type :string}} "Contact Alice at alice@acme.com, she works at Acme Corp")) (db/exec "contacts" "INSERT INTO people (name, email, company) VALUES (?, ?, ?)" (:name info) (:email info) (:company info)) (db/query "contacts" "SELECT * FROM people") ; => ({:company "Acme Corp" :email "alice@acme.com" :name "Alice"}) (db/close "contacts") ``` --- --- url: 'https://sema-lang.com/docs/stdlib/kv-store.md' --- # Key-Value Store Sema includes a persistent, JSON-backed key-value store for storing structured data across sessions. Data is automatically flushed to disk on every write. ::: tip `kv/open`, `kv/set`, and `kv/delete` require filesystem write capabilities (they are gated by `FS_WRITE`). ::: ## How It Works **File path** — You control where data is stored via the second argument to `kv/open`. Relative paths resolve from the current working directory. The file is **not created until the first write** (`kv/set` or `kv/delete`). **Store names** — The first argument to `kv/open` is a logical handle used to reference the store in subsequent calls. Store names are scoped to the current process. Opening the same name twice replaces the previous handle. **Flushing** — Every `kv/set` and `kv/delete` rewrites the entire backing file immediately. `kv/close` also flushes. There is no separate manual flush — persistence is automatic. **JSON format** — The backing file is pretty-printed JSON, so you can inspect or edit it with any text editor. If an existing file contains malformed JSON, `kv/open` raises an error. **Supported value types:** | Sema | JSON | Notes | |------|------|-------| | `nil` | `null` | | | `#t` / `#f` | `true` / `false` | | | Integers | number | | | Floats | number | `NaN` and `Infinity` become `null` | | Strings | string | | | Lists | array | Recursive | | Maps (keyword keys) | object | Keys become strings | **Performance** — Each write rewrites the whole file. This is ideal for small-to-medium stores (config, caches, counters). For large datasets or high-frequency writes, consider using `file/write` directly. ## Functions ### `kv/open` Open (or create) a named KV store backed by a JSON file. If the file exists, its contents are loaded. Returns the store name. ```sema (kv/open "config" "/path/to/config.json") ; => "config" (kv/open "cache" "cache.json") ; relative to CWD ``` If the file doesn't exist yet, no file is created — that happens on the first `kv/set`. ### `kv/get` Get a value by key. Returns `nil` if the key doesn't exist. ```sema (kv/get "config" "api-key") ; => "sk-..." or nil ``` ### `kv/set` Set a key-value pair. The value is serialized as JSON. Returns the value. Flushes to disk immediately. ```sema (kv/set "config" "api-key" "sk-...") (kv/set "config" "retries" 3) (kv/set "config" "tags" '("a" "b" "c")) (kv/set "config" "user" {:name "Alice" :role "admin"}) ``` ### `kv/delete` Delete a key. Returns `#t` if the key existed, `#f` otherwise. Flushes to disk immediately. ```sema (kv/delete "config" "api-key") ; => #t (kv/delete "config" "api-key") ; => #f (already deleted) ``` ### `kv/keys` List all keys in the store. Returns a list of strings. ```sema (kv/keys "config") ; => ("api-key" "retries" "tags") ``` ### `kv/close` Close a store, flushing data and freeing the handle. Returns `nil`. ```sema (kv/close "config") ``` Data is safe even without calling `kv/close` (every write already flushes), but closing frees memory and releases the store name. ## Examples ### Basic usage ```sema ;; Create a persistent store for caching API results (kv/open "cache" "api-cache.json") ;; Store some data (kv/set "cache" "user:123" {:name "Alice" :email "alice@example.com"}) (kv/set "cache" "user:456" {:name "Bob" :email "bob@example.com"}) ;; Retrieve it (kv/get "cache" "user:123") ; => {:email "alice@example.com" :name "Alice"} ;; List keys (kv/keys "cache") ; => ("user:123" "user:456") ;; Clean up (kv/delete "cache" "user:123") (kv/close "cache") ``` ### Application configuration with defaults ```sema (kv/open "config" "app-config.json") ;; Set defaults only if not already configured (when (nil? (kv/get "config" "theme")) (kv/set "config" "theme" "dark")) (when (nil? (kv/get "config" "max-retries")) (kv/set "config" "max-retries" 3)) ;; Use config values (def theme (kv/get "config" "theme")) (println (string/append "Using theme: " theme)) ``` On first run this creates `app-config.json` with defaults. On subsequent runs, existing values are preserved. ### Persistent run counter ```sema (kv/open "stats" "run-stats.json") ;; Increment run count across sessions (let ((runs (or (kv/get "stats" "run-count") 0))) (kv/set "stats" "run-count" (+ runs 1)) (kv/set "stats" "last-run" (time/format (time/now) "%Y-%m-%d %H:%M:%S"))) (println (string/append "Run #" (string (kv/get "stats" "run-count")))) (kv/close "stats") ``` ### Structured data with maps and lists ```sema (kv/open "contacts" "contacts.json") (kv/set "contacts" "alice" {:name "Alice" :email "alice@example.com" :tags '("admin" "dev")}) (kv/set "contacts" "bob" {:name "Bob" :email "bob@example.com" :tags '("dev")}) ;; Retrieve and destructure (def alice (kv/get "contacts" "alice")) (:name alice) ; => "Alice" (:tags alice) ; => ("admin" "dev") ;; List all contacts (for-each (fn (key) (println (:name (kv/get "contacts" key)))) (kv/keys "contacts")) (kv/close "contacts") ``` ## Tips * The backing file is human-readable JSON — you can inspect or hand-edit it between runs. * Store names are just logical handles. Choose descriptive names like `"config"`, `"cache"`, or `"sessions"`. * Use `kv/keys` with iteration for bulk operations like export or cleanup. * For write-heavy workloads on large datasets, consider writing JSON directly with `file/write` to avoid rewriting the entire file on each operation. --- --- url: 'https://sema-lang.com/docs/stdlib/serial.md' --- # Serial Ports Talk to microcontrollers, USB-CDC devices, and any UART over a host serial port. Wraps the cross-platform [`serialport`](https://crates.io/crates/serialport) crate. ::: warning Not available in WASM Serial ports require the host OS — this module is unavailable in the browser playground. ::: ::: tip Sandbox capability All `serial/*` functions require the `serial` capability. They are denied under `--sandbox=strict` and `--sandbox=all`. Allow with the default sandbox or explicitly opt in (see [CLI sandbox docs](../cli#sandbox)). ::: ## Connection Lifecycle ### `serial/list` List the available serial port device paths on the host. ```sema (serial/list) ;; macOS: ("/dev/tty.usbmodem1201" "/dev/tty.Bluetooth-Incoming-Port") ;; Linux: ("/dev/ttyUSB0" "/dev/ttyACM0") ``` ### `serial/open` ```sema (serial/open path baud) ; default 2000 ms read timeout (serial/open path baud timeout-ms) ``` Open a serial port and return an integer **handle** used by every other function in this module. Raises an error if the device is busy or doesn't exist; the message includes the path and baud rate as a hint. ```sema (define pico (serial/open "/dev/tty.usbmodem1201" 115200)) (define modem (serial/open "/dev/ttyUSB0" 9600 5000)) ; 5s read timeout ``` ### `serial/close` ```sema (serial/close handle) ``` Close the port and free the handle. Subsequent calls with that handle raise `invalid handle`. ## I/O ### `serial/write` ```sema (serial/write handle string) ``` Write a raw string to the port and flush. No newline appended — append `"\n"` yourself if your protocol expects it. ```sema (serial/write modem "AT\r\n") ``` ### `serial/read-line` ```sema (serial/read-line handle) → string ``` Read until `\n`, then trim trailing `\r` / `\n` and return the line. Blocks until either a newline arrives or the port's read timeout elapses (configured at `serial/open` time) — on timeout, raises an error. ```sema (serial/read-line pico) ; => "ready" ``` ### `serial/send` ```sema (serial/send handle command) → parsed-json | nil ``` Convenience for line-oriented JSON protocols (such as the [sema-bridge](https://github.com/HelgeSverre/sema/tree/main/examples) firmware that ships with the Pico examples). Writes `command + "\n"`, flushes, reads one line back, and parses it as JSON. Returns `nil` if the response line is empty. ```sema (serial/send pico "{\"cmd\":\"led-on\",\"pin\":25}") ;; => {:ok #t} (serial/send pico "{\"cmd\":\"adc-read\",\"pin\":26}") ;; => {:ok #t :value 2048} ``` ## Example: Pico 2 LED control ```sema (define pico (serial/open "/dev/tty.usbmodem1201" 115200)) (println "bridge:" (serial/read-line pico)) ; "ready" (define (pico-cmd cmd) (let ((resp (serial/send pico cmd))) (when (not (get resp :ok)) (error (format "pico error: ~a" (get resp :error)))) resp)) (pico-cmd "{\"cmd\":\"led-on\",\"pin\":25}") (sleep 500) (pico-cmd "{\"cmd\":\"led-off\",\"pin\":25}") (serial/close pico) ``` See `examples/pico-blink.sema`, `pico-piano.sema`, `pico-jukebox.sema`, `pico-midi.sema`, and `pico-show.sema` for full demos. --- --- url: 'https://sema-lang.com/docs/stdlib/regex.md' --- # Regex Regular expression functions for pattern matching, searching, replacement, and splitting. Sema uses the Rust [`regex`](https://docs.rs/regex) engine. ::: warning Rust regex limitations Rust regex intentionally does **not** support features that require backtracking: * No lookahead / lookbehind (`(?=...)`, `(?!...)`, `(?<=...)`, `(?`) If you need those, consider a multi-step approach using string functions. ::: ## Regex Literals: `#"..."` Normal strings require double-escaping backslashes (`"\\d+"`). Sema's regex literal syntax avoids this: ```sema (regex/match? "\\d+" "abc123") ; normal string — needs \\ (regex/match? #"\d+" "abc123") ; regex literal — cleaner ``` Inside `#"..."`, backslashes are literal (no escape processing). The only special case is `\"` to insert a quote character. ::: tip Prefer `#"..."` for regex patterns. It's easier to read and avoids escaping mistakes. ::: ## Matching ### `regex/match?` Test if a pattern matches anywhere in a string. Returns `#t` or `#f`. ```sema (regex/match? #"\d+" "abc123") ; => #t (regex/match? #"\d+" "no digits") ; => #f (regex/match? #"^\d+$" "abc123") ; => #f (anchored — must match entire string) (regex/match? #"^\d+$" "123") ; => #t ``` ### `regex/match` Match a pattern and return match details as a map, or `nil` if no match. **Signature:** `(regex/match pattern text) → map | nil` The returned map contains: | Key | Value | |-----|-------| | `:match` | The full matched substring | | `:groups` | List of capture groups (group 1, 2, …) | | `:start` | Start byte offset in the input | | `:end` | End byte offset in the input | ```sema (regex/match #"(\d+)-(\w+)" "item-42-foo") ; => {:match "42-foo" :groups ("42" "foo") :start 5 :end 11} (regex/match #"xyz" "abc") ; => nil ``` Optional capture groups that don't participate in the match become `nil`: ```sema (regex/match #"(\d+)(?:-(\d+))?" "42") ; => {:match "42" :groups ("42" nil) :start 0 :end 2} ``` ::: info Byte offsets `:start` and `:end` are byte offsets (UTF-8). For ASCII text they match character indices, but for non-ASCII they may differ. ::: ### `regex/find-all` Find all non-overlapping matches of a pattern. ```sema (regex/find-all #"\d+" "a1b2c3") ; => ("1" "2" "3") (regex/find-all #"[A-Z]" "Hello World") ; => ("H" "W") ``` ## Replacement ### `regex/replace` Replace the **first** match of a pattern. **Signature:** `(regex/replace pattern replacement text) → string` ```sema (regex/replace #"\d+" "X" "a1b2c3") ; => "aXb2c3" ``` Capture group references (`$1`, `$2`, …) work in the replacement string: ```sema (regex/replace #"(\d+)-(\w+)" "$2:$1" "item-42-foo") ; => "item-foo:42" ``` Named capture groups also work: ```sema (regex/replace #"(?P\d+)-(?P\w+)" "$word:$num" "item-42-foo") ; => "item-foo:42" ``` ### `regex/replace-all` Replace **all** matches of a pattern. ```sema (regex/replace-all #"\d" "X" "a1b2") ; => "aXbX" (regex/replace-all #"\s+" " " "a b c") ; => "a b c" ``` ## Splitting ### `regex/split` Split a string by a regex delimiter. ```sema (regex/split #"," "a,b,c") ; => ("a" "b" "c") (regex/split #"\s+" "hello world") ; => ("hello" "world") (regex/split #"[,;]" "a,b;c,d") ; => ("a" "b" "c" "d") ``` ## Supported Syntax Sema uses Rust regex syntax. Common constructs: | Pattern | Meaning | |---------|---------| | `.` | Any character (except newline by default) | | `\d`, `\w`, `\s` | Digit, word char, whitespace | | `\D`, `\W`, `\S` | Negated versions | | `+`, `*`, `?` | One+, zero+, optional | | `{m,n}` | Between m and n repetitions | | `^`, `$` | Start/end anchors | | `(...)` | Capture group | | `(?:...)` | Non-capturing group | | `(?P...)` | Named capture group | | `[abc]`, `[^abc]` | Character class | | `a\|b` | Alternation | See the [Rust regex docs](https://docs.rs/regex) for the full reference. ## Escaping Guide ### Regex literals vs normal strings | Intent | Normal string | Regex literal | |--------|---------------|---------------| | One or more digits | `"\\d+"` | `#"\d+"` | | A literal dot | `"\\."` | `#"\."` | | A backslash | `"\\\\\\\\"` | `#"\\"` | ### Matching a literal `"` in a regex literal Inside `#"..."`, use `\"`: ```sema (regex/match? #"\"[^\"]+\"" "say \"hello\"") ; => #t ``` ## Regex vs String Functions Prefer string functions when possible — they're simpler and faster: | Need | String function | Regex equivalent | |------|----------------|------------------| | Contains? | `string/contains?` | `regex/match?` | | Starts with? | `string/starts-with?` | `regex/match?` with `^` | | Simple split | `string/split` | `regex/split` | | Simple replace | `string/replace` | `regex/replace` | Use regex when you need character classes, repetition, alternation, or capture groups. ## Practical Examples ### Validate an identifier ```sema (define (identifier? s) (regex/match? #"^[A-Za-z_][A-Za-z0-9_]*$" s)) (identifier? "foo_1") ; => #t (identifier? "1foo") ; => #f ``` ### Extract a number from text ```sema (define (extract-first-int s) (let ((m (regex/match #"\d+" s))) (if (nil? m) nil (:match m)))) (extract-first-int "x=42; y=9") ; => "42" ``` ### Normalize whitespace ```sema (regex/replace-all #"\s+" " " "a b\n\nc\t\t d") ; => "a b c d" ``` ### Parse key-value pairs ```sema (define (parse-kv line) (let ((m (regex/match #"^(\w+)\s*=\s*(.+)$" line))) (if (nil? m) nil (let ((groups (:groups m))) {:key (first groups) :value (first (rest groups))})))) (parse-kv "name = Alice") ; => {:key "name" :value "Alice"} ``` ### Find all email-like strings ```sema (regex/find-all #"[\w.+-]+@[\w-]+\.[\w.]+" "Contact ada@example.com or bob@test.org") ; => ("ada@example.com" "bob@test.org") ``` ## Performance Notes * Each function call **compiles the regex pattern** internally * For occasional use, this is fine * For hot loops, consider using `regex/find-all` once instead of many `regex/match?` calls * Rust regex guarantees **linear-time** matching — no catastrophic backtracking --- --- url: 'https://sema-lang.com/docs/stdlib/crypto.md' --- # Crypto & Encoding UUID generation, Base64 encoding, and cryptographic hashing. ## UUID ### `uuid/v4` Generate a random UUID v4 string. **Signature:** `(uuid/v4) → string` ```sema (uuid/v4) ; => "550e8400-e29b-41d4-a716-446655440000" (varies) ``` Each call returns a new unique identifier: ```sema (equal? (uuid/v4) (uuid/v4)) ; => #f ``` ## Base64 Encoding Functions for Base64 encoding and decoding of strings and binary data. Uses the standard Base64 alphabet (RFC 4648). ### `base64/encode` Encode a string to Base64. **Signature:** `(base64/encode string) → string` ```sema (base64/encode "hello") ; => "aGVsbG8=" (base64/encode "") ; => "" ``` ### `base64/decode` Decode a Base64 string back to a UTF-8 string. Errors if the decoded bytes are not valid UTF-8. **Signature:** `(base64/decode base64-string) → string` ```sema (base64/decode "aGVsbG8=") ; => "hello" ``` ### `base64/encode-bytes` Encode a bytevector to Base64. **Signature:** `(base64/encode-bytes bytevector) → string` ```sema (base64/encode-bytes #u8(104 101 108 108 111)) ; => "aGVsbG8=" ``` ### `base64/decode-bytes` Decode a Base64 string to a bytevector. Unlike `base64/decode`, this does not require valid UTF-8. **Signature:** `(base64/decode-bytes base64-string) → bytevector` ```sema (base64/decode-bytes "aGVsbG8=") ; => #u8(104 101 108 108 111) ``` ### Use cases **Data URIs:** ```sema (string/append "data:image/png;base64," (base64/encode-bytes (file/read-bytes "icon.png"))) ``` **API authentication (Basic Auth):** ```sema (define auth-header (string/append "Basic " (base64/encode (string/append username ":" password)))) ``` ## Hashing Cryptographic hash functions that return hex-encoded strings. ::: warning Security note **MD5** is cryptographically broken — do not use it for passwords, signatures, or any security-sensitive purpose. Use `hash/sha256` or `hash/hmac-sha256` instead. MD5 is still fine for checksums and non-security uses (cache keys, deduplication). ::: ### `hash/sha256` Compute the SHA-256 hash of a string. Returns a 64-character hex string. **Signature:** `(hash/sha256 string) → string` ```sema (hash/sha256 "hello") ; => "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824" ``` ### `hash/md5` Compute the MD5 hash of a string. Returns a 32-character hex string. **Signature:** `(hash/md5 string) → string` ```sema (hash/md5 "hello") ; => "5d41402abc4b2a76b9719d911017c592" ``` ### `hash/hmac-sha256` Compute an HMAC-SHA256 message authentication code. Returns a 64-character hex string. **Signature:** `(hash/hmac-sha256 key message) → string` ```sema (hash/hmac-sha256 "secret-key" "message") ; => "hex-encoded-hmac..." ``` **Webhook verification example:** ```sema ;; Verify a webhook signature from a provider (define (verify-webhook payload secret signature) (equal? (hash/hmac-sha256 secret payload) signature)) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/datetime.md' --- # Date & Time All timestamps in Sema are **UTC Unix timestamps** — the number of seconds since January 1, 1970 00:00:00 UTC. Timestamps are floating-point numbers with millisecond fractional precision. ::: tip All `time/` functions operate in UTC. There is no timezone conversion support — if you need local time handling, compute the offset manually with `time/add`. ::: ## Current Time ### `time/now` Return the current time as a UTC Unix timestamp in seconds, with fractional milliseconds. ```sema (time/now) ; => 1707955200.123 ``` The integer part is seconds since the Unix epoch; the fractional part provides millisecond precision. ```sema (define now (time/now)) (println "Current timestamp: " now) ;; Extract just the seconds (truncate fractional part) (define whole-seconds (floor now)) ``` ### `time-ms` Return the current time as Unix milliseconds (integer). Defined in the system module but useful alongside datetime operations. ```sema (time-ms) ; => 1707955200123 ``` ## Formatting ### `time/format` Format a UTC Unix timestamp using a [strftime](#strftime-format-directives)-style format string. ```sema (time/format timestamp format-string) ; => string ``` ```sema (define ts 1736943000.0) ; 2025-01-15 12:10:00 UTC (time/format ts "%Y-%m-%d") ; => "2025-01-15" (time/format ts "%H:%M:%S") ; => "12:10:00" (time/format ts "%Y-%m-%d %H:%M:%S") ; => "2025-01-15 12:10:00" (time/format ts "%A, %B %d, %Y") ; => "Wednesday, January 15, 2025" (time/format ts "%F") ; => "2025-01-15" (shorthand for %Y-%m-%d) (time/format ts "%T") ; => "12:10:00" (shorthand for %H:%M:%S) ``` ## Parsing ### `time/parse` Parse a date string into a UTC Unix timestamp using a [strftime](#strftime-format-directives)-style format string. The input is treated as a **UTC naive datetime** — no timezone information is expected or applied. ```sema (time/parse date-string format-string) ; => float (UTC timestamp) ``` ```sema (time/parse "2025-01-15 12:10:00" "%Y-%m-%d %H:%M:%S") ; => 1736943000.0 (time/parse "2025-01-15 00:00:00" "%Y-%m-%d %H:%M:%S") ; => 1736899200.0 (time/parse "15/01/2025 14:30:00" "%d/%m/%Y %H:%M:%S") ; => 1736951400.0 ``` ::: info The format string must provide enough directives to fully specify a date and time. Parsing a date-only string like `"%Y-%m-%d"` without time components will fail — always include time directives (e.g., `%H:%M:%S`). ::: ::: tip The wall-clock time in the string is **always interpreted as UTC**, regardless of any offset present. `time/parse` does not apply timezone offsets — `"2025-01-15 12:10:00"` always yields the UTC timestamp for 12:10:00 UTC. To work with another timezone, convert the value to UTC yourself (subtract the offset) before parsing, then format/compute in UTC. ::: **Roundtrip** — formatting a timestamp and parsing it back yields the original value: ```sema (define ts 1700000000.0) (define formatted (time/format ts "%Y-%m-%d %H:%M:%S")) (define parsed (time/parse formatted "%Y-%m-%d %H:%M:%S")) (= parsed ts) ; => #t ``` ::: warning `time/parse` returns whole seconds — sub-second precision from the original timestamp is lost when roundtripping through format/parse. ::: ## Date Decomposition ### `time/date-parts` Decompose a UTC Unix timestamp into a map of date/time components. ```sema (time/date-parts timestamp) ; => map ``` ```sema (define ts 1736943000.0) ; 2025-01-15 12:10:00 UTC (define parts (time/date-parts ts)) (get parts :year) ; => 2025 (get parts :month) ; => 1 (get parts :day) ; => 15 (get parts :hour) ; => 12 (get parts :minute) ; => 10 (get parts :second) ; => 0 (get parts :weekday) ; => "Wednesday" ``` The returned map contains these keys: | Key | Type | Description | Example | |-----|------|-------------|---------| | `:year` | integer | Four-digit year | `2025` | | `:month` | integer | Month (1–12) | `1` | | `:day` | integer | Day of month (1–31) | `15` | | `:hour` | integer | Hour (0–23) | `12` | | `:minute` | integer | Minute (0–59) | `10` | | `:second` | integer | Second (0–59) | `0` | | `:weekday` | string | Full weekday name | `"Wednesday"` | The `:weekday` value is the full English weekday name: `"Monday"`, `"Tuesday"`, `"Wednesday"`, `"Thursday"`, `"Friday"`, `"Saturday"`, `"Sunday"`. ## Arithmetic ### `time/add` Add seconds to a timestamp. Returns a new timestamp. Use negative values to subtract. ```sema (time/add timestamp seconds) ; => float (timestamp) ``` ```sema (define ts 1736943000.0) ; 2025-01-15 12:10:00 UTC (time/add ts 3600) ; one hour later => 1736946600.0 (time/add ts 86400) ; one day later => 1737029400.0 (time/add ts -3600) ; one hour earlier => 1736939400.0 (time/add ts (* 7 86400)) ; one week later ``` Common durations in seconds: | Duration | Seconds | |----------|---------| | 1 minute | `60` | | 1 hour | `3600` | | 1 day | `86400` | | 1 week | `604800` | | 30 days | `2592000` | ### `time/diff` Compute the difference between two timestamps in seconds. Returns `t1 - t2` (the first argument minus the second). The result can be negative. ```sema (time/diff t1 t2) ; => float (seconds) ``` ```sema (define morning 1736935800.0) ; 2025-01-15 10:10:00 UTC (define afternoon 1736943000.0) ; 2025-01-15 12:10:00 UTC (time/diff afternoon morning) ; => 7200.0 (2 hours) (time/diff morning afternoon) ; => -7200.0 (negative — morning is earlier) (time/diff morning morning) ; => 0.0 ``` ::: tip `time/diff` returns a signed value: positive when `t1 > t2`, negative when `t1 < t2`. Use `abs` if you need the absolute elapsed time regardless of order. ::: ## Delay ### `sleep` Pause execution for a given number of milliseconds. Returns `nil`. ```sema (sleep milliseconds) ; => nil ``` ```sema (sleep 1000) ; sleep for 1 second (sleep 500) ; sleep for 500ms (sleep 0) ; yield (no-op pause) ``` Note that `sleep` takes **milliseconds** (not seconds), unlike the `time/` functions which work in seconds. ## strftime Format Directives The `time/format` and `time/parse` functions use [chrono strftime](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) format directives. Here are the most common ones: ### Date | Directive | Description | Example | |-----------|-------------|---------| | `%Y` | Four-digit year | `2025` | | `%m` | Month (zero-padded, 01–12) | `01` | | `%d` | Day of month (zero-padded, 01–31) | `15` | | `%e` | Day of month (space-padded) | `15` | | `%B` | Full month name | `January` | | `%b` | Abbreviated month name | `Jan` | | `%A` | Full weekday name | `Wednesday` | | `%a` | Abbreviated weekday name | `Wed` | | `%u` | Day of week (1=Monday, 7=Sunday) | `3` | | `%j` | Day of year (001–366) | `015` | | `%F` | ISO 8601 date (`%Y-%m-%d`) | `2025-01-15` | ### Time | Directive | Description | Example | |-----------|-------------|---------| | `%H` | Hour, 24-hour (zero-padded, 00–23) | `12` | | `%I` | Hour, 12-hour (zero-padded, 01–12) | `12` | | `%M` | Minute (zero-padded, 00–59) | `10` | | `%S` | Second (zero-padded, 00–59) | `00` | | `%p` | AM/PM | `PM` | | `%T` | Time (`%H:%M:%S`) | `12:10:00` | | `%R` | Short time (`%H:%M`) | `12:10` | ### Combined & Special | Directive | Description | Example | |-----------|-------------|---------| | `%c` | Locale date and time | `Wed Jan 15 12:10:00 2025` | | `%s` | Unix timestamp (seconds) | `1736943000` | | `%Z` | Timezone abbreviation | `UTC` | | `%%` | Literal `%` | `%` | ## Common Patterns ### Measuring elapsed time ```sema (define start (time/now)) ;; ... do some work ... (define end (time/now)) (define elapsed (time/diff end start)) (println (format "Took ~a seconds" elapsed)) ``` ### ISO 8601 formatting ```sema (define ts (time/now)) (time/format ts "%Y-%m-%dT%H:%M:%SZ") ; => "2025-01-15T12:10:00Z" (time/format ts "%F") ; => "2025-01-15" (date only) ``` ### Calculating "N days ago" ```sema (define now (time/now)) (define one-week-ago (time/add now (* -7 86400))) (define thirty-days-ago (time/add now (* -30 86400))) (println "One week ago: " (time/format one-week-ago "%Y-%m-%d")) ``` ### Formatting for display ```sema (define ts (time/now)) (time/format ts "%A, %B %d, %Y") ; => "Wednesday, January 15, 2025" (time/format ts "%I:%M %p") ; => "12:10 PM" (time/format ts "%b %d at %H:%M") ; => "Jan 15 at 12:10" ``` ### Checking the day of the week ```sema (define parts (time/date-parts (time/now))) (define day (get parts :weekday)) (if (or (= day "Saturday") (= day "Sunday")) (println "It's the weekend!") (println "It's a weekday.")) ``` ### Computing duration between dates ```sema (define start (time/parse "2025-01-01 00:00:00" "%Y-%m-%d %H:%M:%S")) (define end (time/parse "2025-03-15 00:00:00" "%Y-%m-%d %H:%M:%S")) (define diff-seconds (time/diff end start)) (define diff-days (/ diff-seconds 86400)) (println (format "~a days between dates" diff-days)) ``` ## Edge Cases ### Unix epoch ```sema (time/format 0.0 "%Y-%m-%d %H:%M:%S") ; => "1970-01-01 00:00:00" (time/date-parts 0.0) ; => {:day 1 :hour 0 :minute 0 :month 1 :second 0 :weekday "Thursday" :year 1970} ``` ### Negative timestamps (dates before 1970) ```sema (time/format -86400.0 "%Y-%m-%d") ; => "1969-12-31" (time/format -31536000.0 "%Y-%m-%d") ; => "1969-01-01" ``` ### Sub-second precision `time/now` returns millisecond fractional precision. `time/add` and `time/diff` preserve fractional seconds. However, `time/parse` returns whole seconds only. ```sema (define ts (time/add 1736943000.0 0.5)) ; add 500ms (time/diff ts 1736943000.0) ; => 0.5 ``` --- --- url: 'https://sema-lang.com/docs/stdlib/context.md' --- # Context Sema provides an ambient context system — a key-value store that flows through your entire execution without explicit parameter passing. Inspired by [Laravel's Context](https://laravel.com/docs/12.x/context), it's designed for tracing, metadata propagation, and sharing configuration across deeply nested calls. Context data is automatically appended as metadata to log output (`log/info`, `log/warn`, `log/error`, `log/debug`). ## Core Functions ### `context/set` Set a key-value pair in the current context frame. ```sema (context/set :trace-id "abc-123") (context/set :user-id 42) ``` ### `context/get` Retrieve a value by key. Returns `nil` if the key doesn't exist. ```sema (context/get :trace-id) ; => "abc-123" (context/get :missing) ; => nil ``` ### `context/has?` Check if a key exists in the context. ```sema (context/has? :trace-id) ; => #t (context/has? :missing) ; => #f ``` ### `context/remove` Remove a key from all context frames. Returns the removed value, or `nil`. ```sema (context/set :temp "data") (context/remove :temp) ; => "data" (context/remove :temp) ; => nil (already gone) ``` ### `context/pull` Get a value and remove it in one step (identical to `context/remove`). ```sema (context/set :token "abc") (context/pull :token) ; => "abc" (context/has? :token) ; => #f ``` ### `context/all` Get all context as a merged map. ```sema (context/set :a 1) (context/set :b 2) (context/all) ; => {:a 1 :b 2} ``` ### `context/merge` Merge a map of key-value pairs into the current context. ```sema (context/merge {:trace-id "abc" :env "production" :version "1.0"}) (context/get :env) ; => "production" ``` ### `context/clear` Clear all context, resetting to an empty state. ```sema (context/clear) (context/all) ; => {} ``` ## Scoped Overrides ### `context/with` Push a temporary context frame for the duration of a thunk. The frame is automatically popped when the thunk completes — even if it raises an error. ```sema (context/set :env "production") (context/with {:env "staging" :debug #t} (lambda () (context/get :env) ; => "staging" (context/get :debug))) ; => #t (context/get :env) ; => "production" (restored) (context/get :debug) ; => nil (gone) ``` Scopes nest naturally — inner values shadow outer ones: ```sema (context/set :a 1) (context/with {:b 2} (lambda () (context/with {:c 3} (lambda () (list (context/get :a) (context/get :b) (context/get :c)))))) ; => (1 2 3) ``` ::: warning Values set with `context/set` inside a `context/with` block are written to the inner frame and discarded when the scope exits. If you need a value to persist, set it before entering `context/with`. ::: ## Stacks Context stacks are ordered lists of values that you can push to and pop from. Unlike key-value context, stacks are **not scoped** by `context/with` — pushes persist across scope boundaries. ### `context/push` Append a value to a named stack. ```sema (context/push :breadcrumbs "login") (context/push :breadcrumbs "dashboard") (context/push :breadcrumbs "settings") ``` ### `context/stack` Get all values in a named stack as a list. ```sema (context/stack :breadcrumbs) ; => ("login" "dashboard" "settings") ``` ### `context/pop` Remove and return the last value from a stack. Returns `nil` if the stack is empty. ```sema (context/pop :breadcrumbs) ; => "settings" (context/stack :breadcrumbs) ; => ("login" "dashboard") ``` ## Hidden Context Hidden context stores values that are **not visible** via `context/get`, `context/all`, or log metadata. Use it for sensitive data like API keys or internal state. ### `context/set-hidden` ```sema (context/set-hidden :api-key "sk-secret-123") ``` ### `context/get-hidden` ```sema (context/get-hidden :api-key) ; => "sk-secret-123" (context/get :api-key) ; => nil (not visible in regular context) ``` ### `context/has-hidden?` ```sema (context/has-hidden? :api-key) ; => #t ``` ## Log Integration When context is non-empty, `log/info`, `log/warn`, `log/error`, and `log/debug` automatically append the context map as metadata: ```sema (context/set :trace-id "abc-123") (context/set :user-id 42) (log/info "Request processed") ``` Output: ``` [INFO] Request processed {:trace-id "abc-123" :user-id 42} ``` Hidden context is **not** included in log output. ## Examples ### Request tracing ```sema (context/set :request-id (uuid/v4)) (context/set :method "GET") (context/set :path "/api/users") (log/info "Request started") ; [INFO] Request started {:method "GET" :path "/api/users" :request-id "a1b2c3..."} ;; All downstream functions automatically include this context in their logs (process-request) ``` ### Pipeline breadcrumbs ```sema (define (process-document doc) (context/push :steps "parse") (let ((parsed (parse doc))) (context/push :steps "validate") (let ((valid (validate parsed))) (context/push :steps "transform") (transform valid)))) (process-document input) (context/stack :steps) ; => ("parse" "validate" "transform") ``` ### Scoped configuration ```sema ;; Set default model (context/set :model "claude-sonnet") ;; Override for a specific block (context/with {:model "gpt-5.5" :temperature 0.9} (lambda () ;; Code here sees the overridden values (context/get :model))) ; => "gpt-5.5" (context/get :model) ; => "claude-sonnet" ``` ## Function Reference | Function | Args | Description | | --------------------- | ----------- | --------------------------------- | | `context/set` | `key value` | Set a context value | | `context/get` | `key` | Get a value (or `nil`) | | `context/has?` | `key` | Check if key exists | | `context/remove` | `key` | Remove and return value | | `context/pull` | `key` | Get and remove (alias for remove) | | `context/all` | | Get all context as a map | | `context/merge` | `map` | Merge map into context | | `context/clear` | | Clear all context | | `context/with` | `map thunk` | Scoped override | | `context/push` | `key value` | Push to named stack | | `context/stack` | `key` | Get stack as list | | `context/pop` | `key` | Pop from named stack | | `context/set-hidden` | `key value` | Set hidden value | | `context/get-hidden` | `key` | Get hidden value | | `context/has-hidden?` | `key` | Check hidden key exists | --- --- url: 'https://sema-lang.com/docs/stdlib/terminal.md' --- # Terminal Styling Functions for styling terminal output with ANSI escape codes, true color, and animated spinners. All style functions take a string and return a new string wrapped in ANSI escape sequences. The styled text is reset after the content, so styles don't bleed into subsequent output. ::: tip Terminal output Styled output renders correctly in terminals that support ANSI escape codes. When piping or redirecting output (e.g., to a file), the raw escape sequences are included in the output. Use `term/strip` to produce clean text for non-terminal destinations. ::: ## Modifiers Modifier functions change how text is displayed without altering its color. ### `term/bold` Render text in **bold** (increased intensity). ```sema (term/bold "important") (println (term/bold "Warning: check your input")) ``` ### `term/dim` Render text with decreased intensity. ```sema (term/dim "less important") ``` ### `term/italic` Render text in *italic*. ```sema (term/italic "emphasis") ``` ### `term/underline` Render text with an underline. ```sema (term/underline "click here") ``` ### `term/inverse` Swap foreground and background colors. ```sema (term/inverse "highlighted") ``` ### `term/strikethrough` Render text with a ~~strikethrough~~. ```sema (term/strikethrough "deprecated") ``` ## Colors Color functions set the foreground (text) color. ### `term/black` ```sema (term/black "dark text") ``` ### `term/red` ```sema (term/red "error message") ``` ### `term/green` ```sema (term/green "success") ``` ### `term/yellow` ```sema (term/yellow "warning") ``` ### `term/blue` ```sema (term/blue "info") ``` ### `term/magenta` ```sema (term/magenta "special") ``` ### `term/cyan` ```sema (term/cyan "highlight") ``` ### `term/white` ```sema (term/white "bright text") ``` ### `term/gray` ```sema (term/gray "muted text") ``` ## Combined Styles ### `term/style` Apply multiple styles at once using keywords. The first argument is the text, followed by one or more style keywords. ```sema (term/style "danger" :bold :red) (term/style "notice" :italic :yellow :underline) (term/style "subtle" :dim :gray) ``` Internally, `term/style` combines ANSI codes with `;` separators into a single escape sequence (e.g., `ESC[1;31m` for bold red), which is more efficient than nesting individual style functions. If called with no style keywords, the text is returned unstyled. ```sema (term/style "plain text") ; => "plain text" (no ANSI codes) ``` An unknown keyword produces an error: ```sema (term/style "text" :blink) ; Error: unknown style keyword :blink ``` #### Style keyword reference | Keyword | Effect | ANSI Code | |------------------|----------------|-----------| | `:bold` | Bold | 1 | | `:dim` | Dim | 2 | | `:italic` | Italic | 3 | | `:underline` | Underline | 4 | | `:inverse` | Inverse | 7 | | `:strikethrough` | Strikethrough | 9 | | `:black` | Black text | 30 | | `:red` | Red text | 31 | | `:green` | Green text | 32 | | `:yellow` | Yellow text | 33 | | `:blue` | Blue text | 34 | | `:magenta` | Magenta text | 35 | | `:cyan` | Cyan text | 36 | | `:white` | White text | 37 | | `:gray` | Gray text | 90 | ### Composing Styles There are two ways to combine styles: **Using `term/style` (recommended):** produces a single escape sequence with combined codes. ```sema (term/style "alert" :bold :red :underline) ;; Produces: ESC[1;31;4m alert ESC[0m ``` **Nesting individual functions:** each function wraps the text in its own escape sequence. This works but produces more verbose output. ```sema (term/bold (term/red (term/underline "alert"))) ;; Produces: ESC[1m ESC[31m ESC[4m alert ESC[0m ESC[0m ESC[0m ``` Both approaches render identically in terminals, but `term/style` is cleaner. ## True Color ### `term/rgb` Apply 24-bit true color to text. Takes the text followed by red, green, and blue values (integers 0–255). ```sema (term/rgb "orange" 255 165 0) (term/rgb "coral" 255 127 80) (term/rgb "teal" 0 128 128) (term/rgb "hot pink" 255 105 180) ``` Uses the `ESC[38;2;r;g;bm` escape sequence format, which is supported by most modern terminals. ```sema ;; Build a gradient (for-each (lambda (i) (display (term/rgb "█" (* i 25) 50 (- 255 (* i 25))))) (range 11)) (println) ``` ## Stripping ANSI Codes ### `term/strip` Remove all ANSI escape sequences from a string, returning plain text. ```sema (term/strip (term/bold "hello")) ; => "hello" (term/strip (term/style "hi" :red :bold)) ; => "hi" (term/strip (term/rgb "color" 255 0 0)) ; => "color" (term/strip "no codes here") ; => "no codes here" ``` This is useful when you need plain text for logging to files, comparisons, or passing to functions that don't understand ANSI codes: ```sema ;; Write clean text to a file, styled text to terminal (define msg (term/green "Build succeeded")) (println msg) ; styled on terminal (file/write "build.log" (term/strip msg)) ; clean in log file ``` ## Spinners Animated terminal spinners for indicating progress during long-running operations. Spinners use braille animation frames (`⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏`) cycling at 80ms intervals, and render to **stderr** so they don't interfere with stdout output. ### `term/spinner-start` Start a spinner with a message. Returns an integer spinner ID used to update or stop it. ```sema (define id (term/spinner-start "Loading data...")) ``` ### `term/spinner-update` Update the message displayed next to a running spinner. ```sema (term/spinner-update id "Processing records...") (term/spinner-update id "Almost done...") ``` ### `term/spinner-stop` Stop a running spinner and optionally display a final status line. The spinner line is cleared from the terminal before the final status is printed. **Without options** — just clears the spinner: ```sema (term/spinner-stop id) ``` **With options map** — displays a final symbol and text: ```sema (term/spinner-stop id {:symbol "✔" :text "Done"}) ``` The options map supports two keys: | Key | Type | Description | |-----------|--------|--------------------------------------| | `:symbol` | string | Symbol to display (e.g., `"✔"`, `"✗"`, `"⚠"`) | | `:text` | string | Final status message | Both keys are optional. The final line is printed to stderr as `symbol text`. ### Spinner Lifecycle Example ```sema ;; Start spinner (define spinner (term/spinner-start "Fetching data...")) ;; ... do some work ... (term/spinner-update spinner "Processing results...") ;; ... do more work ... (term/spinner-update spinner "Writing output...") ;; Stop with success indicator (term/spinner-stop spinner {:symbol "✔" :text "Complete"}) ``` Multiple spinners can run concurrently — each gets a unique ID: ```sema (define s1 (term/spinner-start "Task A...")) (define s2 (term/spinner-start "Task B...")) ;; ... work ... (term/spinner-stop s1 {:symbol "✔" :text "Task A done"}) (term/spinner-stop s2 {:symbol "✔" :text "Task B done"}) ``` ## Line Input Read whole lines from standard input (cooked mode — the terminal buffers a line until Enter). Useful for simple prompts and for piping data into a script. ### `io/read-line` Block until a full line is available on stdin and return it as a string (without the trailing newline). Returns `nil` at end of input. ```sema (define name (io/read-line)) (println (str "Hello, " name)) ``` ### `io/eof?` Return `#t` once stdin has hit end of input (set when `io/read-line` / `io/read-stdin` / `io/read-key` returns `nil`). Pair it with `io/read-line` to consume piped input line by line: ```sema (let loop () (let ((line (io/read-line))) (unless (io/eof?) (println (string/upper line)) (loop)))) ``` ## Raw-Mode Input Primitives for building interactive TUIs: per-keystroke input, EOF detection, and signal-aware event loops. **Unix only** — these functions are no-op stubs on Windows. In cooked mode (the default), the terminal driver buffers a whole line and only delivers it to your program when the user hits Enter. Raw mode disables that — every key press, including Ctrl-C and arrow keys, is delivered as it happens. Pair these with `sys/term-size` and `sys/on-signal` (in the [System](system) docs) to build full TUIs. ### `io/tty-raw!` Put stdin into raw mode. Returns an **integer restore-token** on success, or `nil` if stdin is not a TTY (e.g., when input is piped from a file). Always pair with `io/tty-restore!` so the user's shell isn't left in raw mode if your program crashes. ```sema (define tok (io/tty-raw!)) (when tok ;; ... read keys, draw UI ... (io/tty-restore! tok)) ``` ### `io/tty-restore!` Restore the TTY to cooked mode using the token returned by `io/tty-raw!`. ```sema (io/tty-restore! tok) ``` ### `io/read-key` Block until a single keypress arrives, then return a map describing it. Returns `nil` on EOF (after which `io/eof?` returns `#t`). ```sema (io/read-key) ;; => {:kind :char :char "a"} ``` The map's `:kind` field is one of: | `:kind` | Other keys | Meaning | |-----------|-------------------------|-------------------------------------------------| | `:char` | `:char` (string) | A printable character (UTF-8 multi-byte handled) | | `:ctrl` | `:char` (string) | Ctrl + letter (e.g., Ctrl-C → `{:kind :ctrl :char "c"}`) | | `:alt` | `:char` (string) | Alt/Meta + character (ESC + char sequence) | | `:key` | `:name` (keyword) | Named key — see table below | Named keys (`:kind :key`) currently emitted: `:enter` `:tab` `:backspace` `:esc` `:up` `:down` `:left` `:right` `:home` `:end` `:delete` `:page-up` `:page-down` `:f1` `:f2` `:f3` `:f4` CSI/SS3 escape sequences (arrow keys, F1–F4, Page Up/Down, Delete) and UTF-8 continuation bytes are decoded for you with a 20 ms continuation-byte window. F5–F12 and Insert use longer escape sequences that aren't decoded yet — they fall through as raw characters. ### `io/read-key-timeout` Like `io/read-key`, but returns `nil` after `timeout-ms` milliseconds with no input. Backed by `select(2)`, so it doesn't burn CPU. ```sema (io/read-key-timeout 100) ; => key map, or nil after 100ms ``` Use this to drive an animation loop or to poll signals between renders: ```sema (let loop () (sys/check-signals) (let ((key (io/read-key-timeout 50))) (when key (handle-key key)) (loop))) ``` ### Minimal TUI skeleton Assumes interactive stdin — `io/tty-raw!` returns `nil` when stdin isn't a TTY, so guard with `when tok` if the program may run with input piped from a file. ```sema (define tok (io/tty-raw!)) (when tok (sys/on-signal :winch (fn () (redraw (sys/term-size)))) (sys/on-signal :int (fn () (io/tty-restore! tok) (exit 0))) (let loop () (sys/check-signals) (let ((key (io/read-key))) (cond ((nil? key) ; EOF (io/tty-restore! tok)) ((and (= (:kind key) :ctrl) (= (:char key) "c")) ; Ctrl-C (io/tty-restore! tok)) (else (handle-key key) (loop)))))) ``` ## Common Patterns ### Colored Log Levels ```sema (define (log-error msg) (println (term/style "✗ ERROR" :bold :red) " " msg)) (define (log-warn msg) (println (term/style "⚠ WARN " :bold :yellow) " " msg)) (define (log-info msg) (println (term/style "ℹ INFO " :bold :blue) " " msg)) (define (log-success msg) (println (term/style "✔ OK " :bold :green) " " msg)) (log-error "Connection refused") (log-warn "Retrying in 5s") (log-info "Connecting to server") (log-success "Connected") ``` ### CLI Status Output ```sema (define (print-step label detail) (println (term/style label :bold :cyan) " " (term/dim detail))) (print-step "Compile" "src/main.sema") (print-step "Link" "3 modules") (print-step "Write" "build/output") ``` ### Progress with Spinners ```sema (define steps '("Downloading" "Extracting" "Installing" "Configuring")) (define sp (term/spinner-start "Starting...")) (for-each (lambda (step) (term/spinner-update sp (string/append step "...")) (sleep 1000)) steps) (term/spinner-stop sp {:symbol "✔" :text "Installation complete"}) ``` ### Conditional Styling ```sema (define (color-status code) (cond ((< code 300) (term/green (number/to-string code))) ((< code 400) (term/yellow (number/to-string code))) (else (term/red (number/to-string code))))) (println "Status: " (color-status 200)) ; green "200" (println "Status: " (color-status 301)) ; yellow "301" (println "Status: " (color-status 404)) ; red "404" ``` --- --- url: 'https://sema-lang.com/docs/stdlib/playground.md' --- # Playground & WASM When running in the browser playground at [sema.run](https://sema.run), Sema executes as WebAssembly. Most stdlib functions work identically, but some behave differently due to browser sandbox constraints, and a few web-only functions are available. ## Web-Only Functions These functions are **only available in the WASM playground** — they access browser APIs that don't exist in the native CLI. ### `web/user-agent` Return the browser's `navigator.userAgent` string. Works in all browsers. ```sema (web/user-agent) ; => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 ..." ``` ### `web/user-agent-data` Return structured browser information from `navigator.userAgentData`. Returns a map on Chromium-based browsers (Chrome, Edge, Opera), `nil` on Firefox and Safari. ```sema (web/user-agent-data) ; Chromium => {:mobile false :platform "macOS" :brands ("Chromium/120" "Google Chrome/120")} ; Firefox/Safari => nil ``` ::: tip `userAgentData` is the modern replacement for UA string parsing — it returns structured, reliable data instead of a messy string. However, it's Chromium-only. Use `web/user-agent` for cross-browser compatibility. ::: ## WASM Behavior Differences ### System Information System functions return web-appropriate values instead of OS-specific ones: | Function | Native | WASM | | ------------------ | ----------------------------------- | --------------------------- | | `sys/platform` | `"macos"` / `"linux"` / `"windows"` | `"web"` | | `sys/os` | `"macos"` | `"web"` | | `sys/arch` | `"aarch64"` / `"x86_64"` | `"wasm32"` | | `sys/cwd` | Current directory path | `"/"` | | `sys/interactive?` | `#t` in REPL | `#f` | | `sys/pid` | Process ID | `0` | | `sys/elapsed` | Nanoseconds since process start | Nanoseconds since page load | | `time-ms` | `SystemTime` milliseconds | `Date.now()` milliseconds | These always return `nil` in WASM: `sys/hostname`, `sys/user`, `sys/home-dir`, `sys/which`, `sys/tty`. ### File I/O (Virtual Filesystem) File operations work against an **in-memory virtual filesystem** (VFS). Files persist for the duration of your session but are **lost on page reload**. ```sema ;; These all work in the playground (file/write "/hello.txt" "Hello from WASM!") (file/read "/hello.txt") ; => "Hello from WASM!" (file/exists? "/hello.txt") ; => #t (file/mkdir "/mydir") (file/is-directory? "/mydir") ; => #t (file/list "/") ; => ("hello.txt") ``` All file functions are supported: `file/read`, `file/write`, `file/append`, `file/delete`, `file/rename`, `file/copy`, `file/exists?`, `file/list`, `file/mkdir`, `file/is-file?`, `file/is-directory?`, `file/is-symlink?`, `file/info`, `file/read-lines`, `file/write-lines`. Path functions (`path/join`, `path/dirname`, `path/basename`, `path/extension`, `path/absolute`) also work. **Quotas**: The VFS enforces limits to prevent runaway memory usage — 1 MB per file, 16 MB total, and 256 files max. Exceeding these limits returns an error. The `load` function reads from the VFS and evaluates the parsed expressions. ### Terminal Styling All `term/*` functions work but return text **without ANSI formatting** (since the browser has no terminal): ```sema (term/bold "hello") ; => "hello" (no bold applied) (term/red "error") ; => "error" (no color applied) (term/style "hi" :bold :cyan) ; => "hi" ``` ### HTTP Functions HTTP functions work in the playground via the browser's `fetch()` API. They return the same `{:status :headers :body}` map as the native CLI. ```sema (define resp (http/get "https://httpbin.org/get")) (:status resp) ; => 200 (:body resp) ; => "{\"args\": {}, ...}" (http/post "https://httpbin.org/post" {:name "sema"}) ; => {:status 200 :headers {...} :body "..."} ``` All HTTP functions are supported: `http/get`, `http/post`, `http/put`, `http/delete`, `http/request`. ::: warning CORS Restrictions Browser security rules (CORS) may block requests to servers that don't include `Access-Control-Allow-Origin` headers. Public APIs like httpbin.org work fine. If you get a network error, the target server likely doesn't allow cross-origin requests. ::: ### Not Available in WASM These functions return an error when called in the playground: | Function | Reason | | ------------ | -------------------------------------------- | | `shell` | No subprocess execution in browser | | `exit` | No process to exit | | `io/read-line` | No stdin in browser | | `io/read-stdin` | No stdin in browser | | `sleep` | Cannot block the browser main thread (no-op) | --- --- url: 'https://sema-lang.com/docs/stdlib/streams.md' --- # Streams Streams are first-class byte-oriented I/O handles for reading and writing data incrementally. They provide a unified interface across files, in-memory buffers, strings, and standard I/O — the same `stream/read` and `stream/write` work regardless of the underlying source. ```sema ;; Read a file line by line (with-stream (s (stream/open-input "data.txt")) (let loop ((line (stream/read-line s))) (when line (println line) (loop (stream/read-line s))))) ;; In-memory buffer (let ((buf (stream/byte-buffer))) (stream/write-string buf "hello") (stream/to-string buf)) ;; => "hello" ``` ## Creating Streams ### `stream/from-string` Create a read-only stream from a string's UTF-8 bytes. ```sema (define s (stream/from-string "hello world")) (stream/read-byte s) ;; => 104 (ASCII 'h') (stream/read s 5) ;; => #u8(101 108 108 111 32) ("ello ") ``` ### `stream/from-bytes` Create a readable stream from a bytevector. ```sema (define s (stream/from-bytes (bytevector 1 2 3))) (stream/read-byte s) ;; => 1 (stream/read-byte s) ;; => 2 ``` ### `stream/byte-buffer` Create a read/write in-memory buffer. Writes append to the buffer; reads consume from the current position. ```sema (define buf (stream/byte-buffer)) (stream/write buf (string->utf8 "hello")) (stream/to-string buf) ;; => "hello" ``` ### `stream/open-input` Open a file for reading. Returns a buffered input stream. Sandbox-gated (`FS_READ`). ```sema (define s (stream/open-input "data.csv")) (define contents (stream/read-all s)) (stream/close s) ``` ### `stream/open-output` Open (or create) a file for writing. Returns a buffered output stream. Sandbox-gated (`FS_WRITE`). ```sema (define s (stream/open-output "output.txt")) (stream/write-string s "hello world\n") (stream/close s) ``` ## Reading ### `stream/read` Read up to `n` bytes, returning a bytevector. Returns fewer bytes at EOF. ```sema (stream/read s 1024) ;; => bytevector (up to 1024 bytes) ``` ### `stream/read-byte` Read a single byte. Returns an integer 0–255, or `nil` at EOF. ```sema (stream/read-byte s) ;; => 65 (or nil at EOF) ``` ### `stream/read-line` Read until newline (`\n`), returning a string without the newline. Strips trailing `\r` for Windows line endings. Returns `nil` at EOF. ```sema (stream/read-line s) ;; => "first line" (or nil) ``` ### `stream/read-all` Read the entire stream into a bytevector. ```sema (define data (stream/read-all s)) (utf8->string data) ; convert to string if text ``` ## Writing ### `stream/write` Write a bytevector. Returns the number of bytes written. ```sema (stream/write s (bytevector 72 101 108 108 111)) ;; => 5 ``` ### `stream/write-byte` Write a single byte (integer 0–255). ```sema (stream/write-byte s 10) ; write a newline ``` ### `stream/write-string` Write a string as UTF-8 bytes. Returns the number of bytes written. ```sema (stream/write-string s "hello") ;; => 5 ``` ## Control ### `stream/close` Close a stream, releasing the underlying resource. Double-close is a no-op. ```sema (stream/close s) (stream/close s) ; safe, does nothing ``` ### `stream/flush` Flush any buffered output to the underlying sink. ```sema (stream/flush s) ``` ### `stream/copy` Copy all bytes from one stream to another. Returns total bytes copied. ```sema (with-stream (in (stream/open-input "src.bin")) (with-stream (out (stream/open-output "dst.bin")) (stream/copy in out))) ;; => bytes copied ``` ## Introspection ### `stream?` Type predicate — returns `#t` if the value is a stream. ```sema (stream? (stream/byte-buffer)) ;; => #t (stream? 42) ;; => #f ``` ### `stream/readable?`, `stream/writable?` Check the direction of a stream. ```sema (stream/readable? (stream/from-string "x")) ;; => #t (stream/writable? (stream/from-string "x")) ;; => #f (stream/writable? (stream/byte-buffer)) ;; => #t ``` ### `stream/available?` Returns `#t` if data is ready to read without blocking. ```sema (stream/available? (stream/from-string "x")) ;; => #t (stream/available? (stream/from-string "")) ;; => #f ``` ### `stream/type` Returns a string describing the stream implementation. ```sema (stream/type (stream/byte-buffer)) ;; => "byte-buffer" (stream/type (stream/from-string "x")) ;; => "string" (stream/type (stream/open-input "f.txt")) ;; => "file-input" (stream/type *stdout*) ;; => "stdout" ``` ## Extraction (Byte Buffers) ### `stream/to-bytes` Extract the accumulated contents of a byte-buffer stream as a bytevector. ```sema (let ((s (stream/byte-buffer))) (stream/write s (bytevector 1 2 3)) (stream/to-bytes s)) ;; => #u8(1 2 3) ``` ### `stream/to-string` Extract the contents of a byte-buffer stream as a UTF-8 string. ```sema (let ((s (stream/byte-buffer))) (stream/write-string s "hello") (stream/to-string s)) ;; => "hello" ``` ## Standard I/O Three global streams are available for console I/O: | Stream | Direction | Description | |--------|-----------|-------------| | `*stdin*` | Readable | Standard input | | `*stdout*` | Writable | Standard output | | `*stderr*` | Writable | Standard error | ```sema (stream/write-string *stdout* "prompt> ") (stream/flush *stdout*) (stream/write-string *stderr* "warning: something happened\n") ``` ## Resource Management ### `with-stream` Macro that binds a stream, executes the body, and automatically closes the stream on exit — even if an error is thrown. ```sema (with-stream (s (stream/open-input "data.txt")) (stream/read-all s)) ;; s is closed here, even if read-all threw an error ;; Write to a file (with-stream (out (stream/open-output "output.txt")) (stream/write-string out "line 1\n") (stream/write-string out "line 2\n")) ;; file is flushed and closed ``` ## Patterns ### Line-by-Line Processing ```sema (with-stream (s (stream/open-input "log.txt")) (let loop ((line (stream/read-line s)) (count 0)) (if (nil? line) count (loop (stream/read-line s) (+ count 1))))) ``` ### Building a String Incrementally ```sema (let ((buf (stream/byte-buffer))) (stream/write-string buf "{") (stream/write-string buf "\"key\": \"value\"") (stream/write-string buf "}") (stream/to-string buf)) ;; => "{\"key\": \"value\"}" ``` ### File Copy ```sema (with-stream (in (stream/open-input "photo.jpg")) (with-stream (out (stream/open-output "backup.jpg")) (stream/copy in out))) ``` ## Error Handling Reading a closed stream or writing to a read-only stream throws an error caught with `try`/`catch`: ```sema (try (let ((s (stream/from-string "x"))) (stream/close s) (stream/read s 1)) ; throws "stream is closed" (catch e (println (str "Error: " e)))) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/concurrency.md' --- # Concurrency Cooperative async concurrency with promises and channels. Tasks run on the VM's cooperative scheduler, interleaving at yield points (channel operations, `await`, `sleep`). ## Scheduling guarantees * **Spawn order is preserved.** When several tasks are simultaneously ready to run, the scheduler picks them in the order they were spawned. A pipeline of `(async (send-1)) (async (send-2)) (async (send-3))` followed by sequential receives yields `1 2 3`, not a reordered surface. * **Wake order is FIFO.** When a value becomes available on a channel, the longest-waiting receiver is woken first. * **Cooperation, not parallelism.** Tasks interleave at yield points (channel ops, `await`, `sleep`). CPU-bound tasks without yield points run to completion before other tasks get a turn. ## Promises ### `async/spawn` ```sema (async/spawn thunk) → async-promise ``` Spawn a zero-argument function as an async task. Returns a promise that resolves when the task completes. ```sema (define p (async/spawn (fn () (+ 1 2)))) (async/await p) ; => 3 ``` Usually called via the `async` special form: ```sema (define p (async (+ 1 2))) (await p) ; => 3 ``` ### `async/await` ```sema (async/await promise) → value ``` Wait for a promise to resolve. Inside an async task, yields to the scheduler. At the top level, runs the scheduler inline until the promise resolves. Raises an error if the promise was rejected. ### `async/all` ```sema (async/all promises) → list ``` Run all promises to completion and return a list of their results. Takes a list or vector of promises. ```sema (let ((p1 (async 10)) (p2 (async 20)) (p3 (async 30))) (async/all (list p1 p2 p3))) ; => (10 20 30) ``` ### `async/race` ```sema (async/race promises) → value ``` Return the value of the first promise to resolve. Takes a list or vector of promises. ### `async/resolved` ```sema (async/resolved value) → async-promise ``` Create an already-resolved promise wrapping `value`. ### `async/rejected` ```sema (async/rejected message) → async-promise ``` Create an already-rejected promise with `message`. ### `async/run` ```sema (async/run) ``` Run all pending async tasks to completion. ### `async/sleep` ```sema (async/sleep ms) ``` Inside an async task, yield for `ms` milliseconds on the scheduler's **virtual clock**. The clock only advances when every task is blocked, jumping to the nearest deadline — so a shorter sleep always wakes before a longer one, deterministically. The scheduler then waits the real time when it advances: on native via `thread::sleep`, and in the **browser playground** by running eval on a Web Worker that blocks on `Atomics.wait` (so a sleep really pauses while the page stays responsive). Browsers without cross-origin isolation fall back to advancing the clock instantly — durations still order tasks correctly, just without the real wait. Outside async, calls `thread::sleep` on native. Durations are capped at `86_400_000` ms (1 day). ### `async/timeout` ```sema (async/timeout ms promise) → value ``` Wait for `promise` to resolve, but raise an error if it takes longer than `ms` milliseconds. On expiry the target task **is cancelled** — and any in-flight offloaded I/O it holds is aborted for real (an HTTP connection is torn down, a subprocess is killed; LLM calls are best-effort — see [`async/cancel`](#async-cancel)). So a timed-out `http/get`/`shell` stops consuming resources immediately rather than running to completion in the background. ```sema (async/timeout 100 (async (do-slow-work))) ;; raises: async/timeout: operation timed out ``` A `ms = 0` (or very short) timeout still lets work that is **synchronously ready** finish — it only fires once the virtual clock actually reaches the deadline with the task still pending (i.e. the task had to block/wait). Durations are capped at `86_400_000` ms (1 day). ### `async/cancel` ```sema (async/cancel promise) → bool ``` Request cancellation of a spawned task. Returns `#t` if the call actually transitioned the promise into the `Cancelled` state, `#f` if there was nothing to cancel — the promise was already terminal (resolved, rejected, previously cancelled) or was never spawned in the first place (e.g. created via `async/resolved`). Cancellation never errors. The task transitions to `Cancelled`; subsequent `(await p)` raises `"async/await: task was cancelled"` (distinct from a normal rejection). **What actually gets aborted.** If the cancelled task is parked on offloaded I/O, the underlying work is aborted where the runtime allows it: * `http/*` — the in-flight request's future is dropped, **tearing down the connection** (no wasted round-trip). * `shell` — the subprocess is **killed** (`SIGKILL`), not left running in the background. * `llm/*` (`embed`, `complete`, `classify`, `extract`) — **best-effort**: the request runs on a blocking worker that can't be interrupted mid-call, so the in-flight call completes and its result is discarded. A multi-round caller stops issuing further rounds. ```sema (async/cancel (async/resolved 1)) ;; => #f (never spawned) (let ((p (async 42))) (await p) (async/cancel p)) ;; => #f (already resolved) (let ((p (async (async/sleep 100)))) (async/cancel p)) ;; => #t ``` ### `async/cancelled?` ```sema (async/cancelled? promise) → bool ``` `#t` if `promise` is in the `Cancelled` state — distinct from `async/rejected?`. Matches the state variant directly rather than the rejection message, so a user `(async/rejected "cancelled")` no longer aliases: ```sema (async/cancelled? (async/rejected "cancelled")) ;; => #f ``` ### Promise predicates The four predicates **partition** the terminal states: a promise is at most one of resolved / rejected / cancelled, and `pending?` is the complement of those three. | Function | Description | | --- | --- | | `(async/promise? x)` | Is `x` an async promise? | | `(async/resolved? p)` | Is promise `p` resolved? | | `(async/rejected? p)` | Is promise `p` rejected? (excludes cancelled) | | `(async/pending? p)` | Is promise `p` still pending? | | `(async/cancelled? p)` | Was promise `p` cancelled? | ### `async/pool-map` ```sema (async/pool-map f items n) → list ``` Map `f` over `items` with **bounded concurrency**: at most `n` calls run at once, results returned in input order. A semaphore (an `n`-capacity channel) gates how many tasks are in flight, so you can fan a large batch across a rate-limited resource without launching everything at once. The token is released on both success and error, so a failing item never deadlocks the pool. ```sema ;; Embed 10 000 chunks, but only 8 requests in flight at a time: (async/pool-map (fn (chunk) (llm/embed chunk)) chunks 8) ;; Fetch many URLs, 16 at a time: (async/pool-map (fn (u) (http/get u)) urls 16) ``` ### `async/map` ```sema (async/map f items) → list ``` Concurrent `map`: apply `f` to each item in its **own** task, results in input order. The unbounded sibling of `async/pool-map` (no cap — every item gets a task at once). Use `async/pool-map` when you need to limit how many run together. ```sema (async/map (fn (u) (http/get u)) urls) ; fetch every url concurrently (async/map (fn (i) (* i i)) '(1 2 3 4)) ; => (1 4 9 16) ``` ### `async/spawn-all` ```sema (async/spawn-all thunks) → list ``` Spawn a list of zero-arg functions concurrently and await them all, in input order — the ergonomic form of `(async/all (map (fn (th) (async/spawn th)) thunks))`. ```sema (async/spawn-all (list (fn () (http/get a)) (fn () (http/get b)))) ``` ## Concurrent I/O — what actually overlaps The scheduler's payoff is **latency overlap**: when several tasks each wait on I/O, the waits happen *simultaneously* instead of one after another. The blocking leaves below now yield to the scheduler while their work runs on a background runtime, so spawning them as tasks (via `async/spawn` + `async/all`, or `async/pool-map`) makes wall-clock approach `max(latency)` instead of `sum(latency)`: | Operation | Overlaps when spawned concurrently | | --- | --- | | `http/get` and the other `http/*` verbs | ✅ | | `shell` (subprocess) | ✅ | | `llm/embed` | ✅ | | `llm/complete`, `llm/classify`, `llm/extract` | ✅ | ```sema ;; Four independent LLM calls — concurrent, not serial: (async/all (map (fn (q) (async/spawn (fn () (llm/complete q)))) '("summarize A" "summarize B" "summarize C" "summarize D"))) ;; wall-clock ≈ one call, not four. ``` Outside a scheduler task (a plain top-level call) these run **synchronously**, byte-identical to before — the concurrency only engages inside `async`/`async/spawn`. Tasks still interleave at I/O boundaries on the single VM thread; this is cooperative concurrency, not parallel CPU execution. **Tracing nests across spawns.** Spans (`with-span`, the auto-instrumented `llm/*` spans) opened inside a spawned task nest under the spawning task's active span and share its trace — so `(with-span "batch" (async/map llm/complete prompts))` shows up as one connected tree in Jaeger/Phoenix/Langfuse (the `batch` span with the concurrent LLM spans beneath it), not a pile of disconnected single-span traces. Each task still keeps its own span stack, so concurrent spans never cross-contaminate. A spawn at the top level (no active span) starts its own trace. ## Channels Bounded FIFO channels for communication between async tasks. ### `channel/new` ```sema (channel/new) → channel ; capacity 1 (channel/new capacity) → channel ``` Create a bounded channel. Default capacity is 1. Capacity must be at least 1. ### `channel/send` ```sema (channel/send ch value) ``` Send a value to the channel. If the channel is full and inside an async task, yields until space is available. Outside async context, raises an error if full. Raises an error if the channel is closed. ### `channel/recv` ```sema (channel/recv ch) → value ``` Receive a value from the channel. If the channel is empty and inside an async task, yields until data is available. Outside async context, raises an error if empty. Returns `nil` if the channel is closed and empty. ### `channel/try-recv` ```sema (channel/try-recv ch) → value | nil ``` Non-blocking receive. Returns the next value or `nil` if the channel is empty. ### `channel/close` ```sema (channel/close ch) ``` Close the channel. Subsequent sends will error. Blocked receivers will wake with `nil`. ### Channel predicates | Function | Description | | --- | --- | | `(channel? x)` | Is `x` a channel? | | `(channel/closed? ch)` | Is the channel closed? | | `(channel/empty? ch)` | Is the channel buffer empty? | | `(channel/full? ch)` | Is the channel buffer at capacity? | | `(channel/count ch)` | Number of values in the buffer | ## Examples ### Producer/Consumer ```sema (let ((ch (channel/new 1))) (let ((producer (async (channel/send ch 10) (channel/send ch 20) (channel/send ch 30) (channel/close ch))) (consumer (async (let loop ((sum 0)) (let ((val (channel/recv ch))) (if (nil? val) sum (loop (+ sum val)))))))) (await consumer))) ; => 60 ``` ### Parallel computation ```sema (let ((p1 (async (fib 30))) (p2 (async (fib 31)))) (+ (await p1) (await p2))) ``` See [Scheduling guarantees](#scheduling-guarantees) above for the full ordering / cooperation rules. ## Async ops inside higher-order functions Stdlib higher-order functions like `for-each`, `map`, `filter`, `foldl`, `sort-by`, `apply`, `reduce`, `partition`, `any`, `every` can call **lambdas** that perform async operations (`channel/send`, `channel/recv`, `await`, `async/sleep`). The yield suspends inside the callback and resumes correctly: ```sema (let ((ch (channel/new 3))) (let ((producer (async (for-each (fn (n) (channel/send ch n)) (list 1 2 3 4 5 6 7)) (channel/close ch))) (consumer (async (let loop ((sum 0)) (let ((v (channel/recv ch))) (if (nil? v) sum (loop (+ sum v)))))))) (await consumer))) ;; => 28 ``` Yielding **native** functions (e.g., `channel/recv`, `async/sleep`) passed *directly* as the callback produce a clear error pointing to the workaround: ```sema ;; Error: yielding native passed directly to a higher-order function — wrap in a lambda (map channel/recv (list ch ch ch)) ;; Correct: wrap the native in a lambda (map (fn (c) (channel/recv c)) (list ch ch ch)) ``` --- --- url: 'https://sema-lang.com/docs/stdlib/records.md' --- # Records Records are **user-defined, named product types** created with the `define-record-type` special form. They provide constructors, type predicates, and field accessors. ::: tip Records vs Maps If you need an *open* data shape that's easy to serialize and manipulate generically, use [maps](./maps). If you want a *closed* domain type with a predicate and fixed fields, use records. ::: ## Defining Record Types ### `define-record-type` Define a new record type, generating a constructor, predicate, and one accessor per field. ```sema (define-record-type point (make-point x y) ; constructor (positional args) point? ; predicate (x point-x) ; (field-name accessor-name) (y point-y)) ``` General syntax: ```sema (define-record-type ( ...) ( ) ...) ``` ### What Gets Defined For the `point` example above: | Binding | Signature | Purpose | |---------|-----------|---------| | `make-point` | `(x y) → point` | Constructor | | `point?` | `(value) → bool` | Type predicate | | `point-x` | `(point) → value` | Field accessor | | `point-y` | `(point) → value` | Field accessor | ```sema (define p (make-point 3 4)) (point? p) ; => #t (point? 42) ; => #f (point-x p) ; => 3 (point-y p) ; => 4 ``` ### Constructor Arity The constructor is positional — its arity must match exactly: ```sema (make-point 1 2) ; ok (make-point 1) ; error: wrong arity (make-point 1 2 3) ; error: wrong arity ``` ### Immutability Sema records are immutable. To "update" a record, construct a new one: ```sema (define (move-point p dx dy) (make-point (+ (point-x p) dx) (+ (point-y p) dy))) (move-point (make-point 10 20) 5 -2) ; => a new point record with x=15, y=18 ``` ## Equality Two records are `equal?` if they have the **same type** and their fields are pairwise `equal?`: ```sema (define a (make-point 1 2)) (define b (make-point 1 2)) (define c (make-point 9 9)) (equal? a b) ; => #t (same type, same fields) (equal? a c) ; => #f (same type, different fields) ``` Records of different types are never equal, even if they have the same field values. ## Introspection ### `record?` Test if a value is any record instance (of any record type). ```sema (record? (make-point 3 4)) ; => #t (record? {:x 3 :y 4}) ; => #f (record? 42) ; => #f ``` ### `type` Return the type of a value as a keyword. For records, returns the record's type name: ```sema (type (make-point 3 4)) ; => :point (type [1 2 3]) ; => :vector (type {:a 1}) ; => :map ``` ## Records vs Maps Both model "structured data", but they serve different purposes. ### Use records when… * You want a **distinct type**: `person?`, `invoice?`, `token?` * Your data has a **fixed schema** enforced at construction * You want named field accessors and clear domain boundaries ### Use maps when… * You need easy **serialization** (JSON, TOML, etc.) * You want to add/remove keys dynamically * You want generic operations like `get`, `assoc`, `merge`, `keys`, `map/get-in`, `map/update-in` * You're interacting with external APIs ::: tip Common pattern **Maps at the boundary, records internally.** Parse/validate external maps into records early, and convert records back to maps for output. ::: ## Nested Records Records can contain any values, including other records: ```sema (define-record-type address (make-address line1 city country) address? (line1 address-line1) (city address-city) (country address-country)) (define-record-type user (make-user id name addr) user? (id user-id) (name user-name) (addr user-addr)) (define u (make-user 123 "Ada" (make-address "12 St James" "London" "UK"))) (user-name u) ; => "Ada" (address-city (user-addr u)) ; => "London" ``` ## Pattern Matching with Records Records don't have a dedicated pattern form, but you can use binding patterns with `when` guards: ```sema (define (describe v) (match v (p when (point? p) (string/append "point(" (number/to-string (point-x p)) ", " (number/to-string (point-y p)) ")")) (_ "not a point"))) (describe (make-point 3 4)) ; => "point(3, 4)" (describe {:x 3 :y 4}) ; => "not a point" ``` You can also match on `type`: ```sema (define (record-type-name v) (match (type v) (:point "a point") (:person "a person") (_ "something else"))) ``` ## Domain Modeling Example Use records to represent values that have been validated: ```sema (define-record-type email (make-email value) email? (value email-value)) (define (parse-email s) (if (regex/match? #".+@.+\..+" s) (make-email s) (error "invalid email"))) (define e (parse-email "ada@example.com")) (email? e) ; => #t (email-value e) ; => "ada@example.com" ``` ## Multiple Record Types ```sema (define-record-type color (make-color r g b) color? (r color-r) (g color-g) (b color-b)) (define-record-type person (make-person name age) person? (name person-name) (age person-age)) (define red (make-color 255 0 0)) (define ada (make-person "Ada" 36)) (color? red) ; => #t (person? ada) ; => #t (color? ada) ; => #f (color-r red) ; => 255 (person-name ada) ; => "Ada" (type red) ; => :color (type ada) ; => :person ``` ## Serialization Records are **not JSON-encodable** directly. If you need to serialize a record, convert it to a map first: ```sema (define (point->map p) {:x (point-x p) :y (point-y p)}) (json/encode (point->map (make-point 1 2))) ; => "{\"x\":1,\"y\":2}" ``` Similarly, when loading data from JSON or the KV store, convert maps to records after parsing. ## Tips & Edge Cases * **Accessor type-checking:** calling `point-x` on a non-point value errors * **Type tag:** the tag returned by `type` is derived from the record type name — `point` → `:point` * **No generic field access:** you can't use `get` or keyword-as-function on records — use the generated accessors --- --- url: 'https://sema-lang.com/docs/stdlib/text-processing.md' --- # Text Processing Sema includes utilities for text chunking, cleaning, prompt templates, and structured documents — building blocks for LLM pipelines. ## Text Chunking ### `text/chunk` Recursively split text into chunks, trying natural boundaries (paragraphs, sentences, words) before hard-splitting. Takes text and an optional options map. ```sema (text/chunk "Long text here...") (text/chunk "Long text here..." {:size 500 :overlap 100}) ``` Options: `:size` (default 1000), `:overlap` (default 200). Returns a list of strings. ### `text/chunk-by-separator` Split text by a specific separator string. ```sema (text/chunk-by-separator "a\nb\nc" "\n") ; => ("a" "b" "c") ``` ### `text/split-sentences` Split text into sentences at `.`, `!`, `?` boundaries. ```sema (text/split-sentences "Hello world. How are you? Fine.") ; => ("Hello world." "How are you?" "Fine.") ``` ## Text Cleaning ### `text/clean-whitespace` Collapse multiple whitespace characters (spaces, newlines, tabs) into single spaces. ```sema (text/clean-whitespace " hello world \n\n foo ") ; => "hello world foo" ``` ### `text/strip-html` Remove HTML tags and decode common entities (`&`, `<`, `>`, `"`, `'`, `'`, ` `). ```sema (text/strip-html "

Hello world

") ; => "Hello world" (text/strip-html "a & b < c") ; => "a & b < c" ``` ### `text/truncate` Truncate text to a maximum length with a suffix. Takes text, max-length, and optional suffix (default `"..."`). ```sema (text/truncate "hello world" 5) ; => "he..." (text/truncate "hello world" 8 "…") ; => "hello w…" (text/truncate "hi" 10) ; => "hi" ``` ### `text/word-count` Count words in text (split by whitespace). ```sema (text/word-count "hello world foo bar") ; => 4 ``` ### `text/trim-indent` Remove common leading indentation from all lines. ```sema (text/trim-indent " hello\n world") ; => "hello\nworld" (text/trim-indent " hello\n world") ; => "hello\n world" ``` ### `text/excerpt` Extract a snippet around a search term with omission markers. Case-insensitive search. Returns `nil` if query not found. ```sema (text/excerpt "The quick brown fox jumps over the lazy dog" "fox" {:radius 10}) ; => "...brown fox jumps ov..." (text/excerpt "Hello world" "Hello") ; => "Hello world" ;; Custom omission marker (text/excerpt "Long text here..." "text" {:radius 5 :omission "[…]"}) ; => "[…]g text here[…]" ``` Options map (optional third argument): * `:radius` — number of characters to show on each side (default: 100) * `:omission` — marker string for truncated parts (default: `"..."`) ### `text/normalize-newlines` Convert `\r\n` (Windows) and `\r` (old Mac) line endings to `\n` (Unix). ```sema (text/normalize-newlines "line1\r\nline2\rline3") ; => "line1\nline2\nline3" ``` ## Prompt Templates ### `prompt/template` Create a template string for use with `prompt/render`. ```sema (define tmpl (prompt/template "Hello {{name}}, welcome to {{place}}.")) ``` ### `prompt/render` Render a template by substituting `{{key}}` placeholders with values from a map. Missing keys are left as-is. ```sema (prompt/render "Hello {{name}}, welcome to {{place}}." {:name "Alice" :place "Wonderland"}) ; => "Hello Alice, welcome to Wonderland." (prompt/render "Hello {{name}}, {{missing}}." {:name "Bob"}) ; => "Hello Bob, {{missing}}." ;; Non-string values are stringified (prompt/render "Count: {{n}}" {:n 42}) ; => "Count: 42" ``` ## Documents Structured documents with metadata, designed for use with chunking and vector stores. ### `document/create` Create a document map with `:text` and `:metadata`. ```sema (document/create "Hello world" {:source "test.txt" :page 1}) ; => {:metadata {:page 1 :source "test.txt"} :text "Hello world"} ``` ### `document/text` Extract the text from a document. ```sema (document/text doc) ; => "Hello world" ``` ### `document/metadata` Extract the metadata from a document. ```sema (document/metadata doc) ; => {:source "test.txt" :page 1} ``` ### `document/chunk` Chunk a document, preserving and extending metadata. Each chunk gets `:chunk-index` and `:total-chunks` added to its metadata. ```sema (document/chunk (document/create "long text..." {:source "paper.pdf"}) {:size 500}) ; => ({:text "chunk 1..." :metadata {:source "paper.pdf" :chunk-index 0 :total-chunks 3}} ; {:text "chunk 2..." :metadata {:source "paper.pdf" :chunk-index 1 :total-chunks 3}} ; ...) ``` --- --- url: 'https://sema-lang.com/docs/llm.md' --- # LLM Primitives Sema's differentiating feature: LLM operations are first-class language primitives with prompts, conversations, tools, and agents as native data types. ## Setup Set one or more API keys as environment variables: ```bash export ANTHROPIC_API_KEY=sk-ant-... export OPENAI_API_KEY=sk-... # or any other supported provider ``` Sema auto-detects and configures all available providers on startup. Use `--no-llm` to skip auto-configuration. See [Provider Management](./providers.md) for the full list of supported providers and configuration options. ## Features ### [Completion & Chat](./completion.md) Simple completions, multi-message chat, and streaming responses. ### [Prompts & Messages](./prompts.md) Prompts as composable s-expressions, message construction, and prompt inspection. ### [Conversations](./conversations.md) Persistent, immutable conversation state with automatic LLM round-trips. ### [Tools & Agents](./tools-agents.md) Define tools the LLM can invoke, and build agents with system prompts, tools, and multi-turn loops. ### [Embeddings & Similarity](./embeddings.md) Generate embeddings (as bytevectors), compute cosine similarity, and access embedding dimensions. ### [Structured Extraction](./extraction.md) Extract structured data from text and images, classify inputs, and work with multi-modal content. ### [Vector Store & Math](./vector-store.md) In-memory vector store for semantic search, plus vector math utilities (cosine similarity, dot product, normalize, distance). ### [Caching](./caching.md) In-memory LLM response caching for iterative development and deduplication. ### [Cassettes (Record & Replay)](./cassettes.md) Record real LLM/agent responses to a file once, then replay them deterministically — keyless, offline tests and reproducible demos. ### [Resilience & Retry](./resilience.md) Fallback provider chains, rate limiting, generic retry with exponential backoff, and convenience functions (`llm/summarize`, `llm/compare`). ### [Provider Management](./providers.md) Auto-configuration, runtime provider switching, custom providers, and OpenAI-compatible endpoints. ### [Cost Tracking & Budgets](./cost.md) Usage tracking, budget enforcement, and batch/parallel operations. ### Observability (OpenTelemetry) Built-in, standards-compliant OpenTelemetry tracing + metrics for **every** LLM and agent run — no manual instrumentation. Each completion and tool call is auto-traced (`invoke_agent → chat → execute_tool`) with tokens, cost, and latency, exportable to any OTLP backend or a JSONL file. Off by default, zero-cost when off. * **[Tracing & Metrics](./observability.md)** — the GenAI spans and metrics, sessions, privacy controls, and embedding Sema in your own app. * **[Backend Compatibility](./otel-compat.md)** — label the data so tools that use their own attribute names (Arize Phoenix, Langfuse, Traceloop, LangSmith) read it too via `SEMA_OTEL_COMPAT`. Most other tools work with no extra setup. --- --- url: 'https://sema-lang.com/docs/llm/completion.md' --- # Completion & Chat ## Completion ### `llm/complete` Send a single prompt string and get a completion back. ```sema ;; Simple completion (llm/complete "Say hello in 5 words" {:max-tokens 50}) ``` With options: ```sema (llm/complete "Explain monads" {:model "claude-haiku-4-5-20251001" :max-tokens 200 :temperature 0.3 :system "You are a Haskell expert."}) ``` ### `llm/stream` Stream a completion, printing chunks as they arrive. ```sema (llm/stream "Tell me a story" {:max-tokens 200}) ``` With a callback function: ```sema (llm/stream "Tell me a story" (fn (chunk) (display chunk)) {:max-tokens 200}) ``` `llm/stream` **returns the full accumulated response string** once streaming finishes — so you can show the live stream *and* keep the final text: ```sema (define story (llm/stream "Tell me a story" (fn (c) (display c)) {:max-tokens 200})) ;; `story` is the complete text after the stream ends. ``` ## Chat ### `llm/chat` Send a list of messages and get a response. Supports system, user, and assistant messages. ```sema (llm/chat (list (message :system "You are a helpful assistant.") (message :user "What is Lisp? One sentence.")) {:max-tokens 100}) ``` When you pass `:tools`, `llm/chat` runs the tool-execution loop for you (see [Tools & Agents](./tools-agents)). Two options bound it: `:tool-mode :none` lets the model *see* the tools but never auto-executes them, and `:max-tool-rounds N` caps the loop (default 10). ### Multi-Modal Chat Send messages that include images alongside text using `message/with-image`. ```sema ;; Load an image and ask the LLM about it (define img (file/read-bytes "photo.jpg")) (define msg (message/with-image :user "Describe this image." img)) (llm/chat (list msg)) ``` Combine with regular messages: ```sema (llm/chat (list (message :system "You are an image analyst.") (message/with-image :user "What text is in this image?" (file/read-bytes "doc.png")))) ``` The image must be a bytevector. Media type (PNG, JPEG, GIF, WebP, PDF) is detected automatically from magic bytes. See [Vision Extraction](./extraction.md#vision-extraction) for structured data extraction from images. ### `llm/send` Send a prompt value (composed from `prompt` expressions) to the LLM. ```sema (define review-prompt (prompt (system "You are a code reviewer. Be concise.") (user "Review this function."))) (llm/send review-prompt {:max-tokens 200}) ``` ## Options All completion and chat functions accept an options map with these keys: | Key | Description | | -------------- | ------------------------------------------------------------- | | `:model` | Model name (e.g. `"claude-haiku-4-5-20251001"`) | | `:max-tokens` | Maximum tokens in response | | `:temperature` | Sampling temperature (0.0–1.0) | | `:system` | System prompt (for `llm/complete`) | | `:reasoning-effort` | Reasoning effort for thinking models — see below | | `:tools` | List of tool values (see [Tools & Agents](./tools-agents.md)) | | `:timeout` | Per-call HTTP timeout in **milliseconds** (network providers; non-streaming) | | `:tags` / `:metadata` | Observability tags/metadata — see [Backend Compatibility](./otel-compat.md) | ### Reasoning effort `:reasoning-effort` controls how much a reasoning/thinking model deliberates before answering. It takes a keyword or string: `:minimal`, `:low`, `:medium`, `:high`, `:none`, or `:xhigh`. It is a single **portable** option — Sema maps it to each provider's native control, so the same code works everywhere: ```sema (llm/complete "Prove that sqrt(2) is irrational." {:model "gpt-5.4-mini" :reasoning-effort :high :max-tokens 4000}) ``` | Provider | Mapped to | | --------- | ------------------------------------------------------------------------------------- | | OpenAI | native `reasoning_effort` (gpt-5 / o-series) | | Anthropic | extended **thinking** — effort sets the thinking `budget_tokens` (and raises `max_tokens` above it; `temperature` is forced to default while thinking) | | Gemini | `thinkingConfig.thinkingBudget` (`:none`/`:minimal` disable thinking) | Models and providers that don't support reasoning effort ignore the option (no-op). It is also accepted by `llm/chat` and per-run on `agent/run` (`{:reasoning-effort :high}`). --- --- url: 'https://sema-lang.com/docs/llm/tools-agents.md' --- # Tools & Agents ## Tools Tools let you define functions that the LLM can invoke during a conversation. The LLM sees the tool's name, description, and parameter schema, and can call it when appropriate. ### `deftool` Define a tool with a name, description, parameter schema, and handler function. ```sema (deftool lookup-capital "Look up the capital of a country" {:country {:type :string :description "Country name"}} (lambda (country) (cond ((= country "Norway") "Oslo") ((= country "France") "Paris") (else "Unknown")))) ``` ### Using Tools with Chat Pass tools to `llm/chat` — the LLM will call them automatically when needed. ```sema (llm/chat (list (message :user "What is the capital of Norway?")) {:tools (list lookup-capital) :max-tokens 100}) ``` ### Inspecting Tools ### `tool/name` ```sema (tool/name lookup-capital) ; => "lookup-capital" ``` ### `tool/description` ```sema (tool/description lookup-capital) ; => "Look up the capital..." ``` ### `tool/parameters` ```sema (tool/parameters lookup-capital) ; => {:country {:type :string ...}} ``` ### `tool?` ```sema (tool? lookup-capital) ; => #t ``` ## Agents Agents combine a system prompt, tools, and a multi-turn loop. They handle the back-and-forth of tool calls automatically. ### `defagent` Define an agent with a system prompt, tools, model, and turn limit. ```sema (deftool get-weather "Get weather for a city" {:city {:type :string}} (lambda (city) (format "~a: 22°C, sunny" city))) (defagent weather-bot {:system "You are a weather assistant. Use the get-weather tool." :tools [get-weather] :model "claude-haiku-4-5-20251001" :max-turns 3}) ``` ### `agent/run` Run an agent with a user message. The agent loops, calling tools as needed, until it has a final answer or hits the turn limit. The two-argument form returns the final answer as a **string**: ```sema (agent/run weather-bot "What's the weather in Tokyo?") ; => "It's sunny, 22°C." ``` An optional third argument takes per-run options. **Passing an options map changes the return value** to a map with the final reply *and* the full message history: ```sema (define result (agent/run weather-bot "What's the weather in Tokyo?" {:reasoning-effort :high ; reasoning effort for this run (see Completion) :messages prior-history ; seed the loop with prior conversation :on-tool-call observe-tool})) ; observe each tool call — see below (:response result) ; => the final answer string (:messages result) ; => the full conversation (to continue or inspect) ``` **Observing tool calls.** `:on-tool-call` fires once when each tool starts and once when it ends. The event is a map — branch on `(:event e)`, the string `"start"` or `"end"`: ```sema (define (observe-tool e) (when (= (:event e) "end") (println (:tool e) "→" (:result e) (format "(~ams)" (:duration-ms e))))) ``` The event map carries `:event` (`"start"` / `"end"`), `:tool` (the tool name), and `:args`; on `"end"` it adds `:result` (a preview of the return value), `:error` (a boolean), and `:duration-ms`. **Error recovery.** A tool that throws, isn't found, or is called with arguments that don't match its declared schema does **not** abort the run — the error is fed back to the model as the tool result so it can correct itself and continue. The loop is bounded by `:max-turns` and aborts after 5 consecutive tool errors. ### Inspecting Agents ### `agent/name` ```sema (agent/name weather-bot) ; => "weather-bot" ``` ### `agent/system` ```sema (agent/system weather-bot) ; => "You are a weather assistant..." ``` ### `agent/tools` ```sema (agent/tools weather-bot) ; => list of tool values ``` ### `agent/model` ```sema (agent/model weather-bot) ; => "claude-haiku-4-5-20251001" ``` ### `agent/max-turns` ```sema (agent/max-turns weather-bot) ; => 3 ``` ### `agent?` ```sema (agent? weather-bot) ; => #t ``` --- --- url: 'https://sema-lang.com/docs/llm/conversations.md' --- # Conversations Conversations are immutable data structures that maintain chat history. Each operation returns a new conversation value — the original is never modified. This means you always re-bind the result: ```sema (define conv (conversation/new {:model "claude-haiku-4-5-20251001"})) (define conv (conversation/set-system conv "You are a concise tutor.")) (define conv (conversation/say conv "Explain closures in 2 bullets.")) (println (conversation/last-reply conv)) ;; Branch to explore a different direction (define alt (conversation/fork conv)) (define alt (conversation/say alt "Now explain with JavaScript examples.")) ;; conv is unchanged — alt is an independent conversation ``` ## Creating Conversations ### `conversation/new` Create a new conversation, optionally with a model. `(conversation/new)` is equivalent to `(conversation/new {})`. ```sema (define conv (conversation/new {:model "claude-haiku-4-5-20251001"})) (define conv (conversation/new)) ``` ## Interacting ### `conversation/say` Send a user message to the LLM and get a response. Returns a new conversation with both the user message and the assistant's reply appended. ```sema (define conv (conversation/new {:model "claude-haiku-4-5-20251001"})) (define conv (conversation/say conv "Remember: the secret number is 7")) (define conv (conversation/say conv "What is the secret number?")) (conversation/last-reply conv) ; => "The secret number is 7." ``` With options: ```sema (define conv (conversation/say conv "Explain more" {:temperature 0.5 :max-tokens 500})) ``` ### `conversation/add-message` Manually add a message without making an LLM call. Useful for constructing conversation history programmatically. ```sema (define c (conversation/new)) (define c (conversation/add-message c :system "You are helpful.")) (define c (conversation/add-message c :user "hello")) (define c (conversation/add-message c :assistant "hi there")) ``` ### `conversation/say-as` Send a message with a different system prompt for one turn only. The override applies to the API call but doesn't change the conversation's stored system message. Accepts a system string or a prompt value. ```sema ;; With a prompt value — uses its system message for this turn (define argue-for (prompt (system "You argue IN FAVOR of Lisp."))) (define conv (conversation/new {:model "claude-sonnet-4-6"})) (define conv (conversation/say-as conv argue-for "Make your case.")) ;; With a plain string — treated as system content (define conv (conversation/say-as conv "You argue AGAINST Lisp." "Rebut the argument.")) ``` ## Inspecting ### `conversation/last-reply` Get the content of the last assistant message. ```sema (conversation/last-reply conv) ; => "The secret number is 7." ``` ### `conversation/messages` Get the full list of messages as message values. ```sema (conversation/messages conv) ; => list of message values (length (conversation/messages conv)) ; => 5 ``` ### `conversation/model` Get the model associated with the conversation. ```sema (conversation/model conv) ; => "claude-haiku-4-5-20251001" ``` ## System Message ### `conversation/system` Get the system message content, or `nil` if none is set. ```sema (define c (conversation/add-message (conversation/new) :system "Be helpful.")) (conversation/system c) ; => "Be helpful." (conversation/system (conversation/new)) ; => nil ``` ### `conversation/set-system` Set or replace the system message. All existing system messages are replaced with one new one; other messages are preserved. ```sema (define c (conversation/set-system (conversation/new) "You are a code reviewer.")) (conversation/system c) ; => "You are a code reviewer." ``` ## Filtering & Transforming ### `conversation/filter` Keep only messages matching a predicate. Returns a new conversation. ```sema ;; Keep only user messages (define user-only (conversation/filter conv (fn (m) (= (message/role m) :user)))) ;; Remove system messages (define no-system (conversation/filter conv (fn (m) (not (= (message/role m) :system))))) ``` ### `conversation/map` Apply a function to each message, returning a list of results (not a conversation). ```sema ;; Extract all message contents (conversation/map conv message/content) ;; Build a summary with role prefixes (conversation/map conv (fn (m) (string/append "[" (keyword/to-string (message/role m)) "] " (message/content m)))) ``` ## Usage & Cost ### `conversation/token-count` Estimated token count for the conversation (heuristic: ~4 characters per token). ```sema (conversation/token-count conv) ; => 342 ``` ### `conversation/cost` Estimated input cost in dollars based on the conversation's model pricing. Returns `nil` if pricing is unavailable for the model. ```sema (conversation/cost conv) ; => 0.00034 (or nil) ``` ## Branching ### `conversation/fork` Create an independent copy of a conversation. Since conversations are immutable, forking lets you explore different directions from the same point. ```sema (define conv (conversation/new {:model "claude-haiku-4-5-20251001"})) (define conv (conversation/say conv "Remember the number 7")) ;; Fork and take two different paths (define branch-a (conversation/say (conversation/fork conv) "What about Python?")) (define branch-b (conversation/say (conversation/fork conv) "What about Rust?")) ;; conv, branch-a, branch-b are all independent ``` ## Type Predicate ### `conversation?` Check if a value is a conversation. ```sema (conversation? conv) ; => #t (conversation? 42) ; => #f ``` --- --- url: 'https://sema-lang.com/docs/llm/prompts.md' --- # Prompts & Messages Prompts in Sema are composable data structures — not string templates. They are built from message expressions, and can be inspected, transformed, and composed before being sent to an LLM. The core idea: build small prompt pieces, compose them together, fill in template slots, and send the result. Everything is a value you can pass around, store, and introspect. ```sema ;; Build reusable prompt pieces (define safety (prompt (system "Follow policy. Refuse unsafe requests."))) (define domain (prompt (system "You are a senior Lisp developer."))) (define task (prompt (user "Review this function:\n\n{{code}}"))) ;; Compose, fill, and send (define p (prompt/concat safety domain task)) (define ready (prompt/fill p {:code "(define (f x) (+ x 1))"})) (llm/send ready {:max-tokens 300}) ``` ## Messages A message is a role–content pair. The role is a keyword: `:system`, `:user`, or `:assistant`. ### `message` Create a message with a role and content. ```sema (message :system "You are a helpful assistant.") (message :user "What is Lisp?") (message :assistant "Lisp is a family of programming languages.") ``` ### `message/role` Get the role of a message as a keyword. ```sema (message/role (message :user "hi")) ; => :user ``` ### `message/content` Get the text content of a message. ```sema (message/content (message :user "hi")) ; => "hi" ``` ## Building Prompts ### `prompt` Build a prompt from message expressions. Inside `prompt`, use the shorthand constructors `(system ...)`, `(user ...)`, and `(assistant ...)` — these are equivalent to `(message :system ...)`, etc. ```sema (define review-prompt (prompt (system "You are a code reviewer. Be concise.") (user "Review this function."))) ``` ### `prompt/messages` Get the list of messages from a prompt. ```sema (prompt/messages my-prompt) ; => list of message values (length (prompt/messages my-prompt)) ; => 2 ``` ## Composing Prompts ### `prompt/append` Compose prompts by appending their messages together. Variadic — accepts 2 or more prompts. ```sema (define base (prompt (system "You are helpful."))) (define question (prompt (user "What is 2+2?"))) (define full (prompt/append base question)) ;; Three or more prompts (define safety (prompt (system "Be safe."))) (define full (prompt/append base safety question)) (llm/send full) ``` ### `prompt/concat` Alias for `prompt/append`. Use whichever name reads better in context. ```sema (define full (prompt/concat base-prompt safety-prompt domain-prompt)) ``` ## Templating ### `prompt/fill` Substitute `{{key}}` placeholders in all message contents using a map. Unfilled slots are left as-is, so you can partially fill a template and fill the rest later. ```sema (define template (prompt (system "You are a {{role}} reviewing {{language}} code.") (user "{{query}}"))) ;; Full fill (define filled (prompt/fill template {:role "expert" :language "Rust" :query "Explain this."})) ;; Partial fill — unfilled slots remain as {{...}} (define partial (prompt/fill template {:role "code reviewer"})) ;; partial still has {{language}} and {{query}} unfilled ``` ### `prompt/slots` Return a list of unfilled `{{slot}}` names as keywords. Duplicates are removed. ```sema (prompt/slots template) ; => (:role :language :query) ;; After partial fill, only unfilled slots remain (prompt/slots (prompt/fill template {:role "expert"})) ;; => (:language :query) ;; After full fill, no slots remain (prompt/slots filled) ; => () ``` Use `prompt/slots` to validate that all required slots are filled before sending: ```sema (when (not (null? (prompt/slots my-prompt))) (error "unfilled slots remain")) ``` ## Modifying Prompts ### `prompt/set-system` Replace all system messages with a single new one. Non-system messages are preserved. ```sema (define p (prompt (system "old system") (user "hello"))) (define p2 (prompt/set-system p "new system instructions")) ;; p2 has: [(system "new system instructions"), (user "hello")] ``` ## Type Predicates ### `prompt?` Check if a value is a prompt. ```sema (prompt? review-prompt) ; => #t (prompt? 42) ; => #f ``` ### `message?` Check if a value is a message. ```sema (message? (message :user "hi")) ; => #t (message? "not a message") ; => #f ``` --- --- url: 'https://sema-lang.com/docs/llm/extraction.md' --- # Structured Extraction Extract structured data from unstructured text using LLM-powered schema-based extraction and classification. ## Extraction ### `llm/extract` Extract structured data from text according to a schema. The schema defines the expected fields and their types. ```sema (llm/extract {:vendor {:type :string} :amount {:type :number} :date {:type :string}} "I bought coffee for $4.50 at Blue Bottle on Jan 15, 2025") ; => {:amount 4.5 :date "2025-01-15" :vendor "Blue Bottle"} ``` The schema map specifies field names as keys and type descriptors as values. Supported types include `:string`, `:number`, `:boolean`, and `:list`/`:array`. A field value can be written two ways, and they behave differently: * **Descriptor map** — `{:amount {:type :number}}`. This form is **type-checked**, and supports `:optional` and a custom `:validate` predicate (below). Use it for any field you want validated. * **Bare type keyword** — `{:amount :number}` is shorthand, but the type is sent to the model only as an untyped hint — it is **not** validated. Reach for the descriptor map when correctness matters. Only `:type`, `:optional`, and `:validate` on a field descriptor affect behavior; a `:description` on a field is currently ignored by extraction (it isn't sent to the model). ### Options `llm/extract` accepts an optional third argument — an options map: ```sema (llm/extract schema text {:model "claude-haiku-4-5-20251001"}) ``` | Option | Type | Default | Description | | ----------- | ------- | ------- | -------------------------------------------------- | | `:model` | string | — | Override the default model | | `:validate` | boolean | `#t` | Validate response against the schema | | `:retries` | integer | `2` | Max retry attempts on validation failure | | `:reask?` | boolean | `#t` | Feed validation errors back to the LLM on retry | ### Schema Validation By default, the extracted result is validated against the schema: * All required schema keys must be present in the result * Types must match: `:string` → string, `:number` → integer or float, `:boolean` → boolean, `:list`/`:array` → list or vector ```sema (llm/extract {:name {:type :string} :age {:type :number}} "Alice is 30 years old") ; => {:age 30 :name "Alice"} ``` If validation fails, an error is raised with details about which fields didn't match. ### Optional Fields Mark fields as optional with `:optional #t`. Missing optional fields won't trigger validation errors: ```sema (llm/extract {:name {:type :string} :nickname {:type :string :optional #t}} "Her name is Ada Lovelace.") ; => {:name "Ada Lovelace"} ;; No error even though :nickname is missing ``` ### Custom Validation Predicates Use `:validate` on individual field specs to run a custom predicate after type checking. If the predicate returns falsy, the field fails validation and triggers a retry: ```sema (llm/extract {:amount {:type :number :validate #(> % 0)} :vendor {:type :string :validate #(> (string/length %) 0)}} "Invoice from Acme Corp for $42.50") ; => {:amount 42.5 :vendor "Acme Corp"} ``` Add `:message` to provide a human-readable error description. This message is fed back to the LLM in re-ask prompts, helping it correct its response: ```sema (llm/extract {:age {:type :number :validate #(and (>= % 0) (<= % 150)) :message "age must be between 0 and 150"}} "She is 30 years old.") ; => {:age 30} ``` Without `:message`, the default error text includes the field value: `"custom validation failed for value -5"`. ### Retry on Mismatch Validation failures automatically trigger retries (up to `:retries`, default 2). On each retry, the validation errors are fed back to the LLM to improve the next attempt. After exhausting retries, the final validation error is raised. ```sema (llm/extract {:items {:type :list} :total {:type :number :validate pos?}} "3 apples, 2 oranges, total 5 items") ``` Disable automatic retries with `{:retries 0}` or disable validation entirely with `{:validate #f}`. ## Classification ### `llm/classify` Classify text into one of a set of categories. Returns the matching keyword. ```sema (llm/classify (list :positive :negative :neutral) "This product is amazing!") ; => :positive ``` Pass a list of keyword labels and the text to classify. The LLM picks the best-matching label. An optional third options map takes `:model` — handy for using a cheap, fast model for classification: ```sema (llm/classify (list :spam :ham) text {:model "claude-haiku-4-5-20251001"}) ``` The return type follows the labels: a list of **keywords** classifies to a keyword, a list of **strings** to a string. ## Vision Extraction ### `llm/extract-from-image` Extract structured data from images using vision-capable LLMs. Accepts a schema, an image source (file path or bytevector), and optional options. ```sema ;; Extract from a file path (llm/extract-from-image {:text :string :background_color :string} "assets/logo.png") ; => {:background_color "white" :text "Sema"} ;; Extract from a bytevector (define img (file/read-bytes "invoice.jpg")) (llm/extract-from-image {:invoice_number :string :date :string :total :string} img) ; => {:date "2025-03-15" :invoice_number "12345" :total "$139.96"} ``` Supported image formats (detected automatically via magic bytes): PNG, JPEG, GIF, WebP, PDF. ### Options `llm/extract-from-image` accepts an optional third argument — an options map: ```sema (llm/extract-from-image schema source {:model "gpt-5.5"}) ``` | Option | Type | Default | Description | | -------- | ------ | ------- | -------------------------- | | `:model` | string | — | Override the default model | ## Multi-Modal Messages ### `message/with-image` Create a message that includes both text and an image, for use with `llm/chat`. ```sema (define img (file/read-bytes "photo.jpg")) (define msg (message/with-image :user "What do you see?" img)) (llm/chat (list msg)) ``` The image must be a bytevector (use `file/read-bytes` to load from disk). The media type is detected automatically. You can combine image messages with regular messages: ```sema (llm/chat (list (message :system "You are a helpful image analyst.") (message/with-image :user "Describe this chart." (file/read-bytes "chart.png")))) ``` ### Provider Support Vision features work with providers that support multi-modal input: | Provider | `llm/extract-from-image` | `message/with-image` | | ------------- | ------------------------ | -------------------- | | **Anthropic** | ✅ | ✅ | | **OpenAI** | ✅ | ✅ | | **Gemini** | ✅ | ✅ | | **Ollama** | ✅ (model-dependent) | ✅ (model-dependent) | For Ollama, use a vision-capable model like `gemma3:4b` or `llava`. --- --- url: 'https://sema-lang.com/docs/llm/providers.md' --- # Provider Management ## Auto-Configuration Sema auto-detects and configures all available providers from environment variables on startup. No manual setup is required — just set the API key for your provider. ### `llm/auto-configure` Manually trigger auto-configuration (runs automatically on startup unless `--no-llm` is used). ```sema (llm/auto-configure) ``` ## Manual Configuration ### `llm/configure` Manually configure a known provider with specific options. ```sema (llm/configure :anthropic {:api-key "sk-..."}) ;; Ollama with custom host (llm/configure :ollama {:host "http://localhost:11434" :default-model "llama3"}) ``` ### OpenAI-Compatible Providers Any provider with an OpenAI-compatible API can be registered by passing `:api-key` and `:base-url` with any provider name — no custom code needed, just configuration. ```sema ;; Together AI (llm/configure :together {:api-key (env "TOGETHER_API_KEY") :base-url "https://api.together.xyz/v1" :default-model "meta-llama/Llama-3-70b-chat-hf"}) ;; Azure OpenAI (llm/configure :azure {:api-key (env "AZURE_OPENAI_KEY") :base-url "https://my-resource.openai.azure.com/openai/deployments/gpt-4/v1" :default-model "gpt-4"}) ;; Local vLLM / LiteLLM / text-generation-inference (llm/configure :local {:api-key "not-needed" :base-url "http://localhost:8000/v1" :default-model "my-model"}) ;; Once configured, use like any other provider (llm/complete "Hello from Together!" {:model "meta-llama/Llama-3-70b-chat-hf"}) ``` This works for any service that implements the OpenAI chat completions API: Together, Fireworks, Perplexity, Azure OpenAI, Anyscale, vLLM, LiteLLM, text-generation-inference, and others. > **Sandbox note.** Local endpoints like `http://localhost:8000/v1` and Ollama on `localhost:11434` work normally in the REPL, CLI, and notebook. When running **untrusted code under `--sandbox`**, a `:base-url`/`:host` pointing at a loopback or private address (`localhost`, `127.0.0.1`, `10.x`, `169.254.169.254`, …) is rejected to prevent SSRF. Run unsandboxed to use a local endpoint. ## Lisp-Defined Providers For full control over request/response handling, you can define providers entirely in Sema using `llm/define-provider`. The provider's `:complete` function receives the request as a map and returns either a string or a response map. ### `llm/define-provider` ```sema (llm/define-provider :name {:complete fn :default-model "..."}) ``` **Parameters:** * `:complete` — **(required)** A function that takes a request map and returns a response * `:default-model` — Model name used when none is specified (default: `"default"`) ### Request Map The `:complete` function receives a map with these keys: | Key | Type | Description | | ----------------- | -------------- | ---------------------------------- | | `:model` | string | Model name | | `:messages` | list of maps | Each has `:role` and `:content` | | `:max-tokens` | integer or nil | Token limit | | `:temperature` | float or nil | Sampling temperature | | `:system` | string or nil | System prompt | | `:tools` | list or nil | Tool schemas (if tools are in use) | | `:stop-sequences` | list or nil | Stop sequences for generation | ### Response Format The function can return either: * **A string** — used as the assistant's response content * **A map** with optional keys: | Key | Type | Default | | -------------- | ------ | ------------- | | `:content` | string | `""` | | `:role` | string | `"assistant"` | | `:model` | string | request model | | `:stop-reason` | string | `"end_turn"` | | `:usage` | map | zero tokens | | `:tool-calls` | list | empty list | The `:usage` map can contain `:prompt-tokens` and `:completion-tokens` (both integers). The `:tool-calls` list contains maps with `:id` (string), `:name` (string), and `:arguments` (map). This enables Lisp-defined providers to work with tool-calling agents. ### Examples **Echo provider** — returns the user's message back: ```sema (llm/define-provider :echo {:complete (fn (req) (string/append "Echo: " (:content (last (:messages req))))) :default-model "echo-v1"}) (llm/complete "hello") ;; => "Echo: hello" ``` **HTTP proxy** — forward to a custom API: ```sema (llm/define-provider :my-api {:complete (fn (req) (define resp (json/decode (http/post "https://my-api.example.com/chat" {:headers {"Authorization" (string/append "Bearer " (env "MY_API_KEY")) "Content-Type" "application/json"} :body (json/encode {:model (:model req) :prompt (:content (last (:messages req)))})}))) {:content (:text resp) :usage {:prompt-tokens (:input-tokens resp) :completion-tokens (:output-tokens resp)}}) :default-model "my-model-v2"}) ``` **Mock provider for testing** — deterministic responses without API calls: ```sema (define responses (list "First response" "Second response" "Third response")) (define call-count (atom 0)) (llm/define-provider :mock {:complete (fn (req) (let ((i (deref call-count))) (swap! call-count (fn (n) (+ n 1))) (nth responses (mod i (length responses))))) :default-model "mock-v1"}) ;; Now all llm/complete calls return deterministic values (llm/complete "anything") ;; => "First response" (llm/complete "anything") ;; => "Second response" ``` **Routing provider** — dispatch to different backends by model name: ```sema (llm/configure :anthropic {:api-key (env "ANTHROPIC_API_KEY")}) (llm/configure :openai {:api-key (env "OPENAI_API_KEY")}) (llm/define-provider :router {:complete (fn (req) (let ((model (:model req))) (cond ((string/starts-with? model "claude") (begin (llm/set-default :anthropic) (llm/complete (:content (last (:messages req))) {:model model}))) ((string/starts-with? model "gpt") (begin (llm/set-default :openai) (llm/complete (:content (last (:messages req))) {:model model}))) (else (error (string/append "Unknown model: " model)))))) :default-model "claude-sonnet-4-6"}) ``` ### Switching Between Providers Lisp-defined providers integrate with the standard provider management functions: ```sema (llm/define-provider :mock {:complete (fn (req) "mock response") :default-model "m1"}) (llm/configure :anthropic {:api-key (env "ANTHROPIC_API_KEY")}) (llm/set-default :mock) ;; use mock (llm/complete "test") ;; => "mock response" (llm/set-default :anthropic) ;; switch to real API (llm/complete "test") ;; => real API response ``` ## Runtime Provider Switching ### `llm/list-providers` List all configured providers. ```sema (llm/list-providers) ; => (:anthropic :gemini :openai ...) (llm/providers) ; => same (alias) ``` ### `llm/current-provider` Get the currently active provider and model. ```sema (llm/current-provider) ; => {:name :anthropic :model "claude-sonnet-4-6"} (llm/default-provider) ; => same (alias) ``` ### `llm/set-default` Switch the active provider at runtime. ```sema (llm/set-default :openai) ``` ## Supported Providers All providers are auto-configured from environment variables. Use `(llm/configure :provider {...})` for manual setup. | Provider | Type | Chat | Stream | Tools | Embeddings | Vision | | ------------------- | --------------------- | ---- | ------ | ----- | ---------- | ------ | | **Anthropic** | Native | ✅ | ✅ | ✅ | — | ✅ | | **OpenAI** | Native | ✅ | ✅ | ✅ | ✅ | ✅ | | **Google Gemini** | Native | ✅ | ✅ | ✅ | — | ✅ | | **Ollama** | Native (local) | ✅ | ✅ | ✅ | — | ✅ ² | | **Groq** | OpenAI-compat | ✅ | ✅ | ✅ | — | — | | **xAI** | OpenAI-compat | ✅ | ✅ | ✅ | — | — | | **Mistral** | OpenAI-compat | ✅ | ✅ | ✅ | — | — | | **Moonshot** | OpenAI-compat | ✅ | ✅ | ✅ | — | — | | **Jina** | Embedding-only | — | — | — | ✅ | — | | **Voyage** | Embedding-only | — | — | — | ✅ | — | | **Cohere** | Embedding-only | — | — | — | ✅ | — | | *Any OpenAI-compat* | `llm/configure` | ✅ | ✅ | ✅ | — | ✅ | | *Custom Lisp* | `llm/define-provider` | ✅ | ¹ | ✅ | — | — | ¹ Streaming falls back to non-streaming (sends complete response as a single chunk). ² Vision requires a vision-capable model (e.g., `gemma3:4b`, `llava`). ### Default Models When you don't pass `:default-model` to `llm/configure` (or pin `:model` on a call), each provider uses the following default. These are also what `llm/with-fallback` substitutes per provider when the body doesn't pin a model. | Provider | Default model | | ------------ | ---------------------------- | | `:anthropic` | `claude-sonnet-4-6` | | `:openai` | `gpt-5.5` | | `:gemini` | `gemini-3.5-flash` | | `:ollama` | `gemma4` | | `:groq` | `llama-3.3-70b-versatile` | | `:xai` | `grok-4.3` | | `:mistral` | `mistral-large-latest` | | `:moonshot` | `kimi-k2.6` | Override any of these per provider with `:default-model`, globally via `SEMA_CHAT_MODEL`, or per call with `:model`. ## Environment Variables | Variable | Description | | -------------------- | ----------------------------------------------------- | | `ANTHROPIC_API_KEY` | Anthropic API key | | `OPENAI_API_KEY` | OpenAI API key | | `GROQ_API_KEY` | Groq API key | | `XAI_API_KEY` | xAI/Grok API key | | `MISTRAL_API_KEY` | Mistral API key | | `MOONSHOT_API_KEY` | Moonshot API key | | `GOOGLE_API_KEY` | Google Gemini API key | | `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) | | `JINA_API_KEY` | Jina embeddings API key | | `VOYAGE_API_KEY` | Voyage embeddings API key | | `COHERE_API_KEY` | Cohere embeddings API key | | `SEMA_CHAT_MODEL` | Default chat model name | | `SEMA_CHAT_PROVIDER` | Preferred chat provider | | `SEMA_EMBEDDING_MODEL` | Default embedding model name | | `SEMA_EMBEDDING_PROVIDER` | Preferred embedding provider | --- --- url: 'https://sema-lang.com/docs/llm/cost.md' --- # Cost Tracking & Budgets ## Usage Tracking ### `llm/last-usage` Get token usage from the most recent LLM call. ```sema (llm/last-usage) ; => {:prompt-tokens 42 :completion-tokens 15 :total-tokens 57 ; :cache-read-tokens 0 :cache-creation-tokens 0 ; :model "..." :cost-usd 0.0003} ``` ### `llm/session-usage` Get cumulative usage across all LLM calls in the current session. ```sema (llm/session-usage) ; => {:prompt-tokens 1280 :completion-tokens 410 :total-tokens 1690 ; :cache-read-tokens 1024 :cache-creation-tokens 0 :cost-usd 0.012} ``` #### Prompt-cache tokens `:cache-read-tokens` and `:cache-creation-tokens` report how many input tokens were served from (or written to) the provider's **prompt cache** — large savings when you repeat a stable prefix across calls. * **OpenAI** and **Gemini** (2.5+) cache *implicitly*: send the same long prefix twice and the second call reports `:cache-read-tokens` automatically. Reads are a subset of `:prompt-tokens`. * **Anthropic** reports `:cache-read-tokens` and `:cache-creation-tokens` *separately* from `:prompt-tokens` (caching there is opt-in via `cache_control`). * Providers that don't report cache counts leave these at `0`. > Cost is currently priced at the standard input rate; cached reads are reported > for visibility but not yet discounted in `:cost-usd`. ### `llm/reset-usage` Reset session usage counters. ```sema (llm/reset-usage) ``` ## Pricing Sources Sema tracks LLM costs using pricing data from these sources, checked in this order: 1. **Custom pricing** — set via `(llm/set-pricing "model" input output)`, always wins 2. **Bundled price list** — a [models.dev](https://models.dev) snapshot (2,400+ models) that ships with Sema, so cost tracking works fully offline with no network calls 3. **Unknown** — if no source matches, cost tracking returns `nil` and budget enforcement is best-effort The embedded snapshot is refreshed by maintainers with `make update-pricing` and shipped in patch releases. Prices are matched by model id, preferring the canonical first-party listing; when the serving provider is known (e.g. inside an `llm/with-fallback` chain), a reseller/gateway that lists the same model at a different rate is priced correctly. ### `llm/pricing-status` Check the pricing source and the snapshot date. ```sema (llm/pricing-status) ; => {:source "embedded" :updated-at "2026-06-18"} ``` ## Budget Enforcement > **Note:** If pricing is unknown for a model (not in any source), budget enforcement operates in best-effort mode — the call proceeds with a one-time warning. Use `(llm/set-pricing)` to set pricing for unlisted models. ### `llm/set-budget` Set a spending limit (in dollars) for the session. LLM calls that would exceed the budget will fail. ```sema (llm/set-budget 1.00) ; set $1.00 spending limit ``` ### `llm/budget-remaining` Check current budget status. ```sema (llm/budget-remaining) ; => {:limit 1.0 :spent 0.05 :remaining 0.95} ``` ### `llm/with-budget` Scoped budget — sets spending limits for the duration of a thunk, then restores the previous budget when done. At least one of `:max-cost-usd` or `:max-tokens` is required. When both are provided, **whichever limit is hit first** triggers the error. ```sema ;; Cost-based budget (llm/with-budget {:max-cost-usd 0.50} (lambda () (llm/complete "Expensive operation"))) ;; Token-based budget (useful when pricing is unknown or stale) (llm/with-budget {:max-tokens 10000} (lambda () (llm/complete "Limited tokens"))) ;; Both limits — whichever is reached first stops execution (llm/with-budget {:max-cost-usd 1.00 :max-tokens 50000} (lambda () (llm/complete "Double-capped") (println (format "Budget: ~a" (llm/budget-remaining))))) ``` When a token budget is active, `llm/budget-remaining` includes `:token-limit`, `:tokens-spent`, and `:tokens-remaining` in addition to the cost fields. #### Streaming and the budget By default, budgets enforce on **non-streaming** calls (the spend is known after each call completes). A stream's cost isn't known until it ends, so streams aren't budget-gated unless you opt in with `:on-stream :pre-gate` — which refuses to **open** a stream once the scope's spend is already at the cap: ```sema (llm/with-budget {:max-cost-usd 0.50 :on-stream :pre-gate} (lambda () (llm/stream "..." on-token))) ; blocked at open once $0.50 is spent ``` A single in-flight stream can still push *past* the cap (you only learn its cost when it finishes), but the next call is blocked. Usage is tracked either way. ### `llm/clear-budget` Remove the spending limit. ```sema (llm/clear-budget) ``` ### `llm/set-pricing` Set custom pricing for a model (overrides both dynamic and built-in pricing). Costs are per million tokens. ```sema (llm/set-pricing "my-model" 1.0 3.0) ; $1.00/M input, $3.00/M output ``` ## Batch & Parallel ### `llm/batch` Send multiple prompts concurrently and collect all results. ```sema (llm/batch (list "Translate 'hello' to French" "Translate 'hello' to Spanish" "Translate 'hello' to German")) ``` ### `llm/pmap` Map a function over items, sending all resulting prompts in parallel. ```sema (llm/pmap (fn (word) (format "Define: ~a" word)) '("serendipity" "ephemeral" "ubiquitous") {:max-tokens 50}) ``` --- --- url: 'https://sema-lang.com/docs/llm/caching.md' --- # Response Caching Sema caches LLM responses so identical calls don't hit the API twice. The cache is **persistent**: responses are written to `~/.sema/cache/llm/` (one JSON file per entry, named by a SHA-256 key), so a re-run of a script — even in a new process — serves the answer recorded by an earlier run. An in-memory layer sits on top for the current session. A call is a cache hit when its **model, temperature, system prompt, and full message list** all match a stored entry. `:max-tokens` and `:tools` are *not* part of the key. Caching is **off by default** — turn it on for a block with `llm/with-cache`. > For replay that you **commit and share** (deterministic tests, offline demos), see > [Cassettes](./cassettes) instead. They're a different tool: a cassette stores a tape > next to your code rather than in your personal cache dir, and the response cache is > turned off inside `llm/with-cassette`. ## Cache scope ### `llm/with-cache` Run a thunk with caching enabled for every LLM call inside it. The **options map comes first** when you pass one; with a single argument it's just the thunk. `:ttl` sets the time-to-live in seconds (default 3600). Previous cache settings are restored on exit. ```sema ;; thunk only (llm/with-cache (lambda () (llm/complete "hello"))) ;; with options — opts FIRST, then the thunk (llm/with-cache {:ttl 7200} (lambda () (llm/complete "hello"))) ``` A cache hit costs nothing: it makes no provider call, so it reports **zero** token usage and spends nothing against a [budget](./cost). The two calls below show a miss then a hit: ```sema (llm/with-cache (lambda () (llm/complete "what is 2+2?") ; miss — calls the model, stores the answer (llm/complete "what is 2+2?") ; hit — served from the cache, no API call (llm/cache-stats))) ; => {:hits 1 :misses 1 :size 1} ``` ## Inspection & debugging ### `llm/cache-key` Generate the SHA-256 cache key for a prompt and options — handy for debugging why two calls do or don't share a cache entry. Takes a prompt string and an optional options map. ```sema (llm/cache-key "hello" {:model "gpt-4" :temperature 0.5}) ``` ### `llm/cache-stats` Returns `{:hits :misses :size}`. Note that `:size` counts only the entries loaded into memory **this session** — a cold start can serve hits from disk before `:size` reflects them. ```sema (llm/cache-stats) ; => {:hits 0 :misses 0 :size 0} ``` ## Cache management ### `llm/cache-clear` Clear cached responses — both the in-memory entries and the files in `~/.sema/cache/llm/`. Returns the number of entries cleared. ```sema (llm/cache-clear) ; => 0 ``` --- --- url: 'https://sema-lang.com/docs/llm/resilience.md' --- # Resilience & Retry ## Fallback Provider Chains ### `llm/with-fallback` Wraps a thunk with a fallback chain of providers. If the LLM call fails with one provider, automatically tries the next provider in the list. ```sema (llm/with-fallback [:anthropic :openai :groq] (lambda () (llm/complete "Hello"))) ``` #### Model selection across the chain Model ids are provider-specific (a Claude id is meaningless to OpenAI), so each chain entry resolves its own model: * A **bare provider keyword** (e.g. `:anthropic`) uses that provider's [default model](./providers#default-models), or whatever you set via `(llm/configure :anthropic {:default-model "..."})`. This is the recommended form — leave the body's `(llm/complete ...)` **unpinned** so every provider gets a model id valid for itself. * If the body pins a `:model`, that exact string is sent to **every** provider in the chain. That's fine for a homogeneous chain, but pinning a provider-specific id (e.g. a Claude model) will fail on any other provider it falls back to. #### Per-provider model overrides To target a different model per provider within a single chain, give chain entries as `[provider model]` pairs or `{:provider :model}` maps. A per-provider override **wins over any `:model` pinned in the body**: ```sema ;; Anthropic uses Opus, OpenAI uses GPT-5.5, Groq uses its default (llm/with-fallback [[:anthropic "claude-opus-4-8"] [:openai "gpt-5.5"] :groq] (lambda () (llm/complete "Hello"))) ;; Map form is equivalent and lets you omit :model to use the provider default (llm/with-fallback [{:provider :anthropic :model "claude-opus-4-8"} {:provider :openai}] (lambda () (llm/complete "Hello"))) ``` ## Automatic Retry on Transient Errors LLM calls (`llm/complete`, `llm/chat`, `agent/run`, and the fallback-chain path) **automatically retry transient failures** — no configuration needed: * Retried: HTTP 429 (rate limited), 5xx server errors, and network/timeout errors. * Not retried: 4xx client errors other than 429 (e.g. 400 bad request), and parse errors — these won't succeed on a retry, so they fail fast. * Backoff: capped **exponential backoff with full jitter** (base 500ms, doubling per attempt, capped at 30s), up to 3 retries. A 429 honors the provider's `retry-after` hint when present. This is distinct from [`llm/with-fallback`](#fallback-provider-chains) (which switches *providers* on failure) and the generic [`retry`](#generic-retry) (which wraps *any* thunk). They compose: each provider in a fallback chain does its own transient-error retry before the chain moves on. ### Streaming and resilience `llm/stream` applies these guarantees **at stream-open** — before the first token: * **Fallback** — if a provider fails to *open* the stream, the chain fails over to the next, just like non-streaming. Once the first token has been delivered, a **mid-stream** failure is **not** failed over (switching providers mid-answer would re-emit the partial you already received); the error surfaces and the partial text is kept. * **Rate-limiting** — `llm/with-rate-limit` gates the stream-open call the same as a non-streaming one. * **Budget** — opt in with `llm/with-budget {... :on-stream :pre-gate}`: the stream is refused at open if the scope's spend is already at the cap. By default streams are **not** budget-gated (a stream's cost is unknown until it ends), though usage is still tracked afterward. Two things still **don't** apply to streams: the **response cache** (a live stream isn't cached — for deterministic replay use [cassettes](/docs/llm/cassettes)) and **mid-stream retry** (a retry would duplicate already-emitted output — see above). ## Rate Limiting ### `llm/with-rate-limit` Wraps a thunk with token-bucket rate limiting. Takes a rate (requests per second) and a thunk. Useful to avoid hitting API rate limits. ```sema (llm/with-rate-limit 5 (lambda () (llm/complete "Hello"))) ``` ## Generic Retry ### `retry` Retries a thunk on failure with exponential backoff. Takes a thunk and an optional options map. ```sema ;; Default: 3 attempts, 100ms base delay, 2.0 backoff (retry (lambda () (http/get "https://example.com"))) ;; Custom options (retry (lambda () (http/get "https://example.com")) {:max-attempts 5 :base-delay-ms 200 :backoff 1.5}) ``` Options: | Key | Type | Default | Description | | ---------------- | ------- | ------- | ---------------------------------- | | `:max-attempts` | integer | 3 | Maximum number of attempts | | `:base-delay-ms` | integer | 100 | Initial delay between retries (ms) | | `:backoff` | float | 2.0 | Backoff multiplier | > **Note:** `retry` is in the stdlib (not LLM-specific) — it works with any function. ## LLM Convenience Functions ### `llm/summarize` Summarize text using an LLM. Takes text and an optional options map. ```sema (llm/summarize "Long article text here...") (llm/summarize "Long text" {:model "claude-haiku-4-5-20251001" :max-tokens 200}) ``` ### `llm/compare` Compare two texts using an LLM. Takes two strings and an optional options map. ```sema (llm/compare "Text A" "Text B") (llm/compare "Text A" "Text B" {:model "claude-haiku-4-5-20251001"}) ``` --- --- url: 'https://sema-lang.com/docs/llm/embeddings.md' --- # Embeddings & Similarity Generate vector embeddings from text and compute similarity between them. On startup `(llm/auto-configure)` picks an embedding provider by **precedence** — `JINA_API_KEY`, then `VOYAGE_API_KEY`, then `COHERE_API_KEY`; if none is set it falls back to `OPENAI_API_KEY` (`text-embedding-3-small`). The first key present wins. ## Configuration ### `llm/configure-embeddings` Configure a dedicated embedding provider separately from the chat provider — so you can use one provider for chat and another for embeddings. Pass `:default-model` to pick the model (otherwise each provider uses its default: `jina-embeddings-v3`, `voyage-3`, or `text-embedding-3-small`): ```sema (llm/configure-embeddings :voyage {:api-key (env "VOYAGE_API_KEY") :default-model "voyage-3-large"}) ;; OpenAI-compatible embedding provider, with a model and optional base URL (llm/configure-embeddings :openai {:api-key (env "OPENAI_API_KEY") :default-model "text-embedding-3-large"}) ``` ## Generating Embeddings ### `llm/embed` Generate an embedding for a string or a list of strings. Returns a **bytevector** containing densely-packed f64 values in little-endian format. This representation is 2× more memory efficient and 4× faster for similarity computations compared to a list of floats. ```sema ;; Single embedding (returns a bytevector) (define v1 (llm/embed "hello world")) ;; Pick the model per call with an options map (llm/embed "hello world" {:model "text-embedding-3-small"}) ;; Batch embeddings (llm/embed ["cat" "dog" "fish"]) ; => list of bytevectors ``` ## Embedding Accessors ### `embedding/length` Returns the number of dimensions (f64 elements) in an embedding bytevector. ```sema (define v (llm/embed "hello")) (embedding/length v) ; => 1024 (depends on provider) ``` ### `embedding/ref` Access a specific dimension by index. ```sema (define v (llm/embed "hello")) (embedding/ref v 0) ; => 0.0123 (first dimension) ``` ### `embedding/->list` Convert an embedding bytevector to a list of floats (useful for interop). ```sema (define v (llm/embed "hello")) (embedding/->list v) ; => (0.0123 -0.0456 ...) ``` ### `embedding/list->embedding` Convert a list of numbers to an embedding bytevector. ```sema (define v (embedding/list->embedding '(0.1 0.2 0.3))) (embedding/length v) ; => 3 ``` ## Computing Similarity ### `llm/similarity` Compute cosine similarity between two embedding vectors. Returns a value between -1.0 and 1.0. Accepts both bytevectors (fast path) and lists of floats (backward compatible). ```sema (define v1 (llm/embed "hello world")) (define v2 (llm/embed "hi there")) (llm/similarity v1 v2) ; => 0.87 (cosine similarity) ;; Also works with plain lists (llm/similarity '(0.1 0.2 0.3) '(0.4 0.5 0.6)) ``` ## Reranking ### `llm/rerank` Reorder a list of candidate documents by their relevance to a query using a hosted **cross-encoder** reranker (Cohere, Jina, or Voyage — the same **API key** you already use for embeddings, e.g. `COHERE_API_KEY` / `JINA_API_KEY` / `VOYAGE_API_KEY`; see [Supported Embedding Providers](#supported-embedding-providers) below for setup). Where `llm/similarity` / `vector-store/search` embed the query and documents *independently* (a bi-encoder), a reranker reads the query and each document *together*, so it's far more precise. The standard pattern is to retrieve a generous shortlist by vector search, then rerank it to the best few. ```sema (llm/rerank "how do I read a file?" (list "vectors are cool" "use file/read to read a file" "unrelated trivia") {:top-k 2}) ;; => ({:index 1 :score 0.91 :document "use file/read to read a file"} ...) ``` Returns `{:index :score :document}` maps, highest relevance first; `:index` points back into the input list. Options: `:top-k`, `:model`, and `:provider` (`:cohere` / `:jina` / `:voyage`). See the **[RAG guide](/docs/llm/rag)** for the full retrieve → rerank → answer pipeline. ## Token Counting ### `llm/token-count` Estimate the number of tokens in a string or list of strings. Uses a heuristic (chars/4) — no tokenizer dependency required. ```sema (llm/token-count "hello world") ; => 3 (llm/token-count '("hello" "world")) ; => sum of individual counts ``` ### `llm/token-estimate` Returns a detailed estimate map with the token count and the estimation method used. ```sema (llm/token-estimate "hello world") ; => {:method "chars/4" :tokens 3} ``` ## Supported Embedding Providers | Provider | Env Variable | | -------- | ---------------- | | Jina | `JINA_API_KEY` | | Voyage | `VOYAGE_API_KEY` | | Cohere | `COHERE_API_KEY` | | OpenAI | `OPENAI_API_KEY` | See [Provider Management](./providers.md) for the full provider capability table. --- --- url: 'https://sema-lang.com/docs/llm/vector-store.md' --- # Vector Store & Math ## In-Memory Vector Store Sema includes an in-memory vector store for semantic search over embeddings. Create named stores, add documents with embeddings and metadata, and search by cosine similarity. Stores can optionally be persisted to disk as JSON. ### `vector-store/create` Create a named in-memory vector store. Returns the store name. ```sema (vector-store/create "my-store") ``` ### `vector-store/open` Open a named store backed by a file. If the file exists, its contents are loaded; otherwise an empty store is created. The path is remembered for subsequent `vector-store/save` calls. ```sema (vector-store/open "my-store" "embeddings.json") ``` ### `vector-store/add` Add a document with an ID, embedding (bytevector), and metadata map. ```sema (vector-store/add "my-store" "doc-1" (llm/embed "Hello world") {:source "greeting.txt" :page 1}) ``` If a document with the same ID exists, it is replaced. ### `vector-store/search` Search by cosine similarity. Takes store name, query embedding, and k (number of results). Returns a list of maps with `:id`, `:score`, and `:metadata`. ```sema (vector-store/search "my-store" (llm/embed "Hi there") 5) ;; => ({:id "doc-1" :score 0.92 :metadata {:source "greeting.txt" :page 1}} ...) ``` ### `vector-store/delete` Delete a document by ID. Returns `#t` if found, `#f` otherwise. ```sema (vector-store/delete "my-store" "doc-1") ; => #t ``` ### `vector-store/count` Return the number of documents in a store. ```sema (vector-store/count "my-store") ; => 42 ``` ### `vector-store/save` Save a store to disk as JSON. If the store was opened with `vector-store/open`, the path is used automatically. Otherwise, pass a path explicitly. ```sema ;; Explicit path (vector-store/save "my-store" "embeddings.json") ;; Implicit path (if opened with vector-store/open) (vector-store/save "my-store") ``` The file format is a JSON document with base64-encoded embeddings and full metadata, portable across platforms. ## Vector Math These functions operate on embedding bytevectors (packed f64 arrays in little-endian format, as returned by `llm/embed` or `embedding/list->embedding`). ### `vector/cosine-similarity` Cosine similarity between two embedding vectors. Returns a float between -1.0 and 1.0. ```sema (vector/cosine-similarity (embedding/list->embedding '(1.0 0.0)) (embedding/list->embedding '(0.0 1.0))) ; => 0.0 ``` ### `vector/dot-product` Dot product of two embedding vectors. ```sema (vector/dot-product (embedding/list->embedding '(1.0 2.0 3.0)) (embedding/list->embedding '(4.0 5.0 6.0))) ; => 32.0 ``` ### `vector/normalize` Return a unit-length copy of the vector. ```sema (vector/normalize (embedding/list->embedding '(3.0 4.0))) ;; => embedding with values (0.6 0.8) ``` ### `vector/distance` Euclidean distance between two embedding vectors. ```sema (vector/distance (embedding/list->embedding '(0.0 0.0)) (embedding/list->embedding '(3.0 4.0))) ; => 5.0 ``` ## Full Example A RAG-style workflow: embed documents, store them, search semantically, and persist to disk. ```sema ;; Open a persistent store (creates file if it doesn't exist) (vector-store/open "docs" "my-docs.json") (define texts '("Rust is a systems language" "Python is great for ML" "Lisp is homoiconic")) (for-each (lambda (text) (vector-store/add "docs" text (llm/embed text) {:text text})) texts) ;; Save to disk (vector-store/save "docs") ;; Retrieve the most relevant chunks for a question... (define question "Which language is homoiconic?") (define hits (vector-store/search "docs" (llm/embed question) 2)) ;; ...then generate an answer grounded in only that context (the "G" in RAG) (define context (string/join (map (lambda (h) (:text (:metadata h))) hits) "\n")) (llm/complete (prompt (system "Answer using only the provided context. Be concise.") (user (format "Context:\n~a\n\nQuestion: ~a" context question))) {:max-tokens 120}) ;; => "Lisp — it is homoiconic." ``` Next time you run, `(vector-store/open "docs" "my-docs.json")` will load the saved embeddings instantly — no re-embedding needed. ::: tip Sharpen results with a reranker Cosine `vector-store/search` has high recall but coarse ordering. For better precision, retrieve a larger shortlist and reorder it with the cross-encoder [`llm/rerank`](/docs/llm/embeddings#reranking) — the standard *retrieve-many → rerank-to-a-few* RAG move. See the **[RAG guide](/docs/llm/rag)** for the full pipeline. ::: ::: warning Use one embedding model per store Every document and the query must share the same embedding dimensions. Mixing embedding models (or providers) in one store raises a *dimension-mismatch* error at search time — so pick one embedding model per store. ::: --- --- url: 'https://sema-lang.com/docs/llm/cassettes.md' --- # Cassettes (Record & Replay) A **cassette** saves the answers from real LLM calls to a file the first time you run, then plays them back on every run after — no API key, no network, the same output every time. It's like recording a conversation once and replaying the tape. Two things this makes easy: * **Tests that don't need a key.** Record a run once, commit the file, and your `llm/complete` and `agent/run` tests run offline and give the same result forever — so they pass reliably in CI with no secrets and no flakiness. * **Demos and docs that always work.** A playground example or a notebook can ship its tape and render the exact same output every time, offline, with no model drift. Because the saved answer keeps its real token counts, cost and budget tracking keep working on replay too — so even cost tests become repeatable. ## Quick start ```sema ;; First run: calls the real model and saves the answer to the file. ;; Every run after: plays the saved answer back — offline, identical. (llm/with-cassette "tapes/greeting.jsonl" {:mode :auto} (fn () (llm/complete "Say hello in one word." {:model "gpt-5-mini"}))) ;; => "Hello" ``` Run it once with an API key set to capture the tape, commit `tapes/greeting.jsonl`, and from then on the call is offline and deterministic. That's the whole idea. ## The three modes `:mode` decides what happens on each call: | Mode | If the call is on the tape | If it's a new call | | --- | --- | --- | | `:auto` *(default)* | play it back | call the model and record it | | `:replay` | play it back | **error** — the call wasn't recorded | | `:record` | call the model and record it | call the model and record it | `:auto` is the friendly default for writing tapes: it records what's missing and replays what it already has. `:replay` is what you want in CI — it never touches the network, and a call that isn't on the tape is a **hard error** that names the request. That error is a feature: if you change a prompt, the matching recording disappears, and the failure tells you exactly which call drifted instead of silently hitting a live model. ## What you can record Cassettes cover the everyday LLM calls. Each is matched and replayed independently: | Call | Works? | Notes | | --- | --- | --- | | `llm/complete`, `llm/chat` | ✅ | the answer, model, tokens, and finish reason | | `llm/extract` and structured calls | ✅ | the structured result is rebuilt from the saved answer | | `agent/run` and tool loops | ✅ | **each turn is saved separately**, so a full multi-turn run (model → tool call → result → final answer) replays exactly — your tool handlers still run on replay | | `llm/stream` (streaming) | ✅ | the text chunks are saved and replayed in order — see [Streaming](#streaming-in-detail) | | `llm/embed` (embeddings) | ✅ | the vectors are saved and replayed byte-for-byte | A note on **agents**: because each model turn is recorded on its own, your tools execute normally during replay — the cassette only stands in for the *model's* responses, not for your tool code. That's usually what you want: deterministic model output, real tool logic. ## Using cassettes ### `llm/with-cassette` — record/replay for a block The usual way: wrap the calls you want recorded in a function. The tape is saved when the block finishes, and everything goes back to normal afterwards. ```sema (llm/with-cassette "tapes/weather-agent.jsonl" {:mode :auto} (fn () (define bot (agent {:model "gpt-5-mini" :tools [get-weather]})) (agent/run bot "What's the weather in Oslo?"))) ``` The options map is optional and currently takes `:mode` (`:auto`, `:record`, or `:replay`, default `:auto`). The file — and any missing folders — is created when the tape is written. ### Turning it on by hand If your setup and teardown aren't a single block — for example in a test harness or a notebook — use the imperative trio: ```sema (llm/cassette-load "tapes/suite.jsonl" {:mode :replay}) ; turn it on ;; ... run many calls ... (llm/cassette-save) ; write the tape to disk (returns #t if a cassette is active) (llm/cassette-eject) ; write the tape and turn it back off ``` `llm/cassette-load` installs the cassette for everything that follows, until you eject it. ### Forcing replay across a whole run (CI) Two environment variables turn on a cassette for an entire process, so a whole suite — or a whole notebook — runs offline without changing any code: ```bash SEMA_LLM_CASSETTE=tapes/suite.jsonl \ SEMA_LLM_CASSETTE_MODE=replay \ sema test/agents.sema ``` `SEMA_LLM_CASSETTE_MODE` is `replay`, `record`, or `auto` (default `auto`). This is ignored under `--sandbox`, since it reads and writes a file. A common CI pattern: record tapes locally once with a key, commit them, and run the suite with `SEMA_LLM_CASSETTE_MODE=replay` so any un-recorded call fails loudly. ## Streaming in detail Streaming hands you the answer in pieces — *chunks* — as the model generates them, by calling a function you pass for each piece (a typing effect, a progress bar, live output). A cassette records **the exact sequence of chunks**, then on replay feeds those same chunks to your callback in the same order. So a streaming UI behaves identically offline: ```sema ;; Record once, then replay forever — the chunks arrive the same way both times. (llm/with-cassette "tapes/story.jsonl" {:mode :auto} (fn () (llm/stream "Tell me a two-line story." (fn (chunk) (display chunk)) ; called once per recorded chunk, in order {:model "gpt-5-mini"}))) ``` Things worth knowing about streamed replay: * **Boundaries are preserved.** If the recording arrived as `"Hel" "lo"`, replay calls your function with `"Hel"` then `"lo"` — not one combined `"Hello"`. Code that depends on chunking sees the same shape. * **Replay is instant.** The chunks are delivered as fast as your callback accepts them; the original network timing between chunks is *not* reproduced. Replay is for determinism, not for re-simulating latency. * **The full answer is saved too.** Alongside the chunks, the complete text, model, and token counts are recorded — so cost tracking and `llm/last-usage` work on a replayed stream just like a normal call. If you only print the chunks (no callback), `llm/stream` writes to stdout; recording and replay work the same way. ## Embeddings in detail `llm/embed` returns vectors (as bytevectors). A cassette saves those vectors and returns them verbatim on replay — so similarity scores, vector-store contents, and any math built on them are exactly reproducible offline: ```sema (llm/with-cassette "tapes/embeddings.jsonl" {:mode :auto} (fn () (define v (llm/embed "semantic search query" {:model "text-embedding-3-small"})) (vector/cosine-similarity v (llm/embed "another phrase")))) ``` Both `llm/embed` calls are recorded (keyed by their text), so the similarity number is identical every run. Batch embeddings — passing a list of strings — are saved as a set of vectors and replayed in order. ## Using cassettes in notebooks Cassettes are a great fit for [notebooks](../notebook): record the LLM cells once with a key, commit the tape next to the `.sema-nb` file, and the notebook re-runs the same way forever — offline, for anyone, in CI. There are two clean patterns. ### A setup cell that turns it on Put one cell near the top of the notebook that loads a cassette; every LLM cell after it records or replays automatically (cells in a notebook share one environment): ```sema ;; Cell 1 — setup (llm/cassette-load "tapes/notebook.jsonl" {:mode :auto}) ``` ```sema ;; Cell 2 — a normal LLM cell; recorded on first run, replayed after (llm/complete "Summarize the Sema language in one sentence." {:model "gpt-5-mini"}) ``` ```sema ;; Last cell — flush the tape so the recording is written (llm/cassette-save) ``` Run the notebook once with a key to capture `tapes/notebook.jsonl`, commit it alongside the notebook, and every later run (including a headless `sema notebook run`) replays it. ### Force replay for the whole notebook To guarantee a notebook never calls a model — say when you publish it or run it in CI — run it with the environment variable set, no edits required: ```bash SEMA_LLM_CASSETTE=tapes/notebook.jsonl \ SEMA_LLM_CASSETTE_MODE=replay \ sema notebook run my-notebook.sema-nb ``` Any cell that makes a call not on the tape fails with a clear "cassette miss", so a stale notebook can't quietly reach for a live model. > **Tip:** keep tapes next to what they belong to — `tapes/` beside a test, or beside the > `.sema-nb` — and commit them. They're plain text and diff cleanly, so a reviewer can see > exactly how the recorded model output changed when you re-record. ## How it works with the rest of Sema A cassette slots in just above the real model and below everything else, so it composes instead of conflicting: * **Cost & budgets.** A replayed answer keeps its real token counts, so `llm/last-usage`, `llm/session-usage`, and budget limits all behave as if the call really happened. This is different from a [cache](./caching) hit, which reports **zero** usage (no call was made); a replay stands in for a real call, so it reports the real spend. * **Tracing.** A replayed call still produces its [OpenTelemetry](./observability) trace, with the recorded model and token counts — so replayed runs show up in your traces just like live ones. * **The response cache.** `llm/with-cassette` turns the in-memory response [cache](./caching) off for its block, so the cache can't answer before the tape does. You generally want one or the other, not both. * **Retries & fallback.** While *recording*, the normal [retry and fallback](./resilience) logic wraps the real call, so the tape captures the final successful answer. On replay there's nothing to retry. ## What's in the file A tape is plain text — **NDJSON**, one JSON object per line — so it's diffable, appendable, and reviewable in a pull request. There's one line per saved call, and the `kind` field says what it is: ```jsonl {"v":1,"kind":"complete","key":"a1b2…","content":"Hello","model":"gpt-5-mini","prompt_tokens":12,"completion_tokens":1} {"v":1,"kind":"stream","key":"c3d4…","content":"Hi there","model":"gpt-5-mini","chunks":["Hi"," there"],"completion_tokens":2} {"v":1,"kind":"embed","key":"e5f6…","model":"text-embedding-3-small","embeddings":[[0.01,-0.02,0.03]]} ``` Only the **answer** is saved, looked up by a fingerprint (`key`) of the request. The prompt text, your API key, and any headers are **never written to the file** — they simply aren't part of what gets saved, so a tape is safe to commit. The `v` field is a format version, there so old tapes can be migrated if the shape ever changes. ### What counts as "the same call" Two calls match if their meaningful inputs are the same — the model, the system prompt, the temperature, and the messages. Change any of those and it's a different call: in `:replay` mode you get a clear "not recorded" error, which is exactly what flags a prompt or model change. Things that don't affect the answer — request IDs, timing, your API key — are not part of the fingerprint. ## Recipes * **Record once, replay in CI.** Run the suite locally with a key and `:mode :auto` (or `:record`) to capture tapes, commit them, then run CI with `SEMA_LLM_CASSETTE_MODE=replay`. New or changed calls fail loudly. * **Update a tape after a prompt change.** Delete the tape (or the affected line) and re-run in `:auto`, or run that block in `:record` once. Commit the new tape; the diff shows how the model's answer changed. * **A reproducible demo.** Wrap the demo's LLM calls in `llm/with-cassette … {:mode :replay}` and ship the tape, so it runs for anyone with no key. ## Good to know * **Re-record after changes.** Change a prompt, model, or temperature and the old tape no longer matches — re-record it (`:record`, or delete the file and run `:auto`). * **One answer per call.** The first recorded answer for a given call is the one replayed. * **Replay needs no provider.** In `:replay` mode nothing calls a model, so a cassette works with no API key configured at all. * **Cassette miss?** A "cassette miss in :replay mode" error means this exact call wasn't recorded. Either the request changed (re-record it) or you're replaying a call you never captured — switch that block to `:auto` to record it, then commit the updated tape. --- --- url: 'https://sema-lang.com/docs/llm/observability.md' --- # Tracing & Metrics Sema can record what happens inside every LLM and agent run — each model call, tool execution, retry, and notebook cell — as [OpenTelemetry](https://opentelemetry.io/) traces and metrics, and send them to a tool where you can browse them. You don't write any instrumentation: switch it on with one environment variable and `llm/complete`, `agent/run`, `llm/embed`, and the rest are recorded automatically. If OpenTelemetry is new to you, the terms used below: * **OpenTelemetry (OTel)** is an open, vendor-neutral standard for traces and metrics. * A **trace** is one run. It is made of **spans** — individual timed operations such as a single LLM call or a tool execution. Spans nest, so an agent run appears as a tree. * **OTLP** is the network protocol OTel uses. Sema speaks OTLP, so it works with any tool that accepts it — a free local viewer like [Jaeger](https://www.jaegertracing.io/), or a hosted service like [Langfuse](https://langfuse.com/), Grafana, or Datadog. * Sema follows the OTel [GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions-genai) — the agreed attribute names for LLM telemetry (token counts, model, cost, …) — so these tools understand the data with no per-tool glue. Grafana, Jaeger, SigNoz, OpenObserve, Datadog, Honeycomb, Logfire, MLflow, and others read it as-is; a few LLM-specific tools (Arize Phoenix, Langfuse, …) need one extra setting — see [Backend Compatibility](./otel-compat). Tracing is **off by default** — if you don't point Sema at a backend or a file, it records nothing. And once it's on, a slow or unreachable backend can never block, delay, or crash your script: telemetry is sent in the background, out of the way of your run. ## How to turn it on Sema reads its tracing settings from **environment variables** — values you set in your shell. You can set them inline for a single command, or `export` them for the whole session: ```bash # Inline — applies to this one run only: OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema myscript.sema # Or exported — applies to every command in this shell session: export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 sema myscript.sema ``` Two variables decide *where* the data goes, and **setting either one turns tracing on**: * `OTEL_EXPORTER_OTLP_ENDPOINT` — send to a backend over the network (Jaeger, Langfuse, …). * `SEMA_OTEL_FILE` — write to a local file instead (handy with no backend at all). Set neither and tracing stays off. The full list of variables is in [Configuration reference](#configuration-reference) below. ## Quick start: see a trace in one minute [Jaeger](https://www.jaegertracing.io/) is a free trace viewer that runs in a single container — a good way to see your first trace. ```bash # 1. Start Jaeger. The UI is on port 16686; it accepts traces on 4318. docker run --rm -d --name jaeger -p 4318:4318 -p 16686:16686 \ -e COLLECTOR_OTLP_ENABLED=true jaegertracing/all-in-one # 2. Point Sema at it and run something. No model is pinned here, so this uses # your default provider and its default model — just make sure an API key is # set (ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY / …). OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \ sema -e '(llm/complete "say hi" {:max-tokens 16})' ``` Open `http://localhost:16686` in your browser, pick the **sema** service, and you'll see one trace whose `chat` span carries the provider, model, input/output token counts, cost, and finish reason. > **Choosing a specific model.** The example uses whichever provider is active. To pick > one, select it first: `(llm/set-default :openai)` then `{:model "gpt-5-mini"}`, or > `(llm/set-default :anthropic)` then `{:model "claude-haiku-4-5-20251001"}`. A model id > only works with the provider that offers it — sending an OpenAI model id to Anthropic > returns a 404. ## Configuration reference Every setting is an environment variable (see [How to turn it on](#how-to-turn-it-on) for how to set them). The `OTEL_*` names come from OpenTelemetry itself; the `SEMA_OTEL_*` names are Sema conveniences. | Variable | What it does | | --- | --- | | `OTEL_EXPORTER_OTLP_ENDPOINT` | The address of your tracing backend, e.g. `http://localhost:4318`. **Setting this turns tracing on.** | | `OTEL_EXPORTER_OTLP_PROTOCOL` | How to talk to it: `http/protobuf` (default) · `http/json` · `grpc`. Keep the default unless your backend only accepts gRPC. | | `OTEL_EXPORTER_OTLP_HEADERS` | Extra HTTP headers, usually authentication — e.g. `Authorization=Bearer `. Comma-separated `name=value` pairs; see [Authentication headers](#authentication-headers). | | `OTEL_EXPORTER_OTLP_TIMEOUT` | Per-export timeout in milliseconds. Keep it short (e.g. `3000`) so a dead backend never holds things up. | | `OTEL_SERVICE_NAME` | The name your runs appear under in the backend (default `sema`). | | `SEMA_OTEL_FILE` | Write traces to this file path, one JSON object per line, instead of sending them over the network. Also turns tracing on. | | `SEMA_OTEL_ENVIRONMENT` | A label such as `prod` or `staging` for filtering (recorded as `deployment.environment.name`). | | `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` | Set to `true` to also record the **prompt and response text** (off by default — see [Privacy](#privacy)). Sema also accepts the shorter alias `SEMA_OTEL_CAPTURE_CONTENT`. | | `OTEL_BSP_MAX_QUEUE_SIZE`, `OTEL_BSP_MAX_EXPORT_BATCH_SIZE`, `OTEL_BSP_SCHEDULE_DELAY` | Advanced: tune the background export batching. The defaults are fine for most uses. | Sema can send over HTTP or gRPC; choose with `OTEL_EXPORTER_OTLP_PROTOCOL`. HTTP (the default) is what most backends accept — only switch to gRPC if yours requires it. ### Writing to a file instead of a backend No backend running? Set `SEMA_OTEL_FILE` and Sema writes each finished span to a file as one JSON object per line: ```bash SEMA_OTEL_FILE=/tmp/sema-trace.jsonl \ sema -e '(llm/complete "ping" {:max-tokens 16})' cat /tmp/sema-trace.jsonl | jq . ``` The file is written synchronously, so even a one-line script captures its spans. ## Authentication headers Almost every **hosted** backend needs an API key, and you pass it as an HTTP header through `OTEL_EXPORTER_OTLP_HEADERS`. (This is separate from `SEMA_OTEL_COMPAT`, which only relabels attribute names — see [Backend Compatibility](./otel-compat).) The header **name** and the key are dictated by the backend, not by Sema; always check the tool's own OTLP page for the exact names. ### The format `OTEL_EXPORTER_OTLP_HEADERS` is a comma-separated list of `name=value` pairs — the [W3C Baggage](https://www.w3.org/TR/baggage/) format the OpenTelemetry spec mandates: ```bash # one header OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer sk-abc123" # two headers — separate with a comma OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer sk-abc123,x-project=my-app" ``` Rules worth knowing: * **Separate multiple headers with commas**, not semicolons — semicolons are not supported. * **The first `=` splits the name from the value**, so the value itself may contain `=`. base64 strings with `=` padding (common in Basic auth) work fine. * **Avoid literal commas or spaces inside a value** — a comma starts a new header. If a token genuinely needs one, percent-encode it (`,` → `%2C`). Bearer tokens and base64 never contain commas, so this rarely comes up. * **Quote the whole value in your shell** so `$(...)` substitutions and special characters survive. ### Common patterns | Auth style | `OTEL_EXPORTER_OTLP_HEADERS` value | Example tools | | --- | --- | --- | | Bearer token | `Authorization=Bearer ` | Braintrust, Lunary, LangSmith | | Basic auth | `Authorization=Basic ` | Langfuse, W\&B Weave | | Vendor key header | `x-portkey-api-key=` · `dd-api-key=` | Portkey, Datadog | ### Building a Basic-auth header Basic auth wants base64 of `id:secret`. Build it with `base64` and read the keys from environment variables rather than hard-coding them. For [Langfuse](https://langfuse.com/): ```bash export OTEL_EXPORTER_OTLP_ENDPOINT="https://cloud.langfuse.com/api/public/otel" export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n "$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY" | base64)" sema myagent.sema ``` ### Tools that need more than one header Some backends need a second (or third) header to route the trace to the right project or workspace — the auth key alone isn't enough. The exact names come from each tool's OTLP docs: | Tool | `OTEL_EXPORTER_OTLP_HEADERS` value | | --- | --- | | HoneyHive | `Authorization=Bearer ,x-honeyhive=project:` | | W\&B Weave | `Authorization=Basic ,project_id=/` | | Maxim | `x-maxim-api-key=,x-maxim-repo-id=` | | Opik | `Authorization=,projectName=,Comet-Workspace=` | Several LLM-focused backends can also show **richer** detail with one extra compatibility setting on top of the auth header — see [Backend Compatibility](./otel-compat). ## What gets traced | Span | Kind | Name | When | | --- | --- | --- | --- | | LLM call | `CLIENT` | `chat {model}` | every non-streaming completion (including cache hits) | | Embeddings | `CLIENT` | `embeddings {model}` | every `llm/embed` | | Tool call | `INTERNAL` | `execute_tool {name}` | every tool dispatch in an agent loop | | Agent run | `INTERNAL` | `invoke_agent {name}` | every `agent/run` / tools-enabled completion | | Notebook run | `INTERNAL` | `notebook.run_all` → `notebook.cell {id}` | a notebook "Run All" (one trace, one child span per cell) | | Retry | `INTERNAL` | `llm.retry_attempt` | each HTTP retry (429 / 5xx / network), nested under the LLM span | Each LLM span carries the standard GenAI attributes: `gen_ai.operation.name`, `gen_ai.provider.name`, `gen_ai.request.model` / `gen_ai.response.model`, `gen_ai.usage.input_tokens` / `output_tokens`, prompt-cache token counts, `gen_ai.response.finish_reasons`, and the computed cost (`gen_ai.usage.cost`, plus `gen_ai.usage.cost_usd`). Cache hits are flagged with `sema.gen_ai.cache.hit`. Tool spans carry `gen_ai.tool.name` / `gen_ai.tool.call.id` / `gen_ai.tool.type`. ### Sessions and users (grouping multi-turn runs) Every span carries a `gen_ai.conversation.id`, generated per run or supplied by you. For tools that group by session (such as Langfuse), Sema also emits `session.id` and `user.id`, so the turns of one conversation appear together: ```sema (agent/run bot "what is 2 + 3?" {:session-id "chat-42" :user-id "alice"}) (agent/run bot "now add 10" {:session-id "chat-42" :user-id "alice"}) ;; both runs appear under one session "chat-42", attributed to alice ``` `agent/run`, `llm/chat`, and `llm/complete` accept `:conversation-id`, `:session-id`, and `:user-id`. If you omit `:session-id` it defaults to the conversation id; a standalone completion gets a fresh conversation id automatically. ### Metrics When you export over a network endpoint, Sema also records two standard GenAI metric histograms: * `gen_ai.client.token.usage` — token counts (dimension `gen_ai.token.type` = `input` or `output`). * `gen_ai.client.operation.duration` — call latency in seconds. > Cache hits report zero usage by design (no provider call was made), so token metrics > undercount real spend when caching is in play. ## Adding your own spans The `llm/*` and `agent/*` calls are traced for you. When you build your *own* abstraction — a custom RAG loop, a batch job, a provider Sema doesn't ship — these builtins let it emit first-class spans too. Every one is a **no-op when tracing is off**, so they are safe to leave in, and they never change your program's return value. ### Generic spans ```sema ;; with-span runs the body inside a named span carrying an attribute map, ends it on exit ;; (Error status if the body throws), and returns the body's value. Use {} for no attrs. (with-span "ingest-batch" {:batch.size 100} (otel/event "started" {}) (process-batch)) ``` The underlying builtin is `(otel/span name thunk attrs)`; `with-span` is the ergonomic macro over it. Any LLM/tool spans created inside nest beneath it. ### Annotate the current span ```sema (otel/set-attribute :http.status 200) ; one attribute on the innermost span (otel/set-attributes {:rows 42 :cache.hit true}) (otel/set-status :ok) ; or (otel/set-status :error "upstream timeout") (otel/event "cache-miss" {:key "user:42"}) ; a point-in-time event ``` Attribute values keep their type — integers, floats, and booleans render as numbers/bools in the backend, not strings. ### Typed spans (render like the built-ins) For work that *is* an LLM call, tool, or retrieval — but that you implement yourself — use the typed helpers. They set `gen_ai.operation.name` and, when `SEMA_OTEL_COMPAT` is set, the backend-native span-kind, so a custom pipeline classifies in Phoenix/Traceloop/Langfuse exactly like the built-in `llm/*` spans. ```sema ;; A custom LLM/generation call (a provider Sema doesn't natively support): (otel/llm-span {:model "custom-model" :provider "myco" :operation "chat"} (lambda () (let ((resp (my-http-llm-call prompt))) ;; Account tokens + cost on the span — same gen_ai.usage.* keys as the built-ins. (otel/llm-usage {:input-tokens 120 :output-tokens 30 :cost-usd 0.001}) resp))) ;; A user-built retrieval step (first-class RETRIEVER span): (otel/retrieval-span "vector-search" (lambda () (search index query)) {:top-k 5}) ;; A user tool: (otel/tool-span "lookup-weather" (lambda () (weather city))) ``` ### Grouping into sessions `with-session` groups every span started in its body under a session id (and optional user), filling Langfuse **Sessions/Users** for non-agent code: ```sema (with-session "chat-42" {:user "alice"} (llm/complete "...") ; inherits session chat-42, user alice (my-custom-pipeline)) ``` | Form | What it does | | --- | --- | | `(with-span name attrs body…)` / `(otel/span name thunk attrs)` | Generic span around a block. | | `(otel/set-attribute key value)` / `(otel/set-attributes map)` | Set attribute(s) on the innermost active span. | | `(otel/set-status :ok)` / `(otel/set-status :error msg)` | Set the innermost span's status. | | `(otel/event name attrs-map)` | Point-in-time event on the current span. | | `(otel/llm-span config thunk)` + `(otel/llm-usage usage-map)` | Typed LLM/generation span + token/cost accounting. | | `(otel/tool-span name thunk [attrs])` | Typed TOOL span. | | `(otel/retrieval-span name thunk [attrs])` | Typed RETRIEVER span. | | `(with-session id config body…)` / `(otel/with-session id [config] thunk)` | Group spans into a session/user. | ## Privacy Prompt and response **text** is never recorded unless you explicitly set `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true`. Token counts, model names, cost, and timing carry no message text and are always exported. When content capture is on, long messages are truncated to keep span sizes reasonable. ## Embedding Sema in a Rust application When Sema runs as a library inside your own program, it **never installs a global tracer provider on its own** — that is the host application's job. You choose how it connects to telemetry with `InterpreterBuilder::with_telemetry(mode)`: ```rust use sema::InterpreterBuilder; use sema_otel::TelemetryMode; // Emit against the provider your application already installed in `opentelemetry::global`. let interp = InterpreterBuilder::new() .with_telemetry(TelemetryMode::UseHostGlobal) .build(); ``` | `TelemetryMode` | Behavior | | --- | --- | | `Off` (default) | No telemetry; never touches any global state. | | `UseHostGlobal` | Emit against the global provider your app already installed (silent no-op if there is none). | | `OwnProvider(p)` | Emit against a provider you hand to Sema; installs **no** global provider. | | `FromEnv` | Self-install from the `OTEL_*` / `SEMA_OTEL_FILE` variables. The provider is owned by the built `Interpreter` and flushes when it is dropped. If your app already runs OpenTelemetry, prefer `UseHostGlobal` or `OwnProvider`. | Sema's spans automatically nest under whatever span is current (`opentelemetry::Context::current()`), so a host request span becomes the parent of Sema's `invoke_agent → chat / execute_tool` tree. `Interpreter::new()` and `build()` with the default `Off` never touch global OpenTelemetry state. --- --- url: 'https://sema-lang.com/docs/llm/otel-compat.md' --- # Backend Compatibility By default Sema labels its telemetry with the [OpenTelemetry GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions-genai) — the standard `gen_ai.*` attribute names. Tools that follow that standard understand Sema's traces with no extra configuration. A handful of popular LLM-observability tools don't read `gen_ai.*` — they look for their own attribute names instead, so a Sema span can show up in them as "unknown" or with blank fields. For those tools, set the `SEMA_OTEL_COMPAT` environment variable to a **compatibility mode** — a short name such as `openinference` or `langfuse` that tells Sema which extra attribute names to write *alongside* the standard ones. Nothing about your program changes — it's still the same automatic tracing, just labelled so more tools can read it. This is purely additive: the standard `gen_ai.*` attributes are always present; `SEMA_OTEL_COMPAT` only adds extra copies under other names. Read the [Tracing & Metrics](./observability) page first for how tracing works and how to point Sema at a backend — this page only covers the per-tool labelling. ## Which tools need a compatibility mode This section covers the tools that can **receive** OpenTelemetry traces over OTLP. Most of them read the standard `gen_ai.*` attributes and need **no** compatibility mode; only a few key off their own attribute names and need a `SEMA_OTEL_COMPAT` mode. (Tools that ingest only through their own SDK, sit in front of your calls as a proxy, or run offline evaluations can't receive an OTLP push at all — see [Tools you can't send traces to](#tools-you-can-t-send-traces-to).) ::: tip "No compatibility mode" is not the same as "no setup." Almost every **hosted** service still needs its own authentication header — an API key passed through `OTEL_EXPORTER_OTLP_HEADERS`, exactly as shown for Langfuse on the [Tracing & Metrics](./observability#sending-to-hosted-langfuse) page. That auth header is a property of the *backend*, not a Sema compatibility mode. The tables below say only whether a tool needs a `SEMA_OTEL_COMPAT` mode to *understand* Sema's attributes — the column for "does it need an API key" is "almost always, if it's hosted". ::: ### Reads `gen_ai.*` — no compatibility mode needed **General trace viewers and APM platforms.** These store and display `gen_ai.*` as ordinary span attributes (the LLM-specific ones also build GenAI dashboards from them): | Tool | Self-hostable? | Notes | | --- | --- | --- | | Grafana / Tempo, [Jaeger](https://www.jaegertracing.io/) | yes | plain OpenTelemetry trace viewers | | [SigNoz](https://signoz.io/) | yes | OTLP on 4317 / 4318 | | [OpenObserve](https://openobserve.ai/) | yes | OTLP `/api/{org}/v1/traces` *(verified live)* | | Honeycomb, Elastic | partly | general OTel APM | | [Logfire](https://pydantic.dev/logfire) | no | Pydantic's OTel platform | | [Datadog](https://www.datadoghq.com/) LLM Observability | no | maps `gen_ai.*` (semconv 1.37+) natively; needs a Datadog API-key header | | [Dynatrace](https://www.dynatrace.com/) | no | maps `gen_ai.*` natively; needs a Grail (DPS) licence + an ingest token | | [Coralogix](https://coralogix.com/) AI Center | no | maps `gen_ai.*`, but needs account-side setup (S3-archive routing + the experimental-semconv opt-in) | | [New Relic](https://newrelic.com/) | no | accepts OTLP and stores `gen_ai.*` as raw attributes; native GenAI dashboards are not documented | **LLM-native platforms.** These parse `gen_ai.*` into structured LLM records on their own OTLP endpoint (all hosted ones need an API key/header): | Tool | Self-hostable? | OTLP endpoint / notes | | --- | --- | --- | | [OpenLIT](https://openlit.io/) | yes | OTel-native; `docker compose up -d`; OTLP on 4318, no auth by default | | [MLflow](https://mlflow.org/) | yes | tracking server exposes an OTLP `/v1/traces` endpoint | | [Braintrust](https://www.braintrust.dev/) | no | maps `gen_ai.*` to structured fields; API key required (see the optional `braintrust` mode below) | | [W\&B Weave](https://wandb.ai/) | no | `…/otel/v1/traces`; parses `gen_ai.*`; Basic-auth + `project_id` header *(verified in docs)* | | [Portkey](https://portkey.ai/) | no | `/v1/otel/v1/traces`; reads `gen_ai.*`; `x-portkey-api-key` header | | [HoneyHive](https://honeyhive.ai/) | no | `/v1/traces`; reads `gen_ai.*`; Bearer + `x-honeyhive` project header | | [Opik](https://www.comet.com/opik) (Comet) | yes | `/api/v1/private/otel` (HTTP only); API key + project/workspace headers | | [Lunary](https://lunary.ai/) | yes | `/v1/otel`; reads `gen_ai.*`; Bearer token | | [Maxim AI](https://www.getmaxim.ai/) | no | `/v1/otel`; reads `gen_ai.*` / `llm.*` / `ai.*`; `x-maxim-*` headers | | [PostHog](https://posthog.com/) | yes | `/i/v0/ai/otel`; maps `gen_ai.*` → `$ai_*` events; project token | | [FutureAGI](https://futureagi.com/) | no | native convention is `gen_ai.*` (+ `fi.span.kind`); OpenInference is only an optional output mode | | [Laminar](https://www.lmnr.ai/) | yes | parses `gen_ai.*` (+ its own `lmnr.*`); HTTP + gRPC; API key | | [Agenta](https://agenta.ai/) | yes | translates `gen_ai.*` into its own `ag.*`; HTTP/protobuf only; API key | | [Confident AI](https://www.confident-ai.com/) | no | Observatory endpoint reads `gen_ai.*` (+ `confident.*`); API key — this is DeepEval's backend | | [Patronus AI](https://docs.patronus.ai/) | no | OTLP gRPC; ingests standard OTel spans; `x-api-key` header | | [Promptfoo](https://www.promptfoo.dev/) | local | built-in OTLP receiver (port 4318) **while `promptfoo eval` runs**; no token | ### Needs a compatibility mode These ingest OTLP but key off their **own** attribute names, so without the matching `SEMA_OTEL_COMPAT` mode a Sema span shows up with blank or "unknown" fields: | Tool | `SEMA_OTEL_COMPAT` mode | What it adds | | --- | --- | --- | | [Arize Phoenix](https://phoenix.arize.com/), [Arize AX](https://arize.com/) | `openinference` | span types, model/provider, tokens, cost, message I/O, tool args + schemas | | [Langfuse](https://langfuse.com/) | `langfuse` | observation type/model, usage + cost detail, trace-level input/output, tags | | [Traceloop](https://www.traceloop.com/) / OpenLLMetry | `traceloop` | span types, entity input/output, indexed message keys, tool functions | | [LangSmith](https://www.langchain.com/langsmith) | `langsmith` | run types, session threading, tags/metadata | | [Braintrust](https://www.braintrust.dev/) | `braintrust` *(optional)* | adds the richer `braintrust.*` tags/metadata/scores (it already reads `gen_ai.*` without it) | > **Often grouped with OpenLLMetry, but actually `gen_ai.*`-native:** Laminar, LangWatch, > Agenta and FutureAGI are sometimes listed as "Traceloop-compatible". In practice they read > `gen_ai.*` directly (Agenta and FutureAGI translate it into their own namespace), so they > need **no** compatibility mode — they're in the table above. The OpenLLMetry SDK works with > them because *it too* emits `gen_ai.*`, not because they parse the `traceloop.*` namespace. > **Advertise OTel but unconfirmed:** Galileo, PromptLayer, Keywords AI, Arthur AI, and > [LangWatch](https://langwatch.ai/) accept OTLP or claim OpenTelemetry support, but their > docs don't pin down which attributes they surface from a raw push. They may well work — > send a trace with the standard setup and check whether your spans appear. ## Setting `SEMA_OTEL_COMPAT` It's an environment variable like the others (see [How to turn it on](./observability#how-to-turn-it-on)). Its value is a comma-separated list of compatibility modes — the lower-case names from the table above: ```bash # Just Phoenix: SEMA_OTEL_COMPAT=openinference sema myagent.sema # Phoenix and Langfuse at once: SEMA_OTEL_COMPAT=openinference,langfuse sema myagent.sema # Every mode at once — useful if you're not sure which backend you'll use: SEMA_OTEL_COMPAT=all sema myagent.sema ``` Accepted modes: `openinference` (also `phoenix`, `arize`), `traceloop` (also `openllmetry`), `langsmith`, `langfuse`, `braintrust`, and `all`. Names you don't recognise are ignored, so a typo won't break anything. Some of the added detail — message text, tool arguments and results, and the trace-level input/output summary — is **content**, so it only appears when you also turn on content capture with `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true` (see [Privacy](./observability#privacy)). Token counts, models, cost, and span types are always added. When `SEMA_OTEL_COMPAT` is unset, no extra attributes are written — the traces are exactly what you get on the [Tracing & Metrics](./observability) page. ## Per-tool setup ### Arize Phoenix (OpenInference) Phoenix is an open-source LLM trace viewer that runs in one container: ```bash # Start Phoenix. UI on 6006; it accepts traces on 6006 (HTTP) and 4317 (gRPC). docker run -d --name phoenix -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest SEMA_OTEL_COMPAT=openinference \ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:6006 \ OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true \ sema -e '(llm/complete "say hi" {:max-tokens 16})' ``` Open `http://localhost:6006`. Each Sema span is typed (`LLM` / `TOOL` / `AGENT` / `EMBEDDING`) and shows the model, provider, token counts, cost, the message I/O, and — for agent runs — tool arguments, results, and the tool schemas offered to the model. ### Langfuse Langfuse already reads several of Sema's standard attributes (cost and message I/O). The `langfuse` value fills in the rest — the observation type and model, the usage/cost detail objects, and the trace-level input/output summary: ```bash SEMA_OTEL_COMPAT=langfuse \ OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:3000/api/public/otel" \ OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic " \ OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true \ sema myagent.sema ``` (See the [Langfuse example](./observability#sending-to-hosted-langfuse) for how to build the auth header.) Multi-turn runs group into [Sessions](./observability#sessions-and-users-grouping-multi-turn-runs) via the `:session-id` and `:user-id` options. ### Traceloop (OpenLLMetry) Traceloop is mainly a hosted product, but it reads plain OTLP, so you can also view the output in any OTLP backend (such as SigNoz). `SEMA_OTEL_COMPAT=traceloop` adds the `traceloop.span.kind` and `traceloop.entity.*` attributes, the indexed per-message keys, and the advertised tool functions. (You only need this for Traceloop's own platform — the look-alikes Laminar, LangWatch and Agenta read `gen_ai.*` directly.) ### LangSmith Point Sema at LangSmith's OTLP endpoint with your API key and `SEMA_OTEL_COMPAT=langsmith`; this adds LangSmith's run types, session threading, and tags/metadata, which are needed for those features (`gen_ai.*` alone can't populate them). LangSmith is primarily hosted, but Enterprise self-hosted deployments expose their own OTLP endpoint too. ### Braintrust Braintrust reads the standard attributes, so it works with no value set. Add `braintrust` only if you want its native `braintrust.tags` and `braintrust.metadata` fields. ## Span-type mapping How each Sema span is labelled for each tool when its compat value is on: | Sema span | OpenInference | Traceloop | LangSmith | Langfuse | | --- | --- | --- | --- | --- | | `chat` | `LLM` | `task` | `llm` | `generation` | | `embeddings` | `EMBEDDING` | `task` | `embedding` | `generation` | | `execute_tool` | `TOOL` | `tool` | `tool` | `span` | | `invoke_agent` | `AGENT` | `agent` | `chain` | `span` | | `retrieve` (vector search) | `RETRIEVER` | `workflow` | `retriever` | `span` | | `rerank` | `RERANKER` | `workflow` | `chain` | `span` | | notebook cell / retry | `CHAIN` | `workflow` | `chain` | `span` | ## Tags, metadata & streaming TTFT With a compat mode on, three more things are filled in automatically — all behind the same `SEMA_OTEL_COMPAT` switch, so a plain OTel backend stays lean. **Auto-tags.** Every LLM span is tagged with its `operation:…`, `provider:…`, and `model:…`, plus `cache-hit` on a cache-served response. These land on `langfuse.trace.tags`, `braintrust.tags`, and `langsmith.span.tags`. **Your own tags & metadata.** Pass `:tags` (a list) and `:metadata` (a map) to `llm/complete`, `llm/chat`, `llm/stream`, or `agent/run`. Your tags are merged with the auto-tags; metadata fans out to each backend's native field (`langfuse.trace.metadata.*`, `langsmith.metadata.*`, `traceloop.association.properties.*`, `braintrust.metadata`). ```sema (llm/complete "Summarize this." {:max-tokens 100 :tags ["prod" "summarizer"] :metadata {:env "prod" :feature "digest"}}) ``` **Streaming time-to-first-token.** A streamed call records how long the first token took. It's always on the span as `sema.gen_ai.server.time_to_first_token` (seconds) + `sema.gen_ai.is_streaming`, and with compat on it also fills Langfuse's `completion_start_time` (which drives its TTFT column). Traceloop/OpenLLMetry gets the boolean `gen_ai.is_streaming` (OpenLLMetry has no per-span TTFT attribute — it tracks streaming latency as a histogram metric instead). Two more identity fields ride along when their backend is active: LangSmith's `langsmith.trace.session_id` (it ignores the standard `session.id`), and Langfuse's `langfuse.release` from `SEMA_OTEL_RELEASE`. **Per-direction cost split (OpenInference).** Alongside the combined `llm.cost.total`, chat spans also carry `llm.cost.prompt` and `llm.cost.completion`, so Phoenix/Arize show the prompt-vs-completion cost breakdown. Derived from Sema's in-SDK cost computation. **Embedding detail (OpenInference).** Embeddings spans carry `embedding.model_name`, and — when [content capture](#limitations) is enabled — the input texts at `embedding.embeddings.{i}.embedding.text` (capped per call). Raw vectors are never emitted. ## Tools you can't send traces to Some LLM tools collect data a different way — through their own client SDK, by sitting in front of your API calls as a proxy, or by running offline evaluations — rather than by receiving OpenTelemetry traces. Sema's OTLP export can't feed those; to use one, follow its own integration guide instead. The main categories: * **Proxies / gateways** — capture by routing your model calls through them, not by accepting traces: [Helicone](https://www.helicone.ai/), [LiteLLM](https://litellm.ai/), [Pezzo](https://pezzo.ai/). (Portkey is *not* here — its observability endpoint accepts OTLP and reads `gen_ai.*`; see the table above.) * **SDK-only platforms** — ingest only through their own Python/JS library, with no OTLP trace endpoint: [Vellum](https://www.vellum.ai/), [Athina AI](https://athina.ai/), [Parea AI](https://www.parea.ai/), [Nebuly](https://www.nebuly.com/). * **Evaluation-only** — offline scoring/testing, not a runtime trace receiver: [RAGAS](https://docs.ragas.io/), [UpTrain](https://uptrain.ai/), [Evidently AI](https://www.evidentlyai.com/), [Giskard](https://www.giskard.ai/), [TruLens](https://www.trulens.org/). * **Guardrails libraries** that *emit* telemetry rather than receive it: [NVIDIA NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails), [Guardrails AI](https://www.guardrailsai.com/). * **Has an OTLP endpoint, but needs attributes Sema doesn't emit** — [Fiddler AI](https://www.fiddler.ai/) accepts OTLP/HTTP, but requires its own `fiddler.span.type` and `application.id` on every span; without them spans are dropped, and Sema has no Fiddler compatibility mode to add them. > Several tools that *used* to be SDK-only or eval-only now run an OTLP endpoint — Opik, > Lunary, PostHog, Maxim, Promptfoo, Patronus and Confident AI are all in the supported > tables above. [Humanloop](https://humanloop.com/) is gone the other way: its team joined > Anthropic and the platform was sunset in September 2025, so it's no longer an integration > target. If a tool below later adds an OTLP endpoint that reads the GenAI conventions, Sema > works with it the same as the others — no change needed on Sema's side. ## Limitations * **Message content requires the opt-in flag.** The message I/O, tool arguments and results, and the trace-level input/output only appear when `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true`. Token counts, models, cost, and span types are always added. * **OpenInference has no separate tool-result field** — the result appears in the tool span's `output.value` rather than a dedicated attribute. * **A backend may re-derive cost** from the token counts on its side rather than reading Sema's `gen_ai.usage.cost`, so the figure it shows can differ from Sema's exact per-call cost (which accounts for cache pricing). * **Proxies and gateways can't receive traces.** Helicone, LiteLLM and Pezzo capture data by routing your model calls through them, not by accepting an OTLP push — use their own integration instead. * **Not yet implemented:** the per-message *indexed* attribute form some older Traceloop/LangSmith parsers expect (Sema emits the structured and entity forms today). * **More attributes per span.** Compat adds extra copies of each value. If you only use a plain OTel backend, leave `SEMA_OTEL_COMPAT` unset to keep spans lean. --- --- url: 'https://sema-lang.com/docs/llm/rag.md' description: >- Build a retrieval-augmented generation pipeline in Sema — embeddings, vector search, cross-encoder reranking, and a grounded answer. --- # RAG: retrieve, rerank, answer Retrieval-Augmented Generation (RAG) answers a question by first **finding the relevant documents** and then asking the model to answer **using only those documents**. It's how you get grounded, citable answers over a corpus the model was never trained on — your docs, your codebase, your knowledge base. Sema has the whole pipeline as first-class primitives: | Step | Primitive | What it does | | --- | --- | --- | | **Embed** | `llm/embed` | Turn text into vectors (a *bi-encoder* — query and document embedded independently) | | **Retrieve** | `vector-store/*` | Cosine nearest-neighbour search over those vectors | | **Rerank** | `llm/rerank` | A *cross-encoder* reorders the candidates by reading query + document together | | **Answer** | `llm/complete` | Generate an answer grounded in the top reranked documents | The recipe everyone converges on is **retrieve many, rerank to a few**. Vector search has high recall but coarse ordering — because the query and each document are embedded *separately*, the score can't model how they interact. A reranker reads them *together*, so it's far more precise. You retrieve a generous shortlist by cosine (say top 12), then let the reranker pick the best 4. This guide builds a working example that indexes Sema's **own builtin documentation** and answers "which function do I use?" questions. The full file is [`examples/llm/rag-docs-search.sema`](https://github.com/HelgeSverre/sema/blob/main/examples/llm/rag-docs-search.sema). ## Setup You need two kinds of key in your environment: * An **embedding + rerank** provider: `JINA_API_KEY`, `VOYAGE_API_KEY`, or `COHERE_API_KEY`. All three offer both embeddings and reranking from the same key, and Sema auto-configures them on startup. * A **chat** provider for the final answer: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, … ## 1. Index the corpus Read each document, embed it, and add the vector to a store. Two things make this fast and cheap: * **Batch the embeddings.** `llm/embed` takes a *list* and returns a list of vectors, so you embed many documents per network call. * **Cache the store to disk.** `vector-store/open` loads a saved store if the file exists (and starts empty otherwise), so you only pay to index once. ```sema (define (build-index!) (let* ((files (file/glob "crates/sema-docs/entries/stdlib/**/*.md")) (docs (map (lambda (p) {:name (path/stem p) :path p :text (string/take (file/read p) 900)}) ; bounds-safe truncate files)) ;; Embed in batches: list/chunk splits into 64s, flat-map runs one call per ;; batch and flattens the per-batch vector lists back into one. (vecs (flat-map (lambda (b) (llm/embed b)) (list/chunk 64 (map (lambda (d) (:text d)) docs))))) ;; map walks docs + vecs in lockstep — Scheme-style multi-list map. (map (lambda (doc vec) (vector-store/add "docs" (:name doc) vec doc)) docs vecs) (vector-store/save "docs"))) (vector-store/open "docs" "/tmp/sema-docs.vec") (when (= (vector-store/count "docs") 0) (build-index!)) ``` This leans entirely on stdlib: `string/take` truncates safely (no manual length check), `list/chunk` batches a list into fixed-size groups, `flat-map` maps-then-flattens, and `map` walks several lists in lockstep. We store the whole document map as the vector's metadata (the 4th argument to `vector-store/add`) — that's how we get the text back at query time. ## 2. Retrieve Embed the question and pull a generous shortlist by cosine similarity. The query embedding must come from the same model as the stored vectors — which it does, since we use the same configured provider. ```sema (define question "How do I read a file from disk and split it into lines?") (define query-vec (llm/embed question)) (define candidates (vector-store/search "docs" query-vec 12)) ``` Each candidate is a map `{:id :score :metadata}`, sorted by cosine score. ## 3. Rerank Pull the document text out of each candidate's metadata and let the cross-encoder reorder them to the best 4: ```sema (define candidate-texts (map (lambda (c) (:text (:metadata c))) candidates)) (define reranked (llm/rerank question candidate-texts {:top-k 4})) ``` `llm/rerank` returns `{:index :score :document}` maps, highest relevance first. `:index` points back into the list you passed in, so you can recover the original candidate (and its id/metadata): ```sema (for-each (lambda (r) (let ((name (:id (nth candidates (:index r)))) (score (math/round-to (:score r) 3))) (println f" ${score} ${name}"))) reranked) ``` ``` 0.467 file-read-lines 0.304 read-line 0.293 file-for-each-line 0.239 io-read-line ``` The reranker pushed `file-read-lines` to the top — exactly the function the question is about. Treat the scores as an *ordering*, not a calibrated probability: they're query-dependent, so 0.47 isn't "twice as relevant" as 0.24. ## 4. Answer Concatenate the top documents and instruct the model to answer using only them: ```sema (define context (string/join (map (lambda (r) (nth candidate-texts (:index r))) reranked) "\n\n---\n\n")) (define prompt f"Using ONLY the Sema documentation below, answer the question and name the exact functions to call.\n\nDOCS:\n${context}\n\nQUESTION: ${question}") (println (llm/complete prompt {:max-tokens 400})) ``` > **Reading a File as Lines** — Use `file/read-lines` to read a file from disk and get back a list of lines directly. For large files, use `file/for-each-line` to iterate without loading everything into memory. Grounded, correct, and citable — the model only saw the four documents the reranker chose. ## Choosing a reranker `llm/rerank` uses your configured rerank provider; override per call with `:provider` (a keyword like `:cohere` or the string `"cohere"`) and `:model`. | Provider | `:provider` | Default model | Billing | | --- | --- | --- | --- | | Cohere | `:cohere` | `rerank-v3.5` | per search (flat per call) | | Jina | `:jina` | `jina-reranker-v2-base-multilingual` | per token | | Voyage | `:voyage` | `rerank-2.5` | per token | ```sema (llm/rerank query docs {:top-k 5 :provider :voyage :model "rerank-2.5"}) ``` Override `:model` for a newer version, multilingual support, or to trade cost for quality — check the provider's docs for current model names. ## Scores, top-k, and cost **Scores rank; they don't threshold.** The `:score` is the provider's raw relevance score — query-dependent, *not* a calibrated probability, and on a different scale for each of Cohere, Jina, and Voyage. Use scores to order results *within a single rerank call*; don't compare them across providers or read a fixed cutoff as meaningful. If everything comes back with uniformly low scores, that's a signal the query and corpus don't match — not that the reranker failed. **Choosing top-k.** `:top-k` is how many of the best documents to keep — typically **3–10** for RAG, sized so the kept documents fit comfortably in the answer prompt's context window. Omit `:top-k` to rerank and return *all* documents in relevance order. **Cost and latency scale with the candidate set.** A reranker scores *every* candidate against the query, so cost and latency grow with the number of candidates and their length (Cohere bills per search, Jina/Voyage per token). That's why you **retrieve-then-rerank** — pull a shortlist with cheap vector search (say top-20), then rerank to the top-k — instead of reranking the whole corpus. Reranking is a *refinement* on a shortlist, never a standalone search over everything. ## Error handling ```sema (try (llm/rerank query candidates {:top-k 5}) (catch e (println "rerank failed:" (:message e)))) ``` * An **empty document list** returns `()` immediately, with no API call. * **API / network / rate-limit / invalid-model** failures raise a `SemaError` (catch with `try`). * An **unknown `:provider`** — or no rerank provider configured at all — raises a "rerank provider not found" error. Set `COHERE_API_KEY` / `JINA_API_KEY` / `VOYAGE_API_KEY`, or pass `:provider` explicitly. ## Observability With [OpenTelemetry](/docs/llm/observability) on and a compat backend selected, the retrieve and rerank steps emit OpenInference `RETRIEVER` and `RERANKER` spans — the reranker span carries the model name, `top-k`, and (with [content capture](/docs/llm/otel-compat) enabled) the reordered documents and their scores. A full RAG trace renders natively in Phoenix/Arize alongside the embedding and chat spans, which makes "why did this answer cite the wrong doc?" debuggable end to end. ## When do you actually need a reranker? Reranking is the highest-leverage, lowest-effort quality lever in RAG — but it isn't free, so it's worth knowing when it pays off. The honest test is to **A/B it**: measure answer quality (and added latency) with retrieve-only vs. retrieve-then-rerank on your own queries. **Reach for it when** cosine top-k returns *roughly* relevant results but the *ordering* is off, your documents are long or ambiguously worded (where a bi-encoder's single vector blurs detail a cross-encoder recovers), or you retrieve a large shortlist and need to trim it to what fits the prompt. **You can skip it when** the corpus is small, your embedding model already nails ordering for your queries, or context-window space isn't a constraint — `vector-store/search` alone is a complete retriever. --- --- url: 'https://sema-lang.com/docs/internals/architecture.md' --- # Architecture Overview Sema is a Lisp with first-class LLM primitives, implemented in Rust. All code runs on a single evaluator: a [bytecode VM](./bytecode-vm.md). The runtime is single-threaded (`Rc`, not `Arc`), with deterministic destruction via reference counting instead of a garbage collector. The entire implementation is ~116k lines of Rust across 15 crates, each with a clear responsibility and strict dependency ordering. ## Crate Map ``` ┌──────────────────────────────────────┐ │ sema │ │ (binary: CLI, REPL, embedding API) │ └──┬─────────────────┬─────────────────┘ │ │ ┌─────────────▼──┐ ┌────▼─────┐ │ sema-notebook │ │ sema-eval│ │ notebook UI + ├────────►│ macros + │ │ server │ │ modules │ └────────────────┘ └─┬───┬──┬─┘ │ │ │ ┌────────────────▼┐ │ ┌▼──────────────┐ │ sema-stdlib │ │ │ sema-llm │ │ native fns │ │ │ LLM providers │ └────────┬────────┘ │ │ + embeddings │ │ │ └───────┬───────┘ │ ┌────▼─────┐ │ │ │ sema-vm │ │ │ │ bytecode │ │ │ │ VM │ │ │ │(evaluator)│ │ │ └────┬─────┘ │ │ │ │ ┌────▼───────────▼──┐ │ │ sema-reader │ │ │ lexer/parser │ │ └────────┬──────────┘ │ │ │ ┌────────▼───────┐ │ │ sema-core │◄────────┘ │ Value, Env, │ │ SemaError │ └────────────────┘ ``` **Dependency flow:** `sema-core ← sema-reader ← sema-vm ← sema-eval ← sema` — with `sema-eval` also pulling in `sema-stdlib` (to register builtins) and `sema-llm`, both of which depend only on `sema-core` (plus `sema-reader` for stdlib). The critical constraint: **sema-stdlib and sema-llm depend on sema-core, not on sema-eval.** This avoids circular dependencies but creates a problem — both crates sometimes need to evaluate user code. They solve it via dependency inversion: * **sema-stdlib** invokes the real evaluator via callbacks (`call_callback`/`eval_callback`) registered by `sema-eval` at startup — stored on the `EvalContext` and a shared thread-local stdlib context * **sema-llm** mostly uses the same core callbacks, but still carries a redundant second eval callback of its own (tech debt — see [Solution 2](#solution-2-eval-callback-sema-llm-redundant-slated-for-removal)) This is discussed in detail in [The Circular Dependency Problem](#the-circular-dependency-problem). ### Crate Responsibilities | Crate | Role | Key types | | --------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **sema-core** | Shared types | `Value` (NaN-boxed 8-byte), `Env`, `SemaError`, string interner, `NativeFn`, `Lambda`, `Macro`, `Record`, LLM types | | **sema-reader** | Parsing | `Lexer` (24 token types) + recursive descent `Parser` → `Value` AST + `SpanMap` | | **sema-vm** | Bytecode VM (the evaluator) | `CoreExpr`, `ResolvedExpr`, `Op`, `Chunk`, `Emitter` — lowering, resolution, compilation, VM dispatch | | **sema-eval** | Macro expansion + module loading | Macro expander (VM-native), module system (`import`/`load`), prelude, eval/call callback wiring; drives the VM. *Not* a standalone evaluator — the VM is the sole evaluator | | **sema-stdlib** | Standard library | Native functions across a comprehensive standard library | | **sema-llm** | LLM integration | `LlmProvider` trait, native providers (Anthropic, OpenAI, Gemini, Ollama), OpenAI-compatible shim, embedding providers, cost tracking | | **sema-otel** | Observability | OpenTelemetry facade (spans/metrics for LLM, agent, tool, and notebook runs); depends only on `sema-core`; native-only (no-op on wasm32) | | **sema-lsp** | Language Server | LSP via tower-lsp: completions, hover, go-to-definition, references, rename, semantic tokens, diagnostics | | **sema-dap** | Debug Adapter | DAP server: breakpoints, stepping, stack traces, variable inspection via VM debug hooks | | **sema-fmt** | Formatter | Code formatter for `.sema` files (`sema fmt`) | | **sema-notebook** | Notebook interface | `.sema-nb` JSON format, evaluation engine, HTTP server with REST API, embedded browser UI, Markdown export | | **sema-wasm** | WASM bindings | Browser playground bindings, JS interop via `wasm-bindgen` | | **sema-mcp** | MCP server | Model Context Protocol server exposing Sema eval/build/notebook tools (`sema mcp`) | | **sema-docs** | Doc generation (internal) | Builtin-docs index generator (`make docs`); not shipped as a binary | | **sema** | Binary | clap CLI, reedline REPL (highlighter / hinter / inspector live in `crates/sema/src/repl/`), `InterpreterBuilder` embedding API | ## The Value Type All Sema data is represented by a single NaN-boxed `Value` — an 8-byte `struct Value(u64)` that encodes every type in IEEE 754 quiet NaN payload space: ```rust // crates/sema-core/src/value.rs #[repr(transparent)] pub struct Value(u64); // Encoding: floats stored as raw f64 bits. // All other types packed into quiet NaN payloads: // sign=1 | exponent=0x7FF | quiet=1 | TAG(6 bits) | PAYLOAD(45 bits) // // Immediate types (no heap allocation): // Nil, Bool, Char, Symbol(Spur), Keyword(Spur), IntSmall(±2^44) // // Heap types (Rc pointer in 45-bit payload): // IntBig, String, List, Vector, Map, HashMap, Lambda, Macro, NativeFn, // Prompt, Message, Conversation, ToolDef, Agent, Thunk, Record, Bytevector, // MultiMethod, Stream, F64Array, I64Array, AsyncPromise, Channel // // Pattern matching via val.view() → ValueView enum ``` ::: details The IBM 704 connection (1955) The idea of packing type information and data into a single machine word goes back to the [IBM 704](http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf) — the machine Lisp was born on. The 704's 36-bit word was divided into sub-fields: a 3-bit **prefix** (opcode), a 15-bit **decrement**, a 3-bit **tag** (register selector), and a 15-bit **address**. The same word could be an instruction, a fixed-point number, a floating-point number, or six BCD characters — depending entirely on context. Sema's NaN-boxing is the same fundamental idea scaled to 64 bits: 6 tag bits + 45 payload bits, where the tag determines how to interpret the payload. The 704 also pioneered the biased-exponent floating-point format (sign + 8-bit characteristic biased by +128 + 27-bit fraction) that would eventually become IEEE 754 thirty years later — the very standard whose NaN space we now exploit for type tagging. ::: Several design choices here are worth examining. ### Why `Rc`, Not `Arc` Sema is single-threaded. `Arc` adds an atomic increment/decrement on every clone/drop — unnecessary overhead when there's no cross-thread sharing. `Rc` uses ordinary (non-atomic) reference counting, which is cheaper and also means the compiler can catch accidental `Send`/`Sync` usage at compile time. The trade-off versus a tracing garbage collector: reference counting gives deterministic destruction (values are freed the instant their last reference drops), but cannot collect cycles. In practice this is rarely a problem — Lisp closures tend to create tree-shaped reference graphs, not cycles. A lambda captures its enclosing environment, which may capture its own enclosing environment, forming a chain. Cycles are theoretically possible (e.g., named lambdas bind themselves in their own environment, and `Thunk` uses `RefCell` which could close over itself), but they don't arise in typical Sema programs. If they did, the leaked memory would be bounded by the closure's captured environment — not a growing leak. Sema uses NaN-boxing — encoding values in the unused bits of IEEE 754 NaN representations to fit a tagged value in 8 bytes, the same technique used by Janet. This makes `Value` the same size as a `f64` or a pointer, meaning the value stack and constant pool have excellent cache locality. Heap types like `List`, `Map`, and `Lambda` add one level of `Rc` pointer indirection, with the pointer stored in the 45-bit payload field (using the 8-byte alignment guarantee to shift the pointer right by 3 bits). Small integers (±17.5 trillion), symbols, keywords, characters, booleans, and nil are all stored entirely within the 8-byte NaN-box with zero heap allocation. ### Why Vector-Backed Lists `Value::List(Rc>)` stores list elements in a contiguous `Vec`, not a linked list of cons cells. This is a deliberate departure from traditional Lisp. ::: details Why `car` and `cdr` have those names McCarthy's original Lisp (1958) ran on the [IBM 704](http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf), which packed cons cells into a single 36-bit machine word. The **address** field (bits 21-35) held a pointer to the first element; the **decrement** field (bits 3-17) held a pointer to the rest of the list. The 704 had hardware instructions to extract these fields directly — `car` is literally "Contents of the Address Register" and `cdr` is "Contents of the Decrement Register." They were single machine instructions, not function calls. Sema keeps the names for Scheme compatibility but the implementation is completely different — `car` is a Vec index (`v[0]`) and `cdr` is a slice copy (`v[1..]`). ::: The performance trade-offs: | Operation | Vec-backed | Cons cells | | ------------------------------ | -------------- | --------------- | | Random access (`nth`) | O(1) | O(n) | | `length` | O(1) | O(n) | | Cache locality | Contiguous | Pointer-chasing | | `cons` (prepend) | O(n) copy | O(1) | | `append` | O(n) copy | O(n) | | Pattern matching (`car`/`cdr`) | Slice indexing | Natural | The performance win comes from cache locality — modern CPUs prefetch sequential memory, so iterating a `Vec` is dramatically faster than chasing pointers through a cons list. Random access and length are constant-time bonuses. The cost is O(n) `cons` and `append`. Sema mitigates this with copy-on-write optimization (see [Performance Internals](./performance.md#_1-copy-on-write-map-mutation)): when the `Rc` refcount is 1, mutations happen in place instead of copying. In practice, most list construction uses `list`, `map`, `filter`, or `fold` — which build a new `Vec` directly — rather than repeated `cons`. Clojure takes a third approach: persistent vectors backed by wide (32-way branching) array-mapped tries, giving effectively O(1) indexed access (O(log₃₂ n), which is ≤ 7 for any practical size) with structural sharing. Sema's approach is simpler and faster for small to medium lists, at the cost of no structural sharing. ### Why `BTreeMap` for Maps, `hashbrown` Opt-In `Value::Map` uses `BTreeMap` (sorted, deterministic iteration order) rather than `HashMap`. This matters for: * **Deterministic equality:** Two maps with the same entries compare identically via `PartialEq`, and iteration order is independent of insertion order — important for consistent hashing and display * **Printing:** `{:a 1 :b 2}` always prints in the same order, making test assertions reliable * **Usable as keys:** Maps can be keys in other `BTreeMap`s because `Value` implements `Ord`. Since `Map` variants compare by sorted content, two maps with the same entries are always equal under `Ord`, regardless of construction order For performance-critical code, `Value::HashMap` wraps `hashbrown::HashMap` (the SwissTable implementation used inside Rust's standard library). It's opt-in via `(hashmap/new)` — see the [Performance Internals](./performance.md#_5-hashbrown-hashmap) for benchmarks. ### Why `Spur` for Symbols and Keywords `Symbol(Spur)` and `Keyword(Spur)` store interned `u32` handles rather than strings. A thread-local `lasso::Rodeo` interner maps strings to `Spur` values and back: ```rust thread_local! { static INTERNER: RefCell = RefCell::new(Rodeo::default()); } pub fn intern(s: &str) -> Spur { INTERNER.with(|r| r.borrow_mut().get_or_intern(s)) } pub fn with_resolved(spur: Spur, f: F) -> R where F: FnOnce(&str) -> R, { INTERNER.with(|r| { let interner = r.borrow(); f(interner.resolve(&spur)) }) } ``` This makes symbol equality O(1) (integer comparison instead of string comparison) and environment lookup faster (integer keys in the env's hash map). It also means special form dispatch — the hottest path in the evaluator — compares `u32` values against pre-cached constants rather than resolving strings. String interning is as old as Lisp itself. McCarthy's original LISP 1.5 (1962) interned atoms in the "object list" (oblist). The key difference: Sema uses a separate interner rather than pointer identity, so interning is explicit via `intern()` rather than implicit. ### LLM Types as First-Class Values `Prompt`, `Message`, `Conversation`, `ToolDef`, and `Agent` sit in the `Value` type at the same level as `List` and `Map`. They're not encoded as maps-with-conventions — they're distinct types with their own constructors, pattern matching, and display representations: ```sema ;; These are values, not strings or maps (define msg (message :user "Hello")) ; => (define p (prompt msg)) ; => (define conv (conversation p :model "claude-sonnet-4-6")) ; => ``` This means the type system catches errors like passing a string where a message is expected, and tools like `complete` can dispatch on the actual type rather than checking for the presence of magic keys in a map. ## Environment Model The environment is a linked list of scopes, each holding a `SpurMap` (a `hashbrown::HashMap`): ```rust pub struct Env { pub bindings: Rc>>, pub parent: Option>, pub version: Cell, } ``` The `version` counter is bumped on every mutation; the bytecode VM's per-instruction inline caches use it to detect stale global lookups. Variable lookup walks the parent chain until it finds a binding or reaches the root. This is the standard lexical scoping model — a closure captures a reference to its defining environment, and lookups resolve outward through enclosing scopes. ### Operations | Operation | Behavior | Used by | | ------------------------- | --------------------------------------------- | --------------------------- | | `get(spur)` | Walk parent chain, return first match | Variable lookup | | `set(spur, val)` | Insert in current scope | `define`, parameter binding | | `set_existing(spur, val)` | Walk chain, update where found | `set!` (mutation) | | `update(spur, val)` | Overwrite in current scope | Hot-path env reuse | | `take(spur)` | Remove from current scope, return value | COW optimization | | `take_anywhere(spur)` | Remove from any scope in chain | COW optimization | `take` and `take_anywhere` exist for the copy-on-write optimization: by *removing* a value from the environment before passing it to a function, the `Rc` refcount drops to 1, enabling in-place mutation. See [Performance Internals](./performance.md#_1-copy-on-write-map-mutation). `update` exists for the lambda environment reuse optimization: when reusing an environment across iterations of a hot loop, `update` overwrites an existing binding in place instead of going through the full insert path. See [Performance Internals](./performance.md#_2-lambda-environment-reuse). ## Error Handling `SemaError` is a `thiserror`-derived enum with 12 variants including `WithTrace` and `WithContext` wrappers: ```rust #[derive(Debug, Clone, thiserror::Error)] pub enum SemaError { Reader { message: String, span: Span }, Eval(String), Type { expected: String, got: String, got_value: Option }, Arity { name: String, expected: String, got: usize }, Unbound(String), Llm(String), Io(String), PermissionDenied { function: String, capability: String }, PathDenied { function: String, path: String }, UserException(Value), WithTrace { inner: Box, trace: StackTrace }, WithContext { inner: Box, ... }, } ``` ### Constructor Helpers Errors are created via constructor methods, never raw enum variants: ```rust SemaError::eval("division by zero") SemaError::type_error("int", val.type_name()) SemaError::arity("map", "2", args.len()) ``` This keeps error construction concise across all native functions and special forms. ### Lazy Stack Traces Stack traces are not captured at error creation time. Instead, the `WithTrace` wrapper is attached during error *propagation* — as an error unwinds out through a function call, it is wrapped with the current call stack: ```rust pub fn with_stack_trace(self, trace: StackTrace) -> Self { if trace.0.is_empty() { return self; } match self { SemaError::WithTrace { .. } => self, // already wrapped, don't double-wrap SemaError::WithContext { inner, hint, note } => SemaError::WithContext { inner: Box::new(inner.with_stack_trace(trace)), // wrap inside the context hint, note, }, other => SemaError::WithTrace { inner: Box::new(other), trace, }, } } ``` This avoids the cost of capturing a stack trace for errors that are caught by `try`/`catch` — only errors that propagate to the top level pay the trace cost. The idempotence check (`WithTrace { .. } => self`) prevents double-wrapping when an error passes through multiple call frames. ## Interpreter State Sema's evaluator state is held in an explicit `EvalContext` struct, defined in `sema-core/src/context.rs` and threaded through the evaluator as `ctx: &EvalContext`. Each `Interpreter` instance owns its own `EvalContext`, enabling multiple independent interpreters per thread with fully isolated state. ### EvalContext Fields | Field | Type | Purpose | | ------------------- | ----------------------------------- | -------------------------------------------- | | `module_cache` | `RefCell>` | Loaded modules (path → exports) | | `current_file` | `RefCell>` | Stack of file paths being executed | | `module_exports` | `RefCell>>>` | Exports declared by currently-loading module | | `module_load_stack` | `RefCell>` | Cycle detection during module loading | | `call_stack` | `RefCell>` | Call frames for error traces | | `span_table` | `RefCell>` | Rc pointer address → source span | | `eval_depth` | `Cell` | Recursion depth counter | | `max_eval_depth` | `Cell` | High-water mark of eval depth | | `eval_step_limit` | `Cell` | Step limit for fuzz targets | | `eval_steps` | `Cell` | Current step counter | | `eval_deadline` | `Cell>` | Wall-clock budget (used by the notebook) | | `sandbox` | `Sandbox` | Capability sandbox | | `user_context` / `hidden_context` | `RefCell>>` | Dynamic context frames | | `context_stacks` | `RefCell>>` | Named context stacks | | `eval_fn` / `call_fn` | `Cell>` | Registered evaluator callbacks | | `interactive` | `Cell` | REPL/interactive mode flag | ### Remaining Thread-Locals Some state remains in thread-local storage — either because it's a pure performance cache or because it belongs to a subsystem that hasn't been refactored yet: | Location | Thread-local | Purpose | | ---------------------------- | ------------------- | --------------------------------------------- | | `sema-core/value.rs` | `INTERNER` | String interner (`lasso::Rodeo`) | | `sema-core/context.rs` | `STDLIB_CTX` | Shared `EvalContext` for stdlib callbacks | | `sema-eval/special_forms.rs` | `SF` | Cached `SpecialFormSpurs` (performance cache) | | `sema-llm/builtins.rs` | `PROVIDER_REGISTRY` | Registered LLM providers | | `sema-llm/builtins.rs` | `SESSION_USAGE` | Cumulative token usage | | `sema-llm/builtins.rs` | `LAST_USAGE` | Most recent completion's usage | | `sema-llm/builtins.rs` | `EVAL_FN` | Full evaluator callback | | `sema-llm/builtins.rs` | `SESSION_COST` | Cumulative dollar cost | | `sema-llm/builtins.rs` | `BUDGET_LIMIT` | Spending cap | | `sema-llm/builtins.rs` | `BUDGET_SPENT` | Spending against cap | | `sema-llm/pricing.rs` | `CUSTOM_PRICING` | User-defined model pricing | ### Implications for Embedding Multiple `Interpreter` instances can coexist on the same thread with fully isolated evaluator state — each has its own module cache, call stack, span table, and depth counters. The string interner (`INTERNER`) remains shared per-thread, which is correct since `Spur` handles must be consistent within a thread. LLM state (provider registry, usage tracking, budgets) is also per-thread, meaning all interpreters on the same thread share provider configuration and cost tracking. `Value` instances are not `Send` or `Sync` (they use `Rc`, not `Arc`), so interpreters cannot be moved across threads. ## WASM Support Sema compiles to WebAssembly with conditional compilation gates. The `#[cfg(not(target_arch = "wasm32"))]` attribute excludes modules that depend on OS-level capabilities: **From sema-stdlib:** * `io` — file system access (`file/read`, `file/write`, `file/fold-lines`, etc.) * `system` — process execution, environment variables, exit * `http` — HTTP client (`http/get`, `http/post`, etc.) * `terminal` — terminal control (colors, cursor, raw mode) * `kv`, `pdf`, `serial`, `server`, `sqlite` — other OS-dependent modules **From sema-eval:** * Module `import`/`load` (depends on file system) **sema-llm** is excluded entirely — LLM providers require network access. **sema-otel** compiles to no-op spans on wasm32 — the OTLP exporter and its async runtime are gated out, so tracing calls become zero-cost stubs in the browser. The pure-computation core (arithmetic, strings, lists, maps, JSON, regex, crypto, datetime, CSV, bytevectors, predicates, math, comparison, bitwise, meta) remains available in WASM, making Sema usable as an embedded scripting language in browser-based applications. ## The LLM Subsystem ### Provider Trait All LLM providers implement a single trait: ```rust pub trait LlmProvider: Send + Sync { fn name(&self) -> &str; fn complete(&self, request: ChatRequest) -> Result; fn default_model(&self) -> &str; // Optional — defaults provided fn stream_complete(&self, request: ChatRequest, on_chunk: &mut dyn FnMut(&str) -> Result<(), LlmError>, ) -> Result { /* non-streaming fallback */ } fn batch_complete(&self, requests: Vec) -> Vec> { /* sequential fallback */ } fn embed(&self, request: EmbedRequest) -> Result { /* unsupported error */ } } ``` Note the `Send + Sync` bound — despite the single-threaded runtime, provider implementations use `tokio::runtime::Runtime::block_on` internally to run async HTTP clients. The trait itself is synchronous; async is hidden behind the provider boundary. ### Provider Registry The `ProviderRegistry` holds registered providers by name with a default provider slot and a separate embedding provider slot: ```rust pub struct ProviderRegistry { providers: HashMap>, default: Option, embedding_provider: Option, } ``` At startup, the binary crate detects available API keys and registers providers: * `ANTHROPIC_API_KEY` → Anthropic (Claude) * `OPENAI_API_KEY` → OpenAI (GPT) * `GOOGLE_API_KEY` → Gemini * `GROQ_API_KEY`, `XAI_API_KEY`, `MISTRAL_API_KEY`, `MOONSHOT_API_KEY` → OpenAI-compatible shim * Ollama (local) is always registered — no auth needed; `OLLAMA_HOST` overrides the default `http://localhost:11434` Embedding providers (Jina, Voyage, Cohere) are registered separately and selected via `(llm/set-embedding-provider)`. ### Cost Tracking Every completion records token usage in `SESSION_USAGE` and computes dollar cost via a built-in pricing table (`pricing.rs`). The `llm/with-budget` function sets a scoped spending cap: ```sema (llm/with-budget {:max-cost-usd 0.50 :max-tokens 10000} (lambda () (llm/complete "Summarize this document..."))) ;; Raises an error if cumulative cost exceeds $0.50 or 10000 tokens ``` ## Observability Tracing and metrics live in **sema-otel**, a thin facade over [OpenTelemetry](https://opentelemetry.io/). It sits *below* the subsystems it instruments — `sema-llm`, `sema-stdlib`, and `sema-notebook` all depend on it — but it itself depends only on `sema-core`, so the OpenTelemetry stack never leaks into the core types. On `wasm32` the whole crate compiles to no-op stubs (see [WASM Support](#wasm-support)). Instrumentation is automatic and follows the OpenTelemetry [GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions-genai) (`gen_ai.*` attributes): every `llm/complete` and `llm/embed` emits a `CLIENT` span; each agent run, tool dispatch, and notebook cell emits an `INTERNAL` span (`invoke_agent` → `chat` → `execute_tool`); HTTP retries nest beneath the LLM span. Token counts, model, cost, and finish reason ride along as attributes. Tracing is **off by default** and exports over OTLP (HTTP by default, gRPC optional) or to a JSONL file. When Sema is embedded as a library it **never installs a global tracer provider on its own** — that is the host's job. The host chooses the wiring through `InterpreterBuilder::with_telemetry(TelemetryMode::…)`: `Off`, `UseHostGlobal` (emit against the provider the app already installed), `OwnProvider(p)` (a provider handed to Sema), or `FromEnv` (self-install from the `OTEL_*` variables, owned by the built `Interpreter`). Sema's spans nest under whatever span is current, so a host request span becomes the parent of the `invoke_agent` tree. An optional `SEMA_OTEL_COMPAT` setting also writes vendor-specific attribute names for backends that don't read `gen_ai.*`. See [Tracing & Metrics](../llm/observability) and [Backend Compatibility](../llm/otel-compat) for the user-facing guide. ## The Circular Dependency Problem One layering constraint shapes how the library crates reach the evaluator. It's a textbook case of **dependency inversion**, called out here mainly because the same pattern recurs in `sema-stdlib` and `sema-llm`. ### The Problem Both `sema-stdlib` and `sema-llm` sometimes need to evaluate user code: * **sema-stdlib:** `file/fold-lines` invokes a user-provided lambda on each line. `map`, `filter`, `fold`, `for-each`, `sort` all take lambda arguments. * **sema-llm:** Tool handlers defined via `deftool` are Sema expressions that must be evaluated when an LLM invokes the tool. But the dependency already runs the *other* way: `sema-eval` depends on `sema-stdlib` and `sema-llm` so it can register their builtins at startup. If either of them depended on `sema-eval` to reach `eval_value()`, that would close a cycle — which Cargo forbids. ``` sema-eval ──depends-on──► sema-stdlib / sema-llm (to register their builtins) sema-stdlib ──CANNOT depend on──► sema-eval (would close the cycle) └── so it reaches the evaluator through a callback instead ``` The cycle is a hard Cargo rule, but its *existence* is a deliberate trade: keeping the standard library and LLM layers as separate crates (for wasm gating, compile times, and isolated testing) while letting `sema-eval` assemble a batteries-included interpreter. Merging them into one crate would remove the cycle and the callback — at the cost of that modularity. The inversion below is the cheaper trade. ### Solution 1: Callback Architecture (sema-core + sema-stdlib) `sema-core` defines callback storage in `context.rs` that bridges the dependency gap using dependency inversion — function-pointer slots on `EvalContext`, plus a shared thread-local context for stdlib functions that don't receive a `ctx` parameter: ```rust pub type EvalCallbackFn = fn(&EvalContext, &Value, &Env) -> Result; pub type CallCallbackFn = fn(&EvalContext, &Value, &[Value]) -> Result; pub struct EvalContext { // ... pub eval_fn: Cell>, pub call_fn: Cell>, } thread_local! { static STDLIB_CTX: EvalContext = EvalContext::new(); } ``` At startup, `sema-eval` registers the real evaluator and call dispatch functions (into both the interpreter's context and the shared `STDLIB_CTX`): ```rust sema_core::set_eval_callback(&ctx, eval_value); sema_core::set_call_callback(&ctx, call_value); ``` All stdlib higher-order functions (`map`, `filter`, `fold`, `sort-by`, `for-each`, `file/fold-lines`, etc.) invoke user-provided lambdas through `sema_core::call_callback`, which dispatches to the real evaluator: ```rust // In sema-stdlib, e.g. map implementation let result = sema_core::call_callback(ctx, &func, &[elem])?; ``` The `with_stdlib_ctx` function provides a shared `EvalContext` for stdlib callbacks, avoiding per-call allocation of a new context. This is a clean dependency inversion — `sema-stdlib` depends only on the callback signature defined in `sema-core`, not on `sema-eval`. The runtime cost is one `Cell::get()` + function pointer dispatch per call, which is negligible. Unlike the previous mini-evaluator approach, this architecture uses the *same* evaluator everywhere — all special forms, builtins, and features are available inside higher-order functions like `map` and `file/fold-lines`. ### Solution 2: Eval Callback (sema-llm) — redundant, slated for removal `sema-llm` predates Solution 1 and still carries its *own* parallel callback — a `Box` in a thread-local (`EVAL_FN`), plus a hand-rolled function-application routine (`call_value_fn`) and a degraded mini-evaluator fallback (`simple_eval`). It bridges the same gap, redundantly: ```rust pub type EvalCallback = Box Result>; thread_local! { static EVAL_FN: RefCell> = RefCell::new(None); } pub fn set_eval_callback(f: impl Fn(&EvalContext, &Value, &Env) -> Result + 'static) { EVAL_FN.with(|eval| { *eval.borrow_mut() = Some(Box::new(f)); }); } ``` At startup, the binary crate registers the full evaluator: ```rust sema_llm::builtins::set_eval_callback(sema_eval::eval_value); ``` When a tool handler needs to evaluate Sema code, it calls through this indirection: ```rust fn full_eval(ctx: &EvalContext, expr: &Value, env: &Env) -> Result { EVAL_FN.with(|eval_fn| { let eval_fn = eval_fn.borrow(); match &*eval_fn { Some(f) => f(ctx, expr, env), None => simple_eval(expr, env), // fallback if no callback registered } }) } ``` This is the same dependency-inversion idea as Solution 1, but it should not be a *second* mechanism. `sema-llm` already uses the core `sema_core::call_callback` in a few places; the bespoke path duplicates it at ~15 call sites and, worse, `call_value_fn` re-implements function application by binding params into a plain `Env` and evaluating directly — bypassing the VM closure machinery (`run_nested_closure`) that the canonical `call_value` routes through. That means `set!`, captured upvalues, and async/yield *inside* a tool handler or streaming callback can behave differently than the same code inside a stdlib HOF. Consolidating `sema-llm` onto the core callback (and deleting `EVAL_FN`/`call_value_fn`/`simple_eval`) is tracked tech debt — see `docs/plans/2026-06-22-unify-sema-llm-eval-callback.md`. ### Why Not a Trait? An alternative would be to define an `Evaluator` trait in `sema-core` and have `sema-eval` implement it. This would work but adds complexity for little benefit — the callback is simpler, there's only one implementation, and it avoids threading a trait object through every function that might need evaluation. The callback approach also makes it easy to test `sema-llm` in isolation (register a mock evaluator). ### Architectural Lesson The circular dependency constraint forced a callback architecture that turned out to be a better design than having direct access to the evaluator would have been. The dependency inversion through `sema-core` callbacks gives a single, canonical evaluator used everywhere — stdlib HOFs, LLM tool handlers, and the main interpreter all run the same code paths with full feature support. This also provides a clean seam for future work: when the bytecode VM became the default backend, only the callback registrations needed to change — all call sites in stdlib and llm remained untouched, validating this design. Sometimes constraints lead to better designs than unconstrained freedom would have. --- --- url: 'https://sema-lang.com/docs/internals/build-a-bytecode-vm.md' --- # Build a Bytecode VM (in Sema) Compilers have a reputation for being black magic — dragon-book mystique, register allocators, the works. Most of that reputation is about *optimizing* compilers for *hardware*. The core idea — turning a program into instructions a machine can run in a loop — is small enough to fit on one screen. So let's build one. A real compiler and a real virtual machine, in about 80 lines of Sema, that you can paste into the [playground](https://sema.run) and run right now. Then we'll show that Sema's own engine — the thing running your code — is the *same pipeline*, just larger and faster. By the end you'll be able to read [Bytecode VM](./bytecode-vm.md) and [Bytecode File Format](./bytecode-format.md) and recognize every piece. ## The pipeline Every language that "compiles to bytecode" — Python, Lua, the JVM, Sema — follows the same shape: ``` source text → [ front end ] → instructions → [ virtual machine ] → result ``` Sema's full version has a few more stages (we'll get to them): ``` source → Reader → Lower → Optimize → Resolve → Compile → bytecode → VM ``` We're going to **borrow Sema's reader** for the front end, because our toy language *is* s-expressions — `(+ 1 (* 2 3))` is already a tree once Sema reads it. That lets us skip straight to the two stages everyone thinks are magic and aren't: **the compiler** (tree → instructions) and **the VM** (instructions → result). ## A stack machine Our VM is a *stack machine*. It has one stack and a handful of instructions that push and pop it. This is how most real bytecode VMs work (Sema, CPython, the JVM) — it's simpler than juggling registers. Watch how `(+ 1 (* 2 3))` becomes a flat list of instructions, and how running them left-to-right computes the answer using nothing but a stack: | Instruction | Stack after | | ----------- | ----------- | | `push 1` | `(1)` | | `push 2` | `(2 1)` | | `push 3` | `(3 2 1)` | | `mul` | `(6 1)`   ← popped 3 and 2, pushed 6 | | `add` | `(7)`     ← popped 6 and 1, pushed 7 | The answer, `7`, is the last thing on the stack. That's the whole trick: **operands go on the stack, operators consume them.** No magic — just a discipline for evaluating nested expressions without recursion at run time. ## The compiler The compiler is one recursive walk over the tree. Each kind of node emits a little list of instructions: * a **number** → `push` it * a **variable** → `load` it by name * a **binary form** `(op a b)` → compile `a`, compile `b`, then emit the operator (so the operands are already on the stack when it runs) ```sema (define (second xs) (nth xs 1)) (define (third xs) (nth xs 2)) ;; (op a b) -> [ ...code for a... ][ ...code for b... ][ op ] (define (emit-binary opcode args) (append (compile-expr (first args)) (compile-expr (second args)) (list (list opcode)))) (define (compile-expr expr) (cond ((number? expr) (list (list :push expr))) ((symbol? expr) (list (list :load expr))) (else (let ((op (first expr)) (args (rest expr))) (cond ((= op '+) (emit-binary :add args)) ((= op '-) (emit-binary :sub args)) ((= op '*) (emit-binary :mul args)) ((= op '<) (emit-binary :lt args)) ((= op 'if) (compile-if args)) (else (error (str "unknown form: " op)))))))) ``` That's the entire expression compiler. Notice it emits operands *before* the operator — that "postfix" ordering is exactly what makes the stack machine work. ## Control flow is just jumps The one piece that feels like it should be hard — `if` — turns out to be two jumps. We compile the test, then a **jump-if-false** that skips over the "then" branch, then the "then" code, then an unconditional **jump** that skips the "else", then the "else" code: ```sema (define (compile-if args) (let* ((test (compile-expr (first args))) (then (compile-expr (second args))) (alt (compile-expr (third args))) ;; jump-if-false skips THEN + the trailing jmp (jf (list (list :jfalse (+ (length then) 1)))) ;; jmp at end of THEN skips ELSE (jp (list (list :jmp (length alt))))) (append test jf then jp alt))) ``` The jump *offsets* are computed from the lengths of the branches we just compiled — relative "skip N instructions" jumps. Every loop, conditional, and `&&`/`||` you've ever written compiles down to exactly this: conditional and unconditional jumps over blocks of instructions. There is no `if` at the machine level — only "maybe jump." ## The virtual machine The VM is a loop. It holds a **program counter** (`pc`, which instruction we're on) and a **stack**. Each turn of the loop reads one instruction and does the obvious thing: ```sema ;; pop two, apply f, push result. (a was pushed first, so it's deeper.) (define (binop f stack) (let ((b (first stack)) (a (second stack)) (more (rest (rest stack)))) (cons (f a b) more))) (define (run code env) (let loop ((pc 0) (stack '())) (if (>= pc (length code)) (first stack) ; result = top of stack (let* ((ins (nth code pc)) (op (first ins))) (cond ((= op :push) (loop (+ pc 1) (cons (second ins) stack))) ((= op :load) (loop (+ pc 1) (cons (get env (second ins)) stack))) ((= op :add) (loop (+ pc 1) (binop + stack))) ((= op :sub) (loop (+ pc 1) (binop - stack))) ((= op :mul) (loop (+ pc 1) (binop * stack))) ((= op :lt) (loop (+ pc 1) (binop < stack))) ((= op :jmp) (loop (+ pc 1 (second ins)) stack)) ((= op :jfalse) (let ((top (first stack)) (more (rest stack))) (if top (loop (+ pc 1) more) (loop (+ pc 1 (second ins)) more)))) (else (error (str "bad op: " op)))))))) ``` That `cond` is the **dispatch loop** — the literal heart of every bytecode interpreter. A jump is just "set `pc` to somewhere else instead of `pc + 1`." Sema's real dispatch loop is the same idea with 66 instructions instead of 8, written in Rust for speed. ## Run it ```sema (define program '(+ 1 (* 2 3))) (define bc (compile-expr program)) (println "source: " program) (println "bytecode: " bc) (println "result: " (run bc {})) (define p2 '(if (< x 5) (* x 10) (- x 5))) (println "") (println "source: " p2) (println "bytecode: " (compile-expr p2)) (println "x=3 -> " (run (compile-expr p2) {'x 3})) (println "x=9 -> " (run (compile-expr p2) {'x 9})) ``` Output: ``` source: (+ 1 (* 2 3)) bytecode: ((:push 1) (:push 2) (:push 3) (:mul) (:add)) result: 7 source: (if (< x 5) (* x 10) (- x 5)) bytecode: ((:load x) (:push 5) (:lt) (:jfalse 4) (:load x) (:push 10) (:mul) (:jmp 3) (:load x) (:push 5) (:sub)) x=3 -> 30 x=9 -> 4 ``` That's a working compiler and VM. You can see the bytecode it produces, watch the `if` become a `:jfalse`/`:jmp` pair, and run the result. Nothing was hidden. ## How Sema does the same thing, at scale Sema's engine is this exact pipeline — every concept above has a real counterpart, just bigger and faster. Here's the map: | Toy version (this page) | Sema's engine | Where | | ---------------------------------- | -------------------------------------- | ----- | | Borrowed the reader | Real lexer + recursive-descent parser, with source spans | [Reader & Spans](./reader.md), `crates/sema-reader` | | `compile-expr` (one flat walk) | Four passes: **Lower → Optimize → Resolve → Compile** | [Bytecode VM](./bytecode-vm.md), `crates/sema-vm` | | 8 keyword "ops" | **66 real opcodes** (`Op` enum) | [Bytecode VM](./bytecode-vm.md#instruction-set) | | Cons-list as the stack | A contiguous value stack of NaN-boxed `Value`s | [Architecture](./architecture.md#the-value-type) | | Name-keyed `env` map for variables | Names **resolved to integer slots / upvalues at compile time** | [Bytecode VM](./bytecode-vm.md), `resolve.rs` | | `run` interpreting a list | A tight Rust dispatch loop with per-instruction inline caches | [Performance](./performance.md) | | (we never saved it) | Serialize to a `.semac` file | [Bytecode File Format](./bytecode-format.md) | A few of those are worth a sentence each, because they're exactly the corners we cut: ::: tip We hand-waved variable lookup; Sema doesn't. Our VM looks variables up by *name* in a map at run time (`(get env ...)`). That's the slow way. Sema's **Resolve** pass walks the program once at compile time and replaces every variable with a fixed integer **slot** (a stack offset) or an **upvalue** index for closures. At run time a variable read is an array index, not a hash lookup. That single idea is most of the gap between a teaching interpreter and a fast one — see `resolve.rs`. ::: ::: tip We had no optimizer; Sema folds constants. Sema's **Optimize** pass runs on the intermediate representation before compilation — constant folding, simple simplification — so `(+ 1 2)` becomes the constant `3` at compile time instead of two pushes and an add. ::: ::: info Desugaring: ~40 forms become ~35. Sema's **Lower** pass turns the ~40 special forms you actually write (`let`, `cond`, `when`, `and`, `or`, `case`, …) into a small core IR (`CoreExpr`) of ~35 node kinds — `cond` becomes nested `if`, `and`/`or` become `if`, and so on. The compiler only has to know about the small core, exactly like our `compile-if` only knew about `if`. Macros expand earlier, in `sema-eval`, and feed the same pipeline. ::: ## What we skipped (and where Sema handles it) To stay on one screen, our toy left out the genuinely harder parts of a real language runtime. None of them are magic either — they're just more code: * **Functions & closures.** Calling a function means pushing a *call frame* and jumping; a closure that captures a variable needs **upvalues**. Sema uses the Lua-style "open upvalue" model — see [Bytecode VM](./bytecode-vm.md) and `vm.rs`. * **Tail calls.** Sema reuses the current frame for a call in tail position, so deep recursion doesn't grow the native stack — see [Evaluator & TCO](./evaluator.md). * **Memory.** We leaned on Sema's own reference counting. Sema's `Value` is reference-counted (`Rc`) with deterministic destruction — no garbage collector — see [Architecture](./architecture.md#why-rc-not-arc). * **Macros.** Sema expands macros *before* compilation, producing more AST that goes through the same pipeline you just built. ## The punchline You just read a compiler and a virtual machine end to end. The "magic" turned out to be a recursive walk that emits operands before operators, a couple of jumps for control flow, and a loop with a `cond` in it. Sema's engine is the same shape. It's bigger because it has 66 opcodes, four well-separated passes, integer slot resolution, inline caches, closures, and a verifier for untrusted bytecode — but if you trace any Sema program from source to result, you'll pass through every stage you just built by hand. When you're ready for the real thing: * [Bytecode VM](./bytecode-vm.md) — the actual pipeline and instruction set * [Bytecode File Format](./bytecode-format.md) — how compiled bytecode is laid out on disk * [Architecture](./architecture.md) — the `Value` type, the environment model, the whole engine * The source: `crates/sema-vm/src/{lower,resolve,compiler,vm}.rs` --- --- url: 'https://sema-lang.com/docs/internals/bytecode-vm.md' --- # Bytecode VM ::: tip The Evaluator Sema compiles to bytecode and runs on the VM. Every entry point — the CLI, the REPL, the embedding API, `eval`, `import`/`load`, macros, and async/await — compiles to bytecode and runs on the VM. ::: ## Overview Sema's evaluator is a bytecode VM. The VM compiles Sema source code into stack-based bytecode for fast execution. On compute-heavy workloads it is fast — 500 iterations of the TAK benchmark `(tak 18 12 6)` run in roughly 1.9 s (≈3.7 ms/iteration) in a plain release build on a modern laptop. The PGO-optimized release binaries shipped via cargo-dist / Homebrew (v1.19.2+) run this benchmark ~30% faster again; see [Performance Internals](./performance.md). Macro expansion and dynamic `eval` are handled by `sema-eval`, which expands macros to a `Value` AST and then feeds them through the same compile-to-bytecode pipeline. ## Compilation Pipeline ``` Source text → Reader (tokenize + parse → Value AST) → Macro expand (sema-eval expands macros) → Lower (Value AST → CoreExpr IR) → Optimize (constant folding + simplification on CoreExpr) → Resolve (CoreExpr → ResolvedExpr with slot/upvalue/global analysis) → Compile (ResolvedExpr → bytecode Chunks) → VM execution (dispatch loop) ``` ### Phase 1: Lowering (Value → CoreExpr) The lowering pass converts the `Value` AST into `CoreExpr`, a desugared intermediate representation. All ~40 special forms are lowered to ~35 CoreExpr variants. Several forms desugar into simpler ones: | Source Form | Lowers To | | ----------- | ----------------------------------------- | | `cond` | Nested `If` | | `when` | `If` with nil else | | `unless` | `If` with swapped branches | | `case` | `Let` + nested `If` with `Or` comparisons | | `defun` | `Define` + `Lambda` | | Named `let` | `Letrec` + `Lambda` | **Tail position analysis** happens during lowering. The `Call` node carries a `tail: bool` flag, set based on position: * **Tail**: last expression in `lambda` body, `begin`, `let`/`let*`/`letrec` body, `if` branches, `cond` clauses, `and`/`or` last operand * **Not tail**: `try` body (handler must be reachable), `do` loop body, non-last expressions ### Phase 2: Variable Resolution (CoreExpr → ResolvedExpr) The resolver walks the CoreExpr tree and classifies every variable reference as one of: | Resolution | Opcode | Description | | ------------------- | ------------------------------ | ---------------------------------------- | | `Local { slot }` | `LoadLocal` / `StoreLocal` | Variable in the current function's frame | | `Upvalue { index }` | `LoadUpvalue` / `StoreUpvalue` | Captured from an enclosing function | | `Global { spur }` | `LoadGlobal` / `StoreGlobal` | Module-level binding | This is a key optimization: instead of hash-based environment chain lookup (O(scope depth) per access), variables are accessed by direct slot index (O(1)). #### Upvalue Capture Closures use the Lua/Steel upvalue model. When a lambda references a variable from an enclosing function: 1. The resolver marks the outer local as **captured** 2. An `UpvalueDesc` entry is added to the inner lambda: `ParentLocal(slot)` if capturing from the immediate parent, `ParentUpvalue(index)` if capturing through an intermediate function ```sema (lambda (x) ; x = Local slot 0 (lambda () ; captures x: UpvalueDesc::ParentLocal(0) (lambda () ; captures through chain: UpvalueDesc::ParentUpvalue(0) x))) ; resolves to Upvalue { index: 0 } ``` ### Phase 3: Bytecode Compilation (ResolvedExpr → Chunk) ::: details The instruction format echoes the IBM 704 (1955) The [704's](http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf) Type A instruction packed four fields into a single 36-bit word: **prefix** (opcode), **decrement** (constant parameter), **tag** (register selector), and **address** (operand location). Sema's bytecode uses a strikingly similar structure — each instruction is an opcode byte followed by inline operands (constant indices, slot numbers, jump offsets). The 704 also had a `CAS` (Compare Accumulator with Storage) instruction that performed a 3-way branch in a single operation: skip 0, 1, or 2 instructions depending on less-than, equal, or greater-than. This is pattern matching as a hardware primitive — the ancestor of the conditional jump patterns Sema's compiler generates for `cond` and `match`. ::: The compiler (`compiler.rs`) transforms `ResolvedExpr` into bytecode `Chunk`s. The `Compiler` struct wraps an `Emitter` (bytecode builder) and collects `Function` templates for inner lambdas. **Compilation strategies:** * **Constants**: `Nil`, `True`, `False` get dedicated opcodes. All other constants use `Const` + constant pool. * **Variables**: `LoadLocal`/`StoreLocal` for locals, `LoadUpvalue`/`StoreUpvalue` for captures, `LoadGlobal`/`StoreGlobal`/`DefineGlobal` for globals. * **Control flow**: `if` uses `JumpIfFalse` + `Jump` for short-circuit. `and`/`or` use `Dup` + conditional jumps to preserve the last truthy/falsy value. * **Lambdas**: compiled to separate `Function` templates, referenced by `MakeClosure` instruction with upvalue descriptors. * **`do` loops**: compile to backward `Jump` with `JumpIfTrue` for exit test. * **`try`/`catch`**: adds entries to the chunk's exception table, no inline opcodes. * **Named let**: desugared to `letrec` + `lambda` during lowering — the loop body becomes a closure compiled via `MakeClosure`. **Runtime-delegated forms** — forms that can't be compiled to pure bytecode are compiled as calls to `__vm-*` global functions registered by `sema-eval`: | Source Form | Delegate | | --------------------------------------------------------- | ------------------------------------------------------------ | | `eval` | `__vm-eval` | | `import` | `__vm-import` | | `load` | `__vm-load` | | `defmacro` | `__vm-defmacro-form` (passes entire form as quoted constant) | | `define-record-type` | `__vm-define-record-type` | | `delay` | `__vm-delay` (body wrapped in a zero-arg lambda thunk) | | `force` | `__vm-force` | | `prompt`, `message`, `deftool`, `defagent`, `macroexpand` | Corresponding `__vm-*` delegates | **Public API**: `compile(exprs, n_locals, known_natives)` returns `CompileResult { chunk, functions, native_table }`. When `known_natives` is provided, calls to those globals emit `CallNative` for direct dispatch. ### Compiler Optimizations * **Intrinsic recognition**: Known builtins are compiled to inline opcodes instead of function calls, eliminating global lookup, `Rc` downcast, argument `Vec` allocation, and function pointer dispatch. Arithmetic/comparison: `+`, `-`, `*`, `/`, `<`, `>`, `<=`, `>=`, `=`, `not`. List/predicates: `car`/`first`, `cdr`/`rest`, `cons`, `null?`, `pair?`, `list?`, `number?`, `string?`, `symbol?`, `length`. Collections: `append` (2-arg), `get`, `contains?`, `nth`, `mod`/`modulo`. * **Peephole: `(if (not X) ...)`**: The pattern `(if (not X) A B)` compiles to `JumpIfTrue` instead of `Not` + `JumpIfFalse`, eliminating one instruction. * **Fused `CallGlobal`**: Non-tail calls to global functions use a fused `CallGlobal` instruction that combines `LoadGlobal` + `Call` into a single opcode with `(u32 spur, u16 argc, u16 cache_slot)` operands. * **Specialized `LoadLocal`/`StoreLocal`**: Slots 0–3 have dedicated zero-operand opcodes (`LoadLocal0`..`LoadLocal3`, `StoreLocal0`..`StoreLocal3`), saving 2 bytes per instruction for the most frequently accessed locals. ### Phase 4: VM Execution The VM (`vm.rs`) is a stack-based dispatch loop. **Core structs:** ```rust VM { stack: Vec, frames: Vec, globals: Rc, functions: Rc>>, inline_cache: Vec<(u32, u64, Value)>, native_fns: Vec> } CallFrame { closure: Rc, pc: usize, base: usize, open_upvalues: Option>>>, cache_base: usize } ``` **Key design points:** * **Unsafe hot path**: The dispatch loop uses `unsafe` unchecked stack operations (`pop_unchecked`) and raw pointer bytecode reads via `read_u16!`/`read_i32!`/`read_u32!` macros for performance. Opcodes are dispatched by matching the raw byte against `u8` constants (the `op` module), avoiding decode overhead; `std::mem::transmute` is used only to reconstruct `Spur` handles from `u32` operands. Debug builds retain bounds checks via `debug_assert!`. * **Closure interop**: VM closures are wrapped as `Value::NativeFn` values so code outside the VM can call them. Each NativeFn carries an `Rc` payload containing `VmClosurePayload` (closure + function table), and the VM uses `raw_tag()` + `downcast_ref` to avoid `Rc` refcount bumps on the hot path. When called from outside the VM (e.g., stdlib higher-order-function callbacks), the NativeFn wrapper creates a fresh VM instance to execute the closure's bytecode; in-VM calls unwrap the payload and run in the same VM. * **Upvalue cells**: Lua-style open upvalues. `UpvalueCell` holds a `RefCell` — `Open { frame_base, slot }` points into the VM stack while the defining frame is alive; `Closed(Value)` owns the value after the frame exits. Locals are read and written directly on the stack (no cell indirection); cells are closed when a frame returns, tail-calls, unwinds — and before any non-VM call (see Current Limitations). * **Exception handling**: `Throw` opcode triggers handler search via the chunk's exception table. Stack is restored to saved depth, error value pushed, PC jumps to handler. **Entry points**: `VM::execute()` takes a closure and `EvalContext`. `compile_program()` is the pipeline for normal compilation: `Value AST → lower → optimize → resolve → compile → CompiledProgram`. `compile_program_with_spans()` adds span/source-file support for debug (DAP breakpoints). ### VM Optimizations * **Two-level dispatch loop**: An outer loop caches frame-local state (code pointer, constants pointer, base offset) into local variables. The inner loop dispatches opcodes without re-fetching frame data. Frame state is only reloaded when control flow changes frames (`Call`, `TailCall`, `Return`, exceptions). * **NaN-boxed int fast paths**: `AddInt`/`SubInt`/`MulInt`/`LtInt`/`EqInt` operate directly on raw NaN-boxed bits — sign-extending the payload, performing the arithmetic, and re-boxing, without ever constructing a `Value`. * **Per-instruction global inline cache**: Every `LoadGlobal`/`CallGlobal` instruction carries a `u16` cache-slot operand indexing into a per-VM `Vec<(spur, env_version, Value)>`. A hit (matching spur + env version) skips the `Env` lookup entirely; entries are invalidated by env version mismatch when a global is redefined. * **Raw pointer bytecode reads**: `read_u16!`, `read_i32!`, and `read_u32!` macros read operands via raw pointer arithmetic on the code buffer, avoiding bounds checks in release builds. * **Unsafe unchecked stack operations**: `pop_unchecked` skips length checks (the compiler guarantees stack correctness). `debug_assert!` guards catch violations in debug builds. * **Cold path factoring**: The `handle_err!` macro factors exception handling out of the hot instruction sequence, keeping the fast path compact for better instruction-cache behavior. ## Opcode Set The VM uses a stack-based instruction set with variable-length encoding. Each opcode is one byte, followed by operands (u16, u32, or i32). ### Constants & Stack | Opcode | Operands | Description | | ------- | --------- | ----------------------- | | `Const` | u16 index | Push `constants[index]` | | `Nil` | — | Push nil | | `True` | — | Push #t | | `False` | — | Push #f | | `Pop` | — | Discard top of stack | | `Dup` | — | Duplicate top of stack | ### Variable Access | Opcode | Operands | Description | | -------------- | --------- | ---------------------------- | | `LoadLocal` | u16 slot | Push `locals[slot]` | | `StoreLocal` | u16 slot | `locals[slot] = pop` | | `LoadUpvalue` | u16 index | Push `upvalues[index].get()` | | `StoreUpvalue` | u16 index | `upvalues[index].set(pop)` | | `LoadGlobal` | u32 spur, u16 cache\_slot | Push `globals[spur]` (inline-cached) | | `StoreGlobal` | u32 spur | `globals[spur] = pop` | | `DefineGlobal` | u32 spur | Define new global binding | | `LoadLocal0`..`LoadLocal3` | — | Push `locals[0..3]` (zero-operand fast path) | | `StoreLocal0`..`StoreLocal3` | — | `locals[0..3] = pop` (zero-operand fast path) | ### Control Flow | Opcode | Operands | Description | | ------------- | ---------- | --------------------------- | | `Jump` | i32 offset | Unconditional relative jump | | `JumpIfFalse` | i32 offset | Pop, jump if falsy | | `JumpIfTrue` | i32 offset | Pop, jump if truthy | ### Functions | Opcode | Operands | Description | | ------------- | -------------------------------- | ------------------------------------- | | `Call` | u16 argc | Call function with argc args | | `TailCall` | u16 argc | Tail call (reuse frame for TCO) | | `Return` | — | Return top of stack | | `MakeClosure` | u16 func\_id, u16 n\_upvalues, ... | Create closure from function template | | `CallNative` | u16 native\_id, u16 argc | Direct native function call (no env lookup) | | `CallGlobal` | u32 spur, u16 argc, u16 cache\_slot | Fused global lookup + call (inline-cached) | ### Data Constructors | Opcode | Operands | Description | | ------------- | ----------- | ------------------------------ | | `MakeList` | u16 n | Pop n values, push list | | `MakeVector` | u16 n | Pop n values, push vector | | `MakeMap` | u16 n\_pairs | Pop 2n values, push sorted map | | `MakeHashMap` | u16 n\_pairs | Pop 2n values, push hash map | ### Arithmetic & Comparison | Opcode | Description | | ---------------------------- | ------------------------------------- | | `Add`, `Sub`, `Mul`, `Div` | Generic arithmetic (int/float/string) | | `Negate`, `Not` | Unary operators | | `Eq`, `Lt`, `Gt`, `Le`, `Ge` | Generic comparison | | `AddInt`, `SubInt`, `MulInt` | Specialized int fast paths | | `LtInt`, `EqInt` | Specialized int comparison | ### Inline Intrinsics Zero-operand opcodes emitted by intrinsic recognition (bypass `CallGlobal` overhead): | Opcode | Description | | ------------------------------------------------------------ | ---------------------------------------- | | `Car`, `Cdr`, `Cons` | List operations | | `IsNull`, `IsPair`, `IsList`, `IsNumber`, `IsString`, `IsSymbol` | Type predicates | | `Length`, `Append`, `Get`, `ContainsQ`, `Nth` | Collection operations | | `Mod` | Integer modulo fast path | ### Exception Handling | Opcode | Description | | ------- | ----------------------------- | | `Throw` | Pop value, raise as exception | Exception handling uses an **exception table** on the Chunk rather than inline opcodes. Each entry specifies a PC range, handler address, and stack depth to restore. ## Crate Structure The bytecode VM lives in the `sema-vm` crate, which sits between `sema-reader` and `sema-eval` in the dependency graph: ``` sema-core ← sema-reader ← sema-vm ← sema-eval ``` `sema-vm` depends on `sema-core` (for `Value`, `Spur`, `SemaError`) and `sema-reader` (for parsing in test helpers). It does **not** depend on `sema-eval` — the evaluator will depend on the VM, not the other way around. ### Source Files | File | Purpose | | -------------- | ----------------------------------------------------------------- | | `opcodes.rs` | `Op` enum — 66 bytecode opcodes | | `chunk.rs` | `Chunk` (bytecode + constants + spans), `Function`, `UpvalueDesc` | | `emit.rs` | `Emitter` — bytecode builder with jump backpatching | | `disasm.rs` | Human-readable bytecode disassembler | | `core_expr.rs` | `CoreExpr` and `ResolvedExpr` IR enums | | `lower.rs` | Value AST → CoreExpr lowering pass | | `resolve.rs` | Variable resolution (local/upvalue/global analysis) | | `compiler.rs` | Bytecode compiler (ResolvedExpr → Chunk) | | `vm.rs` | VM dispatch loop, call frames, closures, exception handling | | `optimize.rs` | Constant folding and simplification on CoreExpr IR | | `serialize.rs` | Bytecode serialization/deserialization for `.semac` file format | | `scheduler.rs` | Cooperative async task scheduler (VM-per-task, yield signals) | | `debug.rs` | VM debug hooks for DAP (breakpoints, stepping, state queries) | ## Async Execution (VM-Only) Async/await and channels are implemented entirely in the VM. The model is **VM-per-task with cooperative scheduling**: * Each `async/spawn` creates a **new VM instance** that shares the parent's global `Env` (`Rc`) and function table (`Rc>>`). Tasks are cheap: no threads, no work stealing — everything stays single-threaded. * A cooperative scheduler in `scheduler.rs` runs tasks **round-robin**. A task runs until it yields (e.g. `await` on a pending promise, channel operations, `async/sleep`). * Yielding is signaled via a thread-local **yield signal** (`sema-core/src/async_signal.rs`), not an error variant. The VM checks the signal after every native call (`CallNative`, `CallGlobal`). * On yield, the VM leaves a `nil` placeholder on the stack and advances the PC past the call. On resume, the scheduler swaps the placeholder for the wake value (`replace_stack_top`), so from bytecode's perspective the call simply returned. This replaced an earlier replay-based design that re-executed entire task bodies on resume (corrupting side effects). Promises support cancellation (`PromiseState::Cancelled`), and task wake-ups preserve FIFO order. Yield-aware native functions must work on both closure paths (in-VM and the fresh-VM fallback described under Current Limitations) — see `vm_async_test.rs` for the VM-only test suite. ## Current Limitations * The compiler emits inline opcodes for common builtins (`+`, `-`, `*`, `/`, `<`, `>`, `<=`, `>=`, `=`, `not`, `car`/`first`, `cdr`/`rest`, `cons`, `null?`, `pair?`, `list?`, `number?`, `string?`, `symbol?`, `length`, `append`, `get`, `contains?`, `nth`, `mod`/`modulo`) via intrinsic recognition. Redefining one of these names in the same program suppresses the intrinsic for that program, but a redefinition from a separate compilation unit (e.g., an earlier REPL entry) does not — the intrinsic still fires. * `CallNative` optimization requires passing `known_natives` at compile time (done automatically by `eval_str_compiled`); without it, all global calls use `CallGlobal` * `set!` to a captured local is silently lost when the closure is invoked through a stdlib higher-order function (`map`, `filter`, `for-each`, …) — upvalue cells are closed to snapshots before every non-VM call, so the callback mutates a detached copy. Globals and in-VM calls are unaffected. Use `foldl` with explicit accumulator threading as a workaround. ## CLI Usage The bytecode VM runs everything — no flag is required. ```bash # Run a file sema examples/hello.sema # Start the REPL sema ``` ## Design Decisions ### Why Not Delete CoreExpr After Resolution? The pipeline uses two IR types: `CoreExpr` (variables as names) and `ResolvedExpr` (variables as slots). This provides type-level safety — the compiler can only receive resolved expressions, preventing accidental use of unresolved variable references. --- --- url: 'https://sema-lang.com/docs/internals/bytecode-format.md' --- # Bytecode File Format (`.semac`) ::: tip Versioned build artifact The `.semac` format is stable and used in production — `sema compile`, `sema disasm`, and `sema build` all rely on it, and a verifier guarantees untrusted files can be loaded safely. It is **versioned** (currently `4`): the header records the format version, and the loader requires an exact match, so a `.semac` is a build artifact tied to the Sema version that produced it rather than a long-term portable interchange format. When a format change bumps the version, recompile from source. See [Versioning Strategy](#versioning-strategy). ::: ## Overview Sema supports compiling source files to bytecode files (`.semac`) for faster loading and distribution without source. The compilation pipeline is: ``` Source (.sema) → Reader → Lower → Optimize → Resolve → Compile → Serialize → .semac file ``` Loading a `.semac` file skips parsing, lowering, resolution, and compilation — the VM directly deserializes and executes the pre-compiled bytecode. ### CLI Interface ```bash # Compile a source file to bytecode sema compile script.sema # → script.semac sema compile -o output.semac script.sema # explicit output path # Run a bytecode file (auto-detected via magic number) sema script.semac # Validate a bytecode file sema compile --check script.semac # Disassemble a bytecode file sema disasm script.semac sema disasm --json script.semac # structured JSON output ``` ### Design Goals 1. **Fast loading** — skip parsing and compilation; the primary benefit (like Lua's `luac`) 2. **Source protection** — distribute without revealing source code 3. **Debuggability** — optional debug sections for source maps, local names, breakpoints 4. **Forward compatibility** — version field allows graceful rejection of incompatible bytecode 5. **Simplicity** — flat section-based format, no complex container (no ELF, no zip) ### Non-Goals * **Portability** — bytecode files are tied to the Sema version that produced them (like Lua). Always keep source files. * **AOT native compilation** — Sema's dynamic nature (eval, macros, LLM primitives) makes this impractical * **Streaming** — the entire file is read into memory; no mmap or lazy loading ## File Layout A `.semac` file consists of a fixed **header**, followed by a sequence of **sections**. Each section has a type tag, length, and payload. ``` ┌──────────────────────────────────────┐ │ File Header (24 bytes) │ ├──────────────────────────────────────┤ │ Section: String Table (required) │ ├──────────────────────────────────────┤ │ Section: Function Table (required) │ ├──────────────────────────────────────┤ │ Section: Main Chunk (required) │ ├──────────────────────────────────────┤ │ Section: Source Map (optional) │ ├──────────────────────────────────────┤ │ Section: Debug Symbols (optional) │ ├──────────────────────────────────────┤ │ Section: Breakpoints (optional) │ ├──────────────────────────────────────┤ │ ... future sections ... │ └──────────────────────────────────────┘ ``` All multi-byte integers are **little-endian**. All strings are **UTF-8**. ## File Header | Offset | Size | Field | Description | |--------|------|-------|-------------| | 0 | 4 | `magic` | `\x00SEM` (`0x00`, `0x53`, `0x45`, `0x4D`) | | 4 | 2 | `format_version` | Bytecode format version (currently `4`) | | 6 | 2 | `flags` | Bit flags (see below) | | 8 | 2 | `sema_major` | Sema version major that produced this file | | 10 | 2 | `sema_minor` | Sema version minor | | 12 | 2 | `sema_patch` | Sema version patch | | 14 | 2 | `n_sections` | Number of sections in the file | | 16 | 4 | `source_hash` | CRC-32 of the original source file (0 if unknown) | | 20 | 4 | `reserved` | Reserved for future use (must be 0) | **Total: 24 bytes** ### Magic Number The magic bytes `\x00SEM` serve two purposes: 1. **File type identification** — the CLI uses this to auto-detect bytecode vs source (source files never start with a null byte) 2. **Corruption detection** — if the magic doesn't match, reject the file immediately ### Flags (Bit Field) | Bit | Name | Description | |-----|------|-------------| | 0 | `HAS_DEBUG` | File contains debug sections (Source Map, Debug Symbols) | | 1 | `HAS_SOURCE_MAP` | File contains a Source Map section | | 2 | `HAS_BREAKPOINTS` | File contains a Breakpoints section | | 3–15 | — | Reserved (must be 0) | The current serializer always writes `flags = 0` — debug sections (and a `--strip` flag to omit them) are not yet implemented. ## Section Format Each section begins with a section header: | Offset | Size | Field | Description | |--------|------|-------|-------------| | 0 | 2 | `section_type` | Section type tag (see table) | | 2 | 4 | `section_length` | Byte length of section payload (excluding this header) | **Section header: 6 bytes**, followed by `section_length` bytes of payload. ### Section Types | Type ID | Name | Required | Description | |---------|------|----------|-------------| | `0x01` | String Table | ✅ | All interned strings (Spur remapping) | | `0x02` | Function Table | ✅ | Compiled function templates | | `0x03` | Main Chunk | ✅ | Top-level bytecode | | `0x10` | Source Map | — | Source file name + PC-to-line mapping | | `0x11` | Debug Symbols | — | Local variable names per function | | `0x12` | Breakpoints | — | Reserved for breakpoint table | | `0x13` | Debug Scopes | — | Reserved for lexical scope ranges | The three required sections are always written, in the order above, so `n_sections` in the header is currently always `3`. The `0x10`–`0x13` debug sections are **reserved tags only** — the current serializer never emits them and defines no constants for them yet; they are documented here so a future writer and any third-party reader agree on the IDs. Unknown section types are **skipped** (forward compatibility), so a reader that ignores them stays compatible. ## String Table (Section `0x01`) The string table contains all unique strings referenced by the bytecode, including: * Symbol names (global identifiers, function names) * Keyword names * String constants in the constant pool * Source file paths (in debug sections) ``` ┌────────────────────────────┐ │ count: u32 │ Number of strings ├────────────────────────────┤ │ String Entry 0 │ │ len: u32 │ Byte length of UTF-8 data │ data: [u8; len] │ UTF-8 bytes (no null terminator) ├────────────────────────────┤ │ String Entry 1 │ │ ... │ └────────────────────────────┘ ``` On load, each string is interned into the process-local `lasso::Rodeo` (a thread-local interner), producing a fresh `Spur`. The loader builds a **remap table** (`Vec`) mapping file-local string indices to process-local Spurs. String index `0` is reserved and must be the empty string `""`. ## Main Chunk (Section `0x03`) The main chunk contains the top-level bytecode and its constant pool. ``` ┌────────────────────────────────┐ │ code_len: u32 │ │ code: [u8; code_len] │ Raw bytecode ├────────────────────────────────┤ │ n_consts: u16 │ │ constants: [SerializedValue] │ Constant pool entries ├────────────────────────────────┤ │ n_spans: u32 │ │ spans: [(u32 pc, u32 line, │ PC → source location │ u32 col, u32 │ │ end_line, u32 │ │ end_col)] │ ├────────────────────────────────┤ │ max_stack: u16 │ │ n_locals: u16 │ │ n_global_cache_slots: u16 │ Inline cache slots for global lookups ├────────────────────────────────┤ │ n_exceptions: u16 │ │ exceptions: [ExceptionEntry] │ Exception table └────────────────────────────────┘ ``` ### Exception Entry (16 bytes each) | Offset | Size | Field | |--------|------|-------| | 0 | 4 | `try_start` (PC) | | 4 | 4 | `try_end` (PC) | | 8 | 4 | `handler_pc` | | 12 | 2 | `stack_depth` | | 14 | 2 | `catch_slot` | ## Function Table (Section `0x02`) ``` ┌────────────────────────────────┐ │ count: u32 │ Number of functions ├────────────────────────────────┤ │ Function Entry 0 │ │ name: u32 │ String table index (0xFFFFFFFF = anonymous) │ arity: u16 │ │ has_rest: u8 │ 0 or 1 │ n_upvalue_descs: u16 │ │ upvalue_descs: [UpvalueDesc]│ │ n_upvalue_names: u16 │ │ upvalue_names: [u32 name] │ Lexical names aligned with upvalue_descs │ chunk: [Chunk data] │ Same format as Main Chunk │ n_local_names: u16 │ │ local_names: [(u16 slot, │ Local variable debug info │ u32 name)] │ (name = string table index) │ n_local_scopes: u16 │ │ local_scopes: [(u16 slot, │ Block-scope ranges (debug metadata) │ u32 start, │ half-open [start_pc, end_pc) per │ u32 end)] │ block-introduced local ├────────────────────────────────┤ │ Function Entry 1 │ │ ... │ └────────────────────────────────┘ ``` ### Local Scopes (10 bytes each) `local_scopes` records the half-open bytecode PC range `[start_pc, end_pc)` over which each block-introduced local (from `let` / `let*` / `letrec` / `do`) is live. The debugger uses these ranges to hide locals that are not yet bound or already out of scope at the current PC. This is debug-only metadata — it is never read during execution. Functions whose `local_scopes` is empty (e.g. those with only parameters, or older `.semac` files) cause the debugger to show all locals. | Offset | Size | Field | |--------|------|-------| | 0 | 2 | `slot` — local variable slot | | 2 | 4 | `start_pc` — PC where the binding comes into scope | | 6 | 4 | `end_pc` — PC where the binding goes out of scope (exclusive) | ### Upvalue Descriptor (3 bytes each) | Offset | Size | Field | |--------|------|-------| | 0 | 1 | `kind`: 0 = ParentLocal, 1 = ParentUpvalue | | 1 | 2 | `index`: slot/upvalue index in parent | ::: warning Bytecode inline encoding differs The upvalue descriptors in the **function table** (above) use a compact 3-byte encoding (`u8` kind + `u16` index). However, the `MakeClosure` opcode in the **bytecode stream** uses a 4-byte encoding per upvalue: `u16` is\_local + `u16` index. This wider encoding is used for alignment in the runtime bytecode. ::: ## Serialized Values (Constant Pool) Each constant is serialized as a **type tag** (1 byte) followed by type-specific payload. | Tag | Type | Payload | |-----|------|---------| | `0x00` | Nil | — (0 bytes) | | `0x01` | Bool | 1 byte: `0x00` = false, `0x01` = true | | `0x02` | Int | 8 bytes: i64 little-endian | | `0x03` | Float | 8 bytes: f64 little-endian (IEEE 754) | | `0x04` | String | 4 bytes: string table index (u32) | | `0x05` | Symbol | 4 bytes: string table index (u32) | | `0x06` | Keyword | 4 bytes: string table index (u32) | | `0x07` | Char | 4 bytes: Unicode code point (u32) | | `0x08` | List | 2 bytes: count (u16), then `count` recursive SerializedValues | | `0x09` | Vector | 2 bytes: count (u16), then `count` recursive SerializedValues | | `0x0A` | Map | 2 bytes: n\_pairs (u16), then `n_pairs × 2` recursive SerializedValues (key, value alternating) | | `0x0B` | HashMap | Same as Map (`0x0A`) — tag distinguishes sorted vs hash map | | `0x0C` | Bytevector | 4 bytes: length (u32), then `length` raw bytes | ### Values That Cannot Appear in Bytecode The following `ValueView` variants are **runtime-only** and must never appear in a `.semac` constant pool: * `Lambda` / `Macro` — closures are constructed at runtime via `MakeClosure` * `NativeFn` — registered by the runtime, not serializable * `Prompt` / `Message` / `Conversation` — constructed via `__vm-prompt` / `__vm-message` * `ToolDef` / `Agent` — constructed via `__vm-deftool` / `__vm-defagent` * `Thunk` — created by `delay` * `Record` — constructed by `define-record-type` * `AsyncPromise` (tag 28) — created by `async/spawn`, runtime-only * `Channel` (tag 29) — created by `channel/new`, runtime-only If the serializer encounters any of these in a constant pool, it should emit a compile error. ## Spur Remapping Sema uses `lasso::Spur` (process-local interned string handles) for symbols, keywords, and global variable names. These handles are **not stable** across processes. ### In the bytecode stream Global variable opcodes (`LoadGlobal`, `StoreGlobal`, `DefineGlobal`, `CallGlobal`) encode Spur values as `u32`. `LoadGlobal` additionally carries a `u16` inline-cache slot operand, and `CallGlobal` carries `u16 argc` + `u16` cache slot — these are copied through unchanged; only the `u32` Spur operand is remapped. On serialization: 1. The serializer collects all Spurs referenced in the bytecode (globals, function names, local names) 2. Each Spur's string is added to the string table, getting a file-local index 3. The bytecode is **rewritten**: Spur-encoded u32 operands are replaced with string table indices On deserialization: 1. The string table is loaded and each string is interned → new process-local Spurs 2. A remap table maps file-local indices to process-local Spurs 3. The bytecode is walked: `LoadGlobal`/`StoreGlobal`/`DefineGlobal`/`CallGlobal` operands are rewritten with the new Spur u32 values This is the same approach Lua uses for upvalue names, and Guile uses for its symbol table. ## Source Map (Section `0x10`) ::: info Future Feature This section is defined but not yet implemented. ::: The source map links bytecode PCs back to source file locations, enabling error messages with file/line info when running from `.semac` files. ``` ┌────────────────────────────────┐ │ source_file: u32 │ String table index of source file path │ source_hash: [u8; 32] │ SHA-256 of the original source ├────────────────────────────────┤ │ n_entries: u32 │ │ entries: [SourceMapEntry] │ Sorted by PC, delta-encoded └────────────────────────────────┘ ``` ### Source Map Entry (delta-encoded, variable-length) For compact representation, entries are delta-encoded from the previous entry: | Field | Encoding | Description | |-------|----------|-------------| | `delta_pc` | LEB128 u32 | PC offset from previous entry | | `delta_line` | LEB128 i32 | Line offset from previous entry | | `delta_col` | LEB128 i32 | Column offset from previous entry | The first entry uses absolute values (delta from 0). ## Debug Symbols (Section `0x11`) ::: info Future Feature This section is defined but not yet implemented. ::: Debug symbols provide local variable names and their scope ranges within each function, enabling meaningful debugger variable inspection. ``` ┌────────────────────────────────┐ │ n_functions: u32 │ Must match Function Table count ├────────────────────────────────┤ │ Function 0 debug info │ │ n_locals: u16 │ │ locals: [LocalDebugEntry] │ ├────────────────────────────────┤ │ Function 1 debug info │ │ ... │ └────────────────────────────────┘ ``` ### Local Debug Entry | Offset | Size | Field | |--------|------|-------| | 0 | 4 | `name` — string table index | | 4 | 2 | `slot` — local variable slot | | 6 | 4 | `scope_start` — PC where variable comes into scope | | 10 | 4 | `scope_end` — PC where variable goes out of scope | ## Breakpoints Section (Section `0x12`) ::: info Future Feature This section is reserved for debugger integration. Format TBD. ::: The breakpoints section will support: * **Persistent breakpoints** — set breakpoints by source location; they survive recompilation * **Conditional breakpoints** — attach Sema expressions as conditions * **Source-mapped breakpoints** — store breakpoints as `(file, line)` pairs, resolved to PCs on load Planned entry format: ``` ┌────────────────────────────────┐ │ n_breakpoints: u32 │ ├────────────────────────────────┤ │ Breakpoint Entry │ │ source_file: u32 │ String table index │ line: u32 │ │ col: u32 │ 0 = any column │ condition_len: u16 │ 0 = unconditional │ condition: [u8] │ Sema source expression (UTF-8) │ flags: u8 │ 0x01 = enabled, 0x02 = one-shot └────────────────────────────────┘ ``` ## Debug Scopes Section (Section `0x13`) ::: info Future Feature This section is reserved for lexical scope tracking. Format TBD. ::: Debug scopes will map PC ranges to lexical scopes, enabling: * Accurate "step over" / "step into" behavior * Proper variable shadowing display in debuggers * Scope-aware watch expressions ## Validation When loading a `.semac` file, the loader performs these checks: 1. **Magic number** — must be `\x00SEM` 2. **Format version** — must exactly match the version this Sema build supports 3. **Reserved header field** — must be zero 4. **Section completeness** — all three required sections must be present (and string index 0 must be `""`) 5. **String table bounds** — all string table indices in the file must be in range 6. **Function table bounds** — all `func_id` references in `MakeClosure` must be valid 7. **Constant pool types** — no runtime-only value types in the constant pool 8. **Bytecode well-formedness** — opcodes must be valid, operand sizes must be correct, constant/local/upvalue/`CallNative` native indices must be in bounds, and jump targets must land on instruction boundaries (the native table is process-local and unserialized, so its loaded length is `0` — any `CallNative` in a `.semac` is rejected) 9. **Stack-depth balance** — an abstract-interpretation pass over every chunk (main chunk and each function) proves the operand stack never underflows and never exceeds the maximum depth If validation fails, the loader returns a `SemaError` with a descriptive message. ### Stack-Depth Verifier (ADR #56) The VM's hot dispatch loop uses an unchecked stack pop (`pop_unchecked`) for speed, which is sound only if the bytecode is stack-balanced. In-process bytecode is balanced by construction; deserialized `.semac` bytecode is proven balanced by a verifier that runs inside `validate_bytecode` before `deserialize_from_bytes` returns. The verifier abstract-interprets each chunk: * Each opcode has a static stack effect (`Op::stack_effect()` — the single source of truth shared with the VM dispatch arms). Variable-arity opcodes (`Call`, `TailCall`, `CallGlobal`, `CallNative`, `MakeList`, `MakeVector`, `MakeMap`, `MakeHashMap`) compute their effect from the decoded operand count. * A worklist tracks the operand-stack depth on entry to every reachable instruction, following fallthrough and jump edges. Exception handlers are seeded as additional roots at their known entry depth (`stack_depth - n_locals + 1`). * Join points must agree on depth exactly (strict-equality lattice, like the JVM/CLR verifiers). A disagreement, a reachable pop deeper than the current depth (underflow), a depth above the maximum (overflow), or control falling off the end of a chunk are all rejected with a descriptive `SemaError`. The verifier is **sound** — it never accepts an underflowing chunk. It is intentionally conservative: it may reject exotic-but-safe bytecode that a future optimizing compiler could emit, but accepts every program Sema's compiler produces. Once verification succeeds, `.semac` files from untrusted sources can be loaded without risking the unchecked-pop undefined behavior. The loader also enforces two hard limits while deserializing, both of which a re-implementation must respect to stay compatible: a chunk may declare a maximum stack depth of at most **65535** (`MAX_STACK_DEPTH`), and a constant-pool value may nest at most **128** levels deep (`MAX_VALUE_DEPTH`) — the latter bounds recursion in the value deserializer so a hostile file can't blow the native stack. Both are defined in `serialize.rs`. ## Opcodes The complete, numbered opcode set lives in `crates/sema-vm/src/opcodes.rs` (the `Op` enum and its `Op::from_u8` mapping are the single source of truth). Most opcodes are single-byte; a handful carry inline operands (`u16`/`u32`/`i32`) as noted in the encoding descriptions above. To keep the common path off the `CallGlobal` → hash-lookup → `NativeFn` route, a set of **inline stdlib intrinsics** are compiled directly to dedicated single-byte opcodes when the call site references the canonical global with the matching arity (and that global has not been redefined in the program). These include list/collection ops (`Car`, `Cdr`, `Cons`, `Append`, `Length`, `Get`, `Nth`, …), type predicates (`IsNull`, `IsString`, …), and **string ops**: | Opcode | Source name(s) | Arity | Stack effect | Behavior | |--------|----------------|-------|--------------|----------| | `StringLength` (0x42) | `string-length` | 1 | pop 1, push 1 | push char count (`chars().count()`) of the string; type error if not a string | | `StringRef` (0x43) | `string-ref` | 2 | pop 2, push 1 | push the char at the 0-based char index; errors on negative index, non-int index, non-string, or out-of-bounds index (matching the stdlib messages) | | `StringAppend` (0x44) | `string-append` | 2 | pop 2, push 1 | push the concatenation of two values (non-strings coerced via `Display`); the N-ary `string-append` stays on the generic path | String indexing is by **Unicode scalar (char)**, not byte, matching the stdlib semantics. These opcodes are additive within the existing encoding (single-byte, no new operand shapes), so they do not change the `format_version`. ## Example Given this source file: ```sema ;; hello.sema (define greeting "Hello, World!") (println greeting) ``` The compiled `.semac` would contain: **String Table**: `["", "greeting", "println", "Hello, World!"]` **Main Chunk bytecode** (conceptual): ``` 0000 CONST 0 ; "Hello, World!" (string constant) 0003 DEFINE_GLOBAL 1 ; greeting (string table index → Spur) 0008 LOAD_GLOBAL 2 ; println (+ u16 inline-cache slot) 0015 LOAD_GLOBAL 1 ; greeting (+ u16 inline-cache slot) 0022 CALL 1 0025 RETURN ``` **Function Table**: (empty — no inner functions) ## Reading a Real `.semac`, Byte by Byte The layout above is easier to trust when you can see every byte of an actual file. Here is the smallest interesting program, compiled and dumped in full — no diagrams, the real 85 bytes: ```bash $ echo '(+ 1 2)' > tiny.sema $ sema compile tiny.sema -o tiny.semac $ sema disasm tiny.semac ==
== 0000 CONST 0 ; 3 0003 RETURN $ xxd tiny.semac 00000000: 0053 454d 0400 0000 0100 1300 0100 0300 .SEM............ 00000010: d9b7 a83a 0000 0000 0100 0800 0000 0100 ...:............ 00000020: 0000 0000 0000 0200 0400 0000 0000 0000 ................ 00000030: 0300 1f00 0000 0400 0000 0000 0012 0100 ................ 00000040: 0203 0000 0000 0000 0000 0000 0000 0000 ................ 00000050: 0000 0000 00 ..... ``` Notice the compiler already **constant-folded** `(+ 1 2)` into the literal `3` — the [Optimize pass](./bytecode-vm.md) ran before serialization, so the only instruction is a `CONST` that pushes a pooled `3`, then `RETURN`. Now every byte: ``` offset bytes meaning ------ ----------------------- -------------------------------------------- HEADER (24 bytes) 0x00 00 53 45 4D magic "\x00SEM" 0x04 04 00 format_version = 4 0x06 00 00 flags = 0 0x08 01 00 sema major = 1 ┐ 0x0A 13 00 sema minor = 19 ├ compiled by Sema 1.19.1 0x0C 01 00 sema patch = 1 ┘ 0x0E 03 00 n_sections = 3 0x10 D9 B7 A8 3A source_hash = 0x3AA8B7D9 (CRC-32 of source) 0x14 00 00 00 00 reserved SECTION 1 — String Table (type 0x01) 0x18 01 00 section type = 0x01 0x1A 08 00 00 00 section length = 8 bytes 0x1E 01 00 00 00 string count = 1 0x22 00 00 00 00 string[0]: length 0 (the reserved empty string at index 0) SECTION 2 — Function Table (type 0x02) 0x26 02 00 section type = 0x02 0x28 04 00 00 00 section length = 4 bytes 0x2C 00 00 00 00 function count = 0 (no lambdas in this program) SECTION 3 — Main Chunk (type 0x03) 0x30 03 00 section type = 0x03 0x32 1F 00 00 00 section length = 31 bytes ── chunk body ── 0x36 04 00 00 00 code length = 4 bytes 0x3A 00 CONST (opcode 0) 0x3B 00 00 └ operand: constant index 0 0x3D 12 RETURN (opcode 18 = 0x12) 0x3E 01 00 constant count = 1 0x40 02 const[0] tag = VAL_INT (0x02) 0x41 03 00 00 00 00 00 00 00 └ i64 value = 3 0x49 00 00 00 00 span count = 0 0x4D 00 00 max_stack = 0 0x4F 00 00 n_locals = 0 0x51 00 00 n_global_cache_slots = 0 0x53 00 00 exception count = 0 ``` That is the whole format with nothing hidden: a 24-byte header, three length-prefixed sections, and a chunk whose four instruction-bytes (`00 00 00 12`) are literally `CONST 0` / `RETURN`. A program that referenced a global would add `"println"` to the string table and `LOAD_GLOBAL`/`CALL` opcodes to the chunk (as in the conceptual example above); a program with a `lambda` would add an entry to the function table. Everything else is more of the same. ::: tip Want to build the instructions themselves first? [Build a Bytecode VM (in Sema)](./build-a-bytecode-vm.md) constructs a working compiler and stack machine from scratch in ~80 lines, so the `CONST`/`RETURN` stream above reads as the natural output of a process you've already seen end to end. ::: ## Versioning Strategy * `format_version` started at `1` and increments on any breaking change to the binary format. Version `2` added `n_global_cache_slots` and the inline-cache operands; version `3` added per-function upvalue names to the debug metadata; version `4` (current) added per-function `local_scopes` (block-scope PC ranges) to the debug metadata. * `sema_major`/`sema_minor`/`sema_patch` record the compiler version for diagnostics * The loader requires an exact `format_version` match and refuses anything else with a clear error: `"unsupported bytecode format version 1 (expected 4). Recompile from source."` * Within the same `format_version`, new section types can be added without breaking older loaders (unknown sections are skipped) ## Comparison with Other Languages | Feature | Sema (`.semac`) | Lua (`luac.out`) | Python (`.pyc`) | Erlang (`.beam`) | Guile (`.go`) | |---------|-----------------|------------------|-----------------|------------------|---------------| | Format | Flat sections | Flat binary | Header + marshal | IFF chunks | ELF container | | Portable | No (version-tied) | No (arch-tied) | No (version-tied) | Yes | Yes | | Debug info | Optional sections | Optional (`-s` strips) | Included | Included | Included | | Auto-detect | Magic `\x00SEM` | Magic `\033Lua` | Magic `\xNN\r\n` | Magic `FOR1` | ELF header | | Cache invalidation | CRC-32 source hash | N/A | Timestamp or hash | N/A | N/A | | Spur/symbol remap | String table + rewrite | Upvalue names | marshal interning | Atom table | Symbol table | --- --- url: 'https://sema-lang.com/docs/internals/executable-format.md' --- # Bundled Executable Format ## Overview `sema build` compiles a Sema program into a standalone executable by embedding a VFS (Virtual File System) archive into the Sema runtime binary. The resulting binary is self-contained and requires no Sema installation to run. ``` Entry file (.sema) → Compile to bytecode → Trace imports → Build VFS archive → Inject into runtime binary → Executable ``` Running a bundled executable skips CLI argument parsing, loads the embedded bytecode from the VFS archive, and executes it directly. ### CLI Interface ```bash # Basic build sema build script.sema # → ./script sema build script.sema -o myapp # explicit output path # Bundle additional files sema build script.sema --include data.json # bundle a file sema build script.sema --include assets/ # bundle a directory (recursive) # Use a specific runtime binary sema build script.sema --runtime /path/to/sema # Cross-compile for another platform (downloads a cached runtime) sema build script.sema --target linux # x86_64-unknown-linux-gnu sema build script.sema --target all # every supported target sema build --list-targets # list targets and aliases # Run the resulting standalone executable ./myapp --name hello ``` ### Options | Option | Description | |--------|-------------| | `-o, --output ` | Output executable path (default: filename without extension) | | `--include ...` | Additional files or directories to bundle (repeatable) | | `--runtime ` | Sema binary to use as runtime base (default: current executable); conflicts with `--target` | | `--target ` | Target triple or alias (`linux`, `macos`, `windows`, …) for cross-compilation; `all` builds every supported target | | `--list-targets` | Show all supported target platforms and aliases | | `--no-cache` | Skip the cached runtime and re-download it (no effect for host-target builds, which never download) | ## Binary Layout The injection strategy varies by binary format — detected from the runtime binary's magic bytes, not the build host, so cross-compilation works from any platform — to preserve binary integrity and OS loader compatibility. ### Linux (ELF): Raw Append ``` ┌─────────────────────────────┐ │ Original Sema Binary (ELF) │ ├─────────────────────────────┤ │ VFS Archive │ ├─────────────────────────────┤ │ Trailer (16 bytes) │ │ archive_size: u64 LE │ │ magic: "SEMAEXEC" │ └─────────────────────────────┘ ``` ELF loaders ignore appended data, so the binary remains valid. ### macOS (Mach-O): Section Injection ``` ┌─────────────────────────────┐ │ Modified Mach-O Binary │ │ ├── Mach-O Header │ │ ├── Load Commands │ │ ├── ...segments... │ │ └── "semaexec" section │ ← VFS archive injected here └─────────────────────────────┘ ``` Injected via `libsui`, which ad-hoc re-signs the binary for macOS ARM64 compatibility. ### Windows (PE): Resource Injection ``` ┌─────────────────────────────┐ │ Modified PE Binary │ │ ├── PE Header │ │ ├── .text, .data, ... │ │ └── .rsrc │ │ └── "semaexec" │ ← VFS archive injected here └─────────────────────────────┘ ``` Injected via `libsui`. Existing Authenticode signatures are stripped. ## Trailer Format **16 bytes, frozen — only used on Linux/ELF.** | Offset | Size | Type | Description | |--------|------|------|-------------| | 0 | 8 | u64 LE | Size of the VFS archive in bytes | | 8 | 8 | bytes | Magic: `SEMAEXEC` (`0x53 0x45 0x4D 0x41 0x45 0x58 0x45 0x43`) | The trailer format is permanent and will never change. Old loaders can always detect new binaries and reject them if the archive format version is unsupported. On macOS and Windows, the archive is stored in a named binary section — no trailer is used. ## VFS Archive Format The VFS archive is a flat binary format with a versioned header, metadata, table of contents, and file data. All multi-byte integers are **little-endian**. All strings are **UTF-8**. ``` ┌─ Archive Header ──────────────────────┐ │ format_version: u16 │ Currently v1 │ flags: u16 │ Reserved bitfield (must be 0) │ archive_checksum: u32 │ CRC32-IEEE of all bytes after this field │ metadata_count: u32 │ │ ┌─ Metadata entries ───────────────┐ │ │ │ key_len(u16) + key(utf8) │ │ │ │ val_len(u32) + val(bytes) │ │ │ │ ...repeats metadata_count times │ │ │ └──────────────────────────────────┘ │ ├─ TOC (Table of Contents) ─────────────┤ │ entry_count: u32 │ │ ┌─ TOC entries ────────────────────┐ │ │ │ path_len(u32) + path(utf8) │ │ │ │ offset(u64) + size(u64) │ │ │ │ ...repeats entry_count times │ │ │ └──────────────────────────────────┘ │ ├─ File data ───────────────────────────┤ │ raw bytes for all bundled files │ │ (offsets relative to file data start)│ └───────────────────────────────────────┘ ``` ### Header | Offset | Size | Type | Description | |--------|------|------|-------------| | 0 | 2 | u16 LE | `format_version` — currently `1` | | 2 | 2 | u16 LE | `flags` — reserved for future use, must be `0` | | 4 | 4 | u32 LE | `archive_checksum` — CRC32-IEEE of all bytes from offset 8 to end of archive | | 8 | 4 | u32 LE | `metadata_count` — number of metadata key-value entries | **Total header: 12 bytes** ### Metadata Entry Repeated `metadata_count` times, immediately after the header. | Field | Size | Type | Description | |-------|------|------|-------------| | `key_len` | 2 | u16 LE | Length of key string in bytes | | `key` | `key_len` | UTF-8 | Metadata key | | `val_len` | 4 | u32 LE | Length of value in bytes | | `val` | `val_len` | bytes | Metadata value (opaque bytes, typically UTF-8) | Unknown metadata keys are ignored by the loader (forward compatibility). ### v1 Metadata Keys | Key | Value | Description | |-----|-------|-------------| | `sema-version` | e.g. `"1.10.0"` | Sema version that built the executable | | `build-timestamp` | Unix timestamp string | Seconds since epoch when the executable was built | | `entry-point` | `"__main__.semac"` | VFS path of the compiled entry bytecode | | `build-root` | absolute path string | Original project root directory | ### TOC (Table of Contents) Starts immediately after the last metadata entry. | Field | Size | Type | Description | |-------|------|------|-------------| | `entry_count` | 4 | u32 LE | Number of file entries | Each TOC entry: | Field | Size | Type | Description | |-------|------|------|-------------| | `path_len` | 4 | u32 LE | Length of VFS path in bytes | | `path` | `path_len` | UTF-8 | VFS path (relative, forward-slash separated) | | `offset` | 8 | u64 LE | Byte offset from start of file data section | | `size` | 8 | u64 LE | Size of file data in bytes | ### File Data Raw concatenated bytes for all files, in TOC order. Offsets in TOC entries are relative to the start of this section (byte 0 = first byte after the last TOC entry). ### VFS Path Conventions | VFS Path | Contents | |----------|----------| | `__main__.semac` | Compiled bytecode of the entry file (always present) | | `lib/utils.sema` | Auto-traced import (relative to project root) | | `github.com/user/repo` | Package entry (git-style, keyed by package name) | | `github.com/user/repo/helpers.sema` | Package internal file (relative to packages dir) | | `json-utils` | Package entry (registry short-name) | | `json-utils/src/core.sema` | Package internal file (registry package) | | `data.json` | Asset from `--include data.json` | | `prompts/system.txt` | Asset from `--include prompts/` | All VFS paths must be: * Relative (no leading `/` or `\`) * Forward-slash separated * No `..` segments * No NUL bytes * No Windows reserved device names (`CON`, `PRN`, `AUX`, `NUL`, `COM1`–`COM3`, `LPT1`–`LPT3`) Paths are validated at build time. Invalid paths cause a build error. ### Integrity The `archive_checksum` is a **CRC32-IEEE** checksum (polynomial `0xEDB88320`, same as gzip/zlib) computed over all archive bytes from offset 8 (after the checksum field) to the end of the archive. On load, the runtime recomputes the checksum and rejects the archive if it doesn't match. This detects accidental corruption but is not a cryptographic security feature. ## Runtime Startup When a Sema binary starts, **before** CLI argument parsing: 1. Try `libsui::find_section("semaexec")` for named section (macOS/Windows) 2. If not found: read last 16 bytes, check for `SEMAEXEC` magic (Linux/ELF) 3. If archive found: * Deserialize and validate CRC32 checksum * Populate thread-local VFS with all archive files * Read `entry-point` from metadata (default: `__main__.semac`) * Load and execute the bytecode * Exit with appropriate status code 4. If no archive found: proceed with normal CLI parsing (REPL/interpreter mode) ## VFS Interception When the VFS is active, the following functions check VFS first, then fall back to the real filesystem: | Function | Behavior | |----------|----------| | `(file/read path)` | Read UTF-8 text from VFS or filesystem | | `(file/read-bytes path)` | Read raw bytes from VFS or filesystem | | `(file/read-lines path)` | Read lines from VFS or filesystem | | `(file/exists? path)` | Check VFS first, then filesystem | | `(import "module")` | Resolve relative to VFS if active | | `(load "file.sema")` | Resolve relative to VFS if active | Write operations (`file/write`, `file/append`, `file/delete`, etc.) always target the real filesystem. ## Build Flow 1. **Compile** the entry file to bytecode (`.semac` format) 2. **Trace** all `(import ...)` and `(load ...)` dependencies recursively * Circular imports are detected and handled * Dynamic imports (non-literal paths) emit a warning 3. **Collect** `--include` assets (directories are expanded recursively) 4. **Build** VFS archive with metadata and CRC32 checksum 5. **Inject** archive into runtime binary (format-aware: ELF append, Mach-O/PE via libsui) 6. **Set** executable permissions on Unix ## Cross-Compilation `sema build --target ` produces executables for other platforms. Supported targets (matching the cargo-dist release matrix): | Triple | Aliases | |--------|---------| | `aarch64-apple-darwin` | `macos`, `darwin` | | `x86_64-apple-darwin` | `macos-intel`, `darwin-intel`, `macos-x86_64` | | `x86_64-unknown-linux-gnu` | `linux` | | `aarch64-unknown-linux-gnu` | `linux-arm`, `linux-aarch64` | | `x86_64-pc-windows-msvc` | `windows`, `win` | `--target all` builds for every supported target, producing one `-` executable each. Runtime binaries for non-host targets are downloaded from GitHub Releases (capped at 200 MB), verified against the published SHA256 checksum, and cached at `~/.sema/cache/runtimes/v{version}/{target}/sema[.exe]`. Cached runtimes are validated by magic bytes against the expected format for the target; `--no-cache` skips the cached copy and re-downloads. If the target matches the host, the local `sema` binary is used directly (no download, and `--no-cache` is a no-op). `SEMA_RUNTIME_BASE_URL` overrides the download location (for mirrors or air-gapped builds). Injection is format-aware rather than host-specific — `libsui` performs Mach-O ad-hoc signing in pure Rust, so e.g. macOS ARM64 binaries can be produced from Linux. ## Platform Notes | Platform | Injection | Signing | Notes | |----------|-----------|---------|-------| | Linux (ELF) | Raw append + trailer | N/A | ELF loaders ignore appended data | | macOS (Mach-O) | `libsui` section injection | Ad-hoc re-signed | Re-sign with Developer ID for distribution | | Windows (PE) | `libsui` resource injection | Authenticode stripped | Re-sign with `signtool` if needed | ## Implementation | Component | File | |-----------|------| | Archive serialization | `crates/sema/src/archive.rs` | | Import tracer | `crates/sema/src/import_tracer.rs` | | Cross-compilation (runtime download/cache) | `crates/sema/src/cross_compile.rs` | | Build command | `crates/sema/src/main.rs` | | VFS core | `crates/sema-core/src/vfs.rs` | | VFS I/O interception | `crates/sema-stdlib/src/io.rs` | | Import/load VFS interception | `crates/sema-eval/src/special_forms.rs` | ## Future Work * **Compression** — optional zstd/deflate compression for VFS entries * **Build options in `sema.toml`** — declare includes, metadata, and build options in the project manifest (`sema.toml` exists today for dependencies and formatter config, but `sema build` does not read it) * **Slimmer runtime** — trim unused runtime components for smaller executables (requires architectural changes) * **Code signing** — proper Apple notarization / Authenticode signing integration --- --- url: 'https://sema-lang.com/docs/internals/evaluator.md' --- # Evaluator Internals Sema's evaluator is a bytecode VM. Every entry point — the CLI, the REPL, the embedding API, `eval`, `import`/`load`, macros, and async/await — compiles to bytecode and runs on the VM. For the architecture of the evaluator, see [Bytecode VM](./bytecode-vm.md). ## The Evaluation Pipeline All Sema code follows one path from source text to a result: ``` Source text → Reader (tokenize + parse → Value AST) → Macro expand (expand macros) → Lower (Value AST → CoreExpr IR) → Optimize (constant folding + simplification on CoreExpr) → Resolve (CoreExpr → ResolvedExpr with slot/upvalue/global analysis) → Compile (ResolvedExpr → bytecode Chunks) → VM execution (stack-based dispatch loop) ``` Each phase is documented in [Bytecode VM](./bytecode-vm.md). Variables are resolved to direct slot/upvalue/global indices at compile time, closures use the Lua-style open-upvalue model, and tail calls reuse the current frame for tail-call optimization without growing the native Rust stack. ## Environment Model Sema uses a linked-list scope chain, where each scope is a `hashbrown::HashMap` keyed by `Spur`: ```rust // crates/sema-core/src/value.rs pub struct Env { pub bindings: Rc>>, pub parent: Option>, pub version: Cell, } ``` `Rc>` makes each scope mutable and reference-counted. `SpurMap` is an alias for `hashbrown::HashMap` — keys are interned `Spur` handles (`u32`), so hashing is cheap integer hashing rather than string hashing. The `version` counter is bumped on every mutation; the VM's per-instruction inline caches use it to invalidate stale global lookups. ### Operations | Method | Behavior | | ------------------------- | ------------------------------------------------------------- | | `get(spur)` | Walk the parent chain, return first match | | `set(spur, val)` | Insert into the current (innermost) scope | | `set_existing(spur, val)` | Walk the chain, update where found (for `set!`) | | `take(spur)` | Remove from current scope only (for COW optimization) | | `take_anywhere(spur)` | Remove from any scope in the chain | | `update(spur, val)` | Overwrite an existing binding in the current scope (for hot loops) | The `take` method is critical for the copy-on-write map optimization described in the [Performance](./performance.md) page — by removing a value from the environment before passing it to a function, the `Rc` reference count drops to 1, enabling in-place mutation. **Literature:** This is the standard lexical environment model described in *Lisp in Small Pieces* (Queinnec, 1996, Chapter 6) — a chain of frames linked by static (lexical) pointers. The alternative for lexical scoping — flat closures that copy all free variables into each closure — is faster for lookup but uses more memory when closures share large environments. Sema uses the chained model because closures are pervasive and lookup cost is dominated by the `Spur` integer comparison, not chain traversal. ## Further Reading * Christian Queinnec, [*Lisp in Small Pieces*](https://www.cambridge.org/core/books/lisp-in-small-pieces/66FD2BE3EDDDC68588A4605F14A4D2A4) (Cambridge, 1996) — the canonical deep-dive into Lisp interpreter and compiler implementation, covering environment models, continuations, and compilation strategies * Guy Lewis Steele Jr., ["Rabbit: A Compiler for Scheme"](https://dspace.mit.edu/handle/1721.1/6913) (MIT AI Memo 474, 1978) — proves that tail calls can be implemented as jumps * Abelson & Sussman, [*Structure and Interpretation of Computer Programs*](https://mitpress.mit.edu/9780262510875/structure-and-interpretation-of-computer-programs/) (MIT Press, 1996) — Chapter 5 shows how to compile to a register machine * R. Kent Dybvig, ["Three Implementation Models for Scheme"](https://www.cs.indiana.edu/~dyb/pubs/3imp.pdf) (PhD thesis, 1987) — compares heap-based, stack-based, and string-based models; Sema uses heap-based (Rc+RefCell scopes) --- --- url: 'https://sema-lang.com/docs/internals/reader.md' --- # Reader Internals Sema's reader is a two-phase pipeline: a lexer tokenizes source text into `SpannedToken`s, then a recursive descent parser produces `Value` nodes directly — there is no intermediate AST. Source locations are tracked per-token and attached to compound values via an `Rc::as_ptr` trick that avoids growing the NaN-boxed `Value`. This page documents the lexer, parser, token types, quote desugaring, span tracking, and how the evaluator recovers source positions for error reporting. ## The Lexer The lexer in `crates/sema-reader/src/lexer.rs` is a single-pass tokenizer that walks a `Vec` with a manual index `i` and tracks `line`/`col` for span information. Character-level dispatch drives the lexer. Each iteration inspects the current character and branches: * **Spaces/tabs** — skipped, advances `col` * **Newline** — emits `Token::Newline` (trivia: the parser skips it, but the formatter and LSP use it) * **`;`** — comment to end of line, emitted as `Token::Comment` (also trivia) * **`(`/`)`/`[`/`]`/`{`/`}`** — emit the corresponding bracket token * **`'`** — emit `Token::Quote` * **`` ` ``** — emit `Token::Quasiquote` * **`,`** — peek ahead: `,@` emits `Token::UnquoteSplice`, otherwise `Token::Unquote` * **`"`** — enter string mode, handle escape sequences * **`#`** — dispatch on next char: `#t`/`#f` for booleans, `#\` for character literals, `#u8(` for bytevector start, `#(` for short lambdas, `#"` for regex literals (raw strings, no escape processing), `#!` for a shebang line (line 1 only) * **`:`** — keyword (Clojure-style `:foo`) * **`f` followed by `"`** — f-string, accumulating literal parts and `${expr}` interpolations * **Digit or `-` followed by digit** — number (integer or float) * **Otherwise** — symbol character, accumulate until delimiter Every token is wrapped in a `SpannedToken` that records where it begins and ends — both as line/column positions and as byte offsets into the source string (the byte offsets enable exact source extraction for the formatter and LSP). This is the only place source positions enter the system — everything downstream inherits or discards them. ```rust // crates/sema-reader/src/lexer.rs pub struct SpannedToken { pub token: Token, pub span: Span, /// Byte offset of the start of this token in the source string. pub byte_start: usize, /// Byte offset past the end of this token in the source string. pub byte_end: usize, } // crates/sema-core/src/error.rs pub struct Span { pub line: usize, pub col: usize, pub end_line: usize, pub end_col: usize, } ``` ## Token Zoo The full `Token` enum: | Token | Syntax | Example | | ----------------------- | ---------------- | ----------------------- | | `LParen` / `RParen` | `(` `)` | `(+ 1 2)` | | `LBracket` / `RBracket` | `[` `]` | `[1 2 3]` | | `LBrace` / `RBrace` | `{` `}` | `{:a 1 :b 2}` | | `Quote` | `'` | `'foo` | | `Quasiquote` | `` ` `` | `` `(a ,b) `` | | `Unquote` | `,` | `,x` | | `UnquoteSplice` | `,@` | `,@xs` | | `Int(i64)` | digits | `42`, `-7` | | `Float(f64)` | digits with `.` | `3.14`, `-0.5` | | `String(String)` | `"..."` | `"hello"` | | `FString(Vec)` | `f"..."` | `f"hi ${name}"` | | `Regex(String)` | `#"..."` | `#"\d+"` | | `ShortLambdaStart` | `#(` | `#(+ % 1)` | | `Symbol(String)` | identifier | `define`, `string/trim` | | `Keyword(String)` | `:` + name | `:key`, `:name` | | `Bool(bool)` | `#t` / `#f` | `#t` | | `Char(char)` | `#\` + char/name | `#\a`, `#\space` | | `BytevectorStart` | `#u8(` | `#u8(1 2 3)` | | `Dot` | `.` | `(a . b)` | | `Comment(String)` | `;...` | `; note` (trivia) | | `Newline` | line break | (trivia) | Symbol characters include alphanumeric plus `+ - * / ! ? < > = _ & % ^ ~ .` — a superset of Scheme's identifier syntax that allows operators and predicates like `nil?` or `string/to-number` as plain symbols. After the first character, `#` is also accepted, which is what makes auto-gensym names like `x#` (used inside quasiquote templates) lex as plain symbols. Booleans accept both `#t`/`#f` (R7RS) and `true`/`false` (as symbol aliases resolved during tokenization). ## The Parser The parser in `crates/sema-reader/src/reader.rs` is a recursive descent parser that consumes the `Vec` produced by the lexer. It's structured as a `Parser` struct with a position index, dispatching on the current token type: ``` parse_expr ├── LParen → parse_list → Value::List ├── LBracket → parse_vector → Value::Vector ├── LBrace → parse_map → Value::Map ├── Quote → desugar → Value::List [quote, x] ├── Quasiquote→ desugar → Value::List [quasiquote, x] ├── Unquote → desugar → Value::List [unquote, x] ├── UnquoteSplice → desugar → Value::List [unquote-splicing, x] ├── BytevectorStart → parse_bytevector → Value::Bytevector ├── ShortLambdaStart → parse_short_lambda → (lambda (%1 …) body) ├── Int → Value::Int ├── Float → Value::Float ├── String → Value::String ├── FString → desugar → Value::List [str, part, …] ├── Regex → Value::String (raw, no escape processing) ├── Symbol → Value::Symbol ├── Keyword → Value::Keyword ├── Bool → Value::Bool └── Char → Value::Char ``` Each compound form has its own parsing method: * **`parse_list`** — collects expressions until `)`, handling dotted pairs (see below) * **`parse_vector`** — collects expressions until `]`, wraps in a vector value via `Value::vector_from_rc` * **`parse_map`** — collects key-value pairs until `}`, wraps a `BTreeMap` via `Value::map`. Odd element count is a parse error * **`parse_bytevector`** — collects integers until `)`, validates each is 0–255, wraps in a bytevector via `Value::bytevector` * **`parse_short_lambda`** — collects the body until `)`, scans it for `%`/`%1`/`%2`… (rewriting bare `%` to `%1`), and produces `(lambda (%1 … %N) body)` F-strings and regex literals are desugared in `parse_atom`: an f-string becomes a `(str "literal" expr …)` call with each `${...}` interpolation parsed recursively, and a regex literal becomes a plain string value with its contents taken raw (no escape processing). The parser produces `Value` nodes directly. There is no separate AST type — the same `Value` type used at runtime is the representation of parsed code. This is the Lisp tradition: code is data, and the reader produces data. > **Comparison:** Racket's reader is configurable with [readtables](https://docs.racket-lang.org/reference/readtables.html) — user code can define new reader syntax. Common Lisp goes further with [reader macros](https://www.lispworks.com/documentation/HyperSpec/Body/02_d.htm) that can override any character's parsing behavior. Sema has neither — quote sugar is hardcoded in the lexer, and there's no mechanism for user-defined reader extensions. This is a deliberate simplicity trade-off: the reader is predictable, the implementation is ~1,400 lines (lexer + parser, excluding tests), and all syntax is documented in one place. See Nystrom's [*Crafting Interpreters*](https://craftinginterpreters.com/parsing-expressions.html) for a thorough treatment of recursive descent parsing, or Aho et al., *Compilers: Principles, Techniques, and Tools* (the Dragon Book), §4.4 for the theory. ## Quote Desugaring The reader desugars quote syntax into real lists *before the evaluator ever sees them*. This is important: `'x` is not a special syntactic form that the evaluator handles — it's reader sugar that produces a `(quote x)` list. | Syntax | Desugars to | Reader token | | -------- | ---------------------- | ---------------------- | | `'x` | `(quote x)` | `Token::Quote` | | `` `x `` | `(quasiquote x)` | `Token::Quasiquote` | | `,x` | `(unquote x)` | `Token::Unquote` | | `,@x` | `(unquote-splicing x)` | `Token::UnquoteSplice` | When the parser encounters a `Quote` token, it: 1. Consumes the next expression (recursive `parse_expr` call) 2. Wraps it: `make_list_with_span(vec![Value::symbol("quote"), expr], span)` 3. Attaches the quote token's span to the resulting list The evaluator then sees `(quote x)` as a normal list whose `car` is the symbol `quote` — which it handles as a special form. The same applies to `quasiquote`, which the evaluator expands recursively (handling nested `unquote` and `unquote-splicing` within templates). The key distinction: the *syntax* (`` ` , ,@ ' ``) is reader-level, but the *semantics* (what `quasiquote` does with its template) is evaluator-level. The reader's job is just to produce the list structure. ## Dotted Pairs Sema supports dotted pair notation `(a . b)` for compatibility with Scheme's cons-cell tradition, but the representation is unconventional. Since `Value::List` wraps a `Vec` (not a linked list of cons cells), dotted pairs are represented using a marker symbol: ```sema (a . b) ;; parses as a list of three elements: [a, ".", b] (1 2 . 3) ;; parses as: [1, 2, ".", 3] ``` The parser's `parse_list` method detects `Token::Dot` and inserts `Value::symbol(".")` into the element list. The evaluator and printer check for this marker when they need to distinguish `(a b c)` from `(a b . c)`. This is a pragmatic compromise. Real Scheme implementations use linked cons cells where `(a . b)` is `cons(a, b)` — the dot is the *absence* of a list, not a marker within one. Sema's Vec-based representation can't express improper lists natively, so the dot marker serves as an escape hatch for the few places that need it (mostly association lists and Scheme compatibility). ## String Escapes The lexer handles common R7RS escape sequences plus Unicode extensions: | Escape | Character | Notes | | ------------ | --------------- | ------------------------------------------- | | `\n` | newline | | | `\t` | tab | | | `\r` | carriage return | | | `\\` | backslash | | | `\"` | double quote | | | `\0` | null | | | `\$` | dollar sign | suppresses `${...}` interpolation in f-strings | | `\x41;` | `A` (hex 0x41) | R7RS — note the trailing semicolon | | `\u0041` | `A` | 4-digit Unicode escape | | `\U00000041` | `A` | 8-digit Unicode escape (full Unicode range) | The R7RS hex escape `\x;` uses a semicolon terminator, which is unusual — most languages use a fixed digit count. This allows variable-length hex sequences: `\x41;` and `\x041;` are both valid and produce the same character. The semicolon disambiguates where the hex digits end. The `\uNNNN` and `\UNNNNNNNN` forms follow the C/Java/Rust convention of fixed-width escapes. These are Sema extensions not found in R7RS. Character literals follow a similar pattern: | Literal | Character | | ----------- | ----------------- | | `#\a` | the character `a` | | `#\space` | space | | `#\newline` | newline | | `#\tab` | tab | | `#\return` | carriage return | | `#\nul` | null | ## Span Tracking This is the most architecturally interesting part of the reader. The problem: error messages need source locations ("line 12, column 5"), but `Value` is a NaN-boxed 8-byte handle (a single `u64`) — there is no room for a `Span` inside it, and growing every value in the system to make room would defeat the point of NaN-boxing, including for runtime values that were never parsed from source. **The solution:** spans are stored in a side table keyed by `Rc` pointer addresses. ```rust // crates/sema-reader/src/reader.rs fn make_list_with_span(&mut self, items: Vec, span: Span) -> Result { let rc = Rc::new(items); let ptr = Rc::as_ptr(&rc) as usize; self.span_map.insert(ptr, span); Ok(Value::list_from_rc(rc)) } ``` The `SpanMap` is a `HashMap` — it maps `Rc::as_ptr()` cast to `usize` to the source span. This works because: 1. **`Rc::as_ptr` is stable** — for a given `Rc`, the inner pointer doesn't change as long as the `Rc` (or any clone of it) is alive 2. **Clones share the pointer** — `Rc::clone()` increments the refcount but doesn't change the underlying pointer, so a cloned list still maps to the same span 3. **No cost to non-compound values** — atoms (integers, strings, symbols) don't get spans. Both `Value::List` and `Value::Vector` participate in span tracking — the reader inserts their `Rc::as_ptr()` addresses into the span map. However, the evaluator's `span_of_expr` currently only recovers spans from `Value::List`; vector spans are tracked by the reader but not used during error reporting **The trade-off:** when the `Rc` is deallocated, its pointer address could be reused by a new allocation, producing a stale span lookup. In practice this is a minor diagnostic risk rather than a correctness issue — a wrong span in an error message is better than no span. The span table accumulates entries across parsed inputs; in long-running processes (REPL, embedding), stale entries could theoretically produce misleading source locations, though this has not been observed in practice. Also, only list-shaped values get spans. An error in evaluating an atom like `undefined-var` won't have a direct span — the evaluator must use the span of the enclosing list expression instead. ### Span Recovery in the Evaluator The span table is a field in `EvalContext`, populated when source is parsed via `ctx.merge_span_table(spans)`: ```rust // crates/sema-eval/src/eval.rs fn span_of_expr(ctx: &EvalContext, expr: &Value) -> Option { if let Some(items) = expr.as_list_rc() { let ptr = Rc::as_ptr(&items) as usize; ctx.lookup_span(ptr) } else { None } } ``` The evaluator calls `span_of_expr` when constructing error messages, attaching the source position of the failing expression to the `SemaError`. This flows through the call stack and ultimately appears in error output like: ``` Error at line 12, col 5: undefined variable 'foo' ``` ## Error Reporting Errors flow through two mechanisms: 1. **`SemaError::Reader`** — carries a `Span` directly for parse-time errors (unmatched brackets, invalid escape sequences, unexpected tokens). These are produced by the lexer and parser before evaluation begins. 2. **`CallFrame` + span table** — for runtime errors, the evaluator maintains a call stack of `CallFrame`s. When an error occurs, it walks the stack, using `span_of_expr` to find source positions for each frame. This produces stack traces with source locations even though `Value` itself carries no span. The combination means parse errors report exact positions (the lexer knows where every token starts), while runtime errors report the position of the enclosing list expression (the best available approximation from the span table). ## Public API The reader exposes five entry points: ```rust // crates/sema-reader/src/reader.rs /// Parse a single expression from input pub fn read(input: &str) -> Result /// Parse all expressions from input pub fn read_many(input: &str) -> Result, SemaError> /// Parse all expressions and return the span map for error reporting pub fn read_many_with_spans(input: &str) -> Result<(Vec, SpanMap), SemaError> /// Parse all expressions, also returning per-symbol spans /// (enables precise go-to-definition in the LSP) pub fn read_many_with_symbol_spans(input: &str) -> Result<(Vec, SpanMap, Vec<(String, Span)>), SemaError> /// Parse with error recovery: on a parse error, skip to the next /// top-level form and continue, collecting all errors pub fn read_many_with_spans_recover(input: &str) -> (Vec, SpanMap, Vec<(String, Span)>, Vec) ``` `read_many_with_spans` is what the evaluator uses — it needs the span map to populate the `EvalContext`'s span table. `read_many_with_spans_recover` is what the LSP uses: it never bails on the first error, so diagnostics, completions, and navigation keep working while the file is mid-edit. The simpler `read` and `read_many` are convenience wrappers for contexts where error positions aren't needed (tests, REPL one-liners). ## Pipeline Summary ``` Source text │ ▼ tokenize() crates/sema-reader/src/lexer.rs │ "single-pass" │ "produces Vec" ▼ Parser::parse() crates/sema-reader/src/reader.rs │ "recursive descent" │ "produces Vec + SpanMap" ▼ ctx.merge_span_table() crates/sema-eval/src/eval.rs │ "populates EvalContext span table" ▼ eval() crates/sema-eval/src/eval.rs │ "trampoline-based TCO evaluator" │ "recovers spans via Rc::as_ptr lookup" ▼ Value result ``` --- --- url: 'https://sema-lang.com/docs/internals/fuzzing.md' --- # Fuzzing the VM Sema is fuzzed two ways: byte-level fuzzers that hammer the parser frontend, and a **grammar-based fuzzer written in Sema itself** that generates *valid* programs and checks them against correctness oracles. The second is the interesting one — it has already found **two real, shipped bugs**: a VM crash and a case of silent integer corruption. ## The hard part of fuzzing: the oracle Generating random input is easy. The hard part is the **oracle** — the judge that decides whether a given input revealed a bug. A crash-only fuzzer has a trivial oracle ("did it panic?") but is blind to the far more common failure mode: code that runs fine and silently returns the *wrong answer*. An oracle is what catches those. Sema's grammar fuzzer (`fuzz/grammar-fuzz.sema`) leans on **homoiconicity** — a generated program is just an ordinary Sema value — to get two sharp oracles almost for free, plus crash detection: ### 1. Round-trip oracle (printer ⇄ reader) ```scheme (= form (read (str form))) ``` Generate arbitrary valid s-expression *data* (atoms of every kind, nested lists, vectors, maps), print it, read it back, and assert structural equality. Any asymmetry between the printer and the reader falls straight out. ### 2. Differential value oracle (compiler/VM) For a generated *program*, compute its **expected** value bottom-up *while generating it* — applying the real primitive ops to the already-known sub-values — then `eval` the whole nested form through the full `macro-expand → lower → optimize → compile → bytecode-VM` pipeline and compare: ```scheme (= expected (eval form)) ``` The expected value is the oracle. Because it's computed by straight-line, bottom-up evaluation while the form is run through the optimizing compiler and VM, a mismatch means the **compiler/optimizer/VM disagrees with the obvious answer** — constant folding, `if`/`let` lowering, closure capture, TCO, short-circuit logic, stack management, and so on. ### 3. Metamorphic / differential laws The value oracle has one blind spot, and it's a sharp one: it computes `expected` by *calling the very operation under test*. If a native op like `reverse` is broken, both the oracle's `expected` and the form's `actual` route through the same broken `reverse` — they agree on the wrong answer, and the bug is invisible. (Verified the hard way: a deliberately no-op `reverse` slipped past 100,000 iterations of the value oracle.) Arithmetic escapes this only because the oracle computes it via the *native* builtin while the compiled form hits a different *inline opcode* — two implementations, so a divergence shows. To cover the rest, the fuzzer also generates **metamorphic laws** — theorems whose expected value is the literal `#t`, cross-checking an op against an *independent* computation: ```scheme (= (reverse L) (foldl (fn (a x) (cons x a)) (list) L)) ; reverse vs fold-cons (= (append (take n L) (drop n L)) L) ; take/drop partition (= (length L) (+ (length (filter even? L)) ; filter partition (length (filter odd? L)))) (= (* a (+ b c)) (+ (* a b) (* a c))) ; distributivity ``` Because the expected is `#t` by construction (not computed by running the op), a broken op makes the two sides disagree → the law evaluates to `#f ≠ #t` → caught. This is the oracle that found the integer-corruption bug below. ### 4. Crash detection Release builds are `panic=abort`, so a VM panic kills the process. The driver (`scripts/grammar-fuzz.sh`) writes the in-flight seed to a breadcrumb file before each iteration, so even a hard abort is reproducible from a single integer. ## How the generator stays sound Every generated program is **well-typed and closed** — it references only variables it has bound, and each sub-expression has a known type and value. The generator threads an environment of `(symbol value type)` triples so every variable reference is in scope and type-correct, and it computes each form's expected value as it builds it. Types covered: `int`, `bool`, `float`, `string`, `list`, `vector`, `map`. Everything is driven by a small, self-contained, seedable PRNG, so **every finding reproduces from one integer**. Iteration `i` uses seed `base + i`, re-seeding each time: ``` SEMA_FUZZ_SEED= SEMA_FUZZ_COUNT=1 # reproduce a single finding ``` ### What it covers Arithmetic (`+ - *` incl. variadic, `min`/`max`, `mod`, `abs`, unary `-`, `expt`), bitwise ops (`bit/and|or|xor`, shifts, `bit/not`), all comparisons, numeric and type predicates (`even?`/`zero?`/…, `string?`/`list?`/`map?`/`vector?`/`bool?`/`nil?`), `and`/`or`/`not`, `if`, `cond`, `case`, `match` (including binding clauses), multi-binding mixed-type `let`, multi-arg and curried lambdas, `try`/`throw`/`catch`, `apply`, named-let TCO recursion at large N, and a broad set of list/vector/map/string operations (`map`/`filter`/`foldl`/`reverse`/`append`/`cons`/`range`/`take`/`drop`/`sort`/`nth`/`length`/`last`, `assoc`/`dissoc`/`get`/`count`/`contains?`/`merge`/`keys`/`vals`, `string-append`/`substring`/`upcase`/`downcase`/`string/repeat`/`number->string`/…). **Concurrency** is fuzzed too, exploiting the fact that Sema's scheduler is cooperative and FIFO — i.e. deterministic. Only patterns whose result is computable regardless of interleaving are generated: `(async/all (list (async T) …))` preserves spawn order (so the result is exactly predictable), and channel fan-in is reduced order-independently (`sum`). The task bodies `T` are ordinary generated programs, and `async` captures enclosing locals, so this also exercises closures crossing task boundaries and the in-VM higher-order-callback path. **Excluded by design:** anything the value oracle can't model soundly. That means non-determinism — LLM calls, time, randomness, `uuid`, file/network I/O, and the timing-dependent async primitives (`async/sleep`, `async/timeout`, `async/race`, cancellation) — *and* `set!`: the value oracle assumes every sub-expression is referentially transparent (same value however many times it's evaluated), and `set!` is the lone impurity, so combined with async or law-style duplication its evaluation count diverges from the bottom-up model and produces false positives. (`set!` correctness is covered by the eval test suite instead.) ## Running it ```bash make fuzz-grammar # default sweep (random seed) make fuzz-grammar SEED=123 N=20000 DEPTH=6 # pinned, larger, deeper make fuzz-grammar-emit # print sample generated programs ``` Exit status: `0` all clear, `1` a deterministic value/round-trip mismatch (the program prints the offending form, expected, actual, and the reproducing seed), `2` a hard crash (the driver prints the reproducing seed). ## Case studies: two real bugs it found **1 — A crash (`try` in a `let` binding).** Expanding the grammar to cover `try`/`catch` immediately produced a crash. Minimized: ```scheme (let ((a 1) (b (try (throw 1) (catch e 2)))) b) ; aborted instead of returning 2 ``` A throwing `try`/`catch` used as a **non-first binding in a parallel `let`** corrupted the operand stack. The compiler pushed all binding inits onto the operand stack before storing them but didn't track the stack height for those pushes, so the exception handler restored the stack *below* the earlier already-pushed bindings; the subsequent stores and local-slot reads then went out of bounds. (`let*`, `letrec`, and function calls tracked the height correctly and were unaffected.) The fix was a few lines in `compile_let`; afterward, **715,000 generated programs up to depth 9 ran with zero crashes and zero value-oracle mismatches.** **2 — Silent integer corruption (caught by a metamorphic law).** The distributivity law `(= (* a (+ b c)) (+ (* a b) (* a c)))` failed for some large operands. Minimized: ```scheme (let ((a 9000000000000)) (+ a a)) ; => -17184372088832, should be 18000000000000 ``` The branchless inline `+`/`-` opcodes did raw-bit arithmetic on the NaN-box payload and masked to 45 bits with **no overflow check**, so any runtime add/subtract whose result crossed the small-int boundary (~±17.5 trillion / 2⁴⁴) was silently truncated. `*` was already correct (it builds via `Value::int`, which promotes to a boxed integer on overflow); literal operands were masked by constant folding. This is exactly the kind of bug the plain value oracle is blind to (it computes `expected` via the native builtin, which was fine — only the inline opcode was wrong), and it took the *metamorphic* law, which forces large intermediate products into a 2-arg add, to expose it. The fix made `+`/`-` mirror `*`. Both bugs were shipped, both gave wrong answers on perfectly valid code, and both were invisible to the entire test suite until the fuzzer's grammar (and oracle) reached the relevant corner. ## Extending the grammar To teach the fuzzer a new construct, add a production to the generator for the result type (`gen-int`, `gen-bool`, `gen-flt`, `gen-str`, `gen-ilist`, `gen-vec`, `gen-map-v`), building the form and its expected value together. Two rules: * **Determinism.** If a construct's result can't be predicted while generating it, it has no oracle and doesn't belong here. * **Mind the self-masking trap.** If you compute `expected` by calling the same operation the form uses (true for any native builtin with a single implementation), a bug in that op hides itself. For those, add a **metamorphic law** in `gen-law` instead — a theorem cross-checking the op against an *independent* computation — so the oracle is genuinely independent of the implementation. And keep impure constructs (anything with a side effect, like `set!`) out: the value oracle's bottom-up model only holds for referentially-transparent expressions. --- --- url: 'https://sema-lang.com/docs/internals/performance.md' --- # Performance Internals Sema's evaluator is a bytecode VM (see [Bytecode VM](./bytecode-vm.md)). It reached that design through a long optimization journey. Early optimizations brought the [1 Billion Row Challenge](https://github.com/gunnarmorling/1brc) benchmark from **~25s to ~9.6s** on 10M rows using a "mini-eval" — a minimal evaluator inlined in the stdlib that bypassed the full trampoline. The mini-eval was later **removed** for architectural reasons (semantic drift from the real evaluator, and blocking the path to a bytecode VM). Fast-path optimizations in the (then) tree-walking evaluator partially recovered performance, bringing it to **~2,700ms on 1M rows** (vs ~960ms with the mini-eval). The bytecode VM now achieves **~1,150ms on 1M rows** and **~12,600ms on 10M rows**, more than recovering the mini-eval's performance through compilation rather than inlining. (The tree-walking interpreter has since been retired entirely.) This page documents each optimization, its history, and measured impact. All benchmarks were run on Apple Silicon (M-series), processing the 1BRC dataset (semicolon-delimited weather station readings, one per line). ## Benchmark Summary | Stage | 1M rows | 10M rows | Technique | Status | | ------------------ | ------------- | -------------- | ---------------------------------- | ------------------------ | | Baseline | 2,501 ms | ~25,000 ms | Naive implementation | — | | + COW assoc | 1,800 ms | ~18,000 ms | In-place map mutation | ✅ Active | | + Env reuse | 1,626 ms | 16,059 ms | Lambda env recycling (mini-eval) | ❌ Removed | | + Mini-eval | ~960 ms | ~9,600 ms | Inlined builtins, custom parser | ❌ Removed | | + String interning | — | — | Spur-based dispatch | ✅ Active | | + hashbrown | — | — | Amortized O(1) accumulator | ✅ Active | | **Post-removal** | **~2,700 ms** | **~29,700 ms** | Callback architecture + fast paths | ⏸ Tree-walker (retired) | | **Bytecode VM** | **~1,150 ms** | **~12,600 ms** | Bytecode VM (sole evaluator) | ✅ Current | > **Note:** The mini-eval and its associated optimizations (env reuse, inlined builtins, custom number parser, SIMD split fast path) were removed to unblock the bytecode VM, which has since become Sema's sole evaluator. The bytecode VM provides a ~2.4× speedup over the (now-retired) tree-walker (~1,150ms vs ~2,700ms on 1M rows), more than recovering the mini-eval's performance through compilation. Fast-path optimizations (self-evaluating short-circuit, inline NativeFn dispatch, thread-local EvalContext, deferred cloning) partially recovered the tree-walker's performance before it was retired. > **VM compute benchmarks** (Feb 2026, post-stdlib intrinsics): TAK 1,248ms, upvalue-counter 450ms, deriv 887ms. The deriv benchmark — dominated by `car`/`cdr`/`cons`/`pair?` — improved 22% from stdlib intrinsic opcodes. The 1BRC numbers above are I/O-bound and less affected by VM compute optimizations. ## Per-Instruction Inline Cache (Mar 2026) The VM's global variable lookup was originally served by a 256-slot direct-mapped cache. Each `LoadGlobal`/`CallGlobal` hashed the variable name to a slot, leading to collisions on hot paths where multiple globals mapped to the same slot. The per-instruction inline cache assigns a **dedicated cache slot to each `LoadGlobal`/`CallGlobal` instruction** at compile time. Cache entries are `(spur_bits, env_version, value)` tuples — the spur\_bits guard provides cross-VM closure safety, and the env version counter invalidates stale entries on any global mutation. **Impact** (Apple Silicon, release build, hyperfine --warmup 2 --runs 5): | Benchmark | Before (direct-mapped) | After (per-instruction) | Speedup | | ------------------ | ---------------------: | ----------------------: | ---------- | | higher-order-fold | 6,116 ms | 2,617 ms | **2.34×** | | deriv | 2,356 ms | 1,449 ms | **1.63×** | | closure-storm | 1,302 ms | 1,145 ms | **1.14×** | | tak | 1,728 ms | 1,749 ms | ~1.0× | | mandelbrot | 311 ms | 313 ms | ~1.0× | | upvalue-counter | 574 ms | 575 ms | ~1.0× | The biggest wins are on **global-call-heavy** workloads: `higher-order-fold` calls stdlib HOFs (`map`, `filter`, `foldl`) in a tight loop — each call requires a global lookup. `deriv` similarly uses many global functions for symbolic differentiation. Benchmarks dominated by local computation (tak, mandelbrot, upvalue-counter) show no change, as expected. ## Micro-Benchmark Suite (Feb 2026) All benchmarks run on Apple Silicon (M-series), 10 runs + 3 warmup, via `scripts/bench.sh`. | Benchmark | Tree-walker | Bytecode VM | VM speedup | | ------------------ | -------------- | -------------- | ---------- | | tak | 21,222 ms | 1,248 ms | 17.0× | | nqueens | 20,735 ms | 2,028 ms ¹ | 10.2× | | deriv | 3,473 ms | 887 ms | 3.9× | | upvalue-counter | 5,762 ms | 450 ms | 12.8× | | closure-storm | 2,373 ms | 1,041 ms | 2.3× | | higher-order-fold | 2,292 ms | 1,081 ms | 2.1× | | hashmap-bench | 8,612 ms | 3,645 ms | 2.4× | | bench-features | 12,427 ms | 1,144 ms | 10.9× | | string-pipeline | 1,551 ms | 613 ms | 2.5× | | mandelbrot | 2,223 ms | 212 ms | 10.5× | | throw-catch | 2,195 ms | 197 ms | 11.2× | ¹ nqueens was previously broken on the VM due to a forward-reference bug in inner defines (fixed Mar 2026). The VM result above now reflects correct execution. The VM achieves **2–17× speedups** across the board, with the largest gains on recursion-heavy benchmarks (tak, nqueens, bench-features, upvalue-counter) where call overhead dominates. Closure-heavy and string benchmarks show more modest ~2–3× gains. ## 1. Copy-on-Write Map Mutation **Problem:** Every `(assoc map key val)` call cloned the entire `BTreeMap`, even when no other reference existed. For the 1BRC accumulator (~400 weather stations), this was O(400) per row × millions of rows. **Solution:** Use `Rc::try_unwrap` to check if the reference count is 1. If so, take ownership and mutate in place. Otherwise, clone. ```rust // crates/sema-stdlib/src/map.rs match Rc::try_unwrap(m) { Ok(map) => map, // refcount == 1: we own it, mutate in place Err(m) => m.as_ref().clone(), // shared: must clone } ``` The key insight is pairing this with `Env::take()` — by *removing* the accumulator from the environment before passing it to `assoc`, the refcount drops to 1, enabling the in-place path. User code looks like: ```sema (file/fold-lines "data.csv" (lambda (acc line) (let ((parts (string/split line ";"))) (assoc acc (first parts) (second parts)))) {}) ``` The `fold-lines` implementation moves (not clones) `acc` into the lambda env on each iteration, keeping the refcount at 1. **Impact:** ~30% of the total speedup. Eliminated the O(n) full-map clone, leaving only the O(log n) BTreeMap insert per row. **Literature:** * This is the same copy-on-write strategy used by Swift's value types. (Clojure's persistent data structures solve a related problem — avoiding full copies — but via structural sharing rather than refcount-based COW.) * Phil Bagwell, ["Ideal Hash Trees"](https://lampwww.epfl.ch/papers/idealhashtrees.pdf) (2001) — the paper behind Clojure/Scala persistent collections * Rust's `Rc::make_mut` provides the same semantics with less ceremony ## 2. Lambda Environment Reuse *(removed)* > **Status:** This optimization was part of the mini-eval's hot path in `io.rs`. It was removed when the mini-eval was deleted. The current `file/fold-lines` uses `sema_core::call_callback`, which routes through the real evaluator — each call creates a fresh `Env` via the standard `apply_lambda` path. **What it was:** For simple lambdas (known arity, no rest params), the mini-eval created the lambda environment *once* and reused it across all iterations, overwriting bindings in place. Combined with a reusable `line_buf`, this eliminated per-iteration allocations for `Env`, string interning, and line buffers. **Why it was removed:** The env reuse logic was tightly coupled to the mini-eval's direct lambda dispatch. The callback architecture routes through the real evaluator's `apply_lambda`, which always creates a fresh child `Env` — this is correct and avoids subtle bugs from env mutation leaking across calls. **Impact when active:** ~15% speedup (2,501ms → 1,626ms combined with COW assoc). **What remains:** The reusable `line_buf` (`String::with_capacity(64)` cleared each iteration) is still present in `file/fold-lines` — only the env reuse was lost. ## 3. Evaluator Callback Architecture *(replacing Mini-Eval)* > **Status:** The mini-eval was deleted and replaced with a callback architecture. Stdlib now calls the real evaluator via `sema_core::call_callback`. **What the mini-eval was:** `sema-stdlib` previously contained its own minimal evaluator (`sema_eval_value`) that handled common forms via direct recursive calls, inlining builtins like `+`, `=`, `assoc`, `string/split`, and `string/to-number` to skip `Env` lookup and `NativeFn` dispatch entirely. **Why it was removed:** 1. **Semantic drift:** The mini-eval diverged from the real evaluator — new special forms, error handling, and features had to be duplicated or were silently missing. 2. **Blocking bytecode VM:** A bytecode compiler can't target two evaluators. Removing the mini-eval ensures a single evaluation path that the VM can replace. **The callback architecture:** `sema-stdlib` cannot depend on `sema-eval` (circular dependency). Instead, `sema-eval` registers a thread-local callback (`set_call_callback`) at startup, and stdlib functions call `sema_core::call_callback` to invoke the real evaluator. A thread-local `EvalContext` (`with_stdlib_ctx`) is shared across calls to avoid per-call context allocation. ```rust // crates/sema-stdlib/src/io.rs — file/fold-lines via callback sema_core::with_stdlib_ctx(|ctx| { let mut line_buf = String::with_capacity(64); loop { line_buf.clear(); let n = reader.read_line(&mut line_buf)?; if n == 0 { break; } // Calls the real evaluator (eval_value) via thread-local callback acc = sema_core::call_callback(ctx, &func, &[acc, Value::string(&line_buf)])?; } Ok(acc) }) ``` **Performance trade-off:** ~960ms → ~2,900ms on 1M rows (~3× regression). The overhead comes from the full trampoline evaluator: call stack management, span tracking, and `Trampoline` dispatch on every sub-expression. **Fast-path optimizations that partially recovered performance:** 1. **Self-evaluating fast path:** `eval_value` short-circuits for integers, floats, strings, keywords, and symbols — skipping depth tracking and step limits for the most common forms. 2. **Inline NativeFn dispatch:** When the evaluator sees a `Value::NativeFn` in call position, it calls the function pointer directly without going through `call_callback` indirection. 3. **Thread-local shared EvalContext:** `with_stdlib_ctx` reuses a single `EvalContext` across all stdlib → evaluator callbacks, avoiding per-call allocation of `RefCell`/`Cell` fields. 4. **Deferred cloning:** `eval_value_inner` avoids cloning the expression and environment on the first trampoline iteration, only cloning if a tail call (`Trampoline::Eval`) is returned. **Remaining gap:** The ~3× regression could not be fully closed within the tree-walking architecture. The bytecode VM — the reason the mini-eval was removed, and now Sema's sole evaluator — gets to ~1.2× of the mini-eval on 1M (~1,150ms vs ~960ms) and is ~2.4× faster than the tree-walker was on the same workload. **Literature:** * Inline caching, pioneered by Smalltalk-80 and refined in V8's hidden classes, solves the same dispatch overhead problem but at a different architectural level * Most production Lisps (SBCL, Chez Scheme) compile to native code, making dispatch overhead negligible — Sema's callback overhead is inherent to tree-walking interpreters * Lua 5.x's bytecode VM inlines common operations (`OP_ADD`, `OP_GETTABLE`) into the dispatch loop — this is the approach Sema's bytecode VM (`sema-vm`) takes ## 4. String Interning (lasso) **Problem:** Symbol/keyword equality was O(n) string comparison. Environment lookups keyed by `String` required comparing the full string on each `BTreeMap` node visit. Special form dispatch compared against 30+ string literals on every list evaluation. **Solution:** Replace `Rc` in `Value::Symbol` and `Value::Keyword` with `Spur` — a `u32` handle from the [lasso](https://crates.io/crates/lasso) string interner. Environment bindings keyed by `Spur` for direct integer lookup. ```rust // Before: O(n) string comparison Value::Symbol(Rc) env: BTreeMap // After: O(1) integer comparison Value::Symbol(Spur) // u32 env: BTreeMap ``` (`Env` bindings have since moved from `BTreeMap` to `hashbrown::HashMap`, still keyed by `Spur`.) Special form dispatch uses pre-cached `Spur` constants: ```rust // crates/sema-eval/src/special_forms.rs struct SpecialFormSpurs { quote: Spur, if_: Spur, define: Spur, // ... 30 more } // Dispatch: integer comparison, no string resolution if head_spur == sf.if_ { return Some(eval_if(args, env)); } ``` **Caveat:** The initial implementation was actually *slower* (2,518ms vs 1,580ms baseline) because `resolve()` was allocating a new `String` on every symbol lookup. Fixed by adding `with_resolved(spur, |s| ...)` which provides a borrowed `&str` without allocation, and switching `Env` to use `Spur` keys directly. **Impact:** 1,580ms → 1,400ms (11% faster) after fixing the allocation issue. **Literature:** * String interning is as old as Lisp itself — McCarthy's original LISP 1.5 (1962) interned atoms in the "object list" (oblist) * Java interns all string literals and provides `String.intern()`. The JVM's `invokedynamic` uses interned method names for O(1) dispatch * The [string-interner](https://crates.io/crates/string-interner) and [lasso](https://crates.io/crates/lasso) crates are the two main Rust options; lasso was chosen for its `Rodeo` thread-local interner which fits Sema's single-threaded architecture ## 5. hashbrown HashMap **Problem:** The 1BRC accumulator uses a map keyed by weather station name (~400 entries). `BTreeMap` provides O(log n) lookup, but the accumulator is accessed on every row. With 10M rows, the log₂(400) ≈ 9 comparisons per lookup adds up. **Solution:** Added a `Value::HashMap` variant backed by [hashbrown](https://crates.io/crates/hashbrown) (the same hash map used inside Rust's `std::collections::HashMap`, but exposed directly for `no_std` compatibility and raw API access). ```sema ;; User code: opt into HashMap for the accumulator (file/fold-lines "data.csv" (lambda (acc line) ...) (hashmap/new)) ; amortized O(1) vs O(log n) ;; Convert back to sorted BTreeMap for output (hashmap/to-map acc) ``` `BTreeMap` remains the default for `{}` map literals because deterministic ordering matters for equality, printing, and test assertions. `hashbrown` is opt-in for performance-critical paths. **Impact:** 1,400ms → 1,340ms (4% faster). Modest because BTreeMap with 400 entries and short string keys is already fast. **Literature:** * hashbrown uses SwissTable, designed by Google for their C++ `absl::flat_hash_map`. See [CppCon 2017: Matt Kulukundis "Designing a Fast, Efficient, Cache-friendly Hash Table"](https://www.youtube.com/watch?v=ncHmEUmJZf4) * Clojure's `{:key val}` maps use HAMTs (hash array mapped tries) which provide O(~1) lookup with structural sharing. Sema's approach is simpler: full COW on the `Rc` rather than structural sharing, which is viable because the refcount-1 fast path almost always hits ## 6. SIMD Byte Search (memchr) *(removed)* > **Status:** The memchr-based two-part split fast path was part of the mini-eval's inlined `string/split` and was removed with it. The current `string/split` in `sema-stdlib/src/string.rs` uses Rust's standard `str::split()` followed by `map` and `collect`. The `memchr` crate remains a dependency of `sema-stdlib` but is no longer used in the split hot path. **What it was:** A SIMD-accelerated (SSE2/AVX2/NEON) byte search via the [memchr](https://crates.io/crates/memchr) crate, combined with a two-part split fast path that avoided `Vec` allocation when splitting on a single-byte separator with exactly one occurrence (the common case in 1BRC: `"Berlin;12.3"` → `["Berlin", "12.3"]`). **Impact when active:** Negligible for SIMD specifically (1BRC strings are 10–30 bytes), but the two-part fast path avoided iterator/Vec overhead. **Literature:** * memchr is maintained by Andrew Gallant (BurntSushi), author of ripgrep. It uses a [generic SIMD](http://0x80.pl/articles/simd-strfind.html) framework to dispatch to the best available instruction set at runtime ## 7. Custom Number Parser *(removed)* > **Status:** This was part of the mini-eval's inlined `string/to-number` and was removed with it. The current `string/to-number` in `sema-stdlib/src/string.rs` uses Rust's standard `str::parse::()` with fallback to `str::parse::()`. **What it was:** A hand-rolled decimal parser that handled only `[-]digits[.digits]`, using a precomputed powers-of-10 lookup table for 1–4 fractional digits. It returned `None` for complex cases (scientific notation, infinity, NaN), falling back to the standard parser. **Impact when active:** Part of the combined mini-eval speedup. Difficult to isolate, but avoided the overhead of Rust's [dec2flt](https://doc.rust-lang.org/stable/src/core/num/dec2flt/mod.rs.html) algorithm. **Literature:** * Rust's float parser is based on the [Eisel-Lemire algorithm](https://nigeltao.github.io/blog/2020/eisel-lemire.html) (2020), which is fast for a general-purpose parser but still does more work than necessary for simple decimals * Daniel Lemire's [fast\_float](https://github.com/fastfloat/fast_float) C++ library (and its Rust port) takes a similar "fast path for common cases" approach ## 8. Enlarged I/O Buffer **Problem:** `BufReader`'s default 8KB buffer means frequent syscalls for large files. **Solution:** 256KB buffer for `file/fold-lines`. ```rust let mut reader = std::io::BufReader::with_capacity(256 * 1024, file); ``` **Impact:** Minor. CPU was the bottleneck, not I/O. But it's a free win — larger buffers amortize syscall overhead and improve sequential read throughput on modern SSDs. ## 9. Bytecode VM Optimizations The bytecode VM applies several optimizations beyond basic bytecode compilation. These are documented in detail in [Bytecode VM](./bytecode-vm.md); highlights below. ### Intrinsic Recognition The compiler recognizes calls to known builtins and emits inline opcodes instead of function calls: **Arithmetic & comparison** (phase 1): | Source | Compiled to | What it replaces | |--------|------------|-----------------| | `(+ a b)` | `AddInt` | `CallGlobal("+", 2)` → hash lookup → NativeFn downcast → args Vec → function call | | `(- a b)` | `SubInt` | Same overhead | | `(* a b)` | `MulInt` | Same overhead | | `(< a b)` | `LtInt` | Same overhead | | `(> a b)` | `Gt` | Same overhead | | `(not x)` | `Not` | Same overhead | **Stdlib: list operations & type predicates** (phase 2, Feb 2026): | Source | Compiled to | What it replaces | |--------|------------|-----------------| | `(car x)` / `(first x)` | `Car` | Same overhead — pop list, push first element | | `(cdr x)` / `(rest x)` | `Cdr` | Same — pop list, push tail | | `(cons h t)` | `Cons` | Same — pop head+tail, push new list | | `(null? x)` | `IsNull` | Same — push `#t` if nil or empty list | | `(pair? x)` | `IsPair` | Same — push `#t` if non-empty list | | `(list? x)` | `IsList` | Same — push `#t` if list | | `(number? x)` | `IsNumber` | Same — push `#t` if int or float | | `(string? x)` | `IsString` | Same — push `#t` if string | | `(symbol? x)` | `IsSymbol` | Same — push `#t` if symbol | | `(length x)` | `Length` | Same — push collection length as int | | `(append a b)` | `Append` | Same — concatenate two lists (2-arg only) | | `(get m k)` | `Get` | Same — map lookup, nil default (2-arg only) | | `(contains? m k)` | `ContainsQ` | Same — push `#t` if key exists in map | This eliminates global hash lookup, `Rc` downcast, argument `Vec` allocation, and function pointer dispatch — the entire call overhead — for the most common operations. The `*Int` opcodes include NaN-boxed small-int fast paths that operate directly on raw `u64` bits, avoiding `Clone`/`Drop` overhead entirely. All standard arithmetic and comparison operators are inlined. The `*Int` variants include NaN-boxed fast paths; the generic opcodes (`Div`, `Gt`, `Le`, `Ge`) handle int/float coercion correctly. **Impact:** Phase 1: TAK 4,352ms → 1,250ms (-71%), upvalue-counter 1,232ms → 450ms (-63%). Phase 2: deriv 1,123ms → 879ms (-22%), closure-storm 1,135ms → 1,029ms (-9%). The deriv benchmark is dominated by `car`/`cdr`/`cons`/`pair?` — exactly the functions that became intrinsics. ### Constant Folding An optimization pass (`optimize.rs`) runs on the CoreExpr IR between lowering and variable resolution. It folds compile-time-evaluable expressions: * **Arithmetic:** `(+ 1 2)` → `3`, `(* 3 4)` → `12` * **Comparisons:** `(< 1 2)` → `#t`, `(= 3 3)` → `#t` * **Boolean:** `(not #t)` → `#f` * **Control flow:** `(if #t a b)` → `a`, `(and #f x)` → `#f`, `(or #t x)` → `#t` * **Dead code:** `(begin 42 x)` → `(begin x)` (pure constants before the last expression are eliminated) **Impact:** Eliminates unnecessary instructions at compile time. Runtime impact on benchmarks is negligible (hot loops operate on variables), but reduces code size and improves startup for programs with constant subexpressions. ### Peephole: `(if (not X) ...)` → JumpIfTrue The compiler pattern-matches `(if (not expr) then else)` and emits the condition with an inverted jump, eliminating both the `not` call and one opcode dispatch: ``` ;; Before: CallGlobal("not") + JumpIfFalse ;; After: JumpIfTrue (condition compiled directly) ``` ### Fused CallGlobal Non-tail calls to global functions use a single `CallGlobal` instruction that combines `LoadGlobal + Call`, using `call_vm_closure_direct` to set up the call frame without needing the function value on the stack. ### Per-Instruction Inline Cache Each `LoadGlobal`/`CallGlobal` instruction gets a dedicated cache slot at compile time, eliminating hash collisions. See the [inline cache section](#per-instruction-inline-cache-mar-2026) above for benchmark results. ### Specialized Local Access Slots 0–3 have dedicated zero-operand opcodes (`LoadLocal0`..`LoadLocal3`, `StoreLocal0`..`StoreLocal3`), saving 2 bytes per access to the most common local variable slots. ## Build Tuning: Fat LTO + PGO (v1.19.2) Beyond the VM itself, the **distributed binaries** are optimized at build time: * **Fat LTO** (`lto = "fat"` on the `release`/`dist` profiles): lets LLVM inline across crate boundaries — the dispatch loop in `sema-vm` calls `sema-core` value accessors (`view`, `as_int`, `type_name`, …) millions of times per benchmark, and thin LTO can't always inline those. Measured 3–9% across the suite, at the cost of ~2× longer release builds. * **Profile-Guided Optimization (PGO):** the cargo-dist GitHub-release binaries and Homebrew bottle are built with PGO. The build instruments the binary, trains it on the full benchmark suite + a 1BRC sample, merges the profile with `llvm-profdata`, then rebuilds — letting LLVM lay out the `match op` dispatch hot blocks by *measured* opcode frequency. It runs on native release targets via cargo-dist's `github-build-setup`; cross-compiled and Windows targets fall back to fat LTO, and the step is fail-safe (a PGO failure ships LTO, never breaks the release). Run it locally with `make build-pgo`. (`cargo install` builds get fat LTO but not PGO — PGO needs the training step.) **Measured impact** (v1.19.2 PGO build vs the pre-optimization build, Apple Silicon, best-of-N): | Benchmark | Before | v1.19.2 PGO | Δ | | ----------------- | ------ | ----------- | ------ | | 1BRC (10M rows) | 11.18s | 8.23s | −26% | | higher-order-fold | 552ms | 334ms | −39% | | tak | 1793ms | 1209ms | −33% | | mandelbrot | 246ms | 177ms | −28% | | deriv | 767ms | 570ms | −26% | | hashmap-bench | 3976ms | 2967ms | −25% | | closure-storm | 1040ms | 836ms | −20% | | bench-features | 1373ms | 1098ms | −20% | | string-pipeline | 633ms | 537ms | −15% | | nqueens | 2060ms | 1790ms | −13% | The win is dominated by PGO; fat LTO contributes ~3–9% of it. (`cargo install` builds get the LTO portion but not PGO.) ## Rejected Optimizations Not everything we tried worked: | Approach | Result | Why | | ----------------------------------------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **HashMap for Env** | Slower (later adopted) | At the time, hashing overhead exceeded BTreeMap's few integer comparisons on the very small maps (1–3 entries) typical of `let` scopes. The verdict was later reversed: `Env` bindings now use `hashbrown::HashMap`. | | **im-rc / rpds (persistent collections)** | Slower | Structural sharing fights the COW optimization — the whole point is to *avoid* sharing and mutate in place when refcount is 1. | | **bumpalo / typed-arena** | Incompatible | Values need to escape the arena (returned from functions, stored in environments). Arena allocation only works for temporaries. | | **compact\_str / smol\_str** | Redundant | Once symbols/keywords are interned as `Spur`, small-string optimization for them is pointless. String *values* are still `Rc` but they're not in the hot path for dispatch. | | **`target-cpu=native`** | No-op (this workload) | Tested Jun 2026: the VM dispatch loop is branch-bound, not SIMD-bound, and the generic `aarch64` target already uses NEON on Apple Silicon. Zero measurable gain — and it breaks portable/distributable binaries, so it is not used. | > **Note:** "Full evaluator callback" was previously listed here as rejected (4x slower than mini-eval). It became the **tree-walker's architecture** — the ~2.7× overhead vs the mini-eval was accepted as the cost of architectural correctness. The bytecode VM, now Sema's sole evaluator, bypasses this overhead by compiling directly to bytecode. ## Architecture Diagram The hot path for `file/fold-lines` under the callback architecture, as it ran on the (now-retired) tree-walking evaluator: ``` file/fold-lines ├── BufReader (256KB buffer, reused line_buf) └── Per-line loop: ├── read_line → reused buffer (no alloc) ├── call_callback → real evaluator (eval_value) │ ├── self-evaluating fast path (ints, floats, strings skip depth tracking) │ ├── NativeFn inline dispatch (direct call, no callback indirection) │ ├── apply_lambda → fresh Env per call (no env reuse) │ ├── string/split → std str::split (no SIMD fast path) │ ├── string/to-number → std parse:: / parse:: │ └── assoc → COW in-place mutation (Rc refcount == 1) ├── thread-local EvalContext (shared, not per-call) └── acc moved, not cloned → preserves refcount == 1 ``` The bytecode VM bypasses that callback path entirely. Instead of `call_callback → eval_value → trampoline`, the VM compiles the lambda body to bytecode once and executes it in a tight instruction dispatch loop, eliminating trampoline overhead, per-call span tracking, and repeated AST traversal. --- --- url: 'https://sema-lang.com/docs/internals/lisp-comparison.md' --- # Lisp Dialect Benchmark How does Sema compare to other Lisp dialects on a real-world I/O-heavy workload? This page benchmarks fifteen Lisp dialects on the [1 Billion Row Challenge](https://github.com/gunnarmorling/1brc) — read weather-station measurements and compute min/mean/max per station. It is not a synthetic micro-benchmark; it exercises I/O, string parsing, hash-table accumulation, and numeric aggregation in a tight loop. ::: warning A benchmark ranks implementations, not just runtimes Each dialect's **optimized** entry uses a comparable best effort — a hand-rolled integer×10 temperature parser and, where it helps the runtime, block/byte I/O. Even so, results partly reflect *how each program is written*, not pure runtime throughput. The [dialect notes](#dialect-notes) say where each number comes from; the [simple table](#simple-idiomatic) shows the same workload written the obvious way. ::: ## Benchmark One same-machine run: **macOS 15.6, Apple M2 Max, native Homebrew runtimes, 10,000,000 rows (~124 MiB), best of 3, single-threaded.** Sema is the **v1.19.2 PGO build**. All fifteen implementations produce byte-identical output. PicoLisp is omitted — no native Homebrew formula. ### Optimized — best effort per dialect Each implementation tuned to a comparable level (hand-rolled int×10 parser; block/byte I/O where the runtime benefits). Relative to the fastest (Fennel). | Dialect | Time (ms) | Relative | Runtime | | ----------------- | --------- | -------- | -------------------- | | **Fennel/LuaJIT** | 532 | 1.0x | JIT compiler | | **SBCL** | 899 | 1.7x | Native compiler | | **Racket** | 1,434 | 2.7x | JIT (Chez backend) | | **Chez Scheme** | 1,515 | 2.8x | Native compiler | | **Gambit** | 2,298 | 4.3x | Native compiler (C) | | **Clojure** | 2,805 | 5.3x | JVM (JIT) | | **Guile** | 4,355 | 8.2x | Bytecode VM + JIT | | **Janet** | 5,028 | 9.5x | Bytecode VM | | **Chicken** | 5,772 | 10.8x | Native compiler (C) | | **Gauche** | 7,153 | 13.4x | Bytecode VM | | **Sema** | 8,096 | 15.2x | Bytecode VM | | **Emacs Lisp** | 8,167 | 15.4x | Bytecode VM | | **ECL** | 8,933 | 16.8x | Native compiler (C) | | **newLISP** | 9,019 | 17.0x | Interpreter | | **Kawa** | 18,395 | 34.6x | JVM (JIT) | ### Simple / idiomatic The same workload written the obvious way in each dialect — built-in number parser, per-line I/O, standard data structures. No hand-rolled parsers, no block reads. Closer to "raw runtime on naive code." Relative to the fastest (Gambit). | Dialect | Time (ms) | Relative | | ----------------- | --------- | -------- | | **Gambit** | 2,351 | 1.0x | | **Chez Scheme** | 2,481 | 1.1x | | **Fennel/LuaJIT** | 2,679 | 1.1x | | **Clojure** | 2,902 | 1.2x | | **SBCL** | 2,997 | 1.3x | | **Guile** | 5,186 | 2.2x | | **newLISP** | 8,206 | 3.5x | | **Chicken** | 9,094 | 3.9x | | **Janet** | 9,950 | 4.2x | | **Sema** | 10,026 | 4.3x | | **ECL** | 13,599 | 5.8x | | **Emacs Lisp** | 16,219 | 6.9x | | **Gauche** | 16,476 | 7.0x | | **Kawa** | 17,793 | 7.6x | The gap between the two tables is itself the story. Where optimized ≪ simple (Fennel, Racket, Janet, Gauche), most of the win came from a hand-rolled parser or block I/O. Where they're close (Clojure, Sema, newLISP), the runtime was already doing the work and there was little left to hand-tune. ## Dialect notes What's behind each number — and which results are runtime ceilings versus implementation choices. ### Fennel / LuaJIT — the JIT runs away with it Fennel compiled to LuaJIT is **the fastest entry, ahead of SBCL** (532 ms). LuaJIT's tracing JIT compiles the hot byte-scan loop to native code; with a `string.byte` integer parser and 1 MiB block reads it chews through ~250 MB/s. It's the clearest "runtime does the heavy lifting" result — but note its *simple* version is 2.7 s (5× slower), so the win is the optimized byte loop being unusually JIT-friendly, not a free lunch. ### SBCL — native code + `(safety 0)` SBCL compiles to native machine code; in a type-specialized hot path there is no interpreter loop. With `(declare (optimize (speed 3) (safety 0)))`, block `read-sequence` I/O, an integer×10 parser, and in-place `setf` struct mutation, the inner loop runs near C speed. 25+ years of compiler work (descended from CMUCL). Its 1.3× → 1.0x optimization gain (simple 3.0 s → optimized 0.9 s) is the largest in the suite. ### Racket — byte I/O over the Chez backend Racket reads 1 MiB byte blocks, scans for `;`/newline with O(1) `subbytes` slicing, and parses int×10. Its CS backend (Chez under the hood) plus byte strings put it third overall, just ahead of Chez itself — a notable result for a runtime usually thought of as "the teaching language." ### Chez Scheme — the other native compiler Chez compiles to native code via a [nanopass framework](https://nanopass.org/). With a custom char-by-char parser and `make-hashtable`/`string-hash` it lands just behind Racket. The remaining gap to SBCL is mostly per-line string allocation versus SBCL's block parser. ### Gambit — compiled Scheme via C `gsc` compiles Scheme to C to a native binary. It got the same int×10 parser as the other Schemes, but the win was negligible here — `read-line` + `substring` + string hashing dominate the loop, so I/O, not number parsing, is the bottleneck. ### Clojure — JVM tax + warmup Clojure's time includes JVM startup and JIT warmup, real costs for a single-shot script. `line-seq` + a transient map is idiomatic but not zero-cost, and `Double/parseDouble` handles the full IEEE-754 spec. Steady-state throughput is better than the wall-clock suggests; it trades raw speed for compactness. ### Guile — Scheme bytecode VM + JIT Guile 3 has a bytecode VM with a native JIT on supported platforms. With a hand-rolled int×10 parser it's the fastest of the "VM" tier here, ahead of Janet and Chicken. ### Janet — the closest architectural peer Janet is the most architecturally comparable to Sema: an embeddable scripting language, bytecode VM, GC-based, no native compiler. Head to head, **Janet (5.0 s) lands ~1.6× ahead of Sema (8.1 s)**. Two things help Janet — its strings *are* byte strings (O(1) slicing, no UTF-8 navigation), and it has a tracing GC instead of `Rc` reference counting, so a `slurp` + byte-scan + int parser goes a long way. This is the comparison to watch as Sema's runtime evolves. ### Chicken — compiled Scheme, I/O bound Chicken compiles Scheme to C via `csc -O3` with an int×10 parser. The remaining gap is per-line I/O allocation and Chicken's continuation-passing-style C ("Cheney on the MTA"), whose trampolining the C compiler can't fully optimize away. ### Gauche — byte scanning over UTF-8 strings Gauche stores strings as **UTF-8 indexed by character**, so a `substring`/`string-index` implementation pays O(k) navigation per slice to convert character positions to byte offsets — a trap that can make a mature, well-engineered runtime look slow. The implementation here sidesteps it: read the whole file into a `u8vector`, scan **bytes** directly, parse int×10. That lands Gauche mid-pack at 7.2 s, ahead of Sema — and is a good reminder that on a char-indexed runtime, byte-oriented I/O is the difference between near-last and respectable. ### Sema — the interpreter floor Sema (8.1 s) sits behind Gauche, Janet, Chicken, and Guile. It's a bytecode interpreter with NaN-boxed immutable values and `Rc` reference counting — no JIT, no native codegen — and its implementation is at the **interpreter floor**: the tricks that help the compiled dialects don't transfer here. Sema's native `string/split` and `string->float` already beat an interpreted hand-parser, and vectors are immutable with no mutable cell, so every row rebuilds the stat vector. The next gains are *runtime*, not implementation: a **tracing GC** (to kill the per-row `Rc` churn) and **mutable vectors / byte-buffer + string-slice APIs** (so a genuinely byte-oriented implementation, like the fast dialects use, becomes possible). See [Performance Internals](./performance.md). ### Emacs Lisp — buffer-based I/O Emacs loads the whole file into a buffer with `insert-file-contents-literally` and parses int×10 directly from buffer characters with no substring extraction — strong for a venerable bytecode VM, essentially tied with Sema. ### ECL — Common Lisp via C ECL compiles Common Lisp through C with `compile-file`, with an int×10 parser. The gap to SBCL is ECL's less aggressive native code generation. ### newLISP — a small, simple interpreter newLISP's accumulation uses a hash, but on this 40-station dataset the data structure hardly matters — with so few stations even a linear scan is cheap, and per-row interpreter overhead dominates either way. A faithful picture of a deliberately minimal interpreter. ### Kawa — JVM Scheme, slower than expected Kawa compiles Scheme to JVM bytecode. Even with Java interop (`BufferedReader`, `java.util.HashMap`), Scheme-on-JVM data representation, startup, and JIT warmup leave it last. ## What this benchmark doesn't show This is one workload. Different benchmarks would reorder things: * **CPU-bound computation** (fibonacci, sorting): the native compilers and JITs would pull further ahead; the I/O here amortizes some of the interpreter gap. * **Startup time:** included in wall-clock but not isolated — it hits the JVM dialects (Clojure, Kawa) hardest. * **Memory usage:** not measured; JVM runtimes carry a higher baseline than small standalone ones like Janet or Sema. * **Multi-threaded:** Clojure, SBCL, Janet, and Guile can parallelize; Sema is single-threaded (its async/channels are cooperative, not parallel). * **Developer experience:** Clojure's REPL, Racket's DrRacket, and SBCL's SLIME are far more mature than Sema's. ## Methodology * **Dataset:** 10,000,000 rows (~124 MiB), 40 weather stations, from the [1BRC spec](https://github.com/gunnarmorling/1brc). * **Environment:** macOS 15.6 / Apple M2 Max, native Homebrew runtimes (June 2026). Sema 1.19.2 (PGO). Gauche 0.9.15. Others are the current Homebrew formulae / downloaded binaries. * **Measurement:** wall-clock, best of 3 consecutive runs per dialect, via `benchmarks/1brc/run-native-benchmarks.py` (all dialects measured together in one session). Sema is timed as the prebuilt PGO binary (`make build-pgo`, run with `SEMA_SKIP_BUILD=1`). * **Verification:** all fifteen implementations produce byte-identical normalized output (sorted stations, 1-decimal rounding) — checked every run. * **Implementations:** each *optimized* entry uses a comparable best effort (hand-rolled int×10 parser; block/byte I/O where the runtime benefits); the *simple* table uses each dialect's naive idiom. PicoLisp is omitted (no native Homebrew formula). ### Reproducing ```bash # Generate test data (or use benchmarks/data/bench-10m.txt) python3 benchmarks/1brc/generate-test-data.py 10000000 benchmarks/data/bench-10m.txt # Build the PGO Sema binary, then run the native matrix against it make build-pgo SEMA_SKIP_BUILD=1 ./benchmarks/1brc/run-native-benchmarks.py benchmarks/data/bench-10m.txt ``` Implementation source: [`benchmarks/1brc/`](https://github.com/HelgeSverre/sema/tree/main/benchmarks/1brc) (optimized) and [`benchmarks/1brc/simple/`](https://github.com/HelgeSverre/sema/tree/main/benchmarks/1brc/simple) (simple/idiomatic). --- --- url: 'https://sema-lang.com/docs/internals/feature-comparison.md' --- # Feature Comparison How does Sema stack up against other Lisps and Lisp-adjacent languages as a practical tool? This isn't about benchmarks (see [Lisp Dialect Benchmark](./lisp-comparison.md) for that) — it's about what you can actually *do* out of the box. ## Languages Compared | Language | Implementation | Primary Use Case | | --- | --- | --- | | **Sema** | Rust (bytecode VM) | LLM-native scripting, AI tooling | | **Janet** | C (bytecode VM) | Embeddable scripting, system tools | | **Racket** | Chez Scheme backend | Teaching, DSLs, research | | **Clojure** | JVM | Production backend systems | | **Fennel** | Lua transpiler | Game dev, Lua ecosystem | | **Guile** | C (bytecode VM) | GNU extension language | | **Common Lisp (SBCL)** | Native compiler | Production systems, HPC | ## Platform & Distribution | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | Standalone executables | ✅ `sema build` | ✅ `jpm` | ✅ `raco exe` | ⚠️ GraalVM only | ⚠️ `--compile-binary` | ❌ | ✅ `save-lisp-and-die` | | Bytecode compilation | ✅ `.semac` | ✅ images | ✅ `.zo` | ✅ `.class` | ❌ | ✅ `.go` | ✅ FASL | | WASM / browser | ✅ [sema.run](https://sema.run) | ⚠️ community | ⚠️ WebRacket (subset) | ✅ ClojureScript | ⚠️ via Fengari | ⚠️ Hoot (R7RS subset) | ⚠️ ECL/Emscripten | | Web playground | ✅ 20+ examples | ⚠️ community | ⚠️ Try Racket | ⚠️ community | ✅ on fennel-lang.org | ❌ | ❌ | | Shebang scripts | ✅ | ✅ | ✅ | ⚠️ `clojure` CLI | ✅ | ✅ | ✅ `--script` | | Homebrew install | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Windows support | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ | ⚠️ | | Install script (curl) | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ## Embedding | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | Embed in Rust | ✅ crate API | ⚠️ via FFI | ❌ | ❌ | ❌ | ⚠️ via FFI | ❌ | | Embed in C/C++ | ⚠️ via FFI | ✅ single `.c`+`.h` | ✅ | ❌ | ✅ single file | ✅ `libguile` | ❌ | | Runs in JS/browser | ✅ WASM module | ⚠️ community WASM | ❌ | ⚠️ via ClojureScript | ⚠️ via Fengari | ❌ | ❌ | | Sandbox mode | ✅ `--sandbox` | ✅ `sandbox` | ✅ | ❌ | ❌ | ✅ `ice-9 sandbox` | ❌ | ## Built-in Standard Library | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | Stdlib functions | 700+ | 600+ | 1000+ | 700+ | ~50 (+ Lua) | 500+ | 900+ | | HTTP client | ✅ built-in | ⚠️ via library | ✅ built-in | ⚠️ via library | ⚠️ via Lua | ✅ `(web client)` | ⚠️ via library | | JSON | ✅ built-in | ⚠️ via spork | ✅ built-in | ⚠️ via library | ❌ | ⚠️ via library | ⚠️ via library | | Regex | ✅ built-in | ✅ PEGs | ✅ built-in | ✅ built-in | ✅ Lua patterns | ✅ built-in | ⚠️ via library | | CSV | ✅ built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Crypto (SHA, HMAC) | ✅ built-in | ⚠️ via library | ⚠️ SHA-1/MD5 only | ⚠️ via library | ❌ | ⚠️ via library | ⚠️ via library | | PDF extraction | ✅ built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | File I/O | ✅ built-in | ✅ built-in | ✅ built-in | ✅ via Java | ✅ via Lua | ✅ built-in | ✅ built-in | | Date/time | ✅ built-in | ✅ built-in | ✅ built-in | ✅ via Java | ✅ via Lua | ✅ built-in | ⚠️ via library | | Shell execution | ✅ built-in | ✅ built-in | ✅ built-in | ✅ built-in | ✅ via Lua | ✅ built-in | ✅ built-in | | KV store | ✅ built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | SQLite | ✅ built-in | ⚠️ via library | ✅ `db` collection | ⚠️ via JDBC | ⚠️ via Lua | ⚠️ via library | ⚠️ via library | | TOML | ✅ built-in | ⚠️ via library | ❌ | ⚠️ via library | ❌ | ❌ | ⚠️ via library | | Web server | ✅ built-in (axum) | ⚠️ via library | ✅ built-in | ⚠️ Ring/Jetty | ⚠️ via Lua | ✅ `(web server)` | ⚠️ via library | | Terminal styling | ✅ built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ## LLM & AI This is Sema's primary differentiator. No other Lisp has LLM primitives as first-class language features. | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | LLM chat/completion | ✅ built-in | ❌ | ❌ | ⚠️ via library | ❌ | ❌ | ❌ | | Multi-provider (8+) | ✅ | — | — | — | — | — | — | | Streaming | ✅ built-in | — | — | — | — | — | — | | Tool use / agents | ✅ `deftool` `defagent` | — | — | — | — | — | — | | Structured extraction | ✅ `llm/extract` | — | — | — | — | — | — | | Vision / images | ✅ built-in | — | — | — | — | — | — | | Embeddings | ✅ 3 providers | — | — | — | — | — | — | | Vector store (RAG) | ✅ built-in | — | — | — | — | — | — | | Cost tracking | ✅ `llm/budget` | — | — | — | — | — | — | | Response caching | ✅ `llm/with-cache` | — | — | — | — | — | — | | Conversations | ✅ immutable data | — | — | — | — | — | — | | Provider fallback | ✅ `llm/with-fallback` | — | — | — | — | — | — | | Prompt templates | ✅ built-in | — | — | — | — | — | — | ## Language Features | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | Tail-call optimization | ✅ | ✅ | ✅ | ⚠️ `recur` only | ✅ via Lua | ✅ | ⚠️ not guaranteed | | Macros | ✅ `defmacro` | ✅ | ✅ hygienic | ✅ | ✅ | ✅ both | ✅ | | Pattern matching | ✅ `match` | ✅ | ✅ | ⚠️ via core.match | ✅ | ✅ | ⚠️ via library | | Modules | ✅ | ✅ | ✅ | ✅ namespaces | ✅ via Lua `require` | ✅ | ✅ packages | | Continuations | ❌ | ⚠️ fibers | ✅ `call/cc` | ❌ | ❌ | ✅ `call/cc` | ❌ | | Async/Channels | ✅ cooperative | ❌ | ❌ | ✅ core.async | ❌ | ❌ | ⚠️ via library | | Multithreading | ❌ | ✅ | ✅ | ✅ | ✅ via Lua | ✅ | ✅ | | Persistent data structures | ⚠️ COW maps | ❌ | ❌ | ✅ core design | ❌ | ❌ | ❌ | | Keywords | ✅ `:foo` | ✅ `:foo` | ✅ `#:foo` | ✅ `:foo` | ✅ `:foo` | ✅ `#:foo` | ✅ `:foo` | | Map literals | ✅ `{:a 1}` | ✅ `{:a 1}` | ✅ `#hash(...)` | ✅ `{:a 1}` | ✅ `{:a 1}` | ❌ | ❌ | | Vector literals | ✅ `[1 2]` | ✅ `[1 2]` | ✅ `#(1 2)` | ✅ `[1 2]` | ✅ `[1 2]` | ✅ `#(1 2)` | ✅ `#(1 2)` | | F-strings | ✅ `f"${x}"` | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | | Short lambdas | ✅ `#(+ % 1)` | ✅ `|(+ $ 1)` | ❌ | ✅ `#(+ % 1)` | ✅ `#(+ $1 1)` | ❌ | ❌ | | Threading macros | ✅ `->` `->>` | ✅ `->` `->>` | ⚠️ via library | ✅ `->` `->>` | ✅ `->` `->>` | ❌ | ⚠️ via library | ## Developer Experience | Feature | Sema | Janet | Racket | Clojure | Fennel | Guile | SBCL | | --- | --- | --- | --- | --- | --- | --- | --- | | REPL | ✅ | ✅ | ✅ DrRacket | ✅ nREPL | ✅ | ✅ | ✅ SLIME/Sly | | Tab completion | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ | ✅ | | Editor support | VS Code, IntelliJ, Zed, Vim, Emacs, Helix | VS Code, Vim, Emacs | DrRacket, Emacs, VS Code | Emacs, VS Code, IntelliJ | Emacs, Vim, VS Code | Emacs (Geiser) | Emacs (SLIME/Sly) | | Package manager | ⚠️ git-based | ✅ `jpm` | ✅ `raco` | ✅ deps.edn/Lein | ❌ (uses Lua) | ⚠️ Guix | ✅ Quicklisp | | Code formatter | ✅ `sema fmt` | ❌ | ✅ `raco fmt` | ✅ cljfmt | ❌ | ❌ | ❌ | | Debugger | ✅ `sema dap` (DAP) | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | | LSP server | ✅ `sema lsp` | ⚠️ community | ✅ racket-langserver | ✅ clojure-lsp | ⚠️ fennel-ls | ⚠️ community | ⚠️ community | | Notebook | ✅ `sema notebook` | ❌ | ⚠️ Jupyter kernel | ✅ Clerk | ❌ | ⚠️ Jupyter kernel | ⚠️ Jupyter kernel | | Documentation site | ✅ sema-lang.com | ✅ janet-lang.org | ✅ docs.racket-lang.org | ✅ clojure.org | ✅ fennel-lang.org | ✅ gnu.org/guile | ✅ cliki.net | | Startup time | ~5ms | ~5ms | ~200ms | ~1–2s | ~5ms | ~50ms | ~50ms | ## Summary Sema is not trying to be the fastest Lisp or the most theoretically pure. Its niche is **practical scripting with LLM primitives built into the language** — no other Lisp has `deftool`, `defagent`, `llm/extract`, or multi-provider conversations as first-class constructs. If you need the **fastest execution**, use SBCL or Chez Scheme. If you need the **JVM ecosystem**, use Clojure. If you need **academic rigor and DSL tooling**, use Racket. If you need a **tiny embeddable C scripting engine**, use Janet. If you want to **build AI agents, extract structured data from LLMs, or prototype LLM-powered tools** in a language that treats prompts as data — Sema is the only Lisp built for that. --- --- url: 'https://sema-lang.com/docs/internals/glossary.md' --- # Glossary This page defines the technical vocabulary used across Sema's documentation — from Lisp fundamentals and VM internals to the LLM, observability, and tooling layers. Many words are overloaded (the same word means different things in different subsystems); those entries enumerate each meaning explicitly so you can tell them apart. ## Lisp & Language Core **Arity** — the number of arguments a function expects. Calling with the wrong count raises an arity error (error `:type` `:arity`), e.g. `f expects 1 args, got 3`. **Association list (alist)** — a list of pairs used as a simple key-value mapping, queried with `assoc` (uses `equal?`), `assq` (uses `eq?`, pointer/symbol equality), or `assv` (uses `eqv?`, numeric value). Each lookup returns the matching pair or `#f`. Distinct from the `map`/`hashmap` data types, and the alist `assoc` is a different function from the map `assoc` that adds a key. **Atom** — a single, non-list Sema value such as a number, string, symbol, or keyword, as opposed to a list of expressions. Note: in Clojure-family Lisps "atom" also means a mutable reference cell — Sema has no such type; use `define` + `set!` for mutable state (see *Mutable state* under Concurrency). **Begin / progn** — a sequencing form that evaluates its expressions in order and returns the last result. `progn` is an accepted Common Lisp alias. **Binding** — (1) *value binding*: an association of a name to a value (`define` for globals; `let`/`let*`/`letrec` for locals; `set!` mutates an existing one; modules expose *exported bindings*). (2) *binding form*: a syntactic construct that introduces bindings (`let`, `when-let`, `if-let`). **Car / cdr** — classic Lisp accessors: `car` (alias `first`) returns the first element of a list, `cdr` (alias `rest`) returns the remainder. Names derive from IBM 704 hardware registers (Contents of the Address/Decrement Register). Compositions like `cadr`, `caddr` chain them. **Closure** — a function paired with the variables it captures from its enclosing lexical scope, retaining access to them even after the defining function returns. See also the VM-level implementation under *Closure* in Reader, Compiler & VM Internals. **Cons pair** — the two-field cell ("cons") from which lists are built: `car` holds the head, `cdr` holds the tail. `cons` prepends an element to a list. See also *Dotted pair*. **Delay / force** — `delay` creates a promise that defers evaluation of its expression; `force` evaluates it and memoizes the result (non-promise values pass through unchanged). Classic Scheme lazy evaluation; `promise-forced?` tests whether it has been forced. See *Promise*. **Destructuring** — binding-position patterns that pull apart a value into named variables: positional list/vector patterns (`[a b c]`, with `&` for rest), nested patterns, and map patterns (`{:keys [name age]}`). Works in `let`, `let*`, `define`, lambda params, and `match`. `_` is a wildcard. **Do loop** — a Scheme `do` iteration form with variable bindings, per-iteration step expressions, a termination test, and an optional body, e.g. `(do ((i 0 (+ i 1))) ((= i 10) result) body)`. Relies on tail-call optimization. **Dotted pair** — Scheme cons-cell notation `(a . b)`. Because Sema lists are `Vec`-backed (not linked cons cells), the parser represents the dot as a marker symbol `"."` inserted into the element list (`(a . b)` parses as `[a, ".", b]`) — a pragmatic escape hatch for improper lists and Scheme compatibility. **Equality** — Sema's equality family: `=` is numeric equality (`(= 1 1.0)` is true); `eq?`/`equal?` test structural equality and are aliases. Alist lookups use `assq` (`eq?`, pointer/symbol) and `assv` (`eqv?`, numeric value), so the `eq?`/`eqv?` distinction is real there. Records compare by type plus pairwise `equal?`. **Form** — (1) *expression*: a single Sema expression considered as a unit of code — a function call, special form, or literal (`io/read-many` parses a string of multiple forms). (2) *special form*: see *Special form*. (3) *formatter category*: the formatter's notion of "body forms", "binding forms", "clause forms" — syntactic categories that drive indentation. **Gensym** — a function generating a guaranteed-unique symbol (e.g. `tmp__42`), used to avoid variable capture in macros. Auto-gensym (`foo#`) is the ergonomic form, with its uniqueness magic active only inside a quasiquote template. **Guard** — an extra boolean condition attached to a `match` clause via `when`, so the clause fires only when both the pattern matches and the guard is truthy, e.g. `(x when (> x 100) "big")`. **Homoiconicity** — the property that code is represented in the language's own data structures (a Sema program is just an ordinary `Value`). Underpins the reader producing `Value` directly (no separate AST), macros operating on `Value` AST, and the grammar fuzzer's near-free round-trip/value oracles. Also called "code is data". **Hygiene / variable capture** — variable capture is the bug where a binding a macro introduces accidentally shadows a user's same-named variable. Auto-gensym (`foo#`) and `gensym` produce unique symbols to prevent it, giving hygienic macros. **IEEE 754 float policy** — Sema's numeric rule split by type: integer division/modulo by zero raises an error, while floating point follows IEEE 754 (overflow and undefined results yield `inf`, `-inf`, or `NaN`). Integer overflow wraps two's-complement (no bignums). `math/nan?`/`math/infinite?` test the special floats; JSON/TOML cannot encode `NaN`/`Infinity`. **Keyword** — a colon-prefixed, self-evaluating identifier (e.g. `:name`, `:ok`), commonly a map key. Keywords double as accessor functions: `(:name m)` is equivalent to `(get m :name)`. Clojure-style; interned as a `Spur`. Also used as error `:type` tags and as the result of `(type x)`. **Keyword-as-function** — the convenience where a keyword in head position acts as a getter: `(:name person)` works like `(get person :name)`. Handled in the evaluator when a `Value::Keyword` appears as the call head. **Lambda** — a special form creating an anonymous function, e.g. `(lambda (x y) (+ x y))`. `fn` is an alias; variadic params use dot notation (`x . rest`). `defun`/`defn` are sugar over `lambda`. See also *Short lambda*. **Let / let\* / letrec** — local-binding special forms: `let` binds in parallel (all inits evaluated before any binding), `let*` binds sequentially (each visible to later inits), `letrec` makes all bindings visible to all inits (for mutual recursion). See also *Named let*. **Lexical scope** — Sema's scoping rule: a function accesses variables from the textually enclosing scopes where it was defined, not from where it is called. The basis of closures. **List** — the fundamental Sema data structure: a parenthesized, ordered sequence with `car`/`first` as head and `cdr`/`rest` as tail, created via `list` or quoting. Conceptually a cons list (queried as alists with `assoc`/`assq`/`assv`), but represented internally as a `Vec` — see *Vector-backed list* for the performance trade-offs. Contrast with `vector` (bracketed, O(1) indexed). **Macro** — a `defmacro`-defined transformer that rewrites code at expansion time (before evaluation), typically built with quasiquote/unquote. Some (threading, `when-let`) are auto-loaded built-ins. Contrast with a special form (built into the evaluator, not user-definable). See also *Macro expansion*. **Match** — a pattern-matching special form testing a value against patterns (literals, binding symbols, vector/map structures) with optional `when` guards. `match` raises an error if no clause matches; `match*` returns `nil` instead. Add a catch-all `(_ ...)` for exhaustiveness. **Module system** — Sema's `import`/`load` mechanism (in `sema-eval`): modules are identified by canonical file path, cached in `EvalContext.module_cache`, and a module's env is a child of the root env (it gets builtins, not caller bindings). Meanings of "module": (1) a source module via `(module ...)` with selective `export`; (2) preloaded *virtual modules* injected into the cache by host code (`preload_module`); (3) packages, whose entrypoint file is loaded on import. Architecture docs also use "module" loosely for stdlib sub-modules (io, http) and Rust modules. Contrast with `load`, which does not use the module system. **Multimethod** — Clojure-style polymorphic dispatch: `defmulti` declares a method with a dispatch function applied to the arguments; `defmethod` registers an implementation for a specific dispatch value (`:default` for fallback). **Named let** — a `let` with a loop name that creates a local recursive function used as a tail-call-optimized loop, e.g. `(let loop ((i 0)) ...)`. Standard Scheme idiom. **Nil** — the empty/null value, returned by `when`/`while`/`unless` on a failed condition, by `some->` on a nil step, by `match*` on no match, and by `channel/recv` on a closed empty channel. `null?`/`nil?` test for it; distinct from `#f` though both are non-truthy. **Pair** — `(pair? x)` is `#t` for a non-empty list (a Scheme-compatibility predicate); the underlying cons-cell pair holds a head (`car`) and tail (`cdr`). See *Cons pair*. **Predicate** — a function (conventionally `?`-suffixed) returning a boolean, e.g. `null?`, `list?`, `even?`, `agent?`. The docs separate overlapping ones precisely: `null?` (empty list OR nil), `nil?` (only nil), `empty?` (any empty collection/string/nil). **Prefix notation** — the convention where the operator or function comes first in a list, followed by its arguments, e.g. `(+ 1 2)` instead of `1 + 2`. Also called Polish notation. **Quasiquote** — a templating form (backtick `` ` ``) returning a structure mostly unevaluated but allowing selective evaluation via unquote and unquote-splicing. Essential for macros; auto-gensym (`foo#`) only has its uniqueness magic inside a quasiquote template. **Quote** — a special form returning its argument unevaluated, turning code into data; reader shorthand `'x` desugars to `(quote x)`. `'(+ 1 2)` yields the list, `'foo` the symbol. **Recursion** — a function calling itself (or mutually) for repetitive work — the standard looping mechanism. Tail recursion enables TCO (see *Tail-call optimization*); infinite recursion triggers a max-eval-depth error. **S-expression** — the uniform parenthesized syntax for both code and data: an expression is either a single value (atom) or a parenthesized list of expressions. Foundational Lisp concept; Sema's pitch is that even LLM prompts are ordinary s-expression data. Also "sexp", "symbolic expression". **Special form** — a construct built into the evaluator that controls evaluation order and cannot be redefined (`define`, `if`, `quote`, `lambda`, `let`, `cond`, `try`, `import`, `async`, …). Unlike functions, special forms may evaluate their arguments selectively or not at all. Sema has ~40 surface special forms; dispatch compares pre-cached `Spur` constants. Some that can't compile to pure bytecode are delegated to `__vm-*` globals (see *Runtime-delegated form*). **Symbol** — a bare identifier used as a variable name and as quoted data, e.g. `foo`, `my-var`, `+`. Symbols evaluate to their bound value unless quoted; `(type 'foo)` is `:symbol`. Interned to a `Spur`; `gensym`/auto-gensym produce fresh ones. **Thunk** — a zero-argument function used to defer execution. It is the unit of scoped behavior for `with-*` combinators (`llm/with-cache`, `llm/with-budget`, `llm/with-fallback`, `retry`, `context/with`), the body of an async task (`async/spawn`), and the wrapped body of a lazy `delay`. In notebooks, thunks are opaque values that cannot be round-tripped and must be re-evaluated on reload. **Threading macro** — pipeline macros that thread a value through a sequence of forms: `->` (thread-first, inserts as first arg), `->>` (thread-last), `as->` (bind to a name for arbitrary placement), `some->` (nil-safe thread-first that short-circuits on nil). Auto-loaded; the formatter indents each step. **Truthiness** — the rule determining which values count as true in conditionals. `and` returns the last truthy value or `#f`; `or` returns the first truthy value; `while`/`when` loop/run on truthy conditions. Only `#f` (and `nil`) are non-truthy. **Try / catch / throw** — error-handling forms: `try` evaluates a body, `catch` binds any raised error (a structured map with `:type`, `:message`, `:stack-trace`) for handling, and `throw` raises any value. `catch` catches ALL error types (including internal `:unbound`, `:arity`, `:permission-denied`), so re-throw what you don't handle; `throw`-ed values appear under `:user`. **Unquote / unquote-splicing** — inside a quasiquote, unquote (`,expr`) evaluates `expr` and inserts its value; unquote-splicing (`,@expr`) evaluates a list and splices each element into the template. E.g. `` `(a ,@(list 1 2 3) b) `` yields `(a 1 2 3 b)`. **Variadic / rest parameters** — functions accepting a variable number of arguments, captured via dot notation (`x . rest`) in lambda params or `&` in destructuring patterns. ## Reader, Compiler & VM Internals **.semac** — Sema's compiled bytecode file format: a 24-byte header (magic `\x00SEM` + format version + flags + Sema version) followed by length-prefixed sections (string table, function table, main chunk, optional debug sections). Versioned (currently 4); the loader requires an exact version match. Produced by `sema compile`, consumed by run/disasm/build. A build artifact tied to the producing Sema version, not a portable interchange format. Auto-detected via the null-byte magic. **AST** — abstract syntax tree. Sema has no dedicated AST type: the parser produces ordinary `Value` nodes, so the same `Value` type that exists at runtime represents parsed code (the "code is data" tradition). `sema ast` prints it; macros expand `Value` AST into more `Value` AST, which lowering converts into `CoreExpr`. **Bytecode VM** — Sema's stack-based bytecode virtual machine, the sole evaluator and default backend (since v1.13). Source compiles to bytecode through four passes (Lower → Optimize → Resolve → Compile) and runs on the VM; `.semac` files store the compiled bytecode. **Call frame** — (1) *VM CallFrame*: the per-call VM record holding the active closure, program counter, stack-base offset, open-upvalue cells, and cache base; pushed on call, reused on tail call, popped on return. (2) *EvalContext CallFrame*: a separate record used to build error stack traces. The DAP debugger renders VM frames with names, line numbers, and source paths. **Callback architecture** — Sema's dependency-inversion design where `sema-stdlib`/`sema-llm` (which depend on `sema-core`, not `sema-eval`) invoke the real evaluator through function-pointer callbacks (`call_callback`/`eval_callback`) registered by `sema-eval` at startup. Solves the circular-dependency problem so higher-order functions and LLM tool handlers run the single canonical evaluator. Replaced the removed *mini-eval*. **Chunk** — the unit of compiled bytecode: raw code bytes plus its constant pool, source spans, max stack depth, local count, inline-cache slot count, and exception table. The main program and each function each compile to a `Chunk`. Not to be confused with an LLM streaming chunk (see *Chunk* under LLM & GenAI) or `list/chunk`/`text/chunk`. **Closure** — at the VM level, a function value paired with the captured upvalues it references, created by `MakeClosure` from a compiled `Function` template plus upvalue descriptors. VM closures are wrapped as `Value::NativeFn` (carrying a `VmClosurePayload`) so non-VM code can call them: in-VM calls run in the same VM, calls from outside spin up a fresh VM (the "fallback path"). See also *Closure* under Lisp & Language Core. **Constant pool** — the per-chunk table of literal values that `Const` opcodes index into. In `.semac` each entry is a serialized type-tag-plus-payload `SerializedValue`; runtime-only types (Lambda, NativeFn, Prompt, Channel, Agent, ToolDef, Thunk, Record) must never appear here. Nesting depth is capped at 128 (`MAX_VALUE_DEPTH`). **Copy-on-write (COW)** — an optimization where a shared `Rc`-wrapped collection is mutated in place when its refcount is 1 (via `Rc::try_unwrap`/`Rc::make_mut`) and cloned only when actually shared, so callers never observe an aliased mutation. Used by `bytevector/set!`, typed-array `set!`, and BTreeMap updates; `Env::take` exists to drop refcounts to 1 first. ~30% of the 1BRC speedup. Sema chose COW over persistent collections. **CoreExpr** — the desugared intermediate representation produced by lowering, with variables still represented as names. The Optimize pass runs on it before resolution; paired with `ResolvedExpr` so the compiler can only receive resolved expressions. **Cross-compilation** — `sema build --target ` producing executables for other platforms by downloading and caching a runtime binary for the target (verified against a published SHA256), then doing magic-byte-detected, format-aware injection. `libsui` does Mach-O ad-hoc signing in pure Rust, so macOS ARM64 binaries can be built from Linux. `SEMA_RUNTIME_BASE_URL` overrides the download location. **Debug hook** — VM instrumentation points (`debug.rs`, `execute_debug`) the DAP server uses: on every instruction step the hook checks for a hit breakpoint, a completed step, or a requested pause, updating `DebugState` and notifying the frontend. Source line numbers map to bytecode instructions for breakpoint verification. **Disassembly** — human-readable rendering of a chunk's bytecode (`disasm.rs` / `sema disasm`), showing each instruction's offset, opcode mnemonic, and operands (e.g. `0000 CONST 0 ; 3`). Exposed via the CLI (optional `--json`) and the MCP `disasm` tool. **Dispatch loop** — the VM's central loop that reads one opcode at a time and executes it — "the literal heart of every bytecode interpreter." Sema's is a two-level loop: an outer loop caches frame-local state (code/constants/base pointers) and an inner loop dispatches without re-fetching frame data, reloading only when control flow changes frames. PGO lays out the `match op` hot blocks by measured opcode frequency. **Emitter** — the bytecode builder (`emit.rs`) wrapped by the Compiler; it writes opcodes/operands and handles jump backpatching (filling in jump offsets once branch lengths are known). **Env** — Sema's runtime environment: a chain of scopes, each an `Rc>>` plus an optional parent and a `version` counter, with lookup walking the parent chain (lexical scoping). In the real VM most variable access is resolved to integer slots/upvalues at compile time and the Env is consulted mainly for globals; the `version` counter drives inline-cache invalidation. WASM `eval` uses a non-persistent child env vs `evalGlobal`'s persistent global env; a notebook's "shared cell environment" is one persistent Env across cells. **EvalContext** — an explicit struct (`sema-core/context.rs`) holding all per-interpreter evaluator state — module cache, call stack, span table, depth counters, sandbox, eval/call callbacks, eval deadline — threaded through evaluation as `ctx: &EvalContext`. Each `Interpreter` owns one, so multiple isolated interpreters can run on one thread. A shared thread-local `STDLIB_CTX` serves stdlib callbacks that don't receive a ctx parameter. **Exception table** — a per-chunk table of entries (`try_start`, `try_end`, `handler_pc`, `stack_depth`, `catch_slot`) implementing `try`/`catch`. The `Throw` opcode searches it for a matching handler, restores the stack to the saved depth, pushes the error value, and jumps to the handler — no inline branching opcodes. **F-string** — an interpolating string literal `f"...${expr}..."` that the reader desugars into a `(str "literal" expr …)` call, parsing each `${...}` interpolation recursively. `\$` suppresses interpolation. Distinct from `prompt/template`'s Mustache-style `{{key}}` slots. **Format version** — the `.semac` binary-format version field (currently 4) in the 24-byte header; the loader requires an exact match and otherwise rejects ("Recompile from source"). Distinct from the recorded compiler version. v2 added inline-cache operands, v3 added upvalue names, v4 added `local_scopes`. **Function table** — the required `.semac` section (0x02) of compiled function templates (name, arity, `has_rest`, upvalue descriptors and names, the function's chunk, debug metadata). `MakeClosure` references entries by `func_id`. Empty for programs with no inner lambdas; distinct from the runtime native-function table. **Function template** — a compiled `Function` (`chunk.rs`) describing a lambda — its chunk, arity, rest flag, and upvalue descriptors — collected by the Compiler and stored in the `.semac` function table. `MakeClosure` instantiates a `Closure` from a template plus captured upvalues; one template can produce many closures. **Fused CallGlobal** — an opcode combining `LoadGlobal` + `Call` into one instruction for non-tail calls to global functions, carrying `(u32 spur, u16 argc, u16 cache_slot)` operands and inline-cached via `cache_slot`; sets up the frame without the function value on the stack. **Inline cache** — a per-instruction cache for global-variable lookups: each `LoadGlobal`/`CallGlobal` carries a `u16` cache-slot operand indexing a per-VM `Vec` of `(spur, env_version, value)` tuples; a matching spur and env version skips the `Env` lookup entirely. Biggest wins on global-call-heavy workloads (higher-order-fold 2.34x); entries invalidate on `env_version` mismatch when a global is redefined. **Intrinsic** — a common builtin the compiler recognizes at a call site and compiles to a dedicated inline opcode (e.g. `+` → `AddInt`, `car` → `Car`, `length` → `Length`) instead of a global lookup plus call, eliminating the call overhead. Fires only when the call references the canonical global with matching arity and that global hasn't been redefined in the compilation unit. **Lexer** — the single-pass tokenizer (`lexer.rs`) that walks a `Vec` and emits `SpannedToken`s of 24 token types (brackets, quote forms, numbers, strings, f-strings, regex, keywords, symbols, etc.). The only place source positions enter the system; emits trivia tokens (Newline, Comment) the parser skips but the formatter and LSP use. **Lowering** — the first compiler pass (`lower.rs`): converts the `Value` AST into `CoreExpr`, a desugared IR. The ~40 surface special forms collapse to ~35 `CoreExpr` node kinds (e.g. `cond` → nested `If`, `case` → `Let` + `If`). Tail-position analysis happens here. **Macro expansion** — the step (in `sema-eval`) where `defmacro`-defined macros are expanded to more `Value` AST before compilation; expansion is performed VM-natively, and the result feeds the same Lower→Optimize→Resolve→Compile pipeline. Auto-gensym names like `x#` lex as plain symbols. **Magic number** — identifying leading bytes of a binary format. `.semac` files start with `\x00SEM` (used to auto-detect bytecode vs source, since source never starts with a null byte); bundled executables use the `SEMAEXEC` archive/trailer magic. Two distinct formats sharing the concept; also used for corruption detection. **Mini-eval** — a removed minimal evaluator once inlined in the stdlib to bypass the full trampoline (inlining `+`, `=`, `assoc`, etc.). Deleted because it caused semantic drift from the real evaluator and blocked the bytecode VM. Replaced by the callback architecture; its removal regressed the tree-walker ~3x, which the VM more than recovered. **NaN-boxing** — a technique that packs every Sema value into a single 8-byte `u64` by encoding non-float types in the unused payload bits of an IEEE 754 quiet NaN. Floats are stored as raw `f64` bits; all other types use a tag plus payload. Immediate types (nil, bool, char, small int, symbol, keyword) need no heap allocation; heap types store an `Rc` pointer in the payload. It is why typed arrays (raw `f64`/`i64`) are faster than NaN-boxing every list element. Same technique used by Janet. **NaN-boxed int fast path** — specialized opcodes (`AddInt`/`SubInt`/`MulInt`/`LtInt`/`EqInt`) that operate directly on raw NaN-boxed `u64` bits — sign-extending the 45-bit payload, doing the arithmetic, re-boxing — without ever constructing a `Value`, avoiding Clone/Drop overhead. An unchecked-overflow bug here once silently truncated large adds/subs crossing the small-int boundary (caught by the metamorphic fuzzer); the fix made `+`/`-` promote on overflow like `*`. **NativeFn** — a Rust-implemented builtin function value. Signature `(&EvalContext, &[Value]) -> Result`; `NativeFn::simple()` for context-free fns, `NativeFn::with_ctx()` for those needing the context. VM closures are also wrapped as `Value::NativeFn` so external code can call them. `CallNative` dispatches by index when `known_natives` is supplied at compile time. Also exposed to embedders via `register_fn` (Rust) / `registerFunction` (JS); a "yielding native" can suspend an async task. **Opcode** — a single-byte VM instruction code (the `Op` enum, 69 opcodes in `sema-vm`). Most are one byte; some carry inline operands (`u16`/`u32`/`i32`). Categories: constants/stack, variable access, control flow, functions, data constructors, arithmetic/comparison, inline intrinsics, exceptions. `opcodes.rs` (`Op` + `Op::from_u8`) is the single source of truth. **Optimize pass** — the compiler pass (`optimize.rs`) running on `CoreExpr` between lowering and resolution: constant folding (`(+ 1 2)` → `3`), comparison/boolean folding, control-flow simplification (`(if #t a b)` → `a`), and dead-code elimination. Why `sema compile` of `(+ 1 2)` yields a single `CONST 3`. **Parser** — the recursive-descent parser (`reader.rs`) that consumes `SpannedToken`s and produces `Value` nodes plus a `SpanMap`, dispatching on token type (LParen→list, LBracket→vector, LBrace→map). Produces `Value` directly (no intermediate AST); handles dotted pairs via a `"."` marker symbol. **Peephole optimization** — a local instruction-pattern rewrite by the compiler — notably `(if (not X) A B)` compiled to `JumpIfTrue` instead of `Not` + `JumpIfFalse`, eliminating one instruction and the `not` call. **PGO** — Profile-Guided Optimization: the distributed binaries are instrumented, trained on the benchmark suite plus a 1BRC sample, the profile merged with `llvm-profdata`, and the binary rebuilt so LLVM lays out the dispatch loop's hot blocks by measured opcode frequency. Applied to cargo-dist releases and the Homebrew bottle (v1.19.2+), ~26–39% wins; a PGO failure ships fat-LTO instead. `cargo install` gets LTO but not PGO. **Pop\_unchecked** — the VM's unsafe unchecked stack-pop used on the hot dispatch path for speed. Sound only because in-process bytecode is balanced by construction and deserialized bytecode is proven balanced by the verifier; debug builds retain bounds checks via `debug_assert!`. Part of the VM's unsafe optimizations alongside raw-pointer bytecode reads. **Prelude** — Sema source bundled in `sema-eval` (`prelude.rs`) and evaluated at interpreter startup to define library macros and functions (threading macros, `when-let`, and friends), expanded VM-natively through the same bytecode pipeline as user code. Distinct from a user file preloaded on the command line with `-l`/`--load`. **Program counter (pc)** — the index of the current instruction in a chunk's bytecode. A jump simply sets `pc` to a different value. Used throughout: jump offsets are relative `pc` deltas, source maps map `pc`→line, exception tables specify `pc` ranges, breakpoints resolve source lines to `pc`s. **Quote desugaring** — the reader's rewriting of quote syntax into real lists before evaluation: `'x` → `(quote x)`, `` `x `` → `(quasiquote x)`, `,x` → `(unquote x)`, `,@x` → `(unquote-splicing x)`. The syntax is reader-level; the semantics are evaluator-level. Sema has no user-extensible reader macros/readtables. **Rc reference counting** — Sema's single-threaded memory model: every `Value` is `Rc` (non-atomic reference counting), giving deterministic destruction with no garbage collector. `Rc` (not `Arc`) avoids atomic increments and makes `Value`s non-`Send`/`Sync`. Cannot collect cycles, but Lisp closures tend to be tree-shaped; a future tracing GC is the named next runtime step. **Reader** — Sema's front end (`sema-reader`): a two-phase pipeline where a lexer tokenizes source into `SpannedToken`s and a recursive-descent parser produces `Value` nodes directly — no separate AST type, since code is data. Quote sugar, f-strings, regex literals, short lambdas, and dotted pairs are desugared here. Reader errors are syntax/parse errors (`:reader`). Also `(read "...")` parses a string into a `Value`. **Regex literal** — a raw-string literal `#"..."` whose contents are taken verbatim with no escape processing (only `\"` is special), letting you avoid double-escaping. The reader desugars it to a plain string `Value`; backed by the Rust `regex` engine (linear-time, no lookaround/backreferences). **Resolution** — the compiler pass (`resolve.rs`) that walks `CoreExpr` and classifies every variable reference as `Local{slot}`, `Upvalue{index}`, or `Global{spur}`, producing `ResolvedExpr` — replacing runtime hash-based env lookup with direct slot indexing. Also marks captured locals and emits `UpvalueDesc`s. Described as "most of the gap between a teaching interpreter and a fast one." **Runtime-delegated form** — a special form the compiler cannot lower to pure bytecode (`eval`, `import`, `load`, `defmacro`, `define-record-type`, `delay`/`force`, `prompt`/`message`/`deftool`/`defagent`, `macroexpand`), so it is compiled as a call to a corresponding `__vm-*` global function registered by `sema-eval`. **SemaError** — Sema's `thiserror`-derived error enum (12 variants incl. Reader, Eval, Type, Arity, Unbound, Llm, UserException, plus `WithTrace`/`WithContext` wrappers), constructed via helper methods (`eval`, `type_error`, `arity`), never raw variants. Stack traces are attached lazily during propagation (`WithTrace`), so caught errors don't pay the trace cost. Surfaced to Sema code as a structured error map with `:type`, `:message`, `:stack-trace` (see *Try / catch / throw*). **Short lambda** — a terse anonymous-function literal `#(...)` whose body is scanned for positional placeholders `%`, `%1`, `%2`…; bare `%` rewrites to `%1`, producing `(lambda (%1 … %N) body)`. Clojure-style; read/desugared by the reader. E.g. `#(* % %)` squares its argument. **Slot** — a fixed integer index into the current function's stack frame where a local variable lives. The Resolve pass replaces variable names with slots so a runtime read is an array index, not a hash lookup. Slots 0–3 have dedicated zero-operand opcodes (`LoadLocal0..3`). Contrast with upvalue indices and global Spurs. **Source map** — an as-yet-unimplemented `.semac` debug section (0x10) linking bytecode PCs back to source file/line/column via delta-encoded LEB128 entries, to enable file/line error messages when running compiled bytecode. At runtime, in-process source positions come from the `EvalContext` span table instead. **Span (source)** — a source-location range (line, col, end\_line, end\_col) recorded per token by the lexer and attached to compound values for error reporting. Stored in a side table (`SpanMap`) keyed by `Rc`-pointer address rather than inside the NaN-boxed `Value`, so `Value` stays 8 bytes; only list/vector values get spans (atoms don't). Distinct from a tracing/telemetry span — see *Span* under Observability. **Spur** — a `u32` interned-string handle from the `lasso::Rodeo` interner. Symbols, keywords, and global variable names are stored as Spurs so equality and env lookups are O(1) integer comparisons. Process-local (per-thread) and not stable across processes, which is why `.semac` files remap them via a string table. `intern(s)` interns; `resolve(spur)` maps back. **Stack-depth verifier** — an abstract-interpretation pass (ADR #56) inside `validate_bytecode` that proves a deserialized chunk's operand stack never underflows or exceeds its declared maximum, making the VM's unchecked `pop_unchecked` sound for untrusted `.semac` files. Uses a worklist over reachable instructions with a strict-equality lattice at join points; `Op::stack_effect()` is the shared source of truth. Sound but conservative. **Stack machine** — a VM design with a single operand stack: operands are pushed and operators pop them, evaluating nested expressions without runtime recursion. Sema's value stack is a contiguous `Vec` of NaN-boxed `Value`s (good cache locality); the compiler emits operands before operators. Sema, CPython, and the JVM all use this model. **String interning** — replacing repeated strings (symbol/keyword names) with shared integer handles (Spurs) in a global table, so identity checks become integer comparisons. In Sema done via an explicit `intern()` into a thread-local `lasso` Rodeo; goes back to McCarthy's LISP 1.5 "object list" (oblist). **String table** — the required `.semac` section (0x01) holding every unique string the bytecode references (symbol/keyword names, string constants, paths). On load each is interned to a fresh Spur and a remap table maps file-local indices to process-local Spurs. String index 0 is reserved and must be the empty string. This is how Sema makes process-local Spurs portable into a file. **Tail-call optimization (TCO)** — reusing the current call frame for a call in tail position (the function's last action) so deep/recursive calls don't grow the native stack. The compiler tags tail-position calls during the Lower pass (the `Call` node carries `tail: bool`) and emits `TailCall`, which reuses the frame. Tail positions include the last body expression, if-branches, cond clauses, and the last `and`/`or` operand. Named `let`, `do`, and tail-recursive functions all rely on it. **Trampoline** — an evaluation technique where a step returns either a final value or an instruction to continue evaluating another expression, looped without growing the native stack — used to implement TCO in the now-retired tree-walker. Distinct from CPS-style "Cheney on the MTA" trampolining cited (in the Lisp comparison) for Chicken Scheme; the VM does TCO via `TailCall` instead. **Tree-walker** — the original recursive AST-interpreting evaluator (now retired). It evaluated `Value` AST directly via the trampoline; the bytecode VM replaced it as the sole evaluator, yielding 2–17x speedups. Docs keep its benchmark numbers for comparison. **Two-level dispatch loop** — see *Dispatch loop*. **Upvalue** — a variable captured by a closure from an enclosing function. Sema uses the Lua/Steel "open upvalue" model: an `UpvalueCell` is `Open { frame_base, slot }` (pointing into the live VM stack) while the defining frame is alive, then `Closed(Value)` once it exits. Resolved at compile time: `UpvalueDesc::ParentLocal(slot)` captures from the immediate parent, `ParentUpvalue(index)` through an intermediate. Known limitation: `set!` to a captured local is lost when the closure runs through a stdlib HOF. **Value** — Sema's single universal data type: a `#[repr(transparent)] struct Value(u64)` that NaN-boxes every Sema datum (numbers, lists, maps, lambdas, LLM types, …), pattern-matched via `val.view()` returning a `ValueView` enum. It is both the runtime value and the parsed-code representation (code is data). Defined in `sema-core`; not `Send`/`Sync` (uses `Rc`). **Vector-backed list** — Sema's representation of `Value::List` as `Rc>` (contiguous array) rather than linked cons cells, giving O(1) `nth`/`length` and cache-friendly iteration at the cost of O(n) cons/append (`car` is `v[0]`, `cdr` a slice copy). A deliberate departure from traditional Lisp; contrast with Clojure's persistent vectors. **VFS (Virtual File System)** — an in-memory file archive. Meanings: (1) a thread-local archive in `sema-core` of compiled bytecode plus bundled assets embedded into a standalone executable by `sema build` — file/import ops check it first, then fall back to the real filesystem; (2) the WASM/browser in-memory filesystem replacing real disk (quotas: 1 MB/file, 16 MB total, 256 files; pluggable persistence backends); (3) the notebook server's sandboxed file API over HTTP, scoped to the notebook's directory. Writes always target the real filesystem in case (1). ## LLM & GenAI **Agent** — a bundle of a system prompt, tools, model, and turn limit (`defagent`) that runs a multi-turn loop, automatically handling the back-and-forth of tool calls until a final answer or `:max-turns`; run with `agent/run`. A first-class `Value` type (predicate `agent?`). Meanings to disambiguate: (1) the Sema LLM agent (this entry); (2) in telemetry, every `agent/run` emits an `invoke_agent` span (typed `AGENT`/`agent`/`chain` in compat tools). **Auto-configuration** — Sema's startup behavior of detecting available providers from environment variables (API keys) and configuring them with no manual setup; triggerable with `llm/auto-configure`, skippable with `--no-llm`. Embedding providers are auto-configured separately from chat providers. **Automatic retry** — built-in, config-free retrying of transient LLM failures (HTTP 429, 5xx, network/timeout) with capped exponential backoff and full jitter (base 500 ms, cap 30 s, up to 3 retries), honoring a 429 `retry-after` hint. 4xx-non-429 and parse errors fail fast. Distinct from `llm/with-fallback` (switches providers) and the stdlib generic `retry`; each retry emits an `llm.retry_attempt` span. **Batch** — `llm/batch` sends multiple prompts concurrently and collects all results; `llm/pmap` maps a function over items and sends the resulting prompts in parallel. Distinct from the OTel batch span processor (telemetry export). **Budget** — a spending limit enforced on LLM calls: a cost cap in dollars (`llm/set-budget`, `:max-cost-usd`) and/or a token cap (`:max-tokens`); calls that would exceed it fail. Scoped form `llm/with-budget`. Best-effort (warn-once) when model pricing is unknown; state is thread-local. Disambiguate: this spend budget vs. the Anthropic *thinking budget* (`budget_tokens`, see *Reasoning effort*) vs. the `EvalContext` `eval_deadline` (a wall-clock time budget). **Cache hit** — when an LLM call is served from Sema's response cache instead of the provider. A cache hit makes no provider call, so it reports zero usage (must not recharge cost or burn budget) and is flagged in telemetry with `sema.gen_ai.cache.hit`. Consequently token metrics undercount real spend when caching is in play. **Cache key** — the SHA-256 hash identifying a cached response (`llm/cache-key`), computed from a prompt and options; the response cache is keyed on prompt + model + temperature. Distinct from provider prompt-cache keys. **Chat** — sending a list of messages (system/user/assistant) to an LLM and getting a reply, via `llm/chat`. Disambiguate: (1) the Sema operation `llm/chat`; (2) a provider-capability column in the support table; (3) the auto-generated OTel span `chat {model}` emitted for every non-streaming completion (typed `LLM`/`generation`/`task`/`llm` in compat tools). **Chunk** — (1) *streaming chunk*: an incremental piece of a streamed LLM response, passed to the stream callback as it arrives (for Lisp-defined providers without streaming, the whole response is sent as a single chunk); (2) *text chunk*: a slice of text from `text/chunk` recursive splitting (`:size`/`:overlap`) for LLM/RAG pipelines. Unrelated: `list/chunk` (list partitioning) and the VM bytecode *Chunk*. **Classification** — assigning text to one of a fixed set of keyword labels via `llm/classify`, which returns the best-matching keyword. A constrained form of extraction. **Completion** — a single model response generated from a prompt. `llm/complete` sends one prompt and returns the generated text; the term also refers to the chat-completions API shape. In usage/cost reporting, *completion tokens* (`:completion-tokens`) are the output tokens, vs. prompt/input tokens. **Conversation** — an immutable data structure holding chat history (and an optional model); every operation returns a new conversation value. Created with `conversation/new`, advanced with `conversation/say` (which makes an LLM round-trip); `conversation/last-reply` returns the latest reply. A first-class `Value` type (predicate `conversation?`). Immutability enables `conversation/fork`. Distinct from the telemetry `gen_ai.conversation.id` (see *Conversation id*). **Cost** — the computed dollar cost of an LLM call/session, derived from token usage and pricing, reported as `:cost-usd` in usage maps and `gen_ai.usage.cost` / `gen_ai.usage.cost_usd` in telemetry. Cached reads are reported but not yet discounted. Some backends (e.g. LangSmith) recompute cost from token counts, so their number may differ. **Default model** — the model id a provider uses when no `:model` is pinned and no `:default-model` was configured (e.g. `:anthropic` → `claude-sonnet-4-6`); also what `llm/with-fallback` substitutes per provider when the body leaves the model unpinned. Set per provider, globally via `SEMA_CHAT_MODEL`, or per call. Model ids are provider-specific (a Claude id sent to OpenAI returns a 404). **Deftool** — a Sema special form defining an LLM-callable tool: a name, description, parameter schema, and handler lambda evaluated when the LLM invokes it. `ToolDef` is a first-class `Value` type. The MCP server auto-exposes `deftool` tools in filepath mode (underscore-prefixed or `:mcp/expose #f` are private). See *Tool*. **Defagent** — see *Agent*. The Sema construct (and first-class `Agent` `Value` type) for defining an LLM agent, alongside `deftool`, `llm/extract`, and conversations. **Finish reason** — the reason the model stopped generating (e.g. `end_turn`, `length`), reported as `:stop-reason` from providers (default `"end_turn"` for Lisp-defined providers) and traced as `gen_ai.response.finish_reasons` on the `chat` span. **Fork** — `conversation/fork` creates an independent copy of a conversation so you can explore divergent directions from the same point; because conversations are immutable, the original and each fork stay independent. Used to run parallel "what about X?" branches. **Fallback chain** — an ordered list of providers passed to `llm/with-fallback`; if a call fails on one provider it automatically retries on the next. Entries can be bare provider keywords or `[provider model]` / `{:provider :model}` pairs for per-provider model overrides. Each provider does its own transient-error retry first. Streaming bypasses the fallback chain. **First-class LLM types** — Prompt, Message, Conversation, ToolDef, and Agent are distinct NaN-boxed `Value` types (not maps-with-conventions), with their own constructors, pattern matching, and display forms. Constructed at runtime via `__vm-prompt`/`__vm-message`/etc., so they are runtime-only and can't be serialized into a `.semac` constant pool. Sema's primary differentiator vs other Lisps. **LlmProvider** — the single Rust trait all LLM backends implement (`name`, `complete`, `default_model`, plus optional `stream_complete`/`batch_complete`/`embed`), registered in a `ProviderRegistry`. The trait is `Send`+`Sync` (providers use tokio `block_on` internally) even though the runtime is single-threaded. Concrete providers: Anthropic, OpenAI, Gemini, Ollama. See *Provider*. **Lisp-defined provider** — a provider implemented entirely in Sema via `llm/define-provider`, whose `:complete` function receives a request map and returns a string or response map. Enables echo/mock/proxy/routing providers and deterministic testing. Streaming falls back to a single chunk. Integrates with `llm/set-default`, `llm/list-providers`, etc. like any other provider. **Max-tokens** — disambiguate two meanings: (1) the per-response generation cap (`:max-tokens`) limiting how many tokens the model may generate; (2) the budget `:max-tokens`, a session/scoped *spend* cap counting input+output. Anthropic extended thinking raises the generation cap above the thinking budget. **Max-turns** — the upper bound on how many back-and-forth iterations an agent's tool loop may run, set in `defagent` and read with `agent/max-turns`. A turn-limit safety bound distinct from token/cost budgets; the loop also aborts after 5 consecutive tool errors. **MCP** — Model Context Protocol — Sema's `sema mcp` server (`sema-mcp`) lets LLM clients (Claude Desktop, Cursor, Claude Code) inspect/compile/format/eval/build Sema code and call user-defined `deftool` tools, over stdio JSON-RPC 2.0. Default tools: `run_file`, `compile`, `eval`, `docs`, `fmt`, `disasm`, `build`, `info`, plus stateful notebook tools. Bundled executables can embed an MCP server via `--mcp`. **Message** — a role-content pair where the role is a keyword (`:system`, `:user`, `:assistant`) and the content is text (and optionally an image), the atomic unit prompts and conversations are made of, created with `(message :role content)`. A first-class `Value` type (predicate `message?`); accessed via `message/role`/`message/content`; `message/with-image` attaches a bytevector image. In Lisp-defined providers the request `:messages` is a list of `{:role :content}` maps. **Model** — the specific LLM variant a call targets, named by a provider-specific id string (e.g. `claude-haiku-4-5-20251001`, `gpt-5.4-mini`). Selected via `:model`, a provider default, or `SEMA_CHAT_MODEL`. Ids are not portable across providers (a mismatched id returns 404). Recorded in telemetry as `gen_ai.request.model`/`gen_ai.response.model`. "Reasoning/thinking models" are a subclass supporting `:reasoning-effort`. **Multi-modal** — LLM input combining text with images, created via `message/with-image` (image as a bytevector) and consumed by vision-capable models. Media type (PNG/JPEG/GIF/WebP/PDF) is auto-detected from magic bytes. Vision support is provider/model-dependent. **OpenAI-compatible provider** — any service implementing the OpenAI chat-completions API, registered with `llm/configure` by passing `:api-key` and `:base-url` with any provider name — no custom code. Covers Together, Fireworks, Perplexity, Azure OpenAI, Groq, vLLM, LiteLLM, etc. Contrasted with native providers (bespoke serializers) and Lisp-defined providers. **On-tool-call callback** — an `agent/run` option (`:on-tool-call (fn (event) ...)`) that observes each tool call as `:start` and `:end` events during the agent loop. A runtime observability hook distinct from OTel spans/events. **Parameter schema** — the map describing a tool's expected arguments (field name → `{:type ... :description ...}`), shown to the LLM so it knows how to call the tool; retrieved with `tool/parameters`. Calling with mismatched arguments doesn't abort an agent run — the mismatch is fed back as the tool result. Distinct from the extraction schema (see *Schema*). **Pricing** — per-million-token input/output rates used to compute cost, resolved in order: custom (`llm/set-pricing`) > a bundled models.dev snapshot (2,400+ models, offline) > unknown (cost returns nil). Checked via `llm/pricing-status`. When unknown, budget enforcement degrades to best-effort. **Prompt** — a first-class, composable, immutable data structure (not a string template) built from message expressions, which can be inspected, transformed, filled with template slots, and sent to an LLM. Built with the `prompt` macro using `(system ...)`, `(user ...)`, `(assistant ...)` shorthands; introspected with `prompt?`, `prompt/messages`, `prompt/slots`. Distinct from a *system prompt* (the instruction message) and a *prompt cache* (provider-side input caching). **Prompt cache** — a provider-side cache of input tokens for a stable prompt prefix, yielding large savings when a prefix repeats; surfaced as `:cache-read-tokens` and `:cache-creation-tokens` in usage. OpenAI and Gemini 2.5+ cache implicitly; Anthropic caching is opt-in via `cache_control`. Distinct from Sema's own in-memory *response cache*. Cached reads are reported but not yet discounted in `:cost-usd`. **Prompt slot** — a `{{key}}` placeholder in a prompt's message contents, filled from a map by `prompt/fill`; `prompt/slots` returns the still-unfilled slot names as keywords. Partial fills leave unfilled slots intact. **Prompt template** — a string with `{{key}}` Mustache-style placeholders created by `prompt/template` and filled by `prompt/render` from a map; missing keys are left as-is and non-string values are stringified. Distinct from reader-level f-strings (`${...}`). **Provider** — a backend LLM service (Anthropic, OpenAI, Gemini, Ollama, Groq, xAI, Mistral, Moonshot, etc.) that Sema auto-configures from environment variables and dispatches calls to; can be native, OpenAI-compatible, embedding-only, or Lisp-defined. `--chat-provider`/`SEMA_CHAT_PROVIDER` select one; embeddings have separate providers (Jina, Voyage, Cohere). Disambiguate from: a *fallback provider chain*; the OTel *tracer provider* (the telemetry SDK object — see Observability). See also *LlmProvider*. **Proxy / gateway** — an LLM-observability tool that captures data by routing your model calls through it (sitting in front of the API) rather than receiving OTLP traces — e.g. Helicone, LiteLLM, Portkey, Pezzo. Sema's OTLP export cannot feed these; use the tool's own gateway integration. **RAG** — Retrieval-Augmented Generation: a workflow that embeds documents, stores them, retrieves semantically relevant ones, and uses them to ground LLM responses. The canonical example for embeddings + vector store. See *Embedding*, *Vector store*, *Semantic search* under Data Structures & Standard Library. **Re-ask** — on extraction validation failure, feeding the validation errors back to the LLM on the next retry so it can correct its response (`:reask?`, default true). A field validator's `:message` is surfaced in the re-ask prompt; bounded by `:retries` (default 2). **Reasoning effort** — a single portable option (`:reasoning-effort`, taking `:minimal`/`:low`/`:medium`/`:high`/`:none`/`:xhigh`) controlling how much a reasoning/thinking model deliberates before answering. Sema maps it to each provider's native control: OpenAI `reasoning_effort`, Anthropic extended thinking `budget_tokens` (the "thinking budget"), Gemini `thinkingConfig.thinkingBudget`. No-op where unsupported. Accepted by `llm/complete`, `llm/chat`, and per-run on `agent/run`. **Response cache** — Sema's in-memory, per-session cache of LLM responses keyed on prompt + model + temperature, enabled for a thunk with `llm/with-cache` (optional `:ttl`). A cache hit makes no provider call and reports zero usage by design. Inspected via `llm/cache-key`, `llm/cache-stats`, `llm/cache-clear`. Distinct from the provider-side *prompt cache*. **Role** — the speaker designation of a message, expressed as a keyword: `:system` (instructions), `:user` (human input), or `:assistant` (model reply). Used in `message`, `conversation/add-message`, and filtered via `message/role`. **Schema** — a map declaring expected fields and their types/constraints. Disambiguate: (1) *extraction schema* — fields → `{:type ...}` with `:optional`/`:validate`/`:message`, used by `llm/extract`; (2) *tool parameter schema* — a tool's callable-function shape passed to the LLM (see *Parameter schema*). Both are validated against but serve different roles. **Semantic conventions (GenAI)** — the OpenTelemetry-agreed standard attribute names for LLM telemetry (e.g. `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.response.finish_reasons`) that Sema emits so backends understand the data without per-tool glue. Tools that use their own names need `SEMA_OTEL_COMPAT`; Sema-specific extras use the `sema.gen_ai.*` prefix. See also Observability. **Stop sequence** — a string (or list of strings, `:stop-sequences`) at which the model halts generation, passed through to providers in the request map. **Streaming** — receiving an LLM response incrementally as chunks rather than one final string, via `llm/stream` (optionally with a per-chunk callback). Streaming calls bypass automatic retry, the response cache, budget enforcement, and the fallback chain — they hit the provider directly. For Lisp-defined providers streaming falls back to a single chunk. See also *Stream* under Data Structures & Standard Library. **Structured extraction** — LLM-powered extraction of typed, schema-conforming data from unstructured text (`llm/extract`) or images (`llm/extract-from-image`), with optional validation and retry. The schema maps field names to type descriptors (`:string`/`:number`/`:boolean`/`:list`). Distinct from a tool's parameter schema. **System prompt** — the instruction/persona message (role `:system`) conditioning an LLM's behavior. Passed via the `:system` option to `llm/complete`, set with `conversation/set-system`, or built as a `(system ...)` message inside a prompt. `conversation/say-as` overrides it for a single turn; an agent carries one in its definition (`agent/system`). **Temperature** — the sampling option (0.0–1.0) controlling randomness/determinism of output. Part of the response-cache key (prompt + model + temperature). Forced to default while Anthropic extended thinking is active. OpenAI may reject it on certain models, and Sema learns to drop it (`DROP_TEMPERATURE`). **Token** — disambiguate four meanings: (1) *LLM token* — the unit LLMs measure input/output in (roughly word-pieces); Sema tracks `:prompt-tokens`/`:completion-tokens`/`:total-tokens` and estimates counts via a chars/4 heuristic (`llm/token-count`, see *Token count*); (2) a *token-bucket* rate-limiter unit in `llm/with-rate-limit`; (3) an *auth bearer token* in OTLP headers; (4) a *lexer token* in the reader (unrelated). The chars/4 LLM estimate is heuristic, not a true tokenizer count. **Token-bucket rate limiting** — a rate-limiting algorithm where requests consume tokens replenished at a fixed rate; `llm/with-rate-limit` caps LLM calls to N requests per second. The "token" here is a rate-limiter unit, not an LLM token. Wraps a thunk. **Token count (heuristic)** — Sema's tokenizer-free estimate of token usage using a chars/4 heuristic, exposed by `llm/token-count`, `llm/token-estimate`, and `conversation/token-count` (reports `:method "chars/4"`). An estimate, not a true tokenizer count; distinct from provider-reported usage tokens in `llm/last-usage`. **Tool** — disambiguate: (1) *LLM tool* — a function the LLM can invoke during a conversation, defined with `deftool` (name, description, parameter schema, handler) and passed via `:tools`; a first-class `Value` type (predicate `tool?`). (2) *observability/telemetry tool* — a backend application (Jaeger, Langfuse, Phoenix). (3) *developer tool* — an MCP/CLI tool (`sema fmt`, the MCP defaults). The OTel `execute_tool` span and `gen_ai.tool.*` attributes refer to meaning (1). See *Deftool*. **Tool call** — an instance of the LLM deciding to invoke a defined tool with arguments, which the runtime dispatches to the handler. Each produces an `execute_tool` span and a correlated tool result fed back to the model. In Lisp-defined providers represented as `:tool-calls` maps with `:id`/`:name`/`:arguments`; observed via `agent/run`'s `:on-tool-call` callback. **Tool loop** — the agent's automatic multi-turn cycle of sending messages, receiving tool calls, executing them, and feeding results back, repeated until a final answer or the turn limit; bounded by `:max-turns` and aborted after 5 consecutive tool errors. Errors (throwing tool, unknown tool, schema mismatch) are recovered in-loop by feeding the error back, not by aborting. Can be seeded with prior history via `:messages`. **Tool result** — the output of executing a tool, correlated back to its tool call and fed to the model so it can continue. On error, the error text is fed back as the result rather than aborting the run. Tool-result correlation is mandatory for OpenAI-family providers; in OpenInference compat the result lands in the tool span's `output.value`. **TTL** — time-to-live in seconds for cached responses (default 3600), passed as `{:ttl ...}` to `llm/with-cache`. **Usage** — the token-accounting record for LLM calls — prompt/completion/total tokens plus cache tokens, model, and cost — returned by `llm/last-usage` (most recent) and `llm/session-usage` (cumulative, `SESSION_USAGE`). A cache hit reports zero usage. Exported as the `gen_ai.client.token.usage` metric histogram. State is thread-local. **Validation** — checking an extracted result against its schema (required keys present, types match, optional per-field predicates via `:validate`) before accepting it; on failure a re-ask retry is triggered. `:validate` may be a boolean (on/off) or a per-field predicate `#(...)`. ## Observability (OpenTelemetry) **Batch span processor** — the background mechanism that queues finished spans and exports them in batches so telemetry never blocks the program; tuned by `OTEL_BSP_MAX_QUEUE_SIZE`, `OTEL_BSP_MAX_EXPORT_BATCH_SIZE`, `OTEL_BSP_SCHEDULE_DELAY` ("BSP"). Network export batches; file export is synchronous. **Code lens** — an LSP feature: every top-level expression shows a ▶ Run lens that, when clicked, evaluates all forms up to and including it in a sandboxed `sema eval` subprocess and reports value/stdout/stderr/timing via the custom `sema/evalResult` notification. Subprocess execution keeps the LSP backend thread free. **Compatibility mode** — a `SEMA_OTEL_COMPAT` setting (`openinference`, `langfuse`, `traceloop`, `langsmith`, `braintrust`, or `all`) that makes Sema write extra, tool-specific attribute names alongside the standard `gen_ai.*` ones so backends keying off their own names read the data. Purely additive and self-healing (unknown modes ignored). Distinct from the OpenAI request-compat "mode" (drop-temperature etc.). **Content capture** — opt-in recording of actual prompt/response text (and tool args/results) into spans, enabled by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true` (alias `SEMA_OTEL_CAPTURE_CONTENT`); off by default for privacy. When off, only token counts, model names, cost, timing, and span types are recorded; when on, long messages are truncated. Required before compat "content" attributes appear. **Conversation id** — a telemetry identifier (`gen_ai.conversation.id`) on every span, generated per run or supplied via `:conversation-id`, tying together the spans of one logical conversation. Distinct from the Sema `conversation` value type; session id defaults to it. **DAP** — Debug Adapter Protocol — Sema's `sema dap` server (`sema-dap`) enables step debugging over stdio JSON-RPC: breakpoints (line/conditional/exception), stepping, call-stack and variable/scope inspection, and evaluate-while-paused. Operates on the bytecode VM via debug hooks; an async Tokio frontend bridges to a backend OS thread running VM bytecode. **Event** — disambiguate: (1) *OTel span event* — a point-in-time annotation attached to the current span, added with `(otel/event name attrs-map)`; (2) the agent `:on-tool-call` event (`:start`/`:end`), a runtime callback, not an OTel span event. **Exporter** — the OTel component that ships spans/metrics to a destination — either over the network to an OTLP endpoint or to a local JSONL file (`SEMA_OTEL_FILE`). Configured via `OTEL_EXPORTER_OTLP_*`. The file exporter writes one JSON object per line synchronously; the network exporter batches in the background. **LSP** — Language Server Protocol — Sema's `sema lsp` server (`sema-lsp`, on tower-lsp) provides IDE features (diagnostics, completion, hover, go-to-definition, references, rename, semantic tokens, signature help, code lenses, formatting) over stdio JSON-RPC. A single-threaded backend thread owns all `Rc`-based state behind tokio mpsc/oneshot channels. Runs parse diagnostics (errors) and compile diagnostics (warnings via the bytecode pipeline). **Metric histogram** — a distribution-tracking metric instrument; Sema records two standard GenAI histograms over a network endpoint: `gen_ai.client.token.usage` (token counts, with a `gen_ai.token.type` dimension of `input`/`output`) and `gen_ai.client.operation.duration` (call latency in seconds). Cache hits report zero usage, so the token histogram undercounts under caching. **OpenInference** — the OpenTelemetry attribute convention used by Arize Phoenix/AX and FutureAGI; Sema's `openinference` compat mode adds OpenInference span types and attributes (model/provider, tokens, cost, message I/O, tool args + schemas). Aliases `phoenix`, `arize`. Has no separate tool-result field — the result lands in the tool span's `output.value`. **OpenTelemetry (OTel)** — an open, vendor-neutral standard for traces and metrics that Sema implements to record LLM/agent activity automatically, toggled by environment variables and exportable to any compatible backend. Off by default and zero-cost when off; telemetry is sent in the background so a slow backend never blocks the script. Sema never installs a global tracer provider when embedded as a library unless told to. **OTLP** — the OpenTelemetry network protocol for shipping traces/metrics; Sema speaks OTLP so it works with any backend that accepts it (over HTTP or gRPC). Configured via `OTEL_EXPORTER_OTLP_ENDPOINT`/`_PROTOCOL`/`_HEADERS`/`_TIMEOUT`; default protocol `http/protobuf`. Tools that only ingest via their own SDK or proxy can't receive an OTLP push. **Semantic tokens** — an LSP feature providing token-level classification for richer editor syntax highlighting, among Sema's LSP backend capabilities (completions, hover, folding ranges, inlay hints, etc.). For the GenAI attribute conventions, see *Semantic conventions (GenAI)* under LLM & GenAI. **Session** — a grouping of multi-turn runs sharing a `session.id` (emitted alongside `gen_ai.conversation.id`) so the turns of one conversation appear together in tools that group by session (e.g. Langfuse), supplied via `:session-id` (defaults to the conversation id). Disambiguate from Sema's runtime "session" (the process lifetime for `session-usage` and the per-session response cache). **Span** — disambiguate: (1) *trace span* (this section) — an individual timed operation within a trace (a single LLM call or tool execution); spans nest to form the trace tree, each carrying a kind (CLIENT/INTERNAL), a name, and attributes. Sema names: `chat`, `embeddings`, `execute_tool`, `invoke_agent`, `notebook.cell`, `llm.retry_attempt`; users add spans with `otel/span`. (2) *source span* — a byte/line-column range in source the reader records (see *Span (source)* under Reader, Compiler & VM Internals). **Span kind** — the OTel category of a span: `CLIENT` (an outbound call like `chat`/`embeddings`) or `INTERNAL` (in-process work like `execute_tool`/`invoke_agent`/retries). Separate from the compat *span type*. **Span type** — a tool-specific label for a span added by a compatibility mode — e.g. Sema's `chat` span is typed `LLM` (OpenInference), `generation` (Langfuse), `task` (Traceloop), or `llm` (LangSmith). Distinct from OTel span *kind*; only written when a `SEMA_OTEL_COMPAT` mode is set. **Trace** — one complete run, made of nested spans; an agent run appears as a tree (`invoke_agent → chat → execute_tool`). Grouped by `gen_ai.conversation.id`; multi-turn runs can be threaded into sessions. **Tracer provider** — the OpenTelemetry SDK object that owns and emits spans. When embedded as a Rust library, Sema never installs a global tracer provider on its own; the host chooses behavior via `InterpreterBuilder::with_telemetry(TelemetryMode::...)` (`Off`, `UseHostGlobal`, `OwnProvider(p)`, `FromEnv`). This is the OTel meaning of "provider", not an LLM provider. ## Data Structures & Standard Library **ANSI escape sequence** — control codes (e.g. `ESC[1;31m`) terminals interpret to style text (color, bold). `term/*` functions wrap strings in them and reset afterward; `term/strip` removes them; `term/rgb` uses 24-bit true color. In WASM all `term/*` return unstyled text. **Baud rate** — the serial-port signaling speed (bits per second, e.g. 115200, 9600) passed to `serial/open` along with the device path and an optional read timeout. The serial module wraps the cross-platform `serialport` crate; unavailable in WASM, gated by the `serial` capability. **Byte-buffer** — an in-memory read/write stream (`stream/byte-buffer`) where writes append and reads consume from the current position; contents extracted with `stream/to-bytes` or `stream/to-string`. For building strings/byte sequences incrementally without touching disk. **Bytevector** — a packed array of unsigned 8-bit integers (0–255) for binary data and string encoding, with literal syntax `#u8(...)`. Supports indexed `ref`/`set!` (copy-on-write), `copy`, `append`, and UTF-8 conversion (`utf8/to-string`, `string/to-utf8`). Used for binary file I/O, base64, stream reads/writes, SQLite BLOBs, embeddings (little-endian `f64`), and multi-modal image inputs. A serializable constant-pool type (tag 0x0C). **Capture group** — a parenthesized regex sub-pattern whose matched text is captured; `regex/match` returns them in `:groups`, and `$1`/`$2` (or `$name` for `(?P...)`) reference them in replacements. `regex/match` returns a map with `:match`, `:groups`, `:start`, `:end` (byte offsets); non-capturing groups use `(?:...)`. **Codepoint** — a single Unicode scalar value (an integer). `string/codepoints` returns the list of codepoints in a string and `string/from-codepoints` rebuilds one, revealing that one displayed glyph (e.g. an emoji family) can be several codepoints joined by a Zero Width Joiner (U+200D). Contrast `string/length` (characters) with `string/byte-length` (UTF-8 bytes). **Context** — disambiguate: (1) *ambient context* (`context/*`) — a thread-flowing key-value store for tracing/metadata that auto-appends to log output, with scoped overrides (`context/with` pushes a temporary frame), ordered stacks, and hidden values (invisible to `get`/`all`/logs, for secrets); inspired by Laravel's Context. (2) the VM-internal *EvalContext* (see Reader, Compiler & VM Internals), not user-facing. **Cosine similarity** — a similarity measure between two vectors based on the cosine of the angle between them, returning a value in \[-1.0, 1.0]; used to compare embeddings via `llm/similarity`, `vector/cosine-similarity`, and `f64-array/dot` (dot product over magnitudes), and to rank vector-store results. Accepts both bytevectors (fast path) and lists of floats. **Document** — a structured value (`document/create`) pairing `:text` with a `:metadata` map, designed for chunking and vector stores; `document/chunk` splits it while preserving and extending metadata (`:chunk-index`/`:total-chunks`). An LLM/RAG building block; distinct from PDF files processed by `pdf/*`. **Embedding** — disambiguate: (1) *GenAI embedding* — a dense numeric vector representation of text (`llm/embed`), stored as a bytevector of little-endian `f64` values (or an `f64-array`) for memory efficiency and fast similarity math; accessed with `embedding/length`, `embedding/ref`, `embedding/->list`, `embedding/list->embedding`; auto-configured from `JINA_API_KEY`/`VOYAGE_API_KEY`/`COHERE_API_KEY`/OpenAI; traced as the `embeddings` span. (2) *hosting embedding* — using Sema as a scripting engine inside a Rust or JavaScript host (the `embedding.md`/`embedding-js.md` pages). Same English word, two domains. **EOF (end of file)** — the end-of-input condition. Stdin reads (`io/read-line` etc.) return nil at EOF, and `io/eof?` reports it; stream reads return fewer bytes or nil. Distinguishes an empty line (`""`) from exhausted input (nil). (1.14.0 changed `io/read-line` to return nil, not `""`, on EOF.) **Euclidean distance** — the straight-line distance between two vectors, computed by `vector/distance` on embedding bytevectors. Contrasted with cosine similarity (angle-based). **Glob pattern** — a shell-style wildcard pattern (e.g. `src/**/*.rs`, `*.txt`) passed to `file/glob` to find matching paths; `**` matches across directories, `*` within a segment. Returns a list of matching paths. Distinct from the web-server route `*` wildcard and from regex. **Handle** — a logical name or integer token referencing an opened resource in later calls — e.g. a KV store name, a SQLite database name (`db/open`), an integer serial-port handle (`serial/open`), or a spinner ID. Forms vary across modules (strings for KV/SQLite, integers for serial/spinner, opaque values for streams). Closing frees the handle; reusing a closed one errors. **Hashmap** — an unordered, hash-backed map type (hashbrown/SwissTable) for O(1) performance-critical lookups, created with `hashmap/new`. Generic operations (`get`, `assoc`, `merge`, `count`) work on it and preserve the type; `hashmap/to-map`/`map/sort-keys` convert to a sorted map. Contrast with the default sorted `Map` (BTreeMap). **KV store** — a persistent, JSON-backed key-value store (`kv/*`) for structured data across sessions, opened by a logical store name plus a file path; every `kv/set`/`kv/delete` immediately rewrites the whole backing JSON file. The file isn't created until the first write. Distinct from the in-memory ambient context store and from SQLite. **Map** — disambiguate: (1) *map data type* — a curly-braced key-value collection `{:k v}` with deterministic sorted ordering, backed by a `BTreeMap` (the default `{}` literal); chosen as default because deterministic ordering matters for equality, printing, and tests, and maps can even be keys in other maps; `map/*` functions operate on it. (2) *the `map` function* — applies a function across each element of one or more lists, returning a list. (3) the `hashmap` sibling (see *Hashmap*). **Middleware** — in the Sema web server, plain function composition: a function that takes a handler and returns a new handler, used to wrap cross-cutting behavior (logging, CORS, auth). No framework; composed by nesting or with `->`. Outermost middleware runs first. **Parameterized query** — a SQL statement with `?` placeholders whose values are bound separately (`db/exec`, `db/query`, `db/query-one`), preventing SQL injection. `db/exec-batch` runs static SQL verbatim with no binding (injection-prone for user input). Result column names become keyword keys. **Record** — a user-defined, named product type created with `define-record-type`, generating a positional constructor, a type predicate, and one accessor per field. Records are immutable, closed (fixed schema), have a distinct type tag, and are `equal?` only to same-type records with pairwise-equal fields. Not JSON/TOML-encodable (convert to a map first); no generic `get`/keyword access. `(type rec)` returns the type name as a keyword. A runtime-only `Value` type. Docs guideline: "maps at the boundary, records internally." Note: "record" is also used loosely in some examples for a data row/map. **Request map** — the map a web-server handler receives, with `:method` (keyword), `:path`, `:headers` (string keys), `:query` and `:params` (keyword keys), `:body` (raw string), and `:json` (parsed body when Content-Type is application/json). `:params` holds route path parameters; `:json` is auto-populated only for JSON content type. Counterpart to the *response map*. **Response map** — the `{:status :headers :body}` map returned by HTTP client calls and produced/consumed by the web server. `:status` is an int code, `:headers` a keyword-keyed map, `:body` a raw string. The same shape appears on both the client and server sides. **Router / route** — `http/router` builds a handler from route definitions, each a vector `[method pattern handler]`. Methods include `:get`/`:post`/…/`:any` plus special `:ws` (WebSocket upgrade) and `:static` (static directory). Routes match top-to-bottom, first match wins; `:param` captures path segments into `:params`, `*` is a wildcard catch-all; `:static` falls through on a missing file (SPA index.html catch-alls). **SSE (Server-Sent Events)** — a one-way streaming protocol the web server exposes via `http/stream`, which gives the handler a `send` callback; each `send` emits one SSE `data:` event and the stream stays open until the handler returns. Used for token-by-token streaming of LLM completions to the browser. Contrast with WebSocket (bidirectional). **Standard streams (`*stdin*` / `*stdout*` / `*stderr*`)** — three global stream values for console I/O: `*stdin*` (readable), `*stdout*` (writable), `*stderr*` (writable). Earmuffed names following Lisp convention; used with `stream/write-string`, `stream/flush`. Spinners render to `*stderr*` to avoid corrupting `*stdout*`. **Stream** — disambiguate: (1) *byte I/O stream* — a first-class, byte-oriented I/O handle providing a unified `stream/read`/`stream/write` interface across files, in-memory buffers, strings, and standard I/O (`stream/open-input`/`open-output`/`byte-buffer`/`from-string`). (2) *SSE/LLM stream* — Server-Sent Events from the web server (`http/stream`) or an LLM streaming callback delivering tokens. Streams are opaque values that can't be round-tripped (require re-eval on notebook reload). See also *Streaming* under LLM & GenAI. **Strftime directive** — a `%`-prefixed token (e.g. `%Y`, `%m`, `%d`, `%H`, `%F`, `%T`) used by `time/format` and `time/parse` to format/parse timestamps, following chrono's strftime syntax. All `time/` functions operate in UTC with no timezone conversion. **Typed array** — contiguous, unboxed numeric storage for performance-critical work: `f64-array` (64-bit float) and `i64-array` (64-bit signed int), literals `#f64(...)`/`#i64(...)`. Stores raw values in a flat `Vec` instead of NaN-boxing each element, giving cache locality and no per-element boxing; mutation is copy-on-write via `Rc::make_mut`. Provides `sum`/`dot`/`map`/`fold` in tight Rust loops; `f64-array/dot` powers embedding cosine similarity. **Unicode normalization form** — a canonical/compatibility form (`:nfc`, `:nfd`, `:nfkc`, `:nfkd`) that `string/normalize` converts a string into, controlling composed vs decomposed characters and compatibility ligatures. Related: `string/foldcase` (case folding) and the Zero Width Joiner used to compose emoji. **Unix timestamp** — Sema's representation of time: a UTC count of seconds since 1970-01-01 00:00:00 UTC, as a float with millisecond fractional precision. `time/now` returns this; `time-ms` returns integer milliseconds. Negative values are pre-1970. Distinguish seconds-based `time/*` from `sleep` (milliseconds). **Vector** — disambiguate: (1) *vector data type* — an indexed, immutable collection with square-bracket literal syntax `[1 2 3]` backed by contiguous storage, giving O(1) `nth`/`first`/`length`; distinct from a list (cons-based). Many sequence functions accept a vector but return a list; also used as destructuring/match patterns. (2) *embedding/mathematical vector* — typically an `f64-array` used for dot-product/cosine-similarity work (see *Embedding*, *Cosine similarity*), NOT the `[...]` collection type. When the docs say "vector store" they mean embedding vectors. See also *Bytevector*. **Vector store** — an in-memory (optionally disk-persisted) named store of documents with embeddings and metadata, supporting semantic search by cosine similarity. Managed via `vector-store/create`/`open`/`add`/`search`/`delete`/`count`/`save`. The backbone of RAG-style workflows; persisted as JSON with base64-encoded embeddings; search returns maps with `:id`, `:score`, `:metadata`. See *Semantic search*, *RAG*. **Semantic search** — finding documents by meaning rather than keywords: embedding a query and ranking stored embeddings by cosine similarity (top-k) in the vector store (`vector-store/search` takes a query embedding and `k`). Core of the RAG workflow. **WAL mode** — SQLite's Write-Ahead Logging journal mode, enabled by default when Sema opens a database (`db/open`), along with foreign-key enforcement. Improves concurrency of reads with writes. Backed by the `rusqlite` crate. **WebSocket** — a bidirectional connection handled via `http/websocket` / the `:ws` route; the handler receives a connection map with `:send`, `:recv` (blocks, nil on close), and `:close` functions. Used for chat/broadcast patterns. Contrast with SSE (one-way). ## Concurrency **Async / await** — `async` is a special form that spawns its body as a concurrent task on the VM scheduler, returning an async promise; `await` waits for that promise to resolve (or raises if rejected). Inside a task, `await` yields to the scheduler; at top level it runs the scheduler until resolution. Async features are VM-only (default backend since v1.13). **Async task** — a unit of cooperative concurrency: a zero-argument thunk spawned with `async/spawn` (or the `async` form) that runs on the VM's cooperative scheduler and yields at yield points, returning a promise that resolves on completion. Cooperation, not parallelism — a CPU-bound task without yield points runs to completion before others; spawn order is preserved, channel wake order is FIFO. Also called a green thread/fiber/coroutine. **Async/await implementation (VM)** — Sema's concurrency model implemented entirely in the VM: each `async`/`spawn` creates a new VM instance sharing the parent's global `Env` and function table, and a cooperative round-robin scheduler (`scheduler.rs`) runs them single-threaded until they yield. Deterministic (FIFO), so the grammar fuzzer can model order-independent async patterns. Not parallel/multithreaded. **Mutable state** — Sema has **no** Clojure-style `atom` (and no `swap!`/`reset!`). Hold mutable state in a `define`d binding and update it with `set!`. Because the runtime is single-threaded (`Rc`, not `Arc`), no atomics or locks are needed for it. **Channel** — a bounded FIFO buffer for communication and synchronization between async tasks, created with `(channel/new capacity)` (default 1, minimum 1). `channel/send` blocks (yields) when full, `channel/recv` blocks when empty and returns nil when the channel is closed and empty, `channel/close` closes it; `channel/try-recv` is non-blocking. Blocking only works inside an async task; from top level send/recv raise instead of waiting. (The web server's "channels" bridging HTTP I/O to the evaluator are a related concept at the Rust/Tokio boundary, not the Sema `channel/` API.) **Cooperative scheduler** — the VM scheduler that interleaves async tasks at yield points (channel ops, `await`, `async/sleep`) rather than preempting them, preserving spawn order among ready tasks and waking channel receivers FIFO. Single-threaded — no true parallelism. Uses a virtual clock for deterministic sleep ordering. **Promise** — disambiguate: (1) *async promise* — the result of a concurrent task (`async`/`async/spawn`), with states pending/resolved/rejected (and cancelled), operated on by `await`/`all`/`race`/`timeout`/`cancel`; (2) *lazy promise* — created by `delay` and evaluated by `force` (R7RS-style), tested by `promise?`/`promise-forced?`. The data-types table lists both as separate types; watch the `promise-forced?`/`async/forced?` overlap. **Virtual clock** — the scheduler's logical time source used by `async/sleep`: it only advances when every task is blocked, jumping to the nearest deadline, so shorter sleeps deterministically wake before longer ones. On native it real-sleeps via `thread::sleep`; in the browser it blocks a Web Worker on `Atomics.wait`, falling back to instant advancement without cross-origin isolation. Durations capped at 86,400,000 ms. **Yield point** — a place where an async task voluntarily suspends so the scheduler can run others — channel send/recv, `await`, and `async/sleep` (cancellation also takes effect here). A "yielding native" (e.g. `channel/recv`) passed directly to a higher-order function can't suspend cleanly — wrap it in a lambda; lambdas that yield resume correctly inside HOF callbacks. **Yield signal** — a thread-local flag (`sema-core/src/async_signal.rs`) the VM sets to suspend a task at an await/channel/sleep point. On yield the VM leaves a nil placeholder on the stack and advances the PC; on resume the scheduler swaps in the wake value so the call appears to have simply returned. Replaced an earlier replay-based design that corrupted side effects. Yield-aware native fns must work on both the in-VM and fresh-VM closure paths. ## Tooling & Protocols **ANSI / terminal control** — see *ANSI escape sequence* under Data Structures & Standard Library. **Bundled executable** — a self-contained binary produced by `sema build` that injects a VFS archive into the Sema runtime binary (ELF raw append, Mach-O/PE section injection via `libsui`), embedding compiled bytecode, all transitive imports, and bundled assets; it runs with no Sema install required. Injection strategy is detected from the runtime binary's magic bytes (not the build host), so cross-compilation works from any platform. The 16-byte Linux trailer (`SEMAEXEC` magic + archive size) is frozen. Contrasts with `sema compile`, whose `.semac` resolves imports from disk at runtime. **Entrypoint** — the file loaded when a package is imported — `package.sema` by default, or a custom file named in `sema.toml`'s `entrypoint` field. Resolution order: direct sub-module file → custom entrypoint → default `package.sema`. The package's short name becomes the namespace prefix. **Fat LTO** — Fat Link-Time Optimization (`lto = "fat"`): lets LLVM inline across crate boundaries so the `sema-vm` dispatch loop can inline `sema-core` value accessors it calls millions of times. ~3–9% gain at ~2x build time. Used with PGO; targets that can't PGO fall back to fat LTO. Contrasted with thin LTO. **Grammar-based fuzzer** — a fuzzer written in Sema itself (`fuzz/grammar-fuzz.sema`) that generates well-typed, closed, valid Sema programs and checks them against correctness oracles, plus crash detection. Exploits homoiconicity; every finding reproduces from one integer seed. Found two shipped bugs (a try-in-let VM crash and silent integer overflow). Distinct from the byte-level cargo-fuzz fuzzers that hammer the parser. **Interpreter** — the top-level embedding object that holds the global environment and evaluates code, built via `Interpreter::builder()` (Rust) or `new SemaInterpreter()` (JS); each instance has fully isolated state. Builder options include `with_stdlib`, `with_llm`, `with_sandbox`, `with_allowed_paths`, `with_telemetry`. Multiple interpreters can coexist on one thread without sharing module cache/call stack. **JSON envelope** — the structured JSON result emitted by `sema eval --json` (and notebook/WASM eval results): fields `ok`, `value`, `stdout`, `stderr`, `error` (message/hint/line/col), and `elapsedMs`. Designed for machine/editor/LSP consumption; the WASM `EvalResult` is a related `{value, output, error}` shape. **Metamorphic law** — a fuzzer-generated theorem whose expected value is the literal `#t`, cross-checking an operation against an independent computation (e.g. `(= (reverse L) (foldl cons-flip L))` or distributivity). Because `#t` is true by construction, a broken op makes the two sides disagree. Caught the silent integer-corruption bug by forcing large intermediate products through a 2-arg add. Sidesteps the value oracle's self-masking blind spot. **Oracle** — the judge in a fuzzer that decides whether an input revealed a bug. Sema's grammar fuzzer uses three: a printer⇄reader round-trip oracle, a differential value oracle (expected value computed bottom-up vs eval result), and metamorphic laws. The value oracle's blind spot ("self-masking") is that it computes the expected value with the very op under test; metamorphic laws avoid this. **Raw mode / cooked mode** — terminal input modes: cooked mode (default) buffers a whole line until Enter; raw mode (`io/tty-raw!`) delivers each keystroke immediately, including Ctrl-C and arrows. `io/tty-restore!` returns to cooked mode using a restore-token. Unix-only; used to build TUIs with `io/read-key`, `sys/term-size`, and signal handlers. `io/tty-raw!` returns nil if stdin isn't a TTY. **Registry** — a package registry server (default `pkg.sema-lang.com`, self-hostable) serving published Sema packages over a REST API/web UI; the alternative source is direct git repos. Registry commands (search/info/publish/yank/login) need a running instance; git packages work without one. `--registry`/`SEMA_REGISTRY_URL` override the default. **REPL** — the interactive Read-Eval-Print Loop started by running `sema`; reads an expression, evaluates it, prints the result, and loops. Supports history, tab completion, multiline input, and comma-commands (`,quit`, `,doc`, `,type`, `,time`, `,env`, `,builtins`). History saved to `~/.sema/history.txt`; warns on redefining builtins. **Sandbox / capability** — a capability sandbox stored on the `EvalContext` that restricts what a program can do by named permission gates (shell, fs-read, fs-write, network, env-read/write, process, llm, serial), surfaced via the `--sandbox` flag (modes: strict, all, or comma-separated capabilities). Permission failures produce `SemaError::PermissionDenied`/`PathDenied` (a denied call stays callable but returns the error). `--allowed-paths` confines file ops to directories. The WASM playground is inherently sandboxed. **Sandbox / SSRF guard** — under `--sandbox`, Sema rejects provider `:base-url`/`:host` values pointing at loopback or private addresses (localhost, 127.0.0.1, 10.x, 169.254.169.254) to prevent Server-Side Request Forgery when running untrusted code. Local endpoints work normally unsandboxed (REPL/CLI/notebook). See also *Sandbox / capability*. **Semver / lock file** — packages use semantic versioning (semver) for published versions; `sema.lock` records exact resolved versions (registry version + SHA256, or git ref + commit SHA) for reproducible builds, and `--locked` enforces it in CI. `sema.toml` is the manifest (`[package]`, `[deps]`); `sema.lock` is auto-generated and committed. **Shebang** — a `#!/usr/bin/env sema` line on the first line of a `.sema` file that makes it directly executable; Sema treats the shebang line as a comment. Only allowed on the first line. **Signal handler** — a callback registered for a Unix signal (`:winch`/SIGWINCH, `:int`/SIGINT, `:term`/SIGTERM) via `sys/on-signal`. Handlers are async-signal-safe: the OS handler only flips an atomic flag and the Sema callback runs later when `sys/check-signals` is called. Deferred dispatch keeps the single-threaded `Rc` runtime intact. No-ops on Windows. **Spinner** — an animated terminal progress indicator (`term/spinner-start`/`-update`/`-stop`) using braille frames at 80 ms intervals, rendered to stderr and identified by an integer spinner ID so several can run concurrently. `spinner-stop` can show a final `{:symbol :text}` status line. **VFS backend** — a pluggable persistence layer for the WASM in-memory VFS implementing the `VFSBackend` interface (`init`/`hydrate`/`flush`/`reset`). Built-ins: `MemoryBackend`, `LocalStorageBackend`, `SessionStorageBackend`, `IndexedDBBackend`. `hydrate()` loads files into the VFS on startup; `flush()` persists them out; `namespace` isolates storage between interpreters. See *VFS* under Reader, Compiler & VM Internals. --- --- url: 'https://sema-lang.com/docs.md' --- # Getting Started Sema is a Scheme-like Lisp where prompts are s-expressions, conversations are persistent data structures, and LLM calls are just another form of evaluation. It combines a Scheme core with Clojure-style keywords (`:foo`), map literals (`{:key val}`), and vector literals (`[1 2 3]`). ## Why Sema? * **LLMs as language primitives** — prompts, messages, conversations, tools, and agents are first-class data types, not string templates bolted on * **Multi-provider** — Anthropic, OpenAI, Gemini, Groq, xAI, Mistral, Ollama, and more, all auto-configured from environment variables * **Practical Lisp** — closures, tail-call optimization, macros, modules, error handling, HTTP, file I/O, regex, JSON — everything you need to build real programs * **Embeddable** — clean Rust crate structure, builder API, sync interface ([learn more](./embedding.md)) ## Installation Install pre-built binaries (no Rust required): ```bash # macOS / Linux curl -fsSL https://sema-lang.com/install.sh | sh # Windows (PowerShell) powershell -ExecutionPolicy ByPass -c "irm https://sema-lang.com/install.ps1 | iex" # Homebrew (macOS / Linux) brew install helgesverre/tap/sema-lang ``` Or install from [crates.io](https://crates.io/crates/sema-lang): ```bash cargo install sema-lang ``` Or build from source: ```bash git clone https://github.com/HelgeSverre/sema cd sema cargo build --release # Binary is at target/release/sema ``` ## Quick Start ```bash sema # Start the REPL sema script.sema # Run a file sema -e '(+ 1 2)' # Evaluate an expression sema -p '(map sqr (range 5))' # Evaluate and always print ``` ```sema ;; In the REPL: sema> (define (greet name) f"Hello, ${name}!") sema> (greet "world") "Hello, world!" sema> (map #(* % %) (range 1 6)) (1 4 9 16 25) sema> (define person {:name "Ada" :age 36}) sema> (:name person) "Ada" ``` ## Examples ### Working with Data ```sema ;; Keywords as accessor functions, short lambdas with #(...) (define people [{:name "Ada" :age 36} {:name "Bob" :age 28} {:name "Cat" :age 42}]) (map #(:name %) people) ; => ("Ada" "Bob" "Cat") (->> people (filter #(> (:age %) 30)) (map #(:name %))) ; => ("Ada" "Cat") ;; Destructuring and f-strings (let (({:keys [name age]} (first people))) (println f"${name} is ${age} years old")) ;; Pattern matching (define (describe person) (match person ({:keys [name age]} when (> age 40) f"${name} is experienced") ({:keys [name]} f"${name} is on the team"))) ``` ### LLM Completion ```sema ;; Simple completion (requires an API key env var) (llm/complete "Explain recursion in one sentence" {:max-tokens 50}) ;; Structured chat with message history (llm/chat (list (message :system "You are a helpful assistant.") (message :user "What is Lisp? One sentence.")) {:max-tokens 100}) ``` ### Persistent Conversations ```sema ;; Each conversation/say makes a real LLM call, threading prior turns as history. (define conv (conversation/new {:model "claude-haiku-4-5-20251001"})) (define conv (conversation/say conv "Remember: the secret number is 7")) (define conv (conversation/say conv "What is the secret number?")) (conversation/last-reply conv) ; => the model's reply, e.g. "The secret number is 7." — it recalls the earlier turn ``` ## What's Next? * [CLI Reference](./cli.md) — all flags, subcommands, and environment variables * [Shell Completions](./shell-completions.md) — tab completions for bash, zsh, fish, and more * [Editor Support](./editors.md) — plugins for VS Code, Vim/Neovim, Emacs, Helix, and Zed * [Embedding in Rust](./embedding.md) — use Sema as a scripting engine in your app * [Data Types](./language/data-types.md) — all built-in types * [Special Forms](./language/special-forms.md) — control flow, bindings, and iteration * [Macros & Modules](./language/macros-modules.md) — metaprogramming and code organization * [LLM Primitives](./llm/) — completions, chat, tools, agents, embeddings, and more --- --- url: 'https://sema-lang.com/docs/quickstart.md' --- # Quickstart Welcome to Sema! This guide will get you up and running with the Sema CLI and REPL in under 5 minutes. ## Prerequisites Make sure you have installed the `sema` binary. If you haven't yet, follow the [Installation Instructions](./#installation) on the homepage. To verify your installation, check the version: ```bash sema --version ``` *** ## 1. The Interactive REPL The easiest way to explore Sema is using the interactive Read-Eval-Print Loop (REPL). Start it by running: ```bash sema ``` You should see a prompt like `sema>`. Type an expression and press `Enter` to evaluate it: ```sema sema> (+ 1 2) 3 sema> (define x 10) x sema> (* x 5) 50 sema> (exit) ``` > \[!TIP] > Use `Ctrl+D` or type `(exit)` to quit the REPL. The REPL also supports command history and auto-completions. *** ## 2. Running a Script To write and execute scripts, save your Sema code to a file with the `.sema` extension. Create a file named `hello.sema` with the following content: ```sema (define name "World") (println f"Hello, ${name}!") ``` Run the file using the `sema` command: ```bash sema hello.sema ``` This will output: ```text Hello, World! ``` *** ## 3. Inline Evaluation You can evaluate expressions directly from your shell without opening the REPL or creating a file. ### Evaluate an expression Use the `-e` flag to evaluate an expression silently (it will only print if your code calls printing functions like `println`): ```bash sema -e '(println "Sema is running!")' ``` ### Evaluate and print the result Use the `-p` flag to evaluate an expression and automatically print the result: ```bash sema -p '(map #(* % %) (range 1 6))' ``` Output: ```text (1 4 9 16 25) ``` *** ## 4. Quick LLM Call Sema is designed around LLM integration. To try it, make sure you have an API key set in your environment (e.g., `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`). Start the REPL and call the completion primitive: ```sema sema> (llm/complete "Translate 'Lisp is beautiful' to French.") "« Lisp est magnifique »" ``` ## Next Steps Now that you know how to run Sema, let's learn how to write it: * [Basic Syntax](./tutorial/basics.md) — S-expressions, variables, and collections * [Functions & Scope](./tutorial/functions.md) — How to define functions and run loops/recursion * [Concurrency & Async](./tutorial/concurrency.md) — Multi-threaded execution in the VM --- --- url: 'https://sema-lang.com/docs/tutorial/basics.md' --- # Basic Syntax Sema is a Lisp, meaning it has a very simple and uniform syntax based on **S-expressions (symbolic expressions)**. If you have used Scheme or Clojure, you will feel right at home. If you are new to Lisp, this guide will introduce you to the core rules of the syntax. *** ## 1. S-Expressions and Prefix Notation All code and data in Sema are represented as S-expressions. An expression is either a single value (like a number or string) or a list of expressions enclosed in parentheses. In Sema, **operators and functions always come first**. This is called **prefix notation**: ```sema (+ 1 2) ; => 3 (* 10 (+ 2 3)) ; => 50 (10 * (2 + 3)) ``` Here is how to read `(+ 1 2)`: 1. The opening parenthesis `(` starts a list. 2. The first element `+` is the function or operator to call. 3. The remaining elements `1` and `2` are the arguments passed to it. 4. The closing parenthesis `)` ends the call. *** ## 2. Comments Comments in Sema start with a semicolon `;` and run to the end of the line: ```sema ; This is a single-line comment (+ 1 2) ; This is an inline comment ``` By convention: * A single semicolon `;` is used for inline comments. * A double semicolon `;;` is used for comments on their own line. *** ## 3. Basic Types Sema supports standard scalar types: * **Numbers**: Integers (`42`) and floats (`3.14`). * **Strings**: Double-quoted text (`"hello world"`). * **F-Strings**: Interpolated strings prefixed with `f` (`f"Hello ${name}"`). * **Booleans**: `#t` (true) and `#f` (false). * **Nil**: The null/empty value (`nil`). * **Keywords**: Colon-prefixed identifiers (`:name`, `:status`) commonly used as map keys or identifiers. *** ## 4. Variable Bindings There are two primary ways to declare variables: globally and locally. ### Global Bindings (`define`) Use `define` to bind a name to a value globally: ```sema (define pi 3.14159) (define radius 10) (* pi (* radius radius)) ; => 314.159 ``` ### Local Bindings (`let`) Use `let` to bind variables within a specific scope. The syntax uses a list of `(variable value)` pairs: ```sema (let ((width 10) (height 5)) (* width height)) ; => 50 ;; 'width' and 'height' are not visible here ``` *** ## 5. Core Collections Sema supports three main collection types: Lists, Vectors, and Maps. ### Lists Lists are ordered, linked collections. Since parentheses denote code execution, you create a literal list using the `list` function or by quoting it with `'`: ```sema (list 1 2 3) ; => (1 2 3) '(1 2 3) ; => (1 2 3) ``` ### Vectors Vectors are array-like collections with fast index-based access. They are defined using square brackets `[]`: ```sema (define my-vector [10 20 30]) (nth my-vector 1) ; => 20 (0-indexed) ``` ### Maps Maps are key-value structures. They are defined using curly braces `{}` and typically use keywords as keys: ```sema (define user {:name "Ada" :age 36}) ``` #### Keyword Accessors In Sema, keywords act as accessor functions. You can look up a value in a map by calling the keyword with the map as an argument: ```sema (:name user) ; => "Ada" (:age user) ; => 36 ``` *** ## Next Steps Now that you know how to represent data and variables, let's learn how to organize code into functions: * [Functions & Scope](./functions.md) * [Concurrency & Async](./concurrency.md) --- --- url: 'https://sema-lang.com/docs/tutorial/functions.md' --- # Functions and Scope Functions are the primary building blocks of Sema programs. Functions in Sema are first-class, meaning they can be bound to variables, passed as arguments to other functions, and returned from functions. *** ## 1. Defining Named Functions There are two equivalent ways to define a named function: ### The `defn` Macro By convention, the `defn` macro is the most common way to declare a function: ```sema (defn square (x) (* x x)) (square 5) ; => 25 ``` ### The `define` Shorthand You can also define functions using a shorthand syntax with `define`: ```sema (define (square x) (* x x)) ``` *** ## 2. Anonymous Functions (Lambdas) Anonymous functions are functions without a name, commonly used when passing functions to higher-order helpers like `map` or `filter`. ### Using `fn` You can define an anonymous function using the `fn` form: ```sema (map (fn (x) (* x x)) '(1 2 3)) ; => (1 4 9) ``` ### Shorthand Lambdas `#(...)` Sema provides a compact Clojure-style shorthand syntax for short anonymous functions. * `#(...)` creates an anonymous function. * `%` represents the first argument (or you can use `%1`, `%2`, etc., for multiple arguments). ```sema ;; Square a number (map #(* % %) '(1 2 3)) ; => (1 4 9) ;; Add two numbers (define add #(+ %1 %2)) (add 10 20) ; => 30 ``` *** ## 3. Scope and Closures Sema functions are **lexically scoped**. This means they can access variables declared in their outer parent scopes. When a function references variables from its enclosing scope, it creates a **closure**: ```sema (defn make-adder (x) (fn (y) (+ x y))) (define add-five (make-adder 5)) (add-five 10) ; => 15 ``` In the example above, `add-five` remembers the value of `x` (which is `5`) even after `make-adder` has finished execution. *** ## 4. Recursion and Tail-Call Optimization (TCO) While Sema supports iterative forms, the standard way to perform repetitive tasks is via **recursion**. Sema implements **Tail-Call Optimization (TCO)**. When a function calls itself (or another function) in the **tail position** (meaning the call is the absolute last action in the function), the runtime reuses the current stack frame instead of allocating a new one. This prevents stack overflow errors, regardless of how deep the recursion is. ### Example: Tail-Recursive Factorial A function is tail-recursive if the recursive call's value is directly returned without further computation: ```sema (defn factorial (n accumulator) (if (<= n 1) accumulator (factorial (- n 1) (* n accumulator)))) ; Tail position (factorial 5 1) ; => 120 ``` Contrast this with a non-tail-recursive version where the recursive call is not the last operation: ```sema (defn factorial-bad (n) (if (<= n 1) 1 (* n (factorial-bad (- n 1))))) ; NOT in tail position (* must run after) ``` *** ## Next Steps Now that you know how to write functions, let's look at how Sema handles concurrency and asynchronous programming: * [Concurrency & Async](./concurrency.md) --- --- url: 'https://sema-lang.com/docs/tutorial/concurrency.md' --- # Concurrency & Async Sema features a **cooperative asynchronous concurrency model** using Promises and Channels. Tasks run on the bytecode VM's built-in scheduler and interleave execution at specific **yield points** (such as waiting for a channel, sleeping, or awaiting another task). > \[!IMPORTANT] > Async features are **VM-only** and require the bytecode VM backend (which is the default since v1.13). *** ## 1. Promises and Tasks A Promise represents the result of a computation that runs asynchronously. ### Spawning Tasks (`async`) You can spawn a computation as a background task using the `async` special form, which returns an async promise: ```sema (define p (async (+ 10 20))) ``` ### Awaiting Results (`await`) To wait for a task to complete and get its return value, use the `await` function: ```sema (await p) ; => 30 ``` ### Concurrent Execution If you spawn multiple tasks, they run concurrently. You can kick off several jobs and wait for all of them: ```sema (define task1 (async (do-slow-work-1))) (define task2 (async (do-slow-work-2))) ;; Both tasks are running. Now we wait for their results: (define result1 (await task1)) (define result2 (await task2)) ``` *** ## 2. Sleeping and Yielding Within an async task, you can pause execution to let other tasks run, or delay execution for a specific duration. ### Sleeping (`async/sleep`) Use `async/sleep` to yield control to the scheduler for at least a certain number of milliseconds: ```sema (async (println "Starting...") (async/sleep 1000) ; pause for 1 second (println "Done!")) ``` ::: tip Deterministic — and real wall-clock everywhere The scheduler uses a **virtual clock**, so sleeps order tasks deterministically — a shorter sleep always wakes before a longer one, the same on every run. The clock advances in real time: on native (a 1-second sleep really waits) and in the **browser playground**, where eval runs on a Web Worker that blocks on `Atomics.wait` so the sleep really pauses while the page stays responsive. Browsers without cross-origin isolation fall back to advancing instantly (ordering preserved). Sleep durations are capped at 1 day. ::: *** ## 3. Channels Channels are bounded FIFO (First-In, First-Out) buffers used to communicate and synchronize data between concurrent tasks. ### Creating a Channel (`channel/new`) Create a channel with a specific buffer capacity. The default capacity is 1: ```sema (define ch (channel/new 3)) ; holds up to 3 values ``` ### Sending and Receiving (`send` / `recv`) * **`channel/send`** sends a value to the channel. If the channel is full, the sending task yields until space becomes available. * **`channel/recv`** receives a value from the channel. If the channel is empty, the receiving task yields until a value is sent. ```sema (define ch (channel/new 1)) ;; A worker task sends a message: (async (channel/send ch "message from worker")) ;; `channel/recv` only blocks (yields) inside an async task, so receive ;; from within one and await the result: (await (async (let ((msg (channel/recv ch))) (println msg) ; => "message from worker" msg))) ``` > \[!NOTE] > Channel operations only block by yielding to the scheduler, which runs > async tasks. Calling `channel/recv` on an empty channel (or `channel/send` > on a full one) from the **top level** — outside any `async` task — raises an > error instead of waiting, because there is no task to suspend. ### Closing Channels (`channel/close`) When you are finished sending data, close the channel. Any subsequent sends will raise an error. Receivers waiting on a closed, empty channel will receive `nil`: ```sema (channel/close ch) ``` *** ## 4. Producer / Consumer Example Here is a complete example of a producer task sending a series of numbers to a consumer task via a channel: ```sema (let ((ch (channel/new 1))) (let ((producer (async (channel/send ch 10) (channel/send ch 20) (channel/send ch 30) (channel/close ch))) (consumer (async (let loop ((sum 0)) (let ((val (channel/recv ch))) (if (nil? val) sum (loop (+ sum val)))))))) (await consumer))) ; => 60 ``` *** ## 5. Async inside Higher-Order Functions Sema's standard library functions like `map`, `filter`, and `for-each` support async callbacks. However, if you pass a *yielding* native (like `channel/recv`) directly and it actually needs to suspend, the runtime cannot yield through it. Wrap it in a lambda so the yield can suspend cleanly: ```sema ;; ❌ Inside an async task, if `channel/recv` must wait for a value it raises: ;; "yielding native passed directly to a higher-order function" (async (map channel/recv (list ch1 ch2))) ;; ✓ Wrap the yielding call in a lambda: (async (map (fn (c) (channel/recv c)) (list ch1 ch2))) ``` --- --- url: 'https://sema-lang.com/docs/language/data-types.md' --- # Data Types Sema has a rich set of built-in data types covering numbers, text, collections, and LLM primitives. ## Type Table | Type | Syntax | Examples | | ------------ | -------------------- | ------------------------------------------------------------------ | | Integer | digits | `42`, `-7`, `0` | | Float | digits with `.` | `3.14`, `-0.5`, `0.001` | | String | double-quoted | `"hello"`, `"line\nbreak"`, `"\x1B;"` | | F-String | `f"...${expr}..."` | `f"Hello ${name}"`, `f"${(+ 1 2)}"` | | Boolean | `#t` / `#f` | `#t`, `#f` | | Nil | `nil` | `nil` | | Symbol | bare identifier | `foo`, `my-var`, `+` | | Keyword | colon-prefixed | `:name`, `:type`, `:ok` | | Character | `#\` prefix | `#\a`, `#\space`, `#\newline` | | List | parenthesized | `(1 2 3)`, `(+ a b)` | | Vector | bracketed | `[1 2 3]`, `["a" "b"]` | | Map | curly-braced | `{:name "Ada" :age 36}` | | HashMap | `(hashmap/new ...)` | `(hashmap/new :a 1 :b 2)` | | Prompt | `(prompt ...)` | LLM prompt (see [Prompts](../llm/prompts.md)) | | Message | `(message ...)` | LLM message (see [Prompts](../llm/prompts.md)) | | Conversation | `(conversation/new)` | LLM conversation (see [Conversations](../llm/conversations.md)) | | Tool | `(deftool ...)` | LLM tool definition (see [Tools & Agents](../llm/tools-agents.md)) | | Agent | `(defagent ...)` | LLM agent (see [Tools & Agents](../llm/tools-agents.md)) | | Promise | `(delay expr)` | Lazy evaluation | | Record | `define-record-type` | `(define-record-type point ...)` | | Bytevector | `#u8(...)` literal | `#u8(1 2 3)`, `#u8()` | | Async Promise | `(async expr)` or `(async/resolved val)` | An async task result (pending, resolved, or rejected) | | Channel | `(channel/new)` or `(channel/new capacity)` | Bounded FIFO channel for inter-task communication | ## Scalars ### Integer Whole numbers. Standard arithmetic applies. ```sema 42 -7 0 ``` ### Float Floating-point numbers with a decimal point. ```sema 3.14 -0.5 0.001 ``` ### String Double-quoted text with escape sequences. ```sema "hello" "line\nbreak" "\x1B;" ``` ### F-String (Interpolated String) String interpolation with embedded expressions. `f"..."` reads as a `(str ...)` call (i.e. `f"Hello ${name}"` is the same as `(str "Hello " name)`). ```sema (define name "Alice") f"Hello ${name}" ; => "Hello Alice" f"2 + 2 = ${(+ 2 2)}" ; => "2 + 2 = 4" f"${(:name user)} is ${(:age user)} years old" ``` Use `\$` to include a literal dollar sign: `f"costs \$5"`. ### Boolean `#t` for true, `#f` for false. ```sema #t #f ``` ### Nil The empty/null value. ```sema nil ``` ### Symbol Bare identifiers used as variable names and in quoted data. ```sema foo my-var + ``` ### Keyword Colon-prefixed identifiers. Keywords are self-evaluating and can be used as accessor functions on maps. ```sema :name :type :ok ;; Keywords as functions (:name {:name "Ada" :age 36}) ; => "Ada" ``` ### Character Character literals with `#\` prefix. Named characters are supported. ```sema #\a #\space #\newline #\tab ``` ## Collections ### List Parenthesized sequences. Lists are the fundamental data structure in Sema. Access the first element with `car` (or `first`) and the rest with `cdr` (or `rest`). ::: details Why `car`/`cdr`? These names come from the [IBM 704](http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/704/24-6661-2_704_Manual_1955.pdf) (1955), the machine Lisp was born on. The 704 stored each cons cell in a single 36-bit word: `car` ("Contents of the Address Register") extracted one 15-bit pointer field, `cdr` ("Contents of the Decrement Register") extracted the other. They were single hardware instructions. Sema also provides `first`/`rest` as aliases. ::: ```sema (1 2 3) (+ a b) '(hello world) ``` ### Vector Bracketed sequences with O(1) indexed access. ```sema [1 2 3] ["a" "b"] ``` ### Map Curly-braced key-value pairs with deterministic (sorted) ordering. Maps support [destructuring](./special-forms.md#map-destructuring) in `let`, `define`, `lambda`, and [`match`](./special-forms.md#match) patterns. ```sema {:name "Ada" :age 36} {:a 1 :b 2 :c 3} ``` ### HashMap Hash-based maps for O(1) lookup performance with many keys. ```sema (hashmap/new :a 1 :b 2 :c 3) ``` ### Bytevector Byte arrays with `#u8(...)` literal syntax. ```sema #u8(1 2 3) #u8() (bytevector 1 2 3) (bytevector/new 4) ``` ## Special Types ### Promise Lazy evaluation via `delay`/`force`. The expression is not evaluated until forced, and the result is memoized. ```sema (define p (delay (+ 1 2))) (force p) ; => 3 (promise? p) ; => #t ``` ### Record User-defined record types with constructors, predicates, and field accessors. ```sema (define-record-type point (make-point x y) point? (x point-x) (y point-y)) (define p (make-point 3 4)) (point-x p) ; => 3 ``` ## String Escape Sequences | Escape | Description | Example | | ------------ | ------------------------------------ | --------------------- | | `\n` | Newline | `"line\nbreak"` | | `\t` | Tab | `"col1\tcol2"` | | `\r` | Carriage return | `"text\r"` | | `\\` | Backslash | `"path\\file"` | | `\"` | Double quote | `"say \"hi\""` | | `\0` | Null character | `"\0"` | | `\x;` | Unicode scalar (R7RS, 1+ hex digits) | `"\x1B;"`, `"\x3BB;"` | | `\uNNNN` | Unicode code point (4 hex digits) | `"\u03BB"` (λ) | | `\UNNNNNNNN` | Unicode code point (8 hex digits) | `"\U0001F600"` (😀) | | `\$` | Literal dollar sign (in f-strings) | `f"costs \$5"` | ## Type Predicates ```sema (null? '()) (nil? nil) (empty? "") (list? '(1)) (vector? [1]) (map? {:a 1}) (pair? '(1 2)) ; #t (non-empty list, Scheme compat) (number? 42) (integer? 42) (float? 3.14) (string? "hi") (symbol? 'x) (keyword? :k) (char? #\a) (record? r) (bytevector? #u8()) (promise? (delay 1)) (promise-forced? p) (bool? #t) (fn? car) (zero? 0) (even? 4) (odd? 3) (positive? 1) (negative? -1) (eq? 'a 'a) (= 1 1) ;; Scheme aliases: boolean? = bool?, procedure? = fn? ;; eq? and equal? are the same function in Sema — both do structural ;; equality without numeric coercion. Use = for numeric comparison ;; (e.g. (= 1 1.0) is #t, but (eq? 1 1.0) is #f). ;; LLM type predicates (prompt? p) (message? m) (conversation? c) (tool? t) (agent? a) ``` ## Type Conversions ```sema (str 42) ; => "42" (any value to string) (string/to-number "42") ; => 42 (number/to-string 42) ; => "42" (string/to-symbol "foo") ; => foo (symbol/to-string 'foo) ; => "foo" (string/to-keyword "name") ; => :name (keyword/to-string :name) ; => "name" (char/to-integer #\A) ; => 65 (integer/to-char 65) ; => #\A (char/to-string #\a) ; => "a" (string/to-char "a") ; => #\a (string/to-list "abc") ; => (#\a #\b #\c) (list->string '(#\h #\i)) ; => "hi" (vector->list [1 2 3]) ; => (1 2 3) (list->vector '(1 2 3)) ; => [1 2 3] (bytevector/to-list #u8(65)) ; => (65) (list/to-bytevector '(1 2 3)) ; => #u8(1 2 3) (utf8/to-string #u8(104 105)) ; => "hi" (string/to-utf8 "hi") ; => #u8(104 105) (type 42) ; => :int ``` --- --- url: 'https://sema-lang.com/docs/language/special-forms.md' --- # Special Forms Special forms are built into the evaluator — they control evaluation order and cannot be redefined. ## Definitions & Assignment ### `define` Bind a value or define a function. ```sema (define x 42) ; bind a value (define (square x) (* x x)) ; define a function (shorthand) ``` ### `set!` Mutate an existing binding. ```sema (set! x 99) ``` ## Quoting ### `quote` Return the argument without evaluating it. The reader shorthand `'x` desugars to `(quote x)`. ```sema (quote (+ 1 2)) ; => (+ 1 2) as a list '(+ 1 2) ; same thing 'foo ; => foo (the symbol, not its value) ``` ### `quasiquote` Template with selective evaluation. Use `` ` `` as shorthand. Inside a quasiquote, `,expr` (unquote) evaluates `expr` and splices the result, while `,@expr` (unquote-splicing) evaluates `expr` and splices each element. ```sema (define x 42) `(a b ,x) ; => (a b 42) `(a ,@(list 1 2 3) b) ; => (a 1 2 3 b) ``` Quasiquote is essential for writing macros — see [Macros](./macros-modules.md#macros). ## Functions ### `lambda` Create an anonymous function. ```sema (lambda (x y) (+ x y)) ``` ### `fn` Alias for `lambda`. ```sema (fn (x) (* x x)) (fn (x . rest) rest) ; rest parameters with dot notation ``` ### `defun` Define a named function (equivalent to `(define (name params...) body...)`). ```sema (defun square (x) (* x x)) (defun greet (name) f"Hello, ${name}!") ``` ::: tip Clojure alias `defn` is accepted as an alias for `defun`. ::: ## Conditionals ### `if` Two-branch conditional. ```sema (if (> x 0) "positive" "non-positive") ``` ### `cond` Multi-branch conditional with `else` fallback. ```sema (cond ((< x 0) "negative") ((= x 0) "zero") (else "positive")) ``` ### `case` Match a value against literal alternatives. ```sema (case (:status response) ((:ok) "success") ((:error :timeout) "failure") (else "unknown")) ``` ### `when` Execute body only if condition is true. Returns `nil` otherwise. ```sema (when (> x 0) (println "positive")) ``` ### `unless` Execute body only if condition is false. ```sema (unless (> x 0) (println "not positive")) ``` ## Threading Macros Built-in macros for pipeline-style code. Available automatically — no import needed. ### `->` Thread-first: inserts the value as the first argument of each form. ```sema (-> 5 (+ 3) (* 2)) ; => 16 (-> response :body json/decode :data) ; nested access ``` ### `->>` Thread-last: inserts the value as the last argument of each form. ```sema (->> (range 1 100) (filter even?) (map #(* % %)) (take 5)) ; => (4 16 36 64 100) ``` ### `as->` Thread-as: bind the threaded value to a name for arbitrary placement. ```sema (as-> 5 x (+ x 3) (* x x) (- x 1)) ; => 63 ``` ### `some->` Nil-safe thread-first: stops and returns `nil` if any step produces `nil`. ```sema (some-> config :database :connection-string db/connect) ;; returns nil if any step is nil, instead of crashing ``` ## Conditional Binding ### `when-let` Bind a value and execute body only if non-nil. ```sema (when-let (user (db/find-user id)) (send-email user "Welcome back")) ``` ### `if-let` Bind a value and branch on nil/non-nil. ```sema (if-let (cached (cache/get key)) cached (compute-fresh-value)) ``` ## Short Lambda ### `#(...)` Concise anonymous functions. `%` (or `%1`) is the first argument, `%2` the second, etc. ```sema (map #(+ % 1) '(1 2 3)) ; => (2 3 4) (map #(* % %) '(1 2 3 4)) ; => (1 4 9 16) (filter #(> % 3) '(1 2 3 4 5)) ; => (4 5) (#(+ %1 %2) 3 4) ; => 7 ``` ## Bindings ### `let` Parallel bindings — all init expressions are evaluated before any binding is created. ```sema (let ((x 10) (y 20)) (+ x y)) ``` ### `let*` Sequential bindings — each binding is visible to subsequent ones. ```sema (let* ((x 10) (y (* x 2))) (+ x y)) ``` ### `letrec` Recursive bindings — all bindings are visible to all init expressions. Useful for mutually recursive functions. ```sema (letrec ((even? (fn (n) (if (= n 0) #t (odd? (- n 1))))) (odd? (fn (n) (if (= n 0) #f (even? (- n 1)))))) (even? 10)) ``` ### Named `let` Loop construct with tail-call optimization. ```sema (let loop ((i 0) (sum 0)) (if (= i 100) sum (loop (+ i 1) (+ sum i)))) ``` ## Destructuring `let`, `let*`, `define`, and `lambda` all support destructuring patterns in binding positions. ### Vector Destructuring Extract elements from lists and vectors by position. ```sema (let (([a b c] '(1 2 3))) (+ a b c)) ; => 6 (let (([first & rest] '(1 2 3 4))) rest) ; => (2 3 4) (let (([_ second] '(1 2))) second) ; => 2 ``` ### Map Destructuring Extract values from maps using `{:keys [...]}`. ```sema (let (({:keys [name age]} {:name "Alice" :age 30})) (println name)) ; prints "Alice" ``` Explicit key-pattern pairs: ```sema (let (({:x val} {:x 42})) val) ; => 42 ``` ### Destructuring in `define` ```sema (define [a b c] '(1 2 3)) ; binds a=1, b=2, c=3 (define {:keys [host port]} config) ; binds host, port from map ``` ### Destructuring in Function Parameters ```sema (define (sum-pair [a b]) (+ a b)) (sum-pair '(3 4)) ; => 7 (define (greet {:keys [name title]}) (format "Hello ~a ~a" title name)) (greet {:name "Smith" :title "Dr."}) ; => "Hello Dr. Smith" ``` Nested patterns are supported: ```sema (let (([[a b] c] '((1 2) 3))) (+ a b c)) ; => 6 ``` ## Pattern Matching ### `match` Match a value against patterns with optional guards. ```sema (match value (pattern body ...) (pattern when guard body ...) ...) ``` If no clause matches, `match` **raises an error** (`match: no clause matched value: …`) — a non-exhaustive match is almost always a bug, so it fails loudly rather than returning `nil` silently. Add a catch-all `(_ ...)` clause to handle the rest: ```sema (match status (:ok "success") (_ "other")) ; catch-all; without it, an unmatched status raises ``` #### `match*` — lenient variant When "no match" is a normal outcome (e.g. a lookup), use `match*`, which returns `nil` instead of raising: ```sema (match* 42 (1 "one") (2 "two")) ; => nil (no clause matched) ``` #### Literal Matching ```sema (match status (:ok "success") (:error "failure") (_ "unknown")) ``` #### Binding Patterns Symbols bind the matched value. `_` is a wildcard. ```sema (match (+ 1 2) (x (format "got ~a" x))) ; => "got 3" ``` #### Vector Patterns ```sema (match '(1 2 3) ([a b c] (+ a b c))) ; => 6 (match args ([] (print-help)) ([cmd & rest] (dispatch cmd rest))) ``` #### Map Patterns Structural matching — keys must exist in the value: ```sema (match response ({:type :ok :data d} (process d)) ({:type :error :msg m} (log-error m)) (_ (println "unknown"))) ``` With `{:keys [...]}` shorthand: ```sema (match config ({:keys [host port]} (connect host port))) ``` #### Guards Add `when` after a pattern for conditional matching: ```sema (match n (x when (> x 100) "big") (x when (> x 0) "small") (_ "non-positive")) ``` #### Nested Patterns ```sema (match '(1 (2 3)) ([a [b c]] (+ a b c))) ; => 6 ``` ## Sequencing & Logic ### `begin` Evaluate expressions in order, return the last result. ```sema (begin expr1 expr2 ... exprN) ``` ::: tip Common Lisp alias `progn` is accepted as an alias for `begin`. ::: ### `and` Short-circuit logical AND. Returns the last truthy value or `#f`. ```sema (and a b c) ``` ### `or` Short-circuit logical OR. Returns the first truthy value or `#f`. ```sema (or a b c) ``` ## Iteration ### `while` Loop while a condition is truthy. Returns `nil`. Use `set!` to mutate loop state. ```sema (let ((n 0)) (while (< n 3) (println n) (set! n (+ n 1))) n) ;; prints 0, 1, 2 ;; => 3 ``` ### `do` Scheme `do` loop with variable bindings, step expressions, and a termination test. ```sema ;; (do ((var init step) ...) (test result ...) body ...) (do ((i 0 (+ i 1)) (sum 0 (+ sum i))) ((= i 10) sum)) ; => 45 ``` With a body for side effects: ```sema (do ((i 0 (+ i 1))) ((= i 5)) (println i)) ; prints 0..4 ``` ## Lazy Evaluation ### `delay` Create a promise — the expression is not evaluated until forced. ```sema (define p (delay (+ 1 2))) ``` ### `force` Evaluate a promise and memoize the result. Non-promise values pass through. ```sema (force p) ; => 3 (evaluate and memoize) (force p) ; => 3 (returns cached value) (force 42) ; => 42 (non-promise passes through) ``` ### `promise?` Check if a value is a promise. ```sema (promise? p) ; => #t ``` ### `promise-forced?` Check if a promise has already been forced. ```sema (promise-forced? p) ; => #t (after forcing) ``` ## Record Types ### `define-record-type` Define a record type with constructor, predicate, and field accessors. ```sema (define-record-type point (make-point x y) point? (x point-x) (y point-y)) (define p (make-point 3 4)) (point? p) ; => #t (point-x p) ; => 3 (point-y p) ; => 4 (record? p) ; => #t (type p) ; => :point (equal? (make-point 1 2) (make-point 1 2)) ; => #t ``` ## Multimethods Clojure-style polymorphic dispatch based on a user-defined dispatch function. ### `defmulti` Define a multimethod with a name and a dispatch function. The dispatch function is called with the arguments to determine which method to invoke. ```sema (defmulti area (fn (shape) (get shape :type))) ``` ### `defmethod` Add a method implementation for a specific dispatch value. Use `:default` as the dispatch value for a fallback handler. ```sema (defmethod area :circle (fn (shape) (* 3.14159 (expt (get shape :radius) 2)))) (defmethod area :rect (fn (shape) (* (get shape :width) (get shape :height)))) (defmethod area :default (fn (shape) (throw "unknown shape"))) (area {:type :circle :radius 5}) ; => 78.53975 (area {:type :rect :width 3 :height 4}) ; => 12 ``` ## Loading Files ### `load` Load and execute a Sema source file in the current environment. Unlike `import`, `load` does not use the module system — all top-level definitions become available in the current scope. ```sema (load "helpers.sema") ; execute file, bindings available here ``` ### `eval` Evaluate a data structure as code. See [Metaprogramming](./macros-modules.md#eval). ```sema (eval '(+ 1 2)) ; => 3 (eval (read "(* 3 4)")) ; => 12 ``` ## Error Handling ### `try` / `catch` Catch errors with structured error maps. ```sema (try (/ 1 0) (catch e (println (format "Error: ~a" (:message e))) (:type e))) ; => :eval ``` ::: warning `try`/`catch` catches **all** error types — not just user exceptions thrown with `throw`. This includes internal errors like `:unbound` (typos in variable names), `:permission-denied`, and `:arity` (wrong number of arguments). Catching everything can silently mask bugs. **Re-throw errors you don't intend to handle.** ::: #### Error map fields Every caught error is a map with at least `:type`, `:message`, and `:stack-trace`. Some error types include additional fields: | `:type` | Description | Extra fields | |---|---|---| | `:reader` | Syntax / parse error | — | | `:eval` | General evaluation error | — | | `:type-error` | Wrong argument type | `:expected`, `:got` | | `:arity` | Wrong number of arguments | — | | `:unbound` | Undefined variable | `:name` | | `:llm` | LLM provider error | — | | `:io` | File / network I/O error | — | | `:permission-denied` | Sandboxed capability denied | `:function`, `:capability` | | `:user` | Thrown with `throw` | `:value` (the original thrown value) | #### Discriminating error types Use the `:type` field to handle specific errors and re-throw the rest: ```sema (try (some-operation) (catch e (cond ((= (get e :type) :permission-denied) (println "Access denied!")) ((= (get e :type) :user) (println (format "User error: ~a" (get e :message)))) (else (throw e))))) ;; re-throw unexpected errors ``` ### `throw` Throw any value as an error. ```sema (throw "something went wrong") (throw {:code 404 :reason "not found"}) ``` ## Async / Await ### `async` Create an async task that evaluates `body` concurrently and returns a promise. ``` (async body ...) ``` The task runs on the VM's cooperative scheduler. Multiple async tasks interleave at yield points (channel operations, await, sleep). ```sema (define p (async (+ 1 2))) (await p) ; => 3 ``` ### `await` Wait for an async promise to resolve and return its value. ``` (await promise) ``` If the promise was rejected, raises an error. Inside an async task, `await` yields to the scheduler allowing other tasks to run. At the top level, `await` runs the scheduler until the promise resolves. ```sema (let ((p1 (async (* 3 3))) (p2 (async (* 4 4)))) (+ (await p1) (await p2))) ; => 25 ``` --- --- url: 'https://sema-lang.com/docs/language/macros-modules.md' --- # Macros & Modules ## Macros Sema supports `defmacro`-style macros with quasiquoting, unquoting, and splicing. ### `defmacro` Define a macro that transforms code at expansion time. ```sema (defmacro unless2 (test . body) `(if ,test nil (begin ,@body))) (unless2 #f (println "runs!")) ``` ### `macroexpand` Inspect the expansion of a macro call without evaluating it. ```sema (macroexpand '(unless2 #f (println "x"))) ``` ### `gensym` Generate a unique symbol manually. For most macro use cases, prefer [auto-gensym (`foo#`)](#auto-gensym-foo) instead. ```sema (gensym "tmp") ; => tmp__42 (unique each call) ``` ### Auto-gensym (`foo#`) Inside a quasiquote template, any symbol ending with `#` is automatically replaced with a unique generated symbol. All occurrences of the same `foo#` within a single quasiquote resolve to the same gensym, ensuring consistency. This prevents **variable capture** — a common bug where macro-introduced bindings accidentally shadow user variables. ```sema ;; Without auto-gensym — BUG if user has a variable named "tmp" (defmacro bad-inc (x) `(let ((tmp ,x)) (+ tmp 1))) (let ((tmp 100)) (bad-inc tmp)) ; => 2, not 101! "tmp" is captured ;; With auto-gensym — always correct (defmacro good-inc (x) `(let ((tmp# ,x)) (+ tmp# 1))) (let ((tmp 100)) (good-inc tmp)) ; => 101 ✓ ``` **Rules:** * Same `foo#` in one quasiquote → same generated symbol * Each quasiquote evaluation → fresh symbols (no cross-expansion collisions) * Outside quasiquote, `foo#` is a regular symbol (no magic) **Best practice:** Always use auto-gensym for bindings introduced by macros: ```sema (defmacro swap! (a b) `(let ((tmp# ,a)) (set! ,a ,b) (set! ,b tmp#))) ``` ### Built-in Macros Sema includes several macros that are auto-loaded at startup. These don't need to be defined or imported: * `->`, `->>`, `as->`, `some->` — [Threading macros](./special-forms.html#threading-macros) * `when-let`, `if-let` — [Conditional binding](./special-forms.html#when-let) See [Special Forms](./special-forms.html) for full documentation. ## Metaprogramming ### `eval` Evaluate data as code. ```sema (eval '(+ 1 2)) ; => 3 ``` ### `read` Parse a string into a Sema value. ```sema (read "(+ 1 2)") ; => (+ 1 2) as a list value ``` ### `io/read-many` Parse a string containing multiple forms. ```sema (io/read-many "(+ 1 2) (* 3 4)") ; => ((+ 1 2) (* 3 4)) ``` ### `type` Return the type of a value as a keyword. ```sema (type 42) ; => :int (type 3.14) ; => :float (type "hi") ; => :string (type :foo) ; => :keyword (type 'foo) ; => :symbol (type '(1 2 3)) ; => :list (type [1 2 3]) ; => :vector (type {:a 1}) ; => :map ``` For records, `type` returns the record type tag as a keyword (e.g. `:point`). ### Type Conversion Functions ```sema (string/to-symbol "foo") ; => foo (keyword/to-string :bar) ; => "bar" (string/to-keyword "name") ; => :name (symbol/to-string 'foo) ; => "foo" ``` ## Modules ### `module` Define a module with explicit exports. ```sema ;; math-utils.sema (module math-utils (export square cube) (define (square x) (* x x)) (define (cube x) (* x x x)) (define (internal-helper x) x)) ; not exported ``` ### `import` Import a module from a file. Only exported bindings become available. ```sema ;; main.sema (import "math-utils.sema") (square 5) ; => 25 (cube 3) ; => 27 ``` --- --- url: 'https://sema-lang.com/docs/cli.md' --- # CLI Reference ``` sema [OPTIONS] [FILE] [-- SCRIPT_ARGS...] ``` ## Flags & Options | Flag | Description | | -------------------- | -------------------------------------------- | | `-e, --eval ` | Evaluate expression, print result if non-nil | | `-p, --print ` | Evaluate expression, always print result | | `-l, --load ` | Load file(s) before executing (repeatable) | | `-q, --quiet` | Suppress REPL banner | | `-i, --interactive` | Enter REPL after running file or eval | | `--no-init` | Skip LLM auto-configuration | | `--no-llm` | Disable LLM features (same as `--no-init`) | | `--chat-model ` | Set default chat model | | `--chat-provider ` | Set chat provider | | `--embedding-model ` | Set embedding model | | `--embedding-provider ` | Set embedding provider | | `--sandbox ` | Restrict dangerous operations (see below) | | `-V, --version` | Print version | | `-h, --help` | Print help | ## Subcommands ### `sema ast` Parse source into an AST tree. ``` sema ast [OPTIONS] [FILE] ``` | Flag | Description | | ------------------- | -------------------------------- | | `-e, --eval ` | Parse expression instead of file | | `--json` | Output AST as JSON | ### `sema eval` Evaluate Sema code and return results. Designed for machine consumption (editor/LSP integration). ``` sema eval [OPTIONS] ``` | Flag | Description | | --------------------- | ---------------------------------------------------------------- | | `--stdin` | Read program from stdin | | `--expr ` | Evaluate a single expression | | `--json` | Emit JSON result envelope | | `--path ` | Set file context for imports and error spans | | `--sandbox ` | Sandbox mode (`strict`, `all`, or comma-separated capabilities) | | `--no-llm` | Disable LLM features | | `--timeout ` | Kill evaluation after N ms (default: 5000) | **Examples:** ```bash # Evaluate an expression sema eval --expr "(+ 1 2)" # Read from stdin (avoids shell quoting issues) echo '(* 6 7)' | sema eval --stdin # JSON output for programmatic use sema eval --expr "(+ 1 2)" --json # => {"ok":true,"value":"3","error":null,"elapsedMs":0} # Multi-form context: defines are available to later expressions echo '(define pi 3.14) (define (area r) (* pi r r)) (area 10)' | sema eval --stdin --json # => {"ok":true,"value":"314.0","error":null,"elapsedMs":0} # Sandboxed evaluation (used by LSP) sema eval --expr "(+ 1 2)" --json --sandbox strict --no-llm ``` **JSON envelope format:** ```json { "ok": true, "value": "42", "stdout": "", "stderr": "", "error": null, "elapsedMs": 12 } ``` Output from `print`/`println`/`display` is captured into `stdout`; `print-error`/`println-error` into `stderr`. These fields are always present (empty string when no output). On error: ```json { "ok": false, "value": null, "stdout": "", "stderr": "", "error": { "message": "Unbound variable: foo", "hint": "Did you mean 'for'?", "line": 3, "col": 5 }, "elapsedMs": 2 } ``` ### `sema compile` Compile a source file to a `.semac` bytecode file. The compiled file can be executed directly with `sema` (auto-detected via magic number). See [Bytecode File Format](./internals/bytecode-format.md) for details on the format. ::: info Imports resolve at runtime `sema compile` only compiles the specified file — it does not bundle dependencies. When you run the `.semac` file, `(import ...)` and `(load ...)` are resolved from the filesystem at runtime. All imported packages must be installed on the target machine. For a fully self-contained artifact, use [`sema build`](#sema-build) instead. ::: ``` sema compile [OPTIONS] ``` | Flag | Description | | --------------------- | ---------------------------------------------------- | | `-o, --output ` | Output file path (default: input with `.semac` extension) | | `--check` | Validate a `.semac` file without executing | ```bash # Compile to bytecode sema compile script.sema # → script.semac sema compile -o output.semac script.sema # explicit output path # Run the compiled bytecode (auto-detected) sema script.semac # Validate a bytecode file sema compile --check script.semac # ✓ script.semac: valid (format v1, sema 1.6.2, 3 functions, 847 bytes) ``` ### `sema build` Build a standalone executable from a Sema source file. The resulting binary embeds the compiled bytecode, all transitive imports, and any explicitly included assets into a self-contained executable. See [Executable Format](./internals/executable-format.md) for details on the binary format. ``` sema build [OPTIONS] [FILE] ``` | Flag | Description | | ------------------------ | --------------------------------------------------------- | | `-o, --output ` | Output executable path (default: filename without extension) | | `--include ...` | Additional files or directories to bundle (repeatable) | | `--runtime ` | Sema binary to use as runtime base (default: current exe) | | `--target ` | Target platform triple or alias (e.g. `linux`, `macos`, `windows`, or a full triple like `x86_64-unknown-linux-gnu`). Use `all` to build for all supported targets. | | `--list-targets` | Show all supported target platforms and aliases | | `--no-cache` | Force re-download of cached runtime binaries | ```bash # Build a standalone executable sema build script.sema # → ./script sema build script.sema -o myapp # explicit output path # Bundle additional files sema build script.sema --include data.json # bundle a file sema build script.sema --include assets/ # bundle a directory # Cross-compile for other platforms sema build script.sema --target linux # build for Linux (x86_64) sema build script.sema --target windows # build for Windows sema build script.sema --target all # build for all supported targets sema build script.sema --target linux --no-cache # force re-download runtime # Run the standalone executable ./myapp --arg1 --arg2 ``` Cross-compilation downloads pre-built runtime binaries from GitHub Releases and caches them at `~/.sema/cache/runtimes/`. Use `--no-cache` to force a fresh download, or `--runtime` to provide your own binary. #### Using a custom runtime source If you maintain a fork of Sema or host runtime binaries on your own infrastructure, set `SEMA_RUNTIME_BASE_URL` to point to a directory containing release archives and SHA256 checksums: ```bash export SEMA_RUNTIME_BASE_URL=https://github.com/yourname/sema/releases/download/v1.11.0 sema build app.sema --target linux ``` The expected file layout at that URL is: ``` sema-lang-.tar.xz # Linux/macOS archive containing the sema binary sema-lang-.tar.xz.sha256 # SHA256 checksum (hex hash, optionally followed by filename) sema-lang-.zip # Windows archive containing sema.exe sema-lang-.zip.sha256 # SHA256 checksum ``` Where `` is a full triple like `x86_64-unknown-linux-gnu` or `aarch64-apple-darwin`. This matches the asset naming used by [cargo-dist](https://opensource.axo.dev/cargo-dist/), so forks using cargo-dist will work out of the box. Alternatively, use `--runtime /path/to/sema` to skip downloading entirely and inject a local binary directly. ### `sema disasm` Disassemble a compiled `.semac` bytecode file, printing a human-readable listing of the main chunk and all function templates. ``` sema disasm [OPTIONS] ``` | Flag | Description | | -------- | -------------- | | `--json` | Output as JSON | ```bash sema disasm script.semac # human-readable text sema disasm --json script.semac # structured JSON output ``` ### `sema pkg` Package manager for installing, publishing, and managing Sema packages. Git-based packages work out of the box. Registry commands (`search`, `info`, `publish`, `yank`, `login`) require a running registry instance — see [Self-Hosted Registry](./packages.md#self-hosted-registry). See the full [Packages](./packages.md) documentation for details. ``` sema pkg ``` | Subcommand | Description | | --------------------------- | --------------------------------------------------- | | `init` | Initialize a new `sema.toml` in the current directory | | `add [--registry]` | Add a package from the registry or git | | `install [--locked]` | Install all deps from `sema.toml` (`--locked` fails if `sema.lock` is missing or out of sync — for CI) | | `update [name]` | Update packages (all or specific) | | `remove ` | Remove an installed package | | `list` | List installed packages | | `publish [--registry]` | Publish current package to the registry | | `search [--registry]` | Search the registry for packages | | `info [--registry]` | Show package info from the registry | | `yank [--registry]` | Yank a published version | | `login [--token] [--registry]` | Authenticate with a registry | | `logout` | Remove stored registry credentials | | `config [key] [value]` | View or set package manager configuration | ```bash # Install a registry package sema pkg add http-helpers@1.0.0 # Install a git package sema pkg add github.com/user/repo@v2.0 # Publish to the registry sema pkg login --token sema_pat_... sema pkg publish # Search for packages sema pkg search json # Set default registry sema pkg config registry.url https://my-registry.com ``` ### `sema completions` Generate shell completion scripts. See [Shell Completions](./shell-completions.md) for installation instructions. ``` sema completions [OPTIONS] ``` | Flag | Description | | ----------- | ------------------------------------------------------------------ | | `--install` | Auto-detect the shell's completion directory and install the script | Supported shells: `bash`, `zsh`, `fish`, `elvish`, `powershell`. The `--install` flag is supported for Bash, Zsh, Fish, and Elvish. For PowerShell, use `sema completions powershell` and follow the manual installation steps in [Shell Completions](./shell-completions.md). ```bash # Print completion script to stdout sema completions zsh # Auto-install to the correct directory sema completions --install zsh ``` ### `sema fmt` Format Sema source files. See [Formatter](./formatter.md) for full documentation. ``` sema fmt [OPTIONS] [FILES...] ``` | Flag | Description | | --- | --- | | `--check` | Check formatting without writing (exit 1 if unformatted) | | `--diff` | Print diff of changes | | `--width ` | Max line width (default: `80`) | | `--indent ` | Indentation width (default: `2`) | | `--align` | Align consecutive similar forms | | `--json` | Output result as JSON (useful for editor integrations) | ```bash # Format all .sema files recursively sema fmt # Check in CI sema fmt --check # Preview changes sema fmt --diff ``` ### `sema notebook` Jupyter-inspired cell-based notebook interface with a browser UI. Notebooks are saved as `.sema-nb` JSON files. Cells share a persistent environment — definitions in earlier cells are visible in later ones. ``` sema notebook ``` | Subcommand | Description | | --------------------------------------- | ------------------------------------------------- | | `serve [FILE]` | Start the notebook server with browser UI | | `run ` | Run all cells headlessly (for CI/testing) | | `export ` | Export notebook to Markdown | | `new ` | Create a new empty notebook | #### `sema notebook serve` ``` sema notebook serve [OPTIONS] [FILE] ``` | Flag | Description | | ------------------- | -------------------------------------------- | | `--host ` | Host address to bind to (default: `127.0.0.1`) | | `-p, --port ` | Port to listen on (default: `8888`) | Opens a browser-based notebook at `http://localhost:8888`. If `FILE` doesn't exist, a new notebook is created. The UI supports: * Code and markdown cells with Shift+Enter to evaluate * Stdout capture — `println` output appears in cell output * Collapsible output with execution timing * Single-cell undo with environment rollback * Between-cell insert buttons * Keyboard shortcuts (Shift+Enter, Cmd+Enter, Cmd+S, Tab, Escape) #### `sema notebook run` ``` sema notebook run [OPTIONS] ``` | Flag | Description | | ----------------- | ---------------------------------------------------- | | `--cells ` | Only run specific cells (1-based, comma-separated) | Evaluates all code cells in order without starting a browser. Useful for CI validation and batch execution. #### `sema notebook export` ``` sema notebook export [OPTIONS] ``` | Flag | Description | | --------------------- | ------------------------------------- | | `--format ` | Output format (default: `md`) | | `-o, --output ` | Output file (default: stdout) | #### `sema notebook new` ``` sema notebook new [OPTIONS] ``` | Flag | Description | | ----------------- | ------------------------------------------------ | | `-t, --title ` | Notebook title (default: filename stem) | ```bash # Create and open a notebook sema notebook new my-project.sema-nb --title "My Project" sema notebook serve my-project.sema-nb # Run cells headlessly (CI / smoke test) sema notebook run my-project.sema-nb # Export to Markdown sema notebook export my-project.sema-nb -o output.md ``` See the full [Notebook documentation](/docs/notebook) for details on the file format, UI features, and keyboard shortcuts. ### `sema lsp` Start the Language Server Protocol (LSP) server. Communicates over stdio using the standard LSP JSON-RPC protocol. ``` sema lsp ``` Provides diagnostics, completion, hover, go-to-definition, and code lenses. See the [LSP documentation](/docs/lsp) for full feature details and editor setup instructions. ## Examples ```bash # Parse a file into an AST tree sema ast script.sema # Parse an expression into JSON AST sema ast -e '(+ 1 2)' --json # Load a prelude before starting the REPL sema -l prelude.sema # Load helpers, then run a script sema -l helpers.sema script.sema # Run a script and drop into REPL to inspect state sema -i script.sema # Quick one-liner for shell pipelines sema -p '(string/join (map str (range 10)) ",")' # Run without LLM features (faster startup) sema --no-llm script.sema # Compile to bytecode and run sema compile script.sema sema script.semac # Use a specific model sema --chat-model claude-haiku-4-5-20251001 -e '(llm/complete "Hello!")' # Run with shell commands disabled sema --sandbox=no-shell script.sema # Deny multiple capabilities sema --sandbox=no-shell,no-network,no-fs-write script.sema # Strict mode (no shell, fs-write, network, env-write, process, llm, serial) sema --sandbox=strict script.sema # Maximum restriction (deny all dangerous operations) sema --sandbox=all script.sema # Restrict file operations to specific directories sema --allowed-paths=./data,./output script.sema # Combine sandbox and path restrictions sema --sandbox=strict --allowed-paths=./data script.sema ``` ## Shebang Scripts Sema supports `#!` (shebang) lines, so you can write executable scripts: ```sema #!/usr/bin/env sema (println "Hello from a sema script!") ``` Make the file executable and run it directly: ```bash chmod +x script.sema ./script.sema ``` The shebang line is only allowed on the first line of a file and is treated as a comment. `#!/usr/bin/env sema` uses the standard `env` lookup, so it works regardless of how sema was installed (Homebrew, Cargo, manual, etc.). ## Sandbox The `--sandbox` flag restricts access to dangerous operations. Functions remain callable but return a `PermissionDenied` error when invoked. ### Modes | Mode | Description | | --------------- | ---------------------------------------------------------------------- | | `strict` | Deny shell, fs-write, network, env-write, process, llm, serial (reads allowed) | | `all` | Deny all capabilities | | Comma-separated | e.g. `no-shell,no-network` — deny specific capabilities | ### Capabilities The table below lists every function gated by each capability. It mirrors the `register_fn_gated` / `register_fn_path_gated` call sites in the stdlib — if a function is not listed it is never sandboxed. | Capability | Functions affected | | ----------- | -------------------------------------------------------------------------- | | `shell` | `shell` (also requires `process`) | | `fs-read` | `file/read`, `file/read-bytes`, `file/read-lines`, `file/for-each-line`, `file/fold-lines`, `file/exists?`, `file/list`, `file/info`, `file/is-file?`, `file/is-directory?`, `file/is-symlink?`, `file/glob`, `path/absolute`, `load`, `pdf/extract-text`, `pdf/extract-text-pages`, `pdf/page-count`, `pdf/metadata`, `stream/open-input`, `http/file`, `db/query`, `db/query-one`, `db/last-insert-id`, `db/tables`, `db/open-memory` | | `fs-write` | `file/write`, `file/write-bytes`, `file/write-lines`, `file/append`, `file/delete`, `file/rename`, `file/mkdir`, `file/copy`, `stream/open-output`, `kv/open`, `kv/set`, `kv/delete`, `db/open`, `db/exec`, `db/exec-batch` | | `network` | `http/get`, `http/post`, `http/put`, `http/delete`, `http/request`, `http/serve` | | `env-read` | `env`, `sys/env-all`, `sys/cwd`, `sys/home-dir`, `sys/user`, `sys/temp-dir` | | `env-write` | `sys/set-env` | | `process` | `exit`, `sys/pid`, `sys/args`, `sys/which`, `shell` | | `llm` | `llm/complete`, `llm/chat`, `llm/send`, `llm/extract-from-image` | | `serial` | `serial/list`, `serial/open`, `serial/close`, `serial/write`, `serial/read-line`, `serial/send` | `shell` is the only function gated by two capabilities — it requires both `shell` (to launch a system shell) and `process` (because it spawns a child process). Denying either blocks it. Functions not listed (arithmetic, strings, lists, maps, `println`, `path/join`, `sys/platform`, `sys/arch`, `sys/os`, `sys/hostname`, `sys/sema-home`, `time/now-ms`, etc.) are never restricted. ### Path Restrictions The `--allowed-paths` flag restricts all file operations to specific directories. Paths are canonicalized, so traversal attacks like `../../etc/passwd` are blocked. ```bash # Only allow reading/writing within ./project and /tmp sema --allowed-paths=./project,/tmp script.sema ``` When `--allowed-paths` is set, any file operation (`file/read`, `file/write`, `file/list`, etc.) targeting a path outside the allowed directories returns a `PermissionDenied` error. This works independently of `--sandbox` — you can use both together: ```bash # Allow filesystem but only within ./data sema --sandbox=no-shell,no-network --allowed-paths=./data script.sema ``` ## Environment Variables | Variable | Description | | -------------------- | ----------------------------------------------------- | | `ANTHROPIC_API_KEY` | Anthropic API key (auto-detected) | | `OPENAI_API_KEY` | OpenAI API key (auto-detected) | | `GROQ_API_KEY` | Groq API key (auto-detected) | | `XAI_API_KEY` | xAI/Grok API key (auto-detected) | | `MISTRAL_API_KEY` | Mistral API key (auto-detected) | | `MOONSHOT_API_KEY` | Moonshot API key (auto-detected) | | `GOOGLE_API_KEY` | Google Gemini API key (auto-detected) | | `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) | | `JINA_API_KEY` | Jina embeddings API key (auto-detected) | | `VOYAGE_API_KEY` | Voyage embeddings API key (auto-detected) | | `COHERE_API_KEY` | Cohere embeddings API key (auto-detected) | | `SEMA_HOME` | Override Sema home directory (default: `~/.sema`) | | `SEMA_CHAT_MODEL` | Default chat model name | | `SEMA_CHAT_PROVIDER` | Preferred chat provider | | `SEMA_EMBEDDING_MODEL` | Default embedding model name | | `SEMA_EMBEDDING_PROVIDER` | Preferred embedding provider | | `SEMA_REGISTRY_URL` | Override default package registry URL | | `SEMA_RUNTIME_BASE_URL` | Override base URL for cross-compilation runtime downloads | | `NO_COLOR` | Disable colored output when set | ## REPL Commands | Command | Description | | -------------- | ------------------------------------ | | `,quit` / `,q` | Exit the REPL | | `,help` / `,h` | Show help | | `,env` | Show user-defined bindings | | `,builtins` | List all built-in functions | | `,type EXPR` | Evaluate expression and show its type | | `,time EXPR` | Evaluate expression and show elapsed time | | `,doc NAME` | Show info about a binding or special form | ``` sema> ,type 42 :integer sema> ,type '(1 2 3) :list sema> ,doc map map : native-fn sema> ,doc if if : special form sema> ,doc factorial factorial : lambda (n) sema> ,time (foldl + 0 (range 100000)) 4999950000 elapsed: 58.424ms ``` ## REPL Features ### Tab Completion The REPL supports tab completion for: * All built-in function names (e.g., `string/tr` → `string/trim`) * Special forms (`def` → `define`, `defun`, `defmacro`, ...) * User-defined bindings * REPL commands (`,` → `,quit`, `,help`, `,env`, `,builtins`, `,type`, `,time`, `,doc`) ### Multiline Input The REPL automatically detects incomplete expressions (unbalanced parentheses) and continues on the next line: ``` sema> (define (factorial n) ... (if (= n 0) ... 1 ... (* n (factorial (- n 1))))) sema> (factorial 10) 3628800 ``` ### Shadowing Warnings The REPL warns when you accidentally redefine a built-in function: ``` sema> (define map 42) warning: redefining builtin 'map' ``` This is only a warning — the redefinition still works. It helps catch accidental name collisions. ### History Command history is saved to `~/.sema/history.txt` and persists across sessions. ## Error Messages Sema provides detailed, colorized error messages with source context and actionable hints. ### Source Context Errors show the offending source line with a caret pointing to the problem: ``` Error: Reader error at 1:16: unterminated string --> script.sema:1:16 | 1 | (define name "hello | ^ hint: add a closing `"` to end the string ``` ### Type Errors Type errors show the actual value that caused the problem: ``` Error: Type error: expected number, got string ("hello") --> :1:1 | 1 | (+ "hello" 42) | ^ at + (:1:1) ``` ### Arity Errors When you pass the wrong number of arguments, the error shows what you called: ``` Error: Arity error: f expects 1 args, got 3 --> :1:18 | 1 | (define (f x) x) (f 1 2 3) | ^ at f (:1:18) note: in: (f 1 2 3) ``` ### Mismatched Brackets Mixed bracket types are caught with specific guidance: ``` Error: Reader error at 1:7: mismatched bracket: expected `]` to close `[`, found `)` hint: this vector was opened with `[` — close it with `]` ``` ### "Did You Mean?" Typos in function or variable names trigger fuzzy suggestions: ``` Error: Unbound variable: pritnln hint: Did you mean 'println'? ``` ### Lisp Dialect Hints If you use names from other Lisp dialects (Common Lisp, Clojure, Scheme), Sema provides targeted guidance: ``` Error: Unbound variable: setq hint: Sema uses 'set!' for variable assignment Error: Unbound variable: funcall hint: In Sema, functions are called directly: (f arg ...) ``` ### Stack Overflow Infinite recursion gets a helpful hint: ``` Error: Eval error: maximum eval depth exceeded (1024) hint: this usually means infinite recursion; ensure recursive calls are in tail position for TCO, or use 'do' for iteration ``` ### NO\_COLOR Support Set `NO_COLOR=1` to disable colored output, or pipe stderr to a file — Sema auto-detects non-TTY output and strips colors. --- --- url: 'https://sema-lang.com/docs/formatter.md' --- # Formatter Sema includes a built-in code formatter that enforces consistent style across your codebase. It preserves all comments, handles shebang lines, and produces idempotent output. ## Usage ``` sema fmt [OPTIONS] [FILES...] ``` With no arguments, `sema fmt` formats all `.sema` files in the current directory recursively. ### Options | Flag | Description | | --- | --- | | `--check` | Check formatting without writing changes (exit 1 if unformatted) | | `--diff` | Print diff of formatting changes | | `--width ` | Max line width (default: `80`) | | `--indent ` | Indentation width for body forms (default: `2`) | | `--align` | Align consecutive similar forms (defines, cond clauses, let bindings) | ### Examples ```bash # Format all .sema files in current directory sema fmt # Format specific files sema fmt src/main.sema lib/utils.sema # Format with glob patterns sema fmt "src/**/*.sema" # Check formatting in CI (exits 1 if changes needed) sema fmt --check # Preview changes without writing sema fmt --diff # Use wider lines and 4-space indent sema fmt --width 100 --indent 4 # Enable decorative alignment sema fmt --align ``` ## Project Configuration Create a `sema.toml` file in your project root to set persistent formatting options. The formatter walks up from the current directory to find the nearest `sema.toml`. ```toml [fmt] width = 80 indent = 2 align = false ``` ### Options | Key | Type | Default | Description | | --- | --- | --- | --- | | `width` | integer | `80` | Maximum line width | | `indent` | integer | `2` | Number of spaces for body indentation | | `align` | boolean | `false` | Enable decorative column alignment | ### Precedence Settings are merged in this order (later wins): 1. **Defaults** — `width=80`, `indent=2`, `align=false` 2. **`sema.toml`** — project-level configuration 3. **CLI flags** — `--width`, `--indent`, `--align` override everything ```bash # sema.toml sets width=100, but CLI overrides to 120 sema fmt --width 120 ``` ## Formatting Rules ### Line Breaking The formatter uses a "try flat, then multi-line" strategy. If a form fits within the line width, it stays on one line. Otherwise, it breaks across multiple lines with appropriate indentation. ```scheme ;; Fits on one line (+ 1 2 3) ;; Too long — breaks with body indentation (define (calculate-fibonacci-sequence n) (if (< n 2) n (+ (calculate-fibonacci-sequence (- n 1)) (calculate-fibonacci-sequence (- n 2))))) ``` ### Form-Aware Indentation The formatter recognizes Sema's special forms and applies context-appropriate indentation: **Body forms** (`define`, `defn`, `fn`, `lambda`, `do`, `when`, `unless`, etc.) place the head and key arguments on the first line, then indent the body: ```scheme (defn factorial (n) (if (< n 2) n (* n (factorial (- n 1))))) ``` **Binding forms** (`let`, `let*`, `letrec`, `when-let`, `if-let`) keep bindings aligned: ```scheme (let ((x 1) (y 2) (z 3)) (+ x y z)) ``` **Clause forms** (`cond`, `case`, `match`) indent each clause: ```scheme (cond ((= x 1) "one") ((= x 2) "two") (else "other")) ``` **Threading macros** (`->`, `->>`, `as->`, `some->`) indent each step: ```scheme (-> data (filter even?) (map square) (reduce +)) ``` **Conditionals** (`if`) place condition, then-branch, and else-branch on separate lines when they don't fit: ```scheme (if (> x 0) "positive" "non-positive") ``` ### Comment Preservation All comments are preserved in their original positions — inline, trailing, and standalone: ```scheme ;; Module header comment (define x 42) ; inline comment ;; Between forms (define y 10) ``` ### Decorative Alignment When `--align` is enabled (or `align = true` in `sema.toml`), the formatter column-aligns consecutive similar forms for visual clarity. This is opt-in because it can cause noisier git diffs. **Aligned defines:** ```scheme (define x 1) (define longer-y 2) (define z 3) ``` **Aligned cond clauses:** ```scheme (cond ((= x 1) "one") ((= x 100) "hundred") (else "other")) ``` Alignment groups are broken by blank lines, so you can control which forms get aligned together. --- --- url: 'https://sema-lang.com/docs/shell-completions.md' --- # Shell Completions Sema can generate tab-completion scripts for your shell, giving you completions for all CLI flags, options, and subcommands. ``` sema completions ``` Supported shells: `bash`, `zsh`, `fish`, `elvish`, `powershell`. ::: tip Quick Install For Zsh, Bash, Fish, and Elvish, you can auto-install completions to the standard location: ```bash sema completions zsh --install ``` ::: ## Zsh ### macOS (with oh-my-zsh or custom fpath) ```bash mkdir -p ~/.zsh/completions sema completions zsh > ~/.zsh/completions/_sema ``` Make sure your `~/.zshrc` includes the directory in `fpath` **before** `compinit` is called: ```bash fpath=(~/.zsh/completions $fpath) autoload -Uz compinit && compinit ``` If you use oh-my-zsh, add the `fpath` line before `source $ZSH/oh-my-zsh.sh` (oh-my-zsh calls `compinit` for you). ### Linux ```bash # User-local (no sudo required) mkdir -p ~/.zsh/completions sema completions zsh > ~/.zsh/completions/_sema ``` Add to `~/.zshrc` (before `compinit`): ```bash fpath=(~/.zsh/completions $fpath) autoload -Uz compinit && compinit ``` Or install system-wide: ```bash sudo sema completions zsh > /usr/local/share/zsh/site-functions/_sema ``` Then restart your shell or run `exec zsh`. ## Bash ### macOS macOS ships with an older Bash. Install `bash-completion` via Homebrew: ```bash brew install bash-completion@2 ``` Then generate and install the completion script: ```bash mkdir -p ~/.local/share/bash-completion/completions sema completions bash > ~/.local/share/bash-completion/completions/sema ``` ### Linux ```bash # User-local mkdir -p ~/.local/share/bash-completion/completions sema completions bash > ~/.local/share/bash-completion/completions/sema # Or system-wide sudo sema completions bash > /etc/bash_completion.d/sema ``` Then restart your shell or run `source ~/.bashrc`. ## Fish ```bash sema completions fish > ~/.config/fish/completions/sema.fish ``` This works on both macOS and Linux. Completions are picked up automatically on the next shell session. ## PowerShell ```powershell # Create the completions directory if it doesn't exist New-Item -ItemType Directory -Force -Path (Split-Path -Parent $PROFILE) # Append the completion script to your profile sema completions powershell >> $PROFILE ``` Restart PowerShell to activate. ## Elvish ```bash sema completions elvish > ~/.config/elvish/lib/sema.elv ``` Then add `use sema` to `~/.config/elvish/rc.elv`. ## Verifying After installing, restart your shell and type `sema ` then press Tab. You should see completions for flags and subcommands. --- --- url: 'https://sema-lang.com/docs/editors.md' --- # Editor Support Sema has editor plugins for VS Code, IntelliJ IDEA, Vim/Neovim, Emacs, Helix, and Zed. All plugins provide syntax highlighting for the full standard library, special forms, keyword literals, character literals, strings, numbers, comments, and LLM primitives. Sema also includes a built-in [Language Server (LSP)](/docs/lsp) that provides diagnostics, completion, hover, go-to-definition, and code lenses. See the [LSP documentation](/docs/lsp) for setup instructions and feature details. Source code for all editor plugins is in the [`editors/`](https://github.com/HelgeSverre/sema/tree/main/editors) directory. ## VS Code TextMate grammar-based extension with full syntax highlighting, bracket matching, auto-closing pairs, comment toggling, and indentation support. ### Install ```bash EXT_DIR=~/.vscode/extensions/helgesverre.sema-0.1.0 mkdir -p "$EXT_DIR/syntaxes" BASE=https://raw.githubusercontent.com/HelgeSverre/sema/main/editors/vscode/sema curl -fsSL "$BASE/package.json" -o "$EXT_DIR/package.json" curl -fsSL "$BASE/language-configuration.json" -o "$EXT_DIR/language-configuration.json" curl -fsSL "$BASE/syntaxes/sema.tmLanguage.json" -o "$EXT_DIR/syntaxes/sema.tmLanguage.json" curl -fsSL "$BASE/icon.png" -o "$EXT_DIR/icon.png" ``` Restart VS Code after installing. ### Features * Syntax highlighting (special forms, builtins, LLM primitives, keywords, strings, numbers, booleans, character literals, comments) * Bracket matching and auto-closing for `()`, `[]`, `{}`, `""` * Comment toggling (Ctrl+/ / Cmd+/) * Indentation rules for all bracket types * Arithmetic/comparison operator highlighting ## IntelliJ IDEA Full IDE support via the [LSP4IJ](https://plugins.jetbrains.com/plugin/23257-lsp4ij) plugin, connecting to the built-in Sema [Language Server](/docs/lsp) for completions, diagnostics, hover docs, go-to-definition, code lenses, and more. ### Requirements * IntelliJ IDEA 2024.1+ (Community or Ultimate) * [LSP4IJ](https://plugins.jetbrains.com/plugin/23257-lsp4ij) plugin (installed automatically as a dependency) * `sema` binary on PATH (or set the `SEMA_PATH` environment variable) ### Install Build and install from source: ```bash cd editors/intellij ./gradlew buildPlugin ``` Then install the generated ZIP: 1. Open **Settings → Plugins → ⚙️ → Install Plugin from Disk…** 2. Select `editors/intellij/build/distributions/sema-intellij-*.zip` 3. Restart the IDE ### Features * Syntax highlighting (special forms, builtins, keywords, strings, numbers, booleans, character literals, comments, regex literals) * Code completion — builtins, special forms, user-defined symbols, scope-aware local bindings * Hover documentation — builtin docs, function signatures, import info * Go to definition — user definitions, cross-module navigation, import path resolution * Find references — scope-aware, local and cross-file * Rename — scope-aware, blocks renaming builtins and special forms * Diagnostics — real-time parse errors and compile-time warnings * Code lenses — ▶ Run top-level forms with inline result display * Brace matching — auto-pair `()`, `[]`, `{}` * Commenting — line (`;`) and block (`#| |#`) comments * Run configurations — right-click `.sema` files to run, or create from the Run menu * File icons — `.sema` source and `.semac` bytecode * Color settings page — customizable syntax colors under **Settings → Editor → Color Scheme → Sema** ### Configuration Set the `SEMA_PATH` environment variable to the path of your `sema` binary if it's not on PATH: ```bash export SEMA_PATH=/path/to/sema ``` ## Vim / Neovim Pure Vimscript plugin with syntax highlighting, filetype detection, and Lisp-aware indentation. ### vim-plug ```vim Plug 'helgesverre/sema', { 'rtp': 'editors/vim' } ``` ### lazy.nvim ```lua { "helgesverre/sema", config = function(plugin) vim.opt.rtp:append(plugin.dir .. "/editors/vim") end, } ``` ### Manual (Vim) ```bash mkdir -p ~/.vim/syntax ~/.vim/ftdetect ~/.vim/ftplugin BASE=https://raw.githubusercontent.com/HelgeSverre/sema/main/editors/vim curl -fsSL "$BASE/syntax/sema.vim" -o ~/.vim/syntax/sema.vim curl -fsSL "$BASE/ftdetect/sema.vim" -o ~/.vim/ftdetect/sema.vim curl -fsSL "$BASE/ftplugin/sema.vim" -o ~/.vim/ftplugin/sema.vim ``` ### Manual (Neovim) ```bash mkdir -p ~/.config/nvim/syntax ~/.config/nvim/ftdetect ~/.config/nvim/ftplugin BASE=https://raw.githubusercontent.com/HelgeSverre/sema/main/editors/vim curl -fsSL "$BASE/syntax/sema.vim" -o ~/.config/nvim/syntax/sema.vim curl -fsSL "$BASE/ftdetect/sema.vim" -o ~/.config/nvim/ftdetect/sema.vim curl -fsSL "$BASE/ftplugin/sema.vim" -o ~/.config/nvim/ftplugin/sema.vim ``` ### Features * Full syntax highlighting (special forms, builtins, LLM primitives, keywords, character literals, comments) * Automatic filetype detection for `.sema` files * Lisp-aware indentation with correct `lispwords` for all Sema special forms * Comment string configured for `;` ## Emacs Major mode derived from `prog-mode` with Lisp-aware indentation, REPL integration, and imenu support. ### Manual ```bash mkdir -p ~/.emacs.d/site-lisp curl -fsSL https://raw.githubusercontent.com/HelgeSverre/sema/main/editors/emacs/sema-mode.el \ -o ~/.emacs.d/site-lisp/sema-mode.el ``` ```elisp (add-to-list 'load-path "~/.emacs.d/site-lisp") (require 'sema-mode) ``` ### use-package ```elisp (use-package sema-mode :load-path "~/.emacs.d/site-lisp" :mode "\\.sema\\'") ``` ### Doom Emacs In `packages.el`: ```elisp (package! sema-mode :recipe (:local-repo "~/.emacs.d/site-lisp")) ``` In `config.el`: ```elisp (use-package! sema-mode :mode "\\.sema\\'") ``` ### Features * Syntax highlighting (special forms, builtins, keyword literals, booleans, character literals, numbers, strings, comments) * Buffer-local Lisp indentation with Sema-specific form rules * REPL integration — start a Sema REPL and send code interactively * imenu support for navigating `defun`, `define`, `defmacro`, `defagent`, `deftool`, and `define-record-type` definitions * Electric pairs for `()`, `[]`, `{}`, `""` * Proper sexp navigation with quote (`'`), quasiquote (`` ` ``), and unquote (`,`) prefix handling ### Key Bindings | Key | Command | Description | | --------- | --------------------- | -------------------------------- | | `C-c C-z` | `sema-repl` | Start or switch to the Sema REPL | | `C-c C-e` | `sema-send-last-sexp` | Send sexp before point to REPL | | `C-c C-r` | `sema-send-region` | Send selected region to REPL | | `C-c C-b` | `sema-send-buffer` | Send entire buffer to REPL | | `C-c C-l` | `sema-run-file` | Run current file with `sema` | ### Configuration ```elisp ;; Path to the sema binary (default: "sema") (setq sema-program "/path/to/sema") ``` ## Helix Syntax highlighting using the dedicated [tree-sitter-sema](https://github.com/helgesverre/tree-sitter-sema) grammar, with Sema-specific highlight queries, text objects, and indentation. ### Install 1. Download and append the language config to your Helix configuration: ```bash BASE=https://raw.githubusercontent.com/HelgeSverre/sema/main/editors/helix curl -fsSL "$BASE/languages.toml" >> ~/.config/helix/languages.toml ``` > If you already have a `languages.toml`, manually merge the `[[language]]` and `[[grammar]]` sections. 2. Download the query files: ```bash mkdir -p ~/.config/helix/runtime/queries/sema for f in highlights indents textobjects injections; do curl -fsSL "$BASE/queries/sema/$f.scm" \ -o ~/.config/helix/runtime/queries/sema/$f.scm done ``` 3. Fetch and build the Sema grammar: ```bash hx --grammar fetch hx --grammar build ``` 4. Verify: ```bash hx --health sema ``` ### Features * Syntax highlighting via tree-sitter queries (special forms, builtins, LLM primitives, keywords, booleans, character literals, strings, comments) * Text objects — `maf`/`mif` for function definitions, `mac`/`mic` for agent/tool definitions * Smart auto-pairs for `()`, `[]`, `{}`, `""` * Indentation support * `;` line comments ### How It Works The `grammar = "sema"` setting tells Helix to parse `.sema` files using the [tree-sitter-sema](https://github.com/helgesverre/tree-sitter-sema) grammar, which provides native support for Sema-specific syntax like keyword literals (`:name`), hash maps, and vectors. Custom query files in `queries/sema/` provide Sema-specific captures for LLM primitives, slash-namespaced builtins (`string/trim`, `llm/chat`), and special forms like `defagent` and `deftool`. ## Zed Extension using the dedicated [tree-sitter-sema](https://github.com/helgesverre/tree-sitter-sema) grammar with full syntax highlighting, bracket matching, code outline, and auto-pairs. ### Install 1. Open Zed 2. Go to **Zed → Extensions** (or Cmd+Shift+X) 3. Click **Install Dev Extension** 4. Select the `editors/zed` directory from the Sema repository ### Features * Syntax highlighting (special forms, builtins, LLM primitives, keyword literals, booleans, `nil`, strings, comments) * Smart auto-pairs for `()`, `[]`, `{}`, `""` * Code outline for `define`, `defun`, `defmacro`, `defagent`, `deftool` * Bracket matching * `;` line comments * 2-space indentation --- --- url: 'https://sema-lang.com/docs/lsp.md' --- # Language Server (LSP) Sema includes a built-in [Language Server Protocol](https://microsoft.github.io/language-server-protocol/) server that provides IDE features for any editor with LSP support. The server communicates over stdio using the standard LSP JSON-RPC protocol. ```bash sema lsp ``` ::: warning VS Code users The VS Code extension for Sema currently provides **syntax highlighting only** and does not yet wire up the language server. The features documented below are available today in any editor that speaks LSP directly (Neovim, Helix, Emacs `eglot`/`lsp-mode`, Zed, Sublime LSP). VS Code LSP integration is planned. ::: ## Features ### Diagnostics Real-time error reporting as you type. The LSP runs two analysis passes: * **Parse diagnostics** (errors) — syntax errors like unclosed parentheses, unterminated strings, and invalid tokens. Uses error recovery to report multiple errors at once. * **Compile diagnostics** (warnings) — deeper issues caught by the bytecode compilation pipeline, such as unbound variables, arity mismatches, and invalid special form usage. Only runs when parsing succeeds (no false positives from incomplete code). ### Completion Context-aware autocompletion triggered by `(` and space characters. Completes from three sources: * **Special forms** — `define`, `defun`, `lambda`, `if`, `cond`, `let`, `match`, `try`, `import`, etc. * **Standard library** — all built-in functions including namespaced ones like `string/trim`, `file/read`, `http/get` * **User definitions** — top-level `define`, `defun`, `defn`, `defmacro`, `defagent`, and `deftool` forms in the current document. Cached definitions survive syntax errors while typing. ### Hover Hover over any symbol to see documentation: * **Builtins & special forms** — shows documentation pulled from the stdlib reference docs, including descriptions and usage examples * **User-defined functions** — shows the function signature with parameter list * **Imported symbols** — shows the signature and which module it was imported from * **Other builtins** — shows the symbol name and whether it's a special form or built-in function Hover continues to work even when the file has syntax errors — the LSP uses error recovery to parse as much as possible. ### Go to Definition Jump to the definition of symbols with precise cursor targeting: * **User definitions** — `define`, `defun`, `defn`, `defmacro`, `defagent`, `deftool` — jumps directly to the **name** of the definition (e.g., `foo` in `(defun foo ...)`), not the entire form * **Cross-file definitions** — if a symbol is not defined locally, the LSP follows `import` and `load` paths to find the definition in other files. Imported file parse results are cached by modification time for performance. * **Imports** — go-to-definition on `(import "utils.sema")` or `(load "config.sema")` opens the referenced file. Supports relative paths, absolute paths, and package imports Go-to-definition works even when the file has syntax errors. ### Find All References Find every occurrence of a symbol across all open documents. Searches all files currently tracked by the LSP server. ### Document Symbols Outline view and breadcrumbs for the current file. Lists all top-level definitions (`define`, `defun`, `defn`, `defmacro`, `defagent`, `deftool`) with their kind, full form range, and precise name selection range. ### Workspace Symbols Fuzzy search for symbols across all open documents. Triggered via the command palette or keybinding (e.g., `Ctrl+T` in VS Code). Matches symbol names case-insensitively. ### Signature Help Parameter hints shown while typing inside function calls. Triggered by `(` and space characters: * **User-defined functions** — shows the function name and parameter list with active parameter highlighting * **Imported functions** — shows the signature from the imported module * **Built-in functions** — shows the function documentation from the stdlib reference ### Rename Rename a user-defined symbol across all open documents: * **Prepare rename** — verifies the cursor is on a renameable symbol (not a built-in or special form) and returns its range * **Rename** — finds all occurrences across all open documents and generates text edits ### Code Lenses Every top-level expression shows a **▶ Run** code lens above it. Clicking it evaluates all forms up to and including that expression in a sandboxed subprocess using `sema eval`, and reports the result (value, stdout, stderr, timing) back to the editor via a custom `sema/evalResult` notification. ### Formatting Whole-document formatting (`textDocument/formatting`) powered by the same engine as the `sema fmt` CLI. Reformat Code normalizes spacing, indentation, and line breaks. Returns no edits when the source has syntax errors, so an unparseable buffer is never disturbed. **Range formatting** (`textDocument/rangeFormatting`) formats a selection. Because formatting a partial s-expression is unsafe in a Lisp, the server expands the selection to the smallest set of *whole* top-level forms it overlaps, formats those, and returns edits scoped to that span. A selection that touches no complete form (e.g. blank space between forms) is a no-op, and an unparseable buffer is left untouched. ### Selection Range Structural selection (`textDocument/selectionRange`) for Extend/Shrink Selection. Expands the selection outward through enclosing s-expressions — from the symbol under the cursor to its containing list and on up to the top-level form. Also backs code-block navigation. ### Call Hierarchy `textDocument/prepareCallHierarchy` with incoming and outgoing calls. Incoming calls list every definition whose body invokes the target; outgoing calls list the known definitions invoked from the target's body. ### Go to Declaration `textDocument/declaration` resolves to the same target as Go to Definition (Sema has no separate forward declarations). ### Document Links `import` and `load` path strings are rendered as clickable links (`textDocument/documentLink`) that open the referenced file. ## Editor Setup ### Helix Add to your `~/.config/helix/languages.toml`: ```toml [[language]] name = "sema" language-servers = ["sema-lsp"] [language-server.sema-lsp] command = "sema" args = ["lsp"] ``` ### Neovim Using [nvim-lspconfig](https://github.com/neovim/nvim-lspconfig): ```lua local lspconfig = require('lspconfig') local configs = require('lspconfig.configs') if not configs.sema then configs.sema = { default_config = { cmd = { 'sema', 'lsp' }, filetypes = { 'sema' }, root_dir = lspconfig.util.root_pattern('sema.toml', '.git'), }, } end lspconfig.sema.setup({}) ``` ### Zed Zed can be configured to use the LSP by adding a language server entry. See the [Zed documentation](https://zed.dev/docs/languages) for configuring custom language servers. ### VS Code The VS Code extension does not yet include LSP integration. For now, the extension provides syntax highlighting only. LSP support is planned. ## Architecture The LSP server uses [tower-lsp](https://github.com/ebkalderon/tower-lsp) and a dedicated backend thread architecture: * **Async layer** — tower-lsp handles the JSON-RPC protocol over stdio. Async handlers forward requests to the backend thread via `tokio::sync::mpsc` channels and receive responses via `tokio::sync::oneshot` channels. * **Backend thread** — a single `std::thread` owns all `Rc`-based state (parsed ASTs, interpreter environment, document cache). This avoids `Send`/`Sync` constraints while keeping the server responsive. * **Import cache** — parsed results for imported files are cached by file path and modification time, avoiding redundant re-parsing on every request. * **Subprocess execution** — code lens "Run" commands spawn a separate `sema eval` process in a sandboxed environment, keeping the backend thread free for diagnostics and completions. ### Custom Notifications The server sends a custom `sema/evalResult` notification after executing a code lens. The payload includes: | Field | Type | Description | |-------------|---------|--------------------------------| | `uri` | string | Document URI | | `range` | Range | Range of the evaluated form | | `kind` | string | Always `"run"` | | `value` | string? | Return value (if successful) | | `stdout` | string | Captured stdout | | `stderr` | string | Captured stderr | | `ok` | boolean | Whether evaluation succeeded | | `error` | string? | Error message (if failed) | | `elapsedMs` | number | Execution time in milliseconds | --- --- url: 'https://sema-lang.com/docs/dap.md' --- # Debugger (DAP) Sema Lisp includes a built-in [Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/) (DAP) server that allows step-by-step debugging of Sema programs. By running over standard I/O (stdin/stdout) using the standard JSON-RPC DAP protocol, it integrates directly with modern editor debuggers. ```bash sema dap ``` ::: info Runs on the bytecode VM The debugger operates on the stack-based bytecode VM — Sema's sole evaluator. Programs debugged via DAP are compiled to bytecode automatically upon launch. ::: ## Features The `sema dap` server implements the core features of the Debug Adapter Protocol: ### Launch Configuration * **Program Target**: Specify the absolute path to the `.sema` file to debug. * **Stop on Entry**: Set `stopOnEntry: true` to pause execution on the first bytecode instruction before any user forms run, allowing you to set up breakpoints or inspect the entry environment. ### Breakpoints * **Dynamic Breakpoints**: Set, toggle, or clear line breakpoints before launching or in real-time while execution is paused. * **Breakpoint Verification**: The compiler maps source file line numbers to bytecode instructions and returns verified locations back to the editor. * **Conditional Breakpoints**: Attach a `condition` expression to a breakpoint; execution only stops there when the condition evaluates truthy in the paused frame (using the same evaluator as `evaluate`/hover). A condition that errors fails open — it stops — so the problem surfaces rather than being silently swallowed. * **Exception Breakpoints**: Enable the `uncaught` filter to stop on a runtime error that escapes to the top level (errors handled by `try`/`catch` do not trigger it). The `exceptionInfo` request reports the error message. Note: at an uncaught-exception stop the VM has already unwound its frames, so the stack/variables there are best-effort — the message is the load-bearing detail. ### Stepping Controls * **Step Over** (`next`): Execute the current form and pause at the next sibling expression in the same frame. * **Step Into** (`stepIn`): Follow execution inside user-defined functions or lambda expressions. * **Step Out** (`stepOut`): Execute all remaining instructions in the active function frame and pause immediately upon returning to the caller. * **Continue**: Resume program execution until the next breakpoint is hit or the program finishes. * **Pause**: Pause execution of a running VM thread. ### Call Stack Inspection * Renders the call hierarchy stack trace with frame IDs, function/closure names, active line numbers, columns, and absolute source file paths. ### Variable & Scope Inspection * **Locals, Closure & Globals**: Separates variables into local bindings (inside `let`, function parameters), captured **upvalues** (shown by name under a *Closure* scope), and global variables. Locals are scoped to the current instruction, so bindings that are not yet in scope or whose block has already exited are not shown. * **Type Inspection**: Displays the value alongside its data type (e.g., `list`, `vector`, `map`, `string`, `number`, `boolean`, `closure`). * **Complex Objects**: Fully supports lazily expandable nested objects — lists, vectors, maps, hash maps, byte vectors, and named **record fields** — using hierarchical variable references. ### Evaluate & Set Values (while paused) * **Evaluate Expressions**: Evaluate arbitrary Sema expressions in the context of the selected stack frame (editor watch expressions, hover, and the debug console REPL). A top-level `(set! …)` of an in-scope local, upvalue, or global is written back to the running program. * **Set Variable**: Edit a local or upvalue value in place from the editor's Variables pane; the new value is written through to the live frame. ### Console Output Redirection * Program output (`stdout`) and standard error (`stderr`) prints are intercepted and redirected into standard DAP `output` event frames. This prevents user outputs from corrupting the JSON-RPC protocol transport channel while ensuring they appear in the editor's debug console. *** ## Editor Setup ### Helix Helix supports debuggers out-of-the-box via the `lldb-dap` architecture. Merge the following configuration into your `~/.config/helix/languages.toml`: ```toml [[language]] name = "sema" language-servers = ["sema-lsp"] debugger = { name = "sema-dap", transport = "stdio" } [debugger.sema-dap] command = "sema" args = ["dap"] transport = "stdio" templates = [ { name = "launch", completion = [ { name = "program", completion = "filename" } ] } ] ``` To debug in Helix: 1. Open a `.sema` file. 2. Press Space + g to open the debug menu. 3. Select `launch` and press Enter to start. ### VS Code Configure a debug launch task in your project's `.vscode/launch.json`: ```json { "version": "0.2.0", "configurations": [ { "type": "sema", "request": "launch", "name": "Debug Sema Script", "program": "${file}", "stopOnEntry": true } ] } ``` ### Neovim Using [nvim-dap](https://github.com/mfussenegger/nvim-dap), register the `sema` adapter and configuration: ```lua local dap = require('dap') dap.adapters.sema = { type = 'executable', command = 'sema', args = { 'dap' } } dap.configurations.sema = { { type = 'sema', request = 'launch', name = "Launch file", program = "${file}", stopOnEntry = true, }, } ``` ### Emacs Using [dap-mode](https://github.com/emacs-lsp/dap-mode), register the debugging template: ```elisp (dap-register-adapter "sema-dap" (lambda (conf) (list :type "executable" :command "sema" :args '("dap")))) (dap-register-debug-template "Sema Launch" (list :type "sema" :request "launch" :name "Sema Debug" :program "${file}" :stopOnEntry t)) ``` *** ## Architecture The `sema-dap` server uses an async-synchronous bridge structure to ensure standard I/O responsiveness while handling the single-threaded nature of Lisp evaluation: * **Frontend Async Loop**: Runs a Tokio event loop handling stdin/stdout messages. Incoming requests are parsed into DAP protocol actions, and outgoing events (like VM output prints or stop events) are serialized to stdout. * **Backend Executor Thread**: Runs VM bytecode execution inside a dedicated OS thread. * **VM Debug Hook**: Utilizes the `sema-vm::VM` debug hooks (`execute_debug`). Every instruction step checks if a breakpoint is hit, stepped, or paused, updating `DebugState` and sending notifications back to the async frontend. --- --- url: 'https://sema-lang.com/docs/mcp.md' --- # Model Context Protocol (MCP) Sema includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server. This allows LLM clients (such as Claude Desktop, Cursor, or Claude Code) to inspect, compile, format, evaluate, and build Sema code in your host environment, as well as execute user-defined Lisp tools. The server communicates over standard input/output (stdio) using JSON-RPC 2.0. ```bash sema mcp ``` *** ## Default MCP Tools When started, the MCP server exposes a set of core developer tools: | Tool Name | Description | Parameters | |---|---|---| | `run_file` | Run a `.sema` or `.semac` file and get output/return value | `file_path` (string), `arguments` (array of strings, optional) | | `compile` | Compile a `.sema` file to `.semac` bytecode | `source_path` (string), `output_path` (string, optional) | | `eval` | Evaluate a single Sema expression string and capture output | `code` (string) | | `docs` | Retrieve docstring and signature details for any symbol | `symbol` (string) | | `fmt` | Format a Sema file or code string | `file_path` (string, optional), `code` (string, optional) | | `disasm` | Disassemble a `.sema` or `.semac` file to VM instructions | `file_path` (string) | | `build` | Compile a `.sema` file into a standalone executable | `source_path` (string), `output_path` (string) | | `info` | Get environment and version information about the server | None | ### Path Resolution All file paths passed to these tools are resolved relative to the current working directory (CWD) of the MCP server process. Both absolute and relative paths are supported. *** ## Filepath Mode & Custom Tools You can expose custom tools defined in your Sema scripts to the LLM client by passing filepaths when starting the server: ```bash sema mcp tools/receipts.sema ``` When started in filepath mode, the server instantiates the interpreter context, evaluates the specified files, and automatically exposes any tools defined via the `deftool` special form. ### Defining a Custom Tool: PDF Receipt Extractor Here is a real-world example of an MCP tool that reads a receipt PDF and extracts structured data from it by combining Sema's [PDF processing](./stdlib/pdf) and [LLM Structured Extraction](./llm/extraction): ```sema ;; tools/receipts.sema (deftool extract-receipt "Extract structured transaction data (merchant, amount, currency, date, line items) from a PDF invoice/receipt." {:pdf-path {:type :string :description "Path to the invoice/receipt PDF file (e.g. invoice.pdf)"}} (lambda (pdf-path) (if (not (file/exists? pdf-path)) (error (string-append "Receipt file not found: " pdf-path)) (begin (llm/auto-configure) ;; Extract text and clean up whitespace for LLM processing (define text (text/clean-whitespace (pdf/extract-text pdf-path))) ;; Call structured LLM extraction (llm/extract {:vendor {:type :string :description "Name of the merchant"} :amount {:type :number :description "Total bill amount"} :currency {:type :string :description "3-letter currency code (e.g. USD, EUR)"} :date {:type :string :description "Date of transaction in YYYY-MM-DD format"} :line-items {:type :array :description "List of individual items purchased" :items {:type :object :properties {:description {:type :string} :price {:type :number}}}}} text))))) ``` *** ## Standalone Binary Mode Sema's `build` command compiles scripts into standalone native executables. Every compiled standalone binary has built-in MCP server capabilities out-of-the-box: ```bash # Compile the custom receipt tool sema build tools/receipts.sema -o receipt-extractor # Run as a normal CLI tool (from your shell/scripts) ./receipt-extractor --pdf-path invoice.pdf # Start the stdio MCP server exposing the embedded tools ./receipt-extractor --mcp ``` When started with `--mcp`, the executable evaluates its embedded bytecode (which registers the `extract-receipt` tool definition in the environment) and then transitions to starting the stdio MCP server loop. *** ## Tool Filtering & Visibility When loading files in filepath mode, you can control which tools are exposed to the LLM: ### 1. Private Prefix Any tool whose name begins with an underscore (e.g., `_secret-helper`) is treated as a private helper and excluded from discovery. ### 2. Declarative Metadata You can declare a tool as private by adding `:mcp/expose #f` or `:private #t` in its parameters metadata map: ```sema (deftool internal-helper "Not visible to MCP clients" {:mcp/expose #f} (lambda () (println "internal"))) ``` ### 3. Command Line Filters You can explicitly include or exclude tools using the `--include` and `--exclude` flags: ```bash # Only expose the receipt extractor tool sema mcp tools/receipts.sema --include extract-receipt # Expose all tools except order-pineapple-pizza (which we probably shouldn't be running automatically) sema mcp tools/receipts.sema --exclude order-pineapple-pizza ``` *** ## Stateful Notebook Tools The MCP server exposes a set of stateful notebook management and evaluation tools to allow LLMs to directly read, write, and execute cell-based `.sema-nb` files. | Tool Name | Description | Parameters | |---|---|---| | `notebook/new` | Create a new empty `.sema-nb` notebook | `path` (string), `title` (string, optional), `overwrite` (boolean, optional — defaults to `false`; creation fails if a file already exists at `path`) | | `notebook/read` | Read the structure, source, and outputs of a notebook | `path` (string) | | `notebook/add_cell` | Append or insert a new cell (code or markdown) | `path` (string), `type` (string: "code"/"markdown"), `source` (string), `after_id` (string, optional) | | `notebook/update_cell` | Update the source/type of an existing cell | `path` (string), `id` (string), `source` (string, optional), `type` (string, optional) | | `notebook/delete_cell` | Delete a cell from a notebook | `path` (string), `id` (string) | | `notebook/eval_cell` | Evaluate a single code cell | `path` (string), `id` (string) | | `notebook/eval_all` | Evaluate all code cells in order | `path` (string) | | `notebook/export` | Export a notebook to Markdown or a clean `.sema` script | `path` (string), `format` (string: "markdown"/"source"), `output_path` (string, optional) | ### In-Memory State Caching To support interactive cell execution (where Cell 2 relies on variables or functions defined in Cell 1), the MCP server maintains an in-memory cache of notebook evaluation engines mapped by their canonical file paths. When a cell is evaluated, the cached engine runs the code, updates the cell output, saves the updated JSON representation back to disk, and returns the result, ensuring state is preserved across consecutive tool calls. --- --- url: 'https://sema-lang.com/docs/notebook.md' --- # Notebook Sema includes a Jupyter-inspired notebook interface for interactive development. Write code in cells, evaluate them individually or all at once, and see results inline — all in the browser. ## Quick Start ```bash # Create a new notebook sema notebook new my-notebook.sema-nb # Open in the browser sema notebook serve my-notebook.sema-nb ``` This starts a local server at `http://localhost:8888` with the notebook UI. ## Cell Types ### Code Cells Code cells contain Sema expressions. Evaluate with **Shift+Enter** (run and advance) or **Cmd/Ctrl+Enter** (run and stay). Cells share a persistent environment — definitions in earlier cells are visible in later ones: ```sema ;; Cell 1 (define greet (fn (name) (format "Hello, ~a!" name))) ;; Cell 2 — can use greet from Cell 1 (greet "Sema") ;=> "Hello, Sema!" ``` Output from `println`, `display`, and `print` is captured and shown in the cell output area. ### Markdown Cells Markdown cells render formatted text for documentation, section headers, and notes. Supports headings, bold, italic, inline code, code blocks, and lists. Click rendered markdown to edit. Press **Shift+Enter** to re-render. ## Keyboard Shortcuts | Shortcut | Action | | ---------------- | ----------------------------- | | `Shift+Enter` | Run cell and advance to next | | `Cmd/Ctrl+Enter` | Run cell and stay focused | | `Cmd/Ctrl+S` | Save notebook | | `Tab` | Insert 2 spaces | | `Escape` | Deselect cell | ## Toolbar | Button | Action | | --------------- | ------------------------------------------- | | **+ Code** | Add a new code cell at the end | | **+ Markdown** | Add a new markdown cell at the end | | **Run All** | Evaluate all code cells in order | | **Undo** | Undo the last cell evaluation (restores environment) | | **Save** | Save the notebook to disk | | **Reset** | Clear all outputs and reset the environment | You can also insert cells between existing ones by hovering between cells and clicking the **+** button that appears. ## Undo After evaluating a cell, click **Undo** (or the inline "Undo cell" button on error outputs) to roll back: * The cell's outputs are restored to their previous state * The interpreter environment is rolled back to before the evaluation * Downstream stale markers are reverted This is useful when a cell modifies global state unexpectedly. ## File Format Notebooks are saved as `.sema-nb` files in JSON format: ```json { "version": 1, "metadata": { "title": "My Notebook", "created": "2026-01-01T00:00:00Z", "modified": "2026-01-01T12:00:00Z", "sema_version": "1.14.2" }, "cells": [ { "id": "c12345678", "type": "code", "source": "(+ 1 2)", "outputs": [] } ] } ``` ### Cell `id` format Cell IDs are short stable strings of the form `"c"` followed by 8 lowercase hexadecimal characters (e.g. `"c4a3f2b1"`). They are generated by hashing a fresh UUIDv4 to its first 8 hex digits and prefixing with `c`. Treat them as opaque tokens — they're stable for the lifetime of a cell and used by the REST API to address cells. ### `stale` flag Code cells carry an optional `stale: bool` field. When an upstream cell is re-evaluated, all downstream code cells with existing outputs are flipped to `stale: true`. The field is omitted from serialized JSON when `false`. A stale cell still displays its previous outputs, but the UI indicates that those outputs may not reflect the current state of the environment until the cell is re-evaluated. ### Fully-evaluated cell example When a code cell has been evaluated, its `outputs` array contains one or more `CellOutput` objects. Here is a cell with every field populated: ```json { "id": "c4a3f2b1", "type": "code", "source": "(+ 1 2)", "outputs": [ { "type": "value", "display": "3", "sema_value": "3", "timestamp": "2026-01-01T12:34:56.789Z", "cost_usd": 0.0, "requires_reeval": false, "duration_ms": 12 } ] } ``` Field reference (`CellOutput`): * `type` — output discriminator (see table below). * `display` — human-readable string shown in the UI. * `sema_value` — round-trippable S-expression form of the value (omitted when not applicable, e.g. for `error` / `stdout` outputs). * `timestamp` — RFC 3339 UTC timestamp of when the output was produced. * `cost_usd` — estimated LLM cost, for outputs from LLM calls; omitted otherwise. * `requires_reeval` — `true` for opaque values (lambdas, native fns, macros, streams, thunks) that cannot be round-tripped through `sema_value` and must be re-evaluated on notebook reload. * `duration_ms` — wall-clock evaluation time of the cell, in milliseconds. ### `OutputType` discriminator The `type` field on `CellOutput` is one of: | Value | Meaning | | --------- | -------------------------------------------------------------------------------- | | `value` | The cell returned a normal Sema value (the common case for successful evals). | | `error` | Evaluation produced a `SemaError`; `display` holds the formatted error message. | | `stdout` | Captured `println` / `display` / `print` output produced during cell evaluation. | A single evaluated code cell may have both a `stdout` output (first) and a `value` or `error` output (second), in that order. ### Version compatibility The current notebook format version is `1`. Tools reading `.sema-nb` files should treat `version` as a major version number and refuse to load files where `MAJOR != 1` rather than silently mis-parsing them. Forward-compatible field additions (e.g. new optional fields on `CellOutput`) do not bump `version`; breaking changes to existing fields will. ## Headless Execution Run all cells without starting the browser UI: ```bash sema notebook run my-notebook.sema-nb ``` This evaluates all code cells in order, printing stdout to the terminal. Useful for CI validation or batch processing. Run specific cells by index (1-based): ```bash sema notebook run my-notebook.sema-nb --cells 1,3,5 ``` ## Export Export a notebook to Markdown: ```bash # To stdout sema notebook export my-notebook.sema-nb # To file sema notebook export my-notebook.sema-nb -o output.md ``` The export includes code blocks with output, markdown sections, and error messages. ## REST API The notebook server exposes a JSON HTTP API on the same port as the browser UI. Everything the UI does goes through these endpoints — they're stable enough to script against from external tools. ### Notebook & cells | Method | Path | Description | | ------ | -------------------------- | -------------------------------------------------- | | GET | `/api/notebook` | Return the full notebook (cells + metadata) | | POST | `/api/cells` | Create a new cell | | GET | `/api/cells/{id}` | Fetch a single rendered cell | | POST | `/api/cells/{id}` | Update a cell's source or type | | DELETE | `/api/cells/{id}` | Delete a cell | | POST | `/api/cells/{id}/eval` | Evaluate a single cell | | POST | `/api/cells/reorder` | Reorder cells by id | | POST | `/api/eval-all` | Evaluate all cells (optionally with edited source) | | GET | `/api/env` | Inspect the current shared cell environment | | POST | `/api/reset` | Reset the evaluation environment | | POST | `/api/undo` | Undo the last cell edit/delete | | POST | `/api/save` | Save the notebook to disk | Create cell request: ```json { "type": "code", "source": "(+ 1 2)", "after": "" } ``` Update cell request: ```json { "source": "(+ 1 2)", "type": "code" } ``` Reorder request: ```json { "cell_ids": ["id-1", "id-2", "id-3"] } ``` Eval-all request (optional — pass currently-edited sources without saving first): ```json { "sources": [["cell-id", "(println \"hi\")"]] } ``` ### VFS The notebook server exposes a small virtual filesystem so the browser UI can read and write files alongside the notebook. | Method | Path | Description | | ------ | ------------ | -------------------------------------------- | | GET | `/vfs/read` | Read a file: `?path=foo.txt` → text body | | POST | `/vfs/write` | Write a file (JSON body, see below) | | GET | `/vfs/list` | List a directory: `?path=.` → JSON entries | Write request: ```json { "path": "notes.txt", "content": "hello" } ``` List response (`FileEntry[]`): ```json [{ "name": "notes.txt", "is_dir": false, "size": 5 }] ``` ::: warning VFS scope VFS endpoints are sandboxed to the **parent directory of the notebook file**. When `sema notebook serve` is started **without** a `--notebook` path, the VFS root falls back to the current working directory (`$PWD`). The server prints a warning at startup in that case — prefer passing a notebook path if you don't want the whole `$PWD` to be reachable. ::: ## Security The notebook server is a **trusted-local** developer tool. Cells run arbitrary Sema code — including file and network access — with the full privileges of the user who started the server, and the server has **no authentication or authorization layer**. For this reason the server binds to the loopback interface (`127.0.0.1`) by default, so it is only reachable from the local machine. You can override the bind address with `--host`, but binding to a non-loopback address such as `0.0.0.0` exposes an unauthenticated remote code-execution endpoint to the network. If you need remote access, putting it behind a firewall, VPN, or an authenticating reverse proxy is **the operator's responsibility**. ## CLI Reference See [`sema notebook`](/docs/cli#sema-notebook) in the CLI reference for all flags and options. --- --- url: 'https://sema-lang.com/docs/packages.md' --- # Package Manager ::: warning Registry Status The central package registry (`pkg.sema-lang.com`) is not yet live. **Git-based packages work today** — you can install any package directly from a git repository. Registry commands (`search`, `info`, `publish`, `yank`, `login`) require a registry instance; see [Self-Hosted Registry](#self-hosted-registry) to run your own. ::: Sema supports two package sources: a **package registry** (for published packages with semver versions) and **direct git repos** (for development branches, private code, or unregistered packages). Both can be mixed freely in the same project. ## Package Format A package is a directory containing at minimum one of: * **`package.sema`** — the default entrypoint (what gets loaded on import) * **`sema.toml`** — optional package metadata, dependencies, and custom entrypoint ### `sema.toml` ```toml [package] name = "my-package" version = "0.1.0" description = "A useful Sema library" entrypoint = "lib.sema" [deps] # Registry packages — short name = version http-helpers = "1.0.0" json-schema = "2.1.0" # Git packages — quoted URL = git ref "github.com/user/private-lib" = "main" ``` The `[package]` section defines metadata: | Field | Description | | ------------- | ------------------------------------------------- | | `name` | Package name | | `version` | Semver version string (required for publishing) | | `description` | Short description of the package | | `entrypoint` | File loaded on import (default: `package.sema`) | The `[deps]` section maps package identifiers to versions or git refs: * **Keys without `/`** are registry packages (e.g., `http-helpers`) * **Keys with `/`** are git packages (e.g., `"github.com/user/repo"`) ### Entrypoint Resolution When you import a package, Sema resolves the entrypoint in this order: 1. **Direct file** — `~/.sema/packages/.sema` (for sub-module imports like `github.com/user/repo/utils`) 2. **Custom entrypoint** — if `sema.toml` exists and has an `entrypoint = "..."` field, that file is loaded 3. **Default entrypoint** — `package.sema` in the package directory ## CLI Commands ### `sema pkg init` Initialize a new project in the current directory. Creates a `sema.toml` with the directory name as the package name. ```bash mkdir my-package && cd my-package sema pkg init ``` This creates both `sema.toml` (with `entrypoint = "package.sema"`) and a starter `package.sema` file. If `package.sema` already exists, only the manifest is created. ### `sema pkg add` Add a package from the registry or a git repository. ```bash # Registry packages (short names) sema pkg add http-helpers # latest version sema pkg add http-helpers@1.0.0 # specific version # Git packages (URL paths) sema pkg add github.com/user/repo # latest default branch (main) sema pkg add github.com/user/repo@v1.2.0 # specific tag sema pkg add github.com/user/repo@main # specific branch ``` The source is auto-detected: if the first path segment contains a dot (looks like a hostname), it's treated as a git URL. Otherwise, it's looked up on the configured registry. You can override the registry with `--registry`: ```bash sema pkg add http-helpers --registry https://my-registry.com ``` If a `sema.toml` exists in the current directory, the package is automatically added to the `[deps]` section. If no `sema.toml` exists, one is created automatically with the package added to `[deps]`. ### `sema pkg install` Fetch all dependencies listed in `sema.toml`. ```bash sema pkg install sema pkg install --locked # fail if sema.lock is missing or out of sync (for CI) ``` Reads the `[deps]` section and fetches each dependency — routing to the registry or git based on the key format (see [sema.toml](#sema-toml) above). Requires a `sema.toml` in the current directory. When a `sema.lock` file exists, locked entries are installed at their exact pinned versions with integrity verification (commit SHA for git, SHA256 checksum for registry). Dependencies not yet in the lock file are resolved fresh and appended. Orphaned lock entries (in lock but not in `sema.toml`) are pruned automatically. The `--locked` flag enforces strict reproducibility for CI: * Fails if `sema.lock` is missing * Fails if any dep in `sema.toml` is not in the lock (or vice versa) * Fails if the version/ref in `sema.toml` doesn't match the lock entry * Never resolves fresh — only installs from lock ### `sema pkg update` Update installed packages to their latest versions. ```bash sema pkg update # update all installed packages sema pkg update http-helpers # update a specific registry package sema pkg update github.com/user/repo # update a specific git package sema pkg update repo # update by short name ``` * **Registry packages** check for a newer version and re-download if available * **Git packages** fetch from origin and pull the latest changes Both `sema.toml` and `sema.lock` are updated to reflect the new versions. ### `sema pkg remove` Remove an installed package from the global cache, `sema.toml`, and `sema.lock`. ```bash sema pkg remove http-helpers # registry package sema pkg remove github.com/user/repo # git package by full path sema pkg remove repo # by short name ``` ### `sema pkg list` List all installed packages with their version/ref and source. ```bash sema pkg list ``` ``` http-helpers (1.0.0) [https://pkg.sema-lang.com] github.com/user/repo (v1.2.0) [git] github.com/user/utils (main) [git] ``` ### `sema pkg search` Search the registry for packages. ```bash sema pkg search http sema pkg search json --registry https://my-registry.com ``` ``` Found 3 packages: http-helpers — HTTP client utilities for Sema http-server — Simple HTTP server framework http-mock — HTTP mocking for tests ``` ### `sema pkg info` Show detailed package information from the registry. ```bash sema pkg info http-helpers ``` ``` http-helpers HTTP client utilities for Sema repo: https://github.com/user/http-helpers owners: alice, bob Versions: 2.0.0 — 12480 bytes, 2026-02-20T10:30:00Z 1.1.0 — 11200 bytes, 2026-01-15T08:00:00Z 1.0.0 — 9800 bytes, 2025-12-01T12:00:00Z ``` ### `sema pkg publish` Publish the current package to the registry. Requires a `sema.toml` with `[package]` containing `name` and `version`, and an active login. ```bash sema pkg publish sema pkg publish --registry https://my-registry.com ``` ``` Packaging... 24576 bytes compressed ✓ Published http-helpers@1.0.0 (24576 bytes, sha256:abc123...) ``` ### `sema pkg yank` Yank a published version to prevent new installs (existing installs are unaffected). ```bash sema pkg yank http-helpers@1.0.0 ``` ### `sema pkg login` Authenticate with a package registry by providing an API token. ```bash sema pkg login --token sema_pat_... # default registry sema pkg login --token sema_pat_... --registry https://... # self-hosted ``` Tokens are stored in `~/.sema/credentials.toml` with `0600` file permissions. You can generate a token from your registry account page. ### `sema pkg logout` Remove stored registry credentials. ```bash sema pkg logout ``` ### `sema pkg config` View or set package manager configuration. Currently supports `registry.url` to change the default registry. ```bash sema pkg config # show all config sema pkg config registry.url # show current registry URL sema pkg config registry.url https://my-registry.com # set default registry ``` ``` registry.url = https://pkg.sema-lang.com registry.token = (set) Credentials file: /Users/you/.sema/credentials.toml ``` ### Environment Variable Override You can set `SEMA_REGISTRY_URL` to override the default registry without modifying the credentials file. This is useful for CI/CD pipelines or when temporarily working with a private registry. ```bash SEMA_REGISTRY_URL=https://my-registry.com sema pkg search foo ``` The resolution order is: `--registry` CLI flag → `SEMA_REGISTRY_URL` env var → `credentials.toml` config → default (`https://pkg.sema-lang.com`). ## Lock File (`sema.lock`) The `sema.lock` file records the exact resolved version of every dependency for reproducible builds. It is auto-generated and should be committed to version control. ### Format ```toml # sema.lock — auto-generated, do not edit manually lock_version = 1 [packages."github.com/user/repo"] source = "git" ref = "main" commit = "a1b2c3d4e5f6789012345678901234567890abcd" [packages."http-helpers"] source = "registry" version = "1.2.0" registry = "https://pkg.sema-lang.com" checksum = "abc123def456789..." ``` * **Git packages** record the `ref` (branch/tag) and exact `commit` SHA * **Registry packages** record the `version`, `registry` URL, and SHA256 `checksum` of the downloaded tarball ### How It Works | Command | Lock behavior | |---------|--------------| | `sema pkg add` | Installs and writes/updates lock entry | | `sema pkg install` | Installs from lock when available; resolves and appends for unlocked deps; prunes orphaned entries | | `sema pkg install --locked` | Installs from lock only; fails on any mismatch (for CI) | | `sema pkg update` | Re-resolves to latest and rewrites lock + manifest | | `sema pkg remove` | Removes package, manifest entry, and lock entry | ### Integrity Verification When installing from a lock file: * **Git packages** are checked out at the pinned commit using `git checkout --detach`. The resulting HEAD is verified against the lock. * **Registry packages** are downloaded and their SHA256 checksum is compared against the lock. A mismatch produces a clear error. ### CI Usage Use `--locked` in CI pipelines to guarantee reproducible builds: ```bash sema pkg install --locked ``` This will fail with an actionable error if: * `sema.lock` doesn't exist * A dependency was added to `sema.toml` but not locked * A dependency version/ref changed in `sema.toml` without re-locking * An orphaned entry exists in the lock ## Importing Packages Import a package by its URL path (git packages) or short name (registry packages): ```sema ;; Git package (import "github.com/user/string-utils") (string-utils/slugify "Hello World") ; => "hello-world" ;; Registry package (import "http-helpers") (http-helpers/fetch "https://api.example.com") ``` The package name (last segment of the URL, or the short name) becomes the namespace prefix. You can also use selective imports: ```sema (import "github.com/user/string-utils" (slugify titlecase)) (slugify "Hello World") ; => "hello-world" ``` ### Sub-module Imports You can import sub-modules from a package by appending a path: ```sema ;; Resolves to ~/.sema/packages/github.com/user/repo/utils.sema (import "github.com/user/repo/utils") ``` ### How Sema Distinguishes Package vs File Imports An import string is treated as a **package import** when it: * Contains `/` (path separator) * Does **not** start with `./` or `../` (relative path) * Does **not** end with `.sema` (explicit file) * Is **not** an absolute path Otherwise, it's resolved as a relative file import from the current file's directory. ```sema ;; Package imports (import "github.com/user/repo") ; → ~/.sema/packages/github.com/user/repo/package.sema (import "github.com/user/repo/utils") ; → ~/.sema/packages/github.com/user/repo/utils.sema ;; File imports (relative to current file) (import "./helpers.sema") ; relative file (import "../lib/utils.sema") ; parent directory ``` ## On-Disk Layout Packages are cached globally at `~/.sema/packages/`, with different structures for registry and git packages: ``` ~/.sema/ credentials.toml # registry token + URL history.txt # REPL history packages/ http-helpers/ # registry package (short name) .sema-pkg.json # source metadata sema.toml package.sema github.com/ # git packages (URL structure) user/ repo/ .git/ sema.toml package.sema ``` Registry packages include a `.sema-pkg.json` metadata file that tracks the source, version, registry URL, and checksum. This file is managed automatically by the package manager. ## Creating a Package ### 1. Initialize ```bash mkdir sema-csv-utils && cd sema-csv-utils sema pkg init ``` ### 2. Write Your Code Edit the generated `package.sema` to define your package's API: ```sema ;; package.sema — package entrypoint (defun parse-row (line) (map string/trim (string/split line ","))) (defun parse-csv (text) (map parse-row (string/split text "\n"))) ``` ### 3. Add Dependencies (Optional) ```bash sema pkg add http-helpers@1.0.0 ``` This fetches the package and adds it to your `sema.toml` automatically. Then use it in your code: ```sema (import "http-helpers") (defun fetch-csv (url) (parse-csv (:body (http-helpers/get url)))) ``` ### 4. Publish #### To the Registry Ensure your `sema.toml` has `name` and `version` in the `[package]` section, then: ```bash sema pkg login --token sema_pat_... sema pkg publish ``` Others can now install your package: ```bash sema pkg add sema-csv-utils@0.1.0 ``` #### Via Git (No Registry) Push to a public git repository. Tag releases with semver: ```bash git tag v0.1.0 git push origin main --tags ``` Others can install directly from git: ```bash sema pkg add github.com/yourname/sema-csv-utils@v0.1.0 ``` ## Example Workflow ```bash # Start a new project mkdir my-project && cd my-project sema pkg init # Add dependencies (mix of registry and git) sema pkg add http-helpers@2.0.0 sema pkg add github.com/user/json-schema@v1.1.0 # Install everything (if cloning the project fresh) sema pkg install # generates/updates sema.lock # In CI, use --locked for reproducibility sema pkg install --locked # List what's installed sema pkg list # Search for packages sema pkg search csv # Check package details sema pkg info csv-parser ``` ```sema ;; main.sema (import "http-helpers") (import "github.com/user/json-schema") (def response (http-helpers/fetch "https://api.example.com/data")) (def valid? (json-schema/validate schema (json/decode (:body response)))) (println (if valid? "Valid!" "Invalid.")) ``` ```bash sema main.sema ``` ## Self-Hosted Registry Sema's package registry is designed to be self-hostable. The registry server ships in the [`pkg/`](https://github.com/helgesverre/sema/tree/main/pkg) directory of the Sema repository — it's a single Rust binary backed by SQLite that serves both a web UI and a REST API. See its [README](https://github.com/helgesverre/sema/tree/main/pkg#readme) for build and deployment instructions. To point the CLI at your own registry instance: ```bash # Set as default registry sema pkg config registry.url https://registry.mycompany.com # Or per-command sema pkg add my-internal-lib --registry https://registry.mycompany.com sema pkg publish --registry https://registry.mycompany.com ``` All `sema pkg` commands that interact with the registry accept a `--registry` flag to override the default. ## Troubleshooting ### "package not found" ``` Error: package not found: github.com/user/repo Hint: Run: sema pkg add github.com/user/repo ``` The package hasn't been fetched yet. Run the suggested command to install it. ### "invalid package spec: URL schemes not allowed" ``` Error: invalid package spec: URL schemes not allowed: https://github.com/user/repo ``` Use the bare host/path format without `https://`: ```bash # ✗ Wrong sema pkg add https://github.com/user/repo # ✓ Correct sema pkg add github.com/user/repo ``` ### "invalid package spec: path traversal not allowed" The package path contains `..`, `.`, or empty segments. Package paths must be clean, forward-slash-separated identifiers like `github.com/user/repo`. ### "No sema.toml found" `sema pkg install` requires a `sema.toml` in the current directory. Run `sema pkg init` to create one, or `cd` to the project root. ### "Not logged in" Publishing and yanking require authentication. Run `sema pkg login --token ` with a token from your registry account page. ### "sema.lock not found" (with `--locked`) `sema pkg install --locked` requires a `sema.lock` file. Run `sema pkg install` (without `--locked`) first to generate it, then commit the lock file to version control. ### "version mismatch" (with `--locked`) You changed a version/ref in `sema.toml` without re-locking. Run `sema pkg install` to update `sema.lock`, then commit both files. ### "Lock integrity error" The downloaded package doesn't match the checksum or commit recorded in `sema.lock`. This can happen if a registry re-published a version with different contents or a git tag was force-pushed. Run `sema pkg update ` to re-resolve and update the lock. ### "git clone/fetch failed" The package URL couldn't be reached. Check that: * The repository exists and is public (or you have git credentials configured) * The git ref (tag/branch) exists on the remote * You have network access --- --- url: 'https://sema-lang.com/docs/embedding.md' --- # Embedding Sema ## Overview Sema can be embedded as a Rust library, letting you use it as a scripting or configuration language inside your own applications. The crate exposes a builder API for creating interpreters, registering native functions, and evaluating Sema code from Rust. ## Quick Start Add Sema to your project: ```toml [dependencies] sema-lang = "1.11" ``` Or use the latest unreleased version from git: ```toml [dependencies] sema-lang = { git = "https://github.com/HelgeSverre/sema" } ``` Evaluate an expression in three lines: ```rust use sema::{Interpreter, Value}; fn main() -> sema::Result<()> { let interp = Interpreter::new(); let result = interp.eval_str("(+ 1 2 3)")?; println!("{result}"); // 6 Ok(()) } ``` ## The Builder `Interpreter::builder()` returns an `InterpreterBuilder` with these options: | Method | Default | Description | | ------------------- | ------------- | ------------------------------------ | | `.with_stdlib(b)` | `true` | Register the full standard library | | `.with_llm(b)` | `true` | Enable LLM functions and auto-config | | `.without_stdlib()` | — | Shorthand for `.with_stdlib(false)` | | `.without_llm()` | — | Shorthand for `.with_llm(false)` | | `.with_sandbox(sb)` | `allow_all()` | Set sandbox to restrict capabilities | | `.with_allowed_paths(p)` | unrestricted | Restrict file ops to specific directories | ### Default Interpreter `Interpreter::new()` gives you everything — stdlib and LLM builtins enabled: ```rust let interp = Interpreter::new(); interp.eval_str("(+ 1 2)")?; // => 3 ``` ### Minimal Interpreter No stdlib, no LLM — only special forms and core evaluation: ```rust let interp = Interpreter::builder() .without_stdlib() .without_llm() .build(); ``` ### Stdlib Only (No LLM) Disable LLM builtins for faster startup when you don't need them: ```rust let interp = Interpreter::builder() .without_llm() .build(); ``` ### Sandboxed Interpreter Restrict specific capabilities while keeping the full stdlib available: ```rust use sema::{Interpreter, Sandbox, Caps}; // Allow computation but deny shell and network access let interp = Interpreter::builder() .with_sandbox(Sandbox::deny( Caps::SHELL.union(Caps::NETWORK) )) .build(); interp.eval_str("(+ 1 2)")?; // => 3 (always works) interp.eval_str(r#"(shell "ls")"#)?; // => PermissionDenied error interp.eval_str(r#"(http/get "...")"#)?; // => PermissionDenied error ``` ### Path-Restricted Interpreter Confine file operations to specific directories (e.g., for LLM agents): ```rust use std::path::PathBuf; use sema::Interpreter; let interp = Interpreter::builder() .with_allowed_paths(vec![ PathBuf::from("./workspace"), PathBuf::from("/tmp"), ]) .build(); interp.eval_str(r#"(file/write "./workspace/out.txt" "ok")"#)?; // works interp.eval_str(r#"(file/read "/etc/passwd")"#)?; // => PermissionDenied ``` ### Multiple Interpreters Each `Interpreter` has its own `EvalContext` with fully isolated state — module cache, call stack, span table, and depth counters are not shared: ```rust let interp_a = Interpreter::new(); let interp_b = Interpreter::new(); interp_a.eval_str("(define x 1)")?; interp_b.eval_str("(define x 2)")?; // Each interpreter has its own bindings assert_eq!(interp_a.eval_str("x")?, Value::Int(1)); assert_eq!(interp_b.eval_str("x")?, Value::Int(2)); ``` ## Registering Native Functions Use `register_fn` to expose Rust functions to Sema scripts. The closure receives `&[Value]` and returns `Result`. ### Basic Example ```rust interp.register_fn("add1", |args| { let n = args[0] .as_int() .ok_or_else(|| sema::SemaError::type_error("int", args[0].type_name()))?; Ok(Value::Int(n + 1)) }); ``` ```sema (add1 41) ; => 42 ``` ### Capturing State Use `Rc>` to share mutable state between Rust and Sema: ```rust use std::rc::Rc; use std::cell::RefCell; let counter = Rc::new(RefCell::new(0_i64)); let c = counter.clone(); interp.register_fn("inc!", move |_| { *c.borrow_mut() += 1; Ok(Value::Int(*c.borrow())) }); ``` ```sema (inc!) ; => 1 (inc!) ; => 2 (inc!) ; => 3 ``` ## Real-World Example: Data Pipeline A Rust CLI tool that uses Sema as a scripting language for user-defined data transformations. The host app provides utility functions and loads a user-written `.sema` script that defines the transform logic. ### Rust Host ```rust use sema::{Interpreter, Value, SemaError}; use std::rc::Rc; use std::collections::BTreeMap; fn main() -> sema::Result<()> { let interp = Interpreter::builder() .without_llm() .build(); // Provide a logging function interp.register_fn("log", |args| { for a in args { eprintln!("[script] {a}"); } Ok(Value::Nil) }); // Load user transform script let script = std::fs::read_to_string("transform.sema") .map_err(|e| SemaError::eval(format!("failed to read script: {e}")))?; interp.eval_str(&script)?; // Process records through the user's transform function let records = vec![ make_record("Alice", 34, "engineering"), make_record("Bob", 28, "marketing"), make_record("Carol", 45, "engineering"), ]; for record in records { interp.env().set_str("__record", record); let result = interp.eval_str("(transform __record)")?; println!("{result}"); } Ok(()) } fn make_record(name: &str, age: i64, dept: &str) -> Value { let mut map = BTreeMap::new(); map.insert( Value::Keyword(sema::intern("name")), Value::String(Rc::new(name.to_string())), ); map.insert( Value::Keyword(sema::intern("age")), Value::Int(age), ); map.insert( Value::Keyword(sema::intern("dept")), Value::String(Rc::new(dept.to_string())), ); Value::Map(Rc::new(map)) } ``` ### User Script (`transform.sema`) ```sema (define (transform record) (log (format "Processing: ~a" (:name record))) (if (> (:age record) 30) (assoc record :senior #t) record)) ``` ### Output ``` [script] Processing: Alice {:age 34 :dept "engineering" :name "Alice" :senior #t} [script] Processing: Bob {:age 28 :dept "marketing" :name "Bob"} [script] Processing: Carol {:age 45 :dept "engineering" :name "Carol" :senior #t} ``` ## Threading Model Sema is **single-threaded by design**. It uses `Rc` (not `Arc`) for reference counting and a thread-local string interner for keywords and symbols. * Multiple `Interpreter` instances can coexist on the same thread with **fully isolated evaluator state** — each has its own module cache, call stack, span table, and depth counters. * Do **not** send `Value` instances across thread boundaries — they are not `Send` or `Sync`. * The string interner is per-thread, so interned keys from one thread are not valid in another. * LLM state (provider registry, usage tracking, budgets) is per-thread and shared across all interpreters on the same thread. ## Security Considerations By default, Sema scripts have full access to the filesystem, shell, network, and environment. For untrusted code, you have two options: **Option 1: Sandbox (recommended)** — Keep the full stdlib but deny dangerous capabilities: ```rust use sema::{Interpreter, Sandbox, Caps}; let interp = Interpreter::builder() .with_sandbox(Sandbox::deny(Caps::STRICT)) // deny shell, fs-write, network, env-write, process, llm .build(); ``` Sandboxed functions remain callable (tab-completable, discoverable) but return a `PermissionDenied` error when invoked. **Option 2: Minimal** — No stdlib at all, register only what you need: ```rust let interp = Interpreter::builder() .without_stdlib() .without_llm() .build(); // Register only safe functions manually ``` See [CLI Sandbox docs](./cli.md#sandbox) for the full list of capabilities and affected functions. ## Loading Files and Preloading Modules ### Load a File `load_file` reads and evaluates a `.sema` file. Definitions persist in the global environment: ```rust let interp = Interpreter::new(); interp.load_file("prelude.sema")?; interp.eval_str("(my-prelude-fn 42)")?; ``` You can also embed files at compile time: ```rust interp.eval_str(include_str!("../scripts/prelude.sema"))?; ``` ### Preload Virtual Modules `preload_module` injects a module into the module cache so that `(import "name")` resolves without a file on disk. This is useful for bundling standard libraries, providing host APIs as importable modules, or testing: ```rust let interp = Interpreter::new(); // All top-level definitions are exported by default interp.preload_module("utils", r#" (define (double x) (* x 2)) (define pi 3.14159) "#)?; // Use `(module ...)` with `(export ...)` for selective exports interp.preload_module("math", r#" (module math (export square cube) (define (square x) (* x x)) (define (cube x) (* x x x)) (define internal-helper 42)) "#)?; ``` Scripts can then import these modules as if they were files: ```sema (import "utils") (double pi) ; => 6.28318 (import "math" square) (square 5) ; => 25 ``` ## API Reference | Type / Method | Description | | ------------------------------------ | ---------------------------------------------------------------- | | `Interpreter` | Holds the global environment; evaluates code | | `InterpreterBuilder` | Configures and builds an `Interpreter` | | `Value` | Core value enum — Int, Float, String, List, Map, etc. | | `SemaError` | Error type with `eval()`, `type_error()`, `arity()` constructors | | `Sandbox` | Configures which capabilities are denied | | `Caps` | Capability bitflags (FS\_READ, SHELL, NETWORK, etc.) | | `Env` | Environment (scope chain backed by `Rc>`) | | `intern(s)` | Intern a string, returning a `Spur` handle | | `resolve(spur)` | Resolve a `Spur` back to a `&str` | | `interp.eval_str(code)` | Parse and evaluate a string of Sema code | | `interp.load_file(path)` | Read and evaluate a `.sema` file | | `interp.preload_module(name, source)`| Inject a virtual module into the import cache | | `interp.register_fn(name, closure)` | Register a native Rust function callable from Sema | --- --- url: 'https://sema-lang.com/docs/embedding-js.md' --- # Embedding Sema (JavaScript) ## Overview Sema can be embedded as a JavaScript scripting engine via WebAssembly. The WASM build runs entirely client-side — no server needed. You get the full Sema standard library (minus shell access and LLM functions) in the browser, including HTTP via `fetch()`, an in-memory virtual filesystem, and persistent definitions across evaluations. Two npm packages are available: | Package | Description | | --- | --- | | [`@sema-lang/sema`](https://www.npmjs.com/package/@sema-lang/sema) | **Recommended.** High-level TypeScript wrapper with ergonomic API | | [`@sema-lang/sema-wasm`](https://www.npmjs.com/package/@sema-lang/sema-wasm) | Low-level wasm-bindgen output — exports `SemaInterpreter` (used internally) | This page documents the low-level `@sema-lang/sema-wasm` API. For the wrapper, see the [npm README](https://www.npmjs.com/package/@sema-lang/sema). ## Quick Start ### npm Install the WASM package: ```sh npm install @sema-lang/sema-wasm ``` Evaluate an expression in three lines: ```js import init, { SemaInterpreter } from '@sema-lang/sema-wasm'; await init(); const interp = new SemaInterpreter(); const result = interp.evalGlobal('(+ 1 2 3)'); console.log(result.value); // "6" ``` ### CDN Use Sema directly in a ` ``` ## Creating an Interpreter `new SemaInterpreter()` creates a new interpreter with the full standard library, I/O overrides for browser output, and a 10M eval-step limit to prevent infinite loops from freezing the tab. ```js import init, { SemaInterpreter } from '@sema-lang/sema-wasm'; await init(); const interp = new SemaInterpreter(); ``` The `init()` call loads and compiles the WASM binary. It only needs to be called once — after that, you can create as many interpreters as you want. When using CDN, pass the `.wasm` URL explicitly: ```js await init('https://cdn.jsdelivr.net/npm/@sema-lang/sema-wasm/sema_wasm_bg.wasm'); ``` ## Evaluating Code Sema's WASM API provides four evaluation methods. All return a JS object with the same shape: ```ts // JS object returned directly by eval methods interface EvalResult { value: string | null; // String representation of the result, or null on error output: string[]; // Lines printed by (print), (println), (display) error: string | null; // Error message with stack trace, or null on success } ``` ### `evalGlobal(code)` — Synchronous, Persistent Evaluates code in the global environment. Definitions persist across calls: ```js const interp = new SemaInterpreter(); interp.evalGlobal('(define x 42)'); const result = interp.evalGlobal('(+ x 8)'); console.log(result.value); // "50" ``` ### `eval(code)` — Synchronous, Isolated Evaluates code in a child environment. Definitions do **not** persist: ```js interp.eval('(define y 10)'); interp.eval('y'); // error: unbound variable y ``` ### `evalAsync(code)` — Async, with HTTP Support Use this when your Sema code makes HTTP requests. The async method bridges the synchronous Sema evaluator with the browser's asynchronous `fetch()` API using a replay-with-cache strategy: ```js const result = await interp.evalAsync('(http/get "https://httpbin.org/get")'); console.log(result.value); // {:status 200 :headers {...} :body "..."} ``` ### `evalVM(code)` / `evalVMAsync(code)` — Bytecode VM Explicit aliases for evaluating via the bytecode VM. Sema runs all code on the VM, so these are equivalent to `eval`/`evalAsync`; they are kept as named entry points for clarity. Same interface: ```js const result = await interp.evalVMAsync('(http/get "https://httpbin.org/get")'); ``` ### Error Handling Errors are returned in the `error` field — they never throw JavaScript exceptions: ```js const result = interp.evalGlobal('(/ 1 0)'); if (result.error) { console.error('Sema error:', result.error); // Includes stack trace and hints when available } else { console.log('Result:', result.value); } // Output lines are always captured, even on error for (const line of result.output) { console.log('Output:', line); } ``` ### Capturing Output `(print)`, `(println)`, and `(display)` write to a buffer that is returned in the `output` array: ```js const result = interp.evalGlobal(` (println "hello") (println "world") (+ 1 2) `); console.log(result.output); // ["hello", "world"] console.log(result.value); // "3" ``` ### Minimal Interpreter (No Stdlib) Use `SemaInterpreter.createWithOptions()` to create a minimal interpreter with only special forms and core evaluation — no standard library functions: ```js const minimal = SemaInterpreter.createWithOptions({ stdlib: false }); minimal.evalGlobal('(+ 1 2)').value; // "3" (special forms work) minimal.evalGlobal('(map identity (list 1 2 3))').error; // "unbound variable: map" ``` ### Sandboxed Interpreter Deny specific capabilities while keeping the full stdlib: ```js // Deny network access const sema = SemaInterpreter.createWithOptions({ deny: ["network"] }); sema.evalGlobal("(+ 1 2)"); // works sema.evalGlobal('(http/get "https://example.com")'); // => PermissionDenied error // Deny both network and VFS writes const strict = SemaInterpreter.createWithOptions({ deny: ["network", "fs-write"] }); ``` Available capabilities to deny: | Capability | Affected Functions | | --- | --- | | `"network"` | `http/get`, `http/post`, `http/put`, `http/delete`, `http/request` | | `"fs-read"` | `file/read`, `file/exists?`, `file/list`, `file/is-directory?`, `file/is-file?` | | `"fs-write"` | `file/write`, `file/delete`, `file/rename`, `file/mkdir`, `file/append` | ## Registering JavaScript Functions Use `registerFunction` to expose JavaScript functions to Sema code. Arguments are passed as native JS values, and the return value is converted back to a Sema value: ```js const interp = new SemaInterpreter(); // Simple function — args arrive as native JS values interp.registerFunction('add1', (n) => n + 1); interp.evalGlobal('(add1 41)').value; // "42" ``` ### Multiple Arguments Each Sema argument is passed as a separate native JS value: ```js interp.registerFunction('greet', (greeting, name) => `${greeting}, ${name}!`); interp.evalGlobal('(greet "Hello" "world")').value; // "Hello, world!" ``` ### Returning Structured Data Return a JSON string for objects/arrays — they'll be converted to Sema maps/lists: ```js interp.registerFunction('get-user', (id) => { return JSON.stringify({ name: "Alice", age: 30 }); }); interp.evalGlobal('(:name (get-user 1))').value; // "Alice" ``` ### Capturing State Use closures to share mutable state between JavaScript and Sema: ```js let counter = 0; interp.registerFunction('inc!', () => ++counter); interp.evalGlobal('(inc!)').value; // "1" interp.evalGlobal('(inc!)').value; // "2" ``` ::: info Value Conversion Arguments are passed as native JS values (numbers, strings, booleans, arrays, objects). Return values are automatically converted: numbers, booleans, `null`/`undefined` → nil, strings, and JSON-stringified objects/arrays are all supported. Non-JSON-serializable values (functions, symbols, circular references) are not supported. ::: ## Preloading Modules Use `preloadModule` to inject virtual modules that can be imported with `(import "name")` — no filesystem needed: ```js const interp = new SemaInterpreter(); interp.preloadModule('utils', ` (define (double x) (* x 2)) (define pi 3.14159) `); interp.evalGlobal(` (import "utils") (double pi) `).value; // "6.28318" ``` ### Selective Exports Use `(module ...)` with `(export ...)` to control which bindings are visible: ```js interp.preloadModule('math', ` (module math (export square cube) (define (square x) (* x x)) (define (cube x) (* x x x)) (define internal-helper 42)) `); interp.evalGlobal(` (import "math" square) (square 5) `).value; // "25" ``` ## Persistent Definitions Use `evalGlobal` to build up state across multiple calls — this is the key pattern for embedding: ```js // Define functions interp.evalGlobal(` (define (greet name) (string/append "Hello, " name "!")) `); // Define data interp.evalGlobal('(define users (list "Alice" "Bob" "Carol"))'); // Use them together const result = interp.evalGlobal('(map greet users)'); console.log(result.value); // ("Hello, Alice!" "Hello, Bob!" "Hello, Carol!") ``` ## Virtual Filesystem The WASM build includes an in-memory virtual filesystem. Files persist for the interpreter's lifetime but are lost on page reload: ```js interp.evalGlobal('(file/write "/config.json" "{\\"key\\": \\"value\\"}")'); const result = interp.evalGlobal('(file/read "/config.json")'); console.log(result.value); // "{\"key\": \"value\"}" ``` Quotas apply: 1 MB per file, 16 MB total, 256 files max. ## Virtual Filesystem (from JavaScript) The VFS can also be accessed directly from JavaScript, enabling file browser UIs, script editors, and pre-seeded environments: ### Seeding Files ```js const sema = new SemaInterpreter(); // Write files from JS sema.writeFile("/lib/math.sema", "(define (square x) (* x x))"); sema.writeFile("/main.sema", '(import "/lib/math") (square 7)'); // Run the script sema.evalGlobal('(load "/main.sema")'); // => 49 ``` ### Building a File Browser ```js sema.mkdir("/src"); sema.writeFile("/src/app.sema", "(println \"hello\")"); sema.writeFile("/src/utils.sema", "(define pi 3.14)"); sema.writeFile("/README.md", "# My Project"); sema.listFiles("/"); // ["README.md", "src"] sema.listFiles("/src"); // ["app.sema", "utils.sema"] sema.isDirectory("/src"); // true sema.fileExists("/src/app.sema"); // true ``` ### Reading Files Back ```js const source = sema.readFile("/src/app.sema"); // "(println \"hello\")" const missing = sema.readFile("/nope"); // null ``` ### Quota Management ```js const stats = sema.vfsStats(); // { files: 3, bytes: 62, maxFiles: 256, maxBytes: 16777216, maxFileBytes: 1048576 } // Clear everything sema.resetVFS(); sema.vfsStats(); // { files: 0, bytes: 0, ... } ``` Quotas: 1 MB per file, 16 MB total, 256 files max. ## VFS Persistence By default, VFS files live only in WASM memory and are lost on page reload. To persist files across sessions, pass a **VFS backend** when creating the interpreter: ```js import { SemaInterpreter, IndexedDBBackend } from "@sema-lang/sema"; const sema = await SemaInterpreter.create({ vfs: new IndexedDBBackend({ namespace: "my-project" }), }); // Sema code can read/write files as usual await sema.evalStrAsync('(file/write "/config.json" "{\\"theme\\": \\"dark\\"}")'); // Persist current VFS state to the backend await sema.flushVFS(); // On next page load, files are automatically restored via hydrate() ``` ### Built-in Backends Four backends ship with the `@sema-lang/sema` package: | Backend | Import | Persistence | Limit | | --- | --- | --- | --- | | `MemoryBackend` | `@sema-lang/sema` | None — lost on reload | WASM quota only | | `LocalStorageBackend` | `@sema-lang/sema` | Across page loads, per origin | ~5–10 MB | | `SessionStorageBackend` | `@sema-lang/sema` | Within the current tab | ~5–10 MB | | `IndexedDBBackend` | `@sema-lang/sema` | Across page loads, per origin | Hundreds of MB | ::: tip Choosing a Backend Use **`IndexedDBBackend`** for production apps — it handles large file sets and doesn't compete with other localStorage usage. Use **`LocalStorageBackend`** for quick prototypes. Use **`MemoryBackend`** (or no backend) when persistence isn't needed. ::: ### Backend Options All backends accept a `namespace` option to isolate storage: ```js // Two interpreters with separate persistent storage const editor = await SemaInterpreter.create({ vfs: new IndexedDBBackend({ namespace: "editor-files" }), }); const preview = await SemaInterpreter.create({ vfs: new IndexedDBBackend({ namespace: "preview-files" }), }); ``` ### Flush and Reset ```js // Persist VFS to the backend (call after eval) await sema.flushVFS(); // Clear both VFS memory and persistent storage await sema.resetVFSAndBackend(); ``` ### Custom Backends Implement the `VFSBackend` interface to persist files anywhere — a remote API, a service worker cache, or a custom database: ```ts import type { VFSBackend, VFSHost } from "@sema-lang/sema"; class CloudBackend implements VFSBackend { async init() { // Open connections, authenticate, etc. } async hydrate(host: VFSHost) { // Fetch files from your API and write them into the WASM VFS const files = await fetch("/api/files").then(r => r.json()); for (const { path, content } of files) { host.writeFile(path, content); } } async flush(host: VFSHost) { // Read files from the WASM VFS and upload them const paths = host.listFiles("/"); for (const name of paths) { const content = host.readFile("/" + name); if (content !== null) { await fetch("/api/files/" + name, { method: "PUT", body: content, }); } } } async reset() { await fetch("/api/files", { method: "DELETE" }); } } const sema = await SemaInterpreter.create({ vfs: new CloudBackend(), }); ``` ::: info VFSHost API The `host` object passed to `hydrate()` and `flush()` provides these methods: * `readFile(path)` → `string | null` * `writeFile(path, content)` — write or overwrite a file * `deleteFile(path)` → `boolean` * `mkdir(path)` — create directories recursively * `listFiles(dir)` → `string[]` — list entries in a directory * `fileExists(path)` → `boolean` * `isDirectory(path)` → `boolean` * `resetVFS()` — clear all files and directories ::: ## Real-World Example: User-Scriptable Web App A web application that lets users write Sema scripts to customize behavior. The host app evaluates user scripts and uses the results: ### HTML ```html

``` ### JavaScript ```js import init, { SemaInterpreter } from '@sema-lang/sema-wasm'; await init(); const interp = new SemaInterpreter(); // Preload sample data interp.evalGlobal(` (define sample-data (list {:name "Alice" :score 85} {:name "Bob" :score 42} {:name "Carol" :score 91} {:name "Dave" :score 33})) `); document.getElementById('run').addEventListener('click', () => { const script = document.getElementById('script').value; // Load user's function definition const loadResult = interp.evalGlobal(script); if (loadResult.error) { document.getElementById('output').textContent = `Error: ${loadResult.error}`; return; } // Call the user's transform function with our data const result = interp.evalGlobal('(transform sample-data)'); if (result.error) { document.getElementById('output').textContent = `Error: ${result.error}`; } else { document.getElementById('output').textContent = result.value; // Output: ({:label "Alice (85)" :name "Alice" :score 85} // {:label "Carol (91)" :name "Carol" :score 91}) } }); ``` ## Multiple Interpreters Each `SemaInterpreter` instance has fully isolated state — its own environment, virtual filesystem, module cache, and eval-step counter: ```js const interpA = new SemaInterpreter(); const interpB = new SemaInterpreter(); interpA.evalGlobal('(define x 1)'); interpB.evalGlobal('(define x 2)'); interpA.evalGlobal('x').value; // "1" interpB.evalGlobal('x').value; // "2" ``` ## CDN Usage ### jsdelivr ``` https://cdn.jsdelivr.net/npm/@sema-lang/sema-wasm/sema_wasm.js https://cdn.jsdelivr.net/npm/@sema-lang/sema-wasm/sema_wasm_bg.wasm ``` ### unpkg ``` https://unpkg.com/@sema-lang/sema-wasm/sema_wasm.js https://unpkg.com/@sema-lang/sema-wasm/sema_wasm_bg.wasm ``` ### Complete HTML Page ```html Sema Playground


  


```

## Limitations

Compared to the [Rust embedding API](./embedding), the WASM/JavaScript embedding has these differences:

| Feature | Rust | JavaScript (WASM) |
| --- | --- | --- |
| Filesystem | Real filesystem | In-memory VFS with pluggable persistence (1 MB/file, 16 MB total) |
| Shell access | `(shell ...)` works | Not available |
| `registerFunction` | Register native Rust closures | `registerFunction` with native JS value args |
| LLM functions | Full provider support | Not available in browser |
| HTTP | Synchronous (reqwest) | Async via `fetch()` (CORS restrictions apply) |
| Sandbox/Caps | Fine-grained capability control | Inherently sandboxed by the browser |
| Threading | Single-threaded (`Rc`) | Single-threaded (WASM) |
| Eval step limit | Unlimited by default | 10M steps (prevents tab freezes) |
| `stdin` / `io/read-line` | Works | Not available |

### Workarounds

* **No LLM**: Use JavaScript to call LLM APIs, then pass results into Sema via `registerFunction` or `evalGlobal`.
* **Persistence**: Use a VFS backend (`IndexedDBBackend`, `LocalStorageBackend`) to persist files across page reloads. See [VFS Persistence](#vfs-persistence).

## API Reference

| Type / Method | Description |
| --- | --- |
| `init(wasmUrl?)` | Initialize the WASM module. Call once before creating interpreters. Pass URL when using CDN. |
| `SemaInterpreter` | Interpreter instance with isolated state |
| `new SemaInterpreter()` | Create an interpreter with full stdlib and 10M step limit |
| `SemaInterpreter.createWithOptions(opts)` | Create with options: `{ stdlib, deny }`. Use `deny` to restrict capabilities. |
| `interp.eval(code)` | Evaluate in a child env (definitions don't persist). Returns a JS object `{ value, output, error }`. |
| `interp.evalGlobal(code)` | Evaluate in global env (definitions persist). Returns a JS object `{ value, output, error }`. |
| `interp.evalAsync(code)` | Async eval with HTTP support. Returns a `Promise`. |
| `interp.evalVM(code)` | Evaluate via the bytecode VM (alias of `eval`). Returns a JS object `{ value, output, error }`. |
| `interp.evalVMAsync(code)` | Async VM eval with HTTP support (alias of `evalAsync`). Returns a `Promise`. |
| `interp.registerFunction(name, fn)` | Register a JS function callable from Sema. Args passed as native JS values. |
| `interp.preloadModule(name, source)` | Inject a virtual module for `(import "name")`. Returns `{ ok, error }`. |
| `interp.readFile(path)` | Read a VFS file. Returns string or null. |
| `interp.writeFile(path, content)` | Write a file to the VFS. |
| `interp.deleteFile(path)` | Delete a VFS file. Returns boolean. |
| `interp.listFiles(dir)` | List entries in a VFS directory. |
| `interp.fileExists(path)` | Check if path exists in VFS. |
| `interp.mkdir(path)` | Create a directory in the VFS. |
| `interp.isDirectory(path)` | Check if path is a directory. |
| `interp.vfsStats()` | Get VFS usage stats (files, bytes, quotas). |
| `interp.resetVFS()` | Clear all VFS state. |
| `interp.flushVFS()` | Persist VFS to the configured backend. Returns `Promise`. |
| `interp.resetVFSAndBackend()` | Clear VFS and persistent backend. Returns `Promise`. |
| `MemoryBackend` | Ephemeral VFS backend — no persistence |
| `LocalStorageBackend` | Persist VFS to `localStorage` (~5 MB limit) |
| `SessionStorageBackend` | Persist VFS to `sessionStorage` (per-tab) |
| `IndexedDBBackend` | Persist VFS to IndexedDB (recommended for production) |
| `VFSBackend` | Interface for custom backends: `{ init?, hydrate, flush, reset? }` |
| `VFSHost` | Host bridge passed to backends: `{ readFile, writeFile, deleteFile, mkdir, listFiles, fileExists, isDirectory, resetVFS }` |
| `interp.version()` | Returns the Sema version string |
| `EvalResult` | `{ value: string \| null, output: string[], error: string \| null }` |

---

---
url: 'https://sema-lang.com/brand.md'
---


---

---
url: 'https://sema-lang.com/docs/for-agents.md'
description: 'Everything that differs from other Lisps, in one page — for AI coding agents.'
---

# Sema for LLM Agents

If you already know a Lisp, this page is **everything that's different about Sema** — read it
and you can write correct Sema without ingesting the full reference. It's deliberately
terse. When you need detail, the full per-page docs are indexed at
[`/llms.txt`](/llms.txt) — fetch only the specific `/docs/**/*.md` page you need on demand
(e.g. `/docs/llm/tools-agents.md`). Do **not** load `/llms-full.txt` (the whole-docs
concatenation, ~200k tokens) into context.

## Install & run

```bash
curl -fsSL https://sema-lang.com/install.sh | sh   # or: brew install helgesverre/tap/sema-lang
                                                   # or: cargo install sema-lang
sema script.sema          # run a file
sema -e '(println "hi")'  # eval an expression
sema                      # start the REPL
```

## What Sema is

A **Scheme core** with a **Clojure-flavored surface** and **first-class LLM/agent
primitives**, compiled to a NaN-boxed bytecode VM. **Single-threaded** (reference-counted,
no shared-memory threads). Implemented in Rust; embeddable as a crate; runs in the browser
via WASM.

## Syntax you may not expect

```sema
:keyword                  ; Clojure-style keyword (self-evaluating; also a getter)
{:a 1 :b 2}               ; map literal (sorted; iteration order is deterministic)
[1 2 3]                   ; vector literal (distinct from a list)
(:name person)            ; keywords are functions: same as (get person :name)
#(* % %)                  ; short lambda; %, %1, %2 … are positional args
f"hi ${name}, ${(+ 1 2)}" ; f-string interpolation
#"\d+"                    ; regex literal (raw; no escape doubling)
```

## Naming conventions (the #1 thing to get right)

* **New functions are slash-namespaced:** `file/read`, `path/join`, `string/split`,
  `regex/match?`, `http/get`, `json/encode`. Do **not** guess `read-file` or `split-string`.
* **Predicates end in `?`:** `null?`, `list?`, `empty?`, `file/exists?`.
* **Conversions use `->`:** `string->symbol`, `keyword->string`, `list->vector`.
* **Legacy Scheme names are kept** for a few string ops: `string-append`, `string-length`,
  `string-ref`, `substring` (no `string/` prefix on these).

## Semantics that bite

* **Truthiness:** only `#f` and `nil` are falsy. `0`, `""`, and the empty list `()` are all
  **truthy**. (Unlike Common Lisp, where `()` is false.)
* **Lists are vector-backed**, not cons cells: `Rc>`. `nth`/`length` are O(1);
  `cons`/`append` are O(n) copies. `car`/`first` and `cdr`/`rest` exist but it's an array
  underneath — prefer `map`/`filter`/`fold` and `vector` for hot paths.
* **Mutable state is `define` + `set!`** — there is **no** Clojure `atom`/`swap!`/`reset!`.
  ```sema
  (define counter 0)
  (set! counter (+ counter 1))
  ```
* **Two map types:** `{:k v}` literals are sorted `BTreeMap`s (deterministic, usable as keys);
  `(hashmap/new)` is a faster unordered hash map. Access with `(get m :k)` or `(:k m)`.
* **Errors** are raised with `throw` and caught with `try`/`catch`; a caught error is a
  structured map with `:type`, `:message`, and `:stack-trace`.
* **Equality:** `=` is numeric (`(= 1 1.0)` → `#t`); `eq?`/`equal?` are structural.
* **Definitions & functions:** `define` for bindings; `lambda` (alias `fn`) for anonymous
  functions; `defun`/`defn` are sugar for `(define name (lambda …))`. `let`/`let*`/`letrec`
  for locals.
* **Tail calls are optimized** — deep recursion in tail position won't overflow.

## LLM providers (configure one first)

LLM calls need a provider, and **Sema auto-configures every provider it finds an API key
for** in the environment on startup — so the only setup is exporting a key:

| Provider | Env var | Default model |
| --- | --- | --- |
| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-6` |
| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` |
| Google Gemini | `GOOGLE_API_KEY` | `gemini-3.5-flash` |
| Groq · xAI · Mistral · Moonshot | `GROQ_API_KEY` · `XAI_API_KEY` · `MISTRAL_API_KEY` · `MOONSHOT_API_KEY` | per provider |
| Ollama (local, no key) | `OLLAMA_HOST` (default `localhost:11434`) | `gemma4` |

The first configured provider becomes the default. Switch at runtime with
`(llm/set-default :openai)`, force one via `SEMA_CHAT_PROVIDER` / `SEMA_CHAT_MODEL`, or check
the active one with `(llm/current-provider)`. Embeddings use their own providers
(Jina / Voyage / Cohere — see `/docs/llm/embeddings.md`).

> **The #1 first stumble:** a pinned `:model` must belong to the **active** provider.
> `(llm/complete "hi" {:model "gpt-5.5"})` fails with a 404 if Anthropic is the default —
> switch first with `(llm/set-default :openai)`. The simplest call **omits `:model`** and
> uses the active provider's default model.

## What's unique to Sema (why it exists)

LLM/agent operations are language primitives, not a bolted-on SDK:

```sema
;; With an API key in the env this just works — no :model means "active provider's default":
(llm/complete "Summarize this in one sentence." {:max-tokens 100})

(deftool get-weather "Get weather" {:city {:type :string}}
  (lambda (city) (format "{\"temp\": 22}")))
(define bot (agent {:tools [get-weather]}))   ; omit :model to use the default
(agent/run bot "Weather in Oslo?")            ; multi-turn tool loop
```

* **Prompts/messages/conversations** are first-class immutable values (`prompt`, `message`,
  `conversation/*`), not string templates.
* **Structured output:** `llm/extract` (schema-validated) and `llm/classify`.
* **Embeddings + an in-memory vector store** for semantic search / RAG (`llm/embed`,
  `vector-store/*`).
* **Cassettes** record/replay LLM calls to a file for keyless, deterministic tests
  (`llm/with-cassette`).
* **Observability:** built-in OpenTelemetry tracing + metrics (GenAI conventions), off by
  default.
* **Cost & resilience:** budgets (`llm/with-budget`), response caching, fallback chains, and
  retry with backoff — all built in.
* **Concurrency** is a deterministic *cooperative* scheduler (single-threaded): `async`/`await`
  and channels, not OS threads. (Determinism is the same property cassettes give to LLM I/O.)

## Where to look next

* **Index of every page:** [`/llms.txt`](/llms.txt) — fetch a specific `/docs/**/*.md` when
  you need detail (e.g. `/docs/llm/tools-agents.md`, `/docs/stdlib/strings.md`).
* **Everything in one file:** [`/llms-full.txt`](/llms-full.txt) (large — not meant to be
  read whole).
* **Term definitions:** [Glossary](/docs/internals/glossary).