KaleidoMind — Sovereign AI for sovereign money

What it is

A wallet you can talk to

Ask it to "buy 100 USDT," "swap some BTC," "pay Alice 5,000 sats," or "find somewhere to spend Bitcoin nearby" — and it runs the whole multi-L2 workflow on-device: picking the right layer, pricing the trade, onboarding a channel when you need one, and reading the action back before it touches your money. An assistant first — voice-native, context-aware, and entirely yours.

Local-first

No hosted AI

Every LLM, embedding, speech-to-text and text-to-speech call runs through the QVAC SDK — on-device, or on an explicitly paired desktop you control.

Structural safety

Confirm before spend

Every fund-moving tool is requiresConfirmation in the contract. The engine pauses for the host's confirm sheet — the model cannot bypass the gate.

Portable

One tool contract

Identical tool names and schemas everywhere; only execution differs. Skills are portable across phone and laptop, and benchmarks stay honest.

How it stays reliable

Trustworthy on a phone-sized model

An assistant on the critical path of real money has to be reliable on a small on-device model. KaleidoMind earns that with a tiered design — most requests never reach the model at all; each tier spends only the inference it has to: instant reads, deterministic recipes for known workflows, and a skill-scoped agent loop for the genuinely open-ended.

T0

Fast path — 0 inferences

"balance" · "address" · "btc price" resolve instantly with zero model calls.

T2

Recipe — ~1 inference, deterministic, confirm-gated

"pay bob 3 EUR" · "buy 100 USDT" — a skill carries the plan; the model only fills the slots. Payments, swaps, atomic swaps and LSPS1 channel orders run as deterministic chains.

T1

Agentic loop — skill-scoped model

Everything novel. Hard chains can P2P-delegate to a paired desktop's larger model. Discovery flows (e.g. merchant search) intentionally lean on more model reasoning.

One engine, three surfaces

The same agent, everywhere

@kaleidorg/mind is the shared core. Each surface differs only in how tools execute and how the user reaches them.

Desktop app

An RGB/Lightning trading wallet whose funds live on a local RGB Lightning Node (RLN) the app runs and unlocks. It hosts the engine as a namespaced MCP + CLI, drives the node via rln_* tools, manages the QVAC model lifecycle, and doubles as the paired inference peer a phone delegates to.

Mobile — Rate

A public React Native wallet on a physical iPhone. Spark, RGB/Lightning and Arkade tools run in-process via WDK adapters; QVAC drives local inference, Whisper STT and TTS — all behind a voice-first confirmation gate.

Autonomous agent

A risk-gated optimizer — task scheduling, run logs and optimizer skills — that drives the engine without a human in every loop. Spend tools still pause for confirmation.

Inside kaleido-mind

An engine, a CLI, and the hosts that ship it

The repository is a small monorepo. @kaleidorg/mind is the engine; everything else drives it or ships it to a device. Two sibling repos — Rate (mobile) and the desktop app — bind the exact same contract.

packages/core

The engine — @kaleidorg/mind

Wallet tool contract, recipe engine, tiered funnel, fast paths, skills, memory + RAG, on-device knowledge and pluggable QVAC providers. One source of truth, consumed identically on every device.

apps/cli · kaleido-mind

The CLI — develop on your Mac

An interactive terminal harness that runs the full agent — QVAC, skills, memory, RAG and live wallet tools — on a laptop, so the whole stack is testable before it ever reaches a phone. It installs on-device models, runs the product eval, and serves an eval dashboard.

apps/provider

Desktop sidecar (Tauri)

Hosts the engine as a namespaced MCP + CLI, and acts as the paired, user-controlled QVAC inference peer a phone delegates heavy reasoning to over P2P.

apps/playground

Playground

Exercise the engine against a real local model with no phone in the loop — the fastest way to watch the funnel and recipes behave.

kaleido-mind setup # guided model install, sized to your hardware kaleido-mind run --rag # chat with the full local agent kaleido-mind product-eval # production-funnel scenarios, outcome-graded kaleido-mind serve # browse + trigger eval runs in a dashboard

How the agent calls tools

Function calling, skills, and one contract

Three pieces decide how a request becomes a wallet action. They work the same way on mobile and desktop — only the transport underneath changes.

Function calling

Native, structured, looped

QVAC emits real structured tool calls — not parsed text. The engine validates the arguments against the contract schema, runs the tool, feeds the result back, and loops until the model has an answer. Spend tools pause for confirmation mid-loop.

Skills

Scope tools, carry recipes

A skill (an Agent-Skills-spec SKILL.md) is a playbook: it allowlists just the tools a job needs and carries the recipe. Entering a skill narrows what the model sees — fewer schemas in a small window, fewer ways to go wrong.

One contract

Many transports

Tool names and schemas are defined once in core. The host binds them to a transport — only bindXxxTools(handlers) differs. The model sees identical tools everywhere, so skills are portable and the eval is honest.

	Mobile — Rate	Desktop app
Tool transport	in-process WDK adapters	namespaced MCP (`kaleido-mcp`) + CLI
Inference	on-device QVAC, or delegate to a paired desktop	on-device QVAC
Where the tool handler runs	on the phone — always	on the desktop, against its local RLN node
Function calling + skills	identical — same names, schemas and loop

The evaluation binds the same contract to deterministic stateful simulators — so a benchmark exercises the exact tools the apps run.

Delegation can't drain your wallet. When a phone delegates inference, only the prompt and tool schemas go to the desktop — it decides which tool and what arguments. The tool handler runs back on the phone, which validates, confirms, signs and broadcasts. Compromising the desktop never exposes a key.

Mobile track — Rate

Built to run on a phone

On Rate the whole agent lives on the device — inference, voice and wallet execution. The hard part isn't features, it's making a tiny model reliable in a small window. Here's how.

On-device execution

WDK adapters, no round-trip

The model emits a canonical tool call — spark_*, rln_*, arkade_* — and the host runs it through an in-process WDK adapter against the right L2. Tool execution is local code; only the chain backends are network calls.

Voice mode

A real hands-free loop

QVAC's Whisper VAD transcribes raw mic frames, the engine reasons and picks tools, and on-device TTS speaks the reply — listening → thinking → speaking, mic-gated during playback. The confirm readback is spoken, so you hear a spend before it happens.

Brain modes

Local, delegated, or auto

Auto delegates heavy turns to a paired desktop when reachable; Always local never leaves the phone (privacy-max); Always desktop prefers the bigger model. A separate thinking-mode control trades latency for depth.

Small-window optimization

Make every token count

A hardware-aware context budget orders and trims what enters the prompt, and tool-output compression dedupes and elides bulky tool results before they re-enter history — dependency-free, on-device, and never worse than the original.

On-device model catalog

One model loaded at a time, sized to the phone

On a phone the realistic on-device model today is Qwen3 1.7B — it runs comfortably on an iPhone 17, with smaller models for older hardware. Bigger models — Qwen3 4B / 8B and function-call-tuned options (xLAM-2-3B, Hermes-3-Llama-3.2-3B) — run on a paired desktop and reach the phone through delegation. RAM allows one local model at a time. Whisper handles speech-to-text. Next: cross-turn prompt-cache reuse, retrieval-gated tool exposure, and fine-tuned small models that need fewer tokens per call.

Capabilities

What's in the box

A multi-L2 Bitcoin wallet brain — Spark · RLN/RGB · Arkade — built around one hard constraint: don't ask a tiny model to do the slow, weak parts.

Multi-L2 tool contract

Per-layer namespaced tools plus a cross-cutting router — one source of truth in core.

Recipe engine

Deterministic payments, swaps, atomic swaps and channel orders that work on a 0.6B model.

Trading & onboarding

Live-maker quotes, RGB↔BTC atomic swaps, LSPS1 channel orders and Flashnet AMM swaps (Spark-native). "buy 100 USDT" onboards a channel-less user end to end.

Skills

Agent-Skills-spec playbooks that scope tools and carry recipes, bundled for React Native.

Memory + RAG

On-device recall and injected-embedding retrieval, with cheap dedup so memory doesn't bloat.

Hardware-aware

Picks the model and context budget for the device; P2P-delegates heavy work to a paired desktop.

Voice-first readback

The confirm sheet reads back the resolved call — not the model — so unit and recipient mistakes surface where they're caught.

Plural tool sources

In-process, MCP, CLI and L402 pay-per-call HTTP — all behind one registry.

Defensible by construction

Receipts, not remembered scores

The headline benchmark is Product Evaluation v3 — twelve realistic scenarios run through the same production Funnel, graded on route, typed arguments, confirmation behavior and observable side effects. No score is claimed without its raw run, exact commit and hardware metadata.

# clean clone → deterministic checks git clone https://github.com/kaleidoswap/kaleido-mind pnpm install --frozen-lockfile pnpm build && pnpm typecheck && pnpm test # real, timestamped QVAC evidence pnpm submission:evidence -- \ --models qwen3-0.6b,qwen3-1.7b,qwen3-4b

100%

local inference via QVAC SDK

0.6–4B

on-device model footprint

JSONL

per-call telemetry: TTFT, TPS, backend

Apache-2.0

open engine, public audit trail

Model (Q4_K_M)	Time to first token	Throughput	Tool-call accuracy
Qwen 3 · 0.6B	0.79 s	90 tok/s	63%
Qwen 3 · 1.7B	2.5 s	46 tok/s	63%
Qwen 3 · 4B	6.8 s	11 tok/s	71%

A snapshot from one timestamped evidence run on the reference MacBook Air (Apple M4, 24 GB), function-calling mechanism. The smallest model answers in under a second; accuracy is preliminary and exactly why the funnel keeps the model off the critical path. Raw runs, model hashes and exact commit live in submission/evidence.

What's next

Roadmap

The hackathon build is a foundation. Two tracks follow directly from it.

01

A real local-first personal assistant

Deepen the agentic tier into a genuine assistant: long-horizon planning and memory, on-device RAG over wallet history, documents and contacts, and a stronger autonomous mode that proposes and executes multi-step workflows — all under the same host-enforced confirmation gates.

02

A measurement & fine-tuning loop for edge models

Turn the evidence harness into a flywheel: sanitized trace collection, synthetic wallet-task datasets, fine-tuned small models, and a benchmark matrix across function-calling vs. MCP vs. skills and reasoning modes (no-think / short CoT / extended thinking) — so the Funnel routes each request to the cheapest mode that still succeeds.

Honest boundary

What's hackathon work

Both apps existed before the hackathon — as wallets, with no AI, agent or voice features. The desktop app was already an RGB/Lightning trading wallet over a local RLN node; Rate was already a multi-L2 mobile wallet. The entire intelligence layer is the hackathon contribution: the KaleidoMind engine, QVAC inference and P2P delegation, the tiered funnel, recipes, skills, the on-device voice loop, safety gates, the evidence telemetry and eval harness — and the integration that gives both wallets their agentic and voice capabilities. Public repository history is the audit trail for that line.

Sovereign AI forsovereign money.

A wallet you can talk to

No hosted AI

Confirm before spend

One tool contract

Trustworthy on a phone-sized model

Fast path — 0 inferences

Recipe — ~1 inference, deterministic, confirm-gated

Agentic loop — skill-scoped model

The same agent, everywhere

Desktop app

Mobile — Rate

Autonomous agent

An engine, a CLI, and the hosts that ship it

The engine — @kaleidorg/mind

The CLI — develop on your Mac

Desktop sidecar (Tauri)

Playground

Function calling, skills, and one contract

Native, structured, looped

Scope tools, carry recipes

Many transports

Built to run on a phone

WDK adapters, no round-trip

A real hands-free loop

Local, delegated, or auto

Make every token count

One model loaded at a time, sized to the phone

What's in the box

Multi-L2 tool contract

Recipe engine

Trading & onboarding

Skills

Memory + RAG

Hardware-aware

Voice-first readback

Plural tool sources

Receipts, not remembered scores

Roadmap

A real local-first personal assistant

A measurement & fine-tuning loop for edge models

What's hackathon work

Open engine. Public history. Your keys, your model.

Sovereign AI for
sovereign money.