Chat, voice, vision, RAG, agents — running entirely on your NPU+GPU+CPU. 94 tok/s on an AMD Strix Halo laptop. No cloud. No subscriptions. No data leaves your machine.
Everything you expect from a modern AI assistant — all running locally. No API keys. No subscriptions. No data leaving your network.
Talk to JARVIS naturally. Whisper-v3 on NPU converts speech to text. Piper TTS reads responses aloud. Push-to-talk from the web UI.
Upload images, screenshots, or photos. JARVIS describes, analyzes, and answers questions about what it sees. Runs on NPU at 11 tok/s.
Upload PDFs, text files, or notes. JARVIS indexes them locally and answers questions from your knowledge base. All files are human-readable markdown.
Calculator, Python execution, file operations, system control. JARVIS decides when to use tools and explains what it found.
Full chat interface with streaming markdown, voice recording, image upload, and drag-drop file upload. Works in any browser. OpenAI-compatible API.
Native Flutter app with the same JARVIS theme. Connect via ngrok tunnel with QR code pairing. JARVIS in your pocket.
JARVIS dynamically dispatches work across NPU, GPU, and CPU — whichever is fastest for each operation.
The NPU AMD shipped disabled on consumer silicon. We drive it through FLM proxy at 94 tok/s. INT8 GEMM via XRT xclbin kernels. ~15W power envelope.
llama.cpp on Vulkan. IQ1_S and Q1_0 quantized models at 1.06-1.25 bpw. 381 tok/s on 0.5B, 312 tok/s on 0.8B, 122 tok/s on 4B. ~45W power envelope.
Every component runs locally. The fused engine dispatches per-operation to NPU, GPU, or CPU.
Every fact JARVIS learns is a human-readable .md file with YAML frontmatter. You can read, edit, add, or delete with any text editor. No proprietary databases. No vendor lock-in.
JARVIS processes speech, images, and documents — all on-device, all private.
Push-to-talk from web UI. Natural voice conversation. Auto-TTS on response.
Upload images, screenshots, or photos. JARVIS describes what it sees. Works in the chat.
Upload files, notes, or entire directories. JARVIS indexes and answers from your knowledge.
Prerequisites: AMD Strix Halo (Ryzen AI Max+ 395) with NPU drivers and FLM installed.
FLM (FastFlowLM) runs the model on the XDNA2 NPU. One command, zero config.
Python FastAPI server. Orchestrator, agent, voice I/O, RAG, knowledge base — all in one process.
Full chat UI with streaming, voice, vision, and file upload. Works in Chrome, Firefox, Safari.
Expose JARVIS via ngrok. Scan the QR code with JARVIS Mobile (iOS/Android) to chat from anywhere.
94 tok/s. Zero cloud. Zero subscriptions. Full voice, vision, and RAG. Open source.