MiniMe Technical Whitepaper
A privacy-first architecture for extracting semantic knowledge graphs from active window telemetry using on-device Large Language Models.
Abstract
Knowledge workers generate thousands of fragmented data points daily across dozens of applications. Existing time-tracking solutions rely on manual entry or invasive cloud-based surveillance. MiniMe proposes a novel architecture that captures telemetry locally, utilizes a quantized Llama-3 model running on-device for entity recognition (NER), and builds a semantic Personal Knowledge Graph (PKG). This approach guarantees absolute data sovereignty while providing enterprise-grade analytics.
1. System Architecture
The MiniMe system is composed of three primary decoupled layers:
- The Sensor Daemon: A lightweight Rust binary natively integrated with macOS (Accessibility API), Windows (User32), and Linux (X11/Wayland). It polls active window state at 1Hz, buffering changes to avoid disk thrashing.
- The Local Backend: A Python/FastAPI service managing a local SQLite database. It serves as the primary router for incoming telemetry and outgoing UI queries.
- The AI Engine: An integration with Ollama to run 8B parameter models. The engine executes batch inference jobs during idle periods to enrich raw telemetry with semantic metadata.
2. The Knowledge Graph (PKG)
Unlike relational time-logs, MiniMe structures data as an RDF-style graph. Nodes represent Software Applications, Projects, Skills, and People. Edges represent temporal interactions (e.g., [User] USED [VS Code] FOR [2 hours] ON [minime-backend]). This enables complex Cypher-like queries directly against the user's history.
3. Privacy & Security Model
The foundational invariant of MiniMe is Zero Trust Cloud. The local node is the primary source of truth.
When the user opts into Cloud Sync (for cross-device experiences or Enterprise team aggregation), the local node encrypts the PKG using AES-256-GCM derived from a user-held master password. The server stores only ciphertext blobs. The cloud backend cannot construct analytics; it acts strictly as a dumb storage layer (E2EE).
4. Performance Benchmarks
In our tests on an Apple M3 Mac with 16GB RAM:
- Sensor Daemon CPU usage: `< 0.5%`
- SQLite write latency: 2ms (via WAL mode WAL and synchronous=NORMAL)
- Local LLM Inference (Batch of 50 activities): 4.2 seconds