Securing Browser-Based Local AI: Threat Models & Hardening

Threat model and hardening guide for mobile browsers running local AI (Puma-style): provenance, runtime isolation, SRE playbooks, and practical mitigations.

Hook — Your mobile browser now runs AI. Are you ready to defend it?

Development teams and SREs are swallowing a new complexity: browsers on iOS and Android (examples: Puma and other local-AI browsers) are running LLMs and inference engines locally. That reduces latency and privacy exposure — but it creates a fresh attack surface across models, on-device storage, runtime engines (WASM/Native), web APIs, and platform attestation. If you're building or operating browser-based local AI, you need a pragmatic threat model and a secure-by-design hardening plan that fits mobile constraints and incident-response realities.

Executive summary — 3 things to do first

Threat-model the entire stack: assets (user data, model weights, signing keys), actors (malicious web pages, compromised apps, supply-chain actors), and interfaces (WebGPU, WebAssembly, IndexedDB, service workers).
Enforce integrity and provenance: model signing, SBOMs for model artifacts, and attestation to ensure the model running locally is the one you shipped.
Design minimal privileges: restrict Web APIs, isolate the runtime, encrypt storage, and instrument for detection and incident response.

Why browser-based local AI matters now (2026 context)

By 2026 the hybrid local/cloud AI model has matured: phones have quantized LLM runtimes, mobile GPUs are accessible via WebGPU, and lightweight browsers such as Puma popularized on-device inference to reduce telemetry and latency. At the same time we saw big vendor moves — cross-company AI integrations and ongoing supply-chain incidents — that intensified scrutiny on provenance and privacy. That combination makes local-AI-in-browser both compelling for users and attractive to attackers.

Assets you must protect

User data: queries, context windows, personally identifiable information (PII) cached in IndexedDB, or stored in files.
Models and weights: the binary artifacts (quantized weights, tokenizers) and any proprietary configuration.
Signing keys and build artifacts: used to verify model updates and runtime binaries.
Runtime integrity: WebAssembly modules, native helpers, GPU kernels.
Device secrets: keystore items, OAuth refresh tokens, platform attestation keys.
Telemetry and logs: which if leaked, reveal user behavior or internal detection cues.

Adversaries & attack vectors

Map actors to motivations and capabilities before writing code. Common adversaries include:

Malicious web pages that try to coerce the in-browser model into leaking data (prompt-injection, jailbreaks, prompt poisoning).
Rogue extensions or apps that escalate privileges to access local model storage or network APIs.
Supply-chain attackers who tamper with model artifacts upstream or compromise update servers.
Local device compromise (rooted/jailbroken phones, malware) that reads files or hooks runtime APIs.
Side-channel attackers measuring GPU/CPU consumption, cache timing, or power usage to extract model details or secrets.

Common attack patterns explained

Prompt injection / jailbreaks: crafted inputs that coerce the model into revealing secrets or executing actions. Browser UIs, web-workers, or service-workers can be vectors.
Model extraction: repeated querying or side-channels to reconstruct model weights or proprietary behavior.
Poisoning: corrupting training or fine-tune artifacts during model updates or third-party plugin installs.
Runtime compromise: exploiting vulnerabilities in the WASM runtime, JS wrappers, or native drivers (GPU) to run arbitrary code.
Data exfiltration: abusing permitted network APIs or cross-origin interactions to leak PII to remote servers.

A concrete threat model for Puma-style local AI browsers

Use this model as a template for workshops. For each row, list likelihood and impact, then assign mitigations.

Threat: Malicious page tricks local model into leaking a user's recent chat history. Entry points: page content, third-party scripts, prompt UI. Mitigations: prompt templates with strict instruction filters, context window redaction, runtime input sanitizers, rate-limiting queries.
Threat: Compromised model update server. Entry points: HTTP endpoints for model downloads. Mitigations: model signing, reproducible builds, timestamped rollbacks, two-tiered update verification using device attestation.
Threat: Rogue extension reads IndexedDB and uploads PII. Entry points: extension APIs, service workers. Mitigations: storage encryption bound to secure enclave, strict extension permissions, limit service-worker access to model storage.
Threat: Side-channel leaks via GPU. Entry points: WebGPU combined with unguarded runtime. Mitigations: noise injection, batching, hardware-backed secure compute when available.

Hardening controls — secure-by-design checklist

Below are practical controls engineers can implement today. Treat them as guardrails and prioritize by threat impact.

Provenance & supply-chain

Digitally sign model artifacts and verify signatures before loading. Enforce strict signature verification in the browser runtime.
Ship SBOMs and hash manifests for each model/version. Automate integrity checks in CI and on-device update flows.
Use reproducible builds for quantization pipelines and record build metadata (commit IDs, toolchain versions).

Runtime integrity & isolation

Run models in isolated realms: use cross-origin isolation and separate service-worker scopes. Consider WebAssembly modules with strict import tables.
Sandbox GPU access: limit WebGPU shaders to vetted kernels; if possible, restrict access via feature policy or origin trials.
Enable platform attestation: where available (Secure Enclave, StrongBox), bind model keys to hardware and verify attestation before model execution.

Data protection

Encrypt on-disk storage for contexts and cached prompts with keys bound to the device keystore.
Minimize retention: drop or short-TTL context windows that contain PII. Offer privacy-preserving defaults in the UX.
Use differential privacy or DP-SGD techniques when collecting telemetry or aggregated usage metrics.

Network & permissions

Least-privilege network rules: when local inference requires cloud fallback, use ephemeral tokens and domain allowlists. Avoid global network access from the model runtime.
Permission UX: surface explicit permissions when a webpage requests access to local-AI features; log consent events for audits.

Policy & controls

Instruction filtering: implement server-side or local rule engines to classify and block potentially hazardous prompts (e.g., data-exfiltration patterns).
Throttle/Rate-limit queries to prevent model extraction attempts. Use anomaly detection on query patterns.

Runtime / Web API-specific hardening

Browsers expose powerful APIs that intersect with local AI. Below are targeted mitigations.

Service Workers: scope and restrict them; deny access to model storage unless explicitly required and audited.
IndexedDB: store only encrypted blobs; avoid storing raw user messages unless necessary and purge after use.
WebAssembly: enable CFI where feasible, keep WASM modules minimal, and sandbox host imports. Use static analysis and fuzzing during CI for the WASM glue code.
WebGPU: avoid exposing raw buffers to untrusted scripts; mediate compute shaders via a vetted kernel library.
Cross-Origin Policies (COOP/COEP): enable strict isolation to reduce side-channel surface and cross-site leaks.
Content-Security-Policy (CSP): block inline scripts, use strict script-src, and SRI for remote resources.

SRE & incident response for local AI

Traditional SRE playbooks assume centralized servers. Local AI requires adapting those playbooks because the “server” is thousands or millions of devices.

Detection

Instrument client-side telemetry for anomalous query patterns, signature verification failures, and unexpected model updates. Use privacy-preserving aggregation (e.g., RAPPOR, DP).
Monitor update-server logs, code-signing key usage, and SBOM mismatches.

Containment

Immediately revoke compromised model signatures and publish a revocation manifest.
Push an emergency update that disables network fallback or forces a safe mode allowing only verified minimal behavior.

Eradication & Recovery

Provide a clear client-side remediation flow: verify signature, replace model, purge cache. Automate rollback via a signed manifest and a short update TTL.
After recovery, rotate keys tied to attestation and update SBOM and CI processes to close the supply-chain vector.

Postmortem

Capture the minimal set of telemetry that can reconstruct attack timelines without violating user privacy. Publish an action plan to stakeholders and customers.

Privacy, compliance, and UX trade-offs

Local inference enables privacy gains, but poor design negates them. Key considerations:

Default to local-only for PII processing. If cloud fallback is necessary, default to user consent and opt-in telemetry.
Data minimization: redact PII from contexts and prompts before storage or telemetry collection.
Compliance: ensure models and telemetry flows meet GDPR/CCPA obligations. Maintain a clear data-processing record for on-device operations.

Developer workflows: test and iterate

Operationalize security in the dev lifecycle.

Threat-model during design: run STRIDE/PASTA sessions for new features that expose the model to web content.
Automated adversarial testing: include prompt-injection fuzzers in CI that emulate malicious web pages and extensions.
Red-team the update chain: simulate a compromised build artifact and verify detection and rollback works end-to-end.
Integrate SBOM verification and signature checks into mobile CI and release pipelines.

Real-world examples & case study

Teams already running local models in browsers report two consistent findings (late 2025 / early 2026): first, the majority of high-risk incidents trace to improper storage of context and lax update verification; second, when hardware attestation and model signing are used together, incident severity drops substantially. One mid-sized mobile browser vendor found that adding a signed manifest verification step prevented a supply-chain incident from propagating — they rolled back in under 30 minutes and communicated transparently with users, preserving trust.

"Model integrity and attestation are the most effective levers for preventing supply-chain impact on users." — Head of Security, mobile browser company (anonymized)

2026 predictions: what's next for browser-local AI security

Standardized model attestation: expect cross-vendor standards for signed model metadata and attestation tokens by late 2026.
Platform-level secure compute: more devices will expose private compute enclaves (Apple Private Compute, Android TEE enhancements) specifically designed for model inference.
Regulatory focus: privacy and AI-safety rules will push vendors to make provenance and audit logs mandatory for certain model classes.
Browser vendor features: CSP extensions and WebAPI flags to explicitly mediate AI runtimes will appear in major engines.

Actionable checklist — implement this in the next 90 days

Require model signing and integrate verification in the browser runtime.
Encrypt all on-device model context with a keystore-backed key and set strict TTLs.
Limit WebGPU and WASM access to vetted kernels; use sandboxed import tables.
Run prompt-injection fuzz tests in CI and add anomaly detection for query patterns.
Publish SBOMs for every model and automate manifest verification on-device.
Create an emergency revocation manifest and a recovery playbook for SREs.

Final thoughts — build trust by design

Local AI in mobile browsers offers strong UX and privacy benefits, but it shifts responsibility onto developers and security teams. The right combination of provenance, runtime isolation, minimal privileges, and SRE preparedness makes local inference resilient. Start with a focused threat-model workshop, prioritize model integrity, and bake in detection and revocation — those steps will reduce risk quickly and measurably.

Call to action

Ready to harden your browser-based local AI? Download our 90-day implementation checklist and incident-response playbook tailored for mobile browsers running local LLMs. Run a threat-model workshop with your team this week — and if you want a template SBOM and manifest verification script, reach out to our team at toolkit.top for community-tested artifacts.

Securing Browser-Based AI: Threat Models and Hardening for Mobile Browsers

Hook — Your mobile browser now runs AI. Are you ready to defend it?

Executive summary — 3 things to do first

Why browser-based local AI matters now (2026 context)

Assets you must protect