nazim/avault

Fork 1

Security Audit: avault.py — findings and recommendations #16

New issue

Open

opened 2026-03-06 21:22:34 +00:00 by doxios · 0 comments

doxios commented

2026-03-06 21:22:34 +00:00

Security Audit: avault.py

Auditor: Doxios (AI agent, Cobot maintainer)
Requested by: Ben (@webdiverblue)
Date: 2026-03-06
Scope: scripts/avault.py (1494 lines, commit on main)
Methodology: Static code review — no runtime testing performed

Severity Legend

🔴 CRITICAL — Exploitable, breaks security model
🟠 HIGH — Significant risk, should fix before production use
🟡 MEDIUM — Defense-in-depth gap, fix when practical
🟢 LOW — Hardening opportunity, nice-to-have
ℹ️ INFO — Design observation, not a vulnerability

🔴 CRITICAL Findings

C1: Silent nsec Fallback Undermines Entire Security Model

Location: get_nsec_string() (line ~174), cli_or_daemon() (line ~738)

Issue: When the daemon isn't running, every CLI command silently falls back to reading the nsec from ~/.profile or NOSTR_NSEC env var. This means the entire NIP-46 remote signing flow — the core value proposition — is bypassed whenever the daemon is down.

Impact: An attacker who can read ~/.profile gets full vault access. The daemon's RAM-only security guarantee is meaningless while the fallback exists.

Recommendation:

Add a --no-fallback flag (or make it the default) that refuses to operate without the daemon
At minimum, print a loud warning: ⚠️ INSECURE: Using plaintext nsec from ~/.profile
After #12 (remove nsec from profile), this becomes moot — but until then, it's a critical gap

C2: Secrets Passed in CLI Arguments (Visible in Process List)

Location: cmd_set(), argparse --value parameter (line ~942)

Issue: avault set myservice --key API_KEY --value sk-1234secret puts the secret value in the process argument list, visible to any user via ps aux or /proc/PID/cmdline.

Impact: Any local user can harvest secrets by watching process lists. On shared systems, this is trivially exploitable.

Recommendation:

Read --value from stdin when value is - or when --value is omitted: echo 'sk-1234' | avault set myservice --key API_KEY
Or prompt interactively: avault set myservice --key API_KEY → Value: (with terminal echo disabled via getpass)

🟠 HIGH Findings

H1: No Authentication on Unix Socket

Location: serve() (line ~628), handle_client() (line ~611)

Issue: The daemon socket is protected only by filesystem permissions (chmod 0o600). Any process running as the same user can connect and issue commands, including get (read any secret), set (modify secrets), and shutdown.

Impact: If any other process running as the same user is compromised (e.g., a Cobot plugin, a Node.js app, a cron script), it has full vault access. This is especially relevant because avault's purpose is to isolate secrets from the agent — but the agent process (same user) can just talk to the socket directly.

Recommendation:

Consider a simple shared-secret token (generated at daemon start, stored in a file readable only by authorized processes)
Or use SO_PEERCRED to verify the connecting PID against an allowlist
Long-term: run daemon as a different user than the agent, use group permissions

H2: No Memory Wiping on Shutdown

Location: serve() cleanup (line ~687)

self.vault = None
self.keys = None

Issue: Setting Python references to None does NOT erase the secret data from memory. The original bytes remain in the Python heap until garbage collected and the memory page is reused. A memory dump (core dump, /proc/PID/mem, cold boot attack) can recover secrets after "wiping".

Impact: The README promises "Kill the daemon or reboot → secrets gone." This is not strictly true. Secrets persist in memory until GC and page reuse.

Recommendation:

Use ctypes.memset() or mmap with explicit zeroing for sensitive buffers
Disable core dumps: resource.setrlimit(resource.RLIMIT_CORE, (0, 0))
Acknowledge the Python limitation in docs — true secure memory wiping requires C/Rust
Consider mlock() to prevent secrets from being swapped to disk

H3: Vault File Written Without Atomic Rename

Location: save_vault() (line ~234)

VAULT_FILE.write_text(encrypted + "\n")

Issue: If the process crashes mid-write, secrets.vault is corrupted — partially written ciphertext that can't be decrypted. All secrets are lost.

Impact: Data loss on crash during save. The ACID guarantee mentioned in the PRD is not actually implemented.

Recommendation:

import tempfile
tmp = tempfile.NamedTemporaryFile(dir=AVAULT_DIR, delete=False, mode='w')
tmp.write(encrypted + '\n')
tmp.flush()
os.fsync(tmp.fileno())
tmp.close()
os.rename(tmp.name, str(VAULT_FILE))

H4: `cmd_init` Writes Generated nsec to `~/.profile` in Plaintext

Location: cmd_init() (line ~812)

with open(PROFILE_FILE, "a") as f:
    f.write(f'\nexport NOSTR_NSEC="{nsec_str}"\n')

Issue: When generating a new keypair, the nsec is appended to ~/.profile in plaintext. This directly contradicts the security model.

Impact: The newly generated nsec is immediately compromised to any process that can read ~/.profile.

Recommendation:

Only write nsec.enc (encrypted to owner). Never write plaintext nsec to disk.
If nsec is needed temporarily for the init flow, hold in RAM only and guide the user through the NIP-46 setup immediately.

H5: Background Daemon Doesn't Redirect File Descriptors Safely

Location: daemon_start() background fork (line ~724)

sys.stdin.close()
devnull = open(os.devnull, "w")
sys.stdout = devnull
sys.stderr = devnull

Issue: File descriptors 0/1/2 are not properly redirected. sys.stdin.close() closes the Python wrapper but not necessarily fd 0. A new file opened later could get fd 0, causing confusion. Also missing double-fork for proper daemonization.

Recommendation:

Use proper daemonization: double-fork, os.dup2(devnull_fd, 0/1/2), os.umask(0o077)
Or better: remove background mode entirely and rely on systemd/supervisord (which also handles restarts)

🟡 MEDIUM Findings

M1: 10MB Message Size Limit Is Excessive

Location: recv_msg() (line ~152)

if length > 10 * 1024 * 1024:  # 10MB sanity limit

Issue: A rogue client can force the daemon to allocate 10MB per request. With socket backlog of 5 and threaded handling, this allows ~50MB memory consumption from a malicious local process.

Recommendation: Reduce to 64KB or 256KB. No legitimate vault request should exceed a few KB.

M2: No Rate Limiting or Connection Limits on Socket

Location: serve() (line ~628)

Issue: Unlimited concurrent connections, no rate limiting. A local DoS can exhaust daemon threads/memory.

Recommendation: Limit concurrent connections (e.g., max 10 threads). Add simple rate limiting per second.

M3: `_auto_commit()` Silently Swallows All Errors

Location: _auto_commit() (line ~251)

except Exception:
    pass  # non-fatal

Issue: If git push fails, the operator has no idea their vault changes aren't backed up. A network outage could mean days of unsynced changes.

Recommendation: Log failures (at minimum to stderr). Consider a status flag or file that tracks last successful push.

M4: `git push` Runs as Detached `Popen` — No Error Handling

Location: _auto_commit() (line ~266)

subprocess.Popen(
    ["git", "push", "origin", "main"],
    ...
    stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)

Issue: Fire-and-forget push with discarded output. If push fails (auth expired, remote down, merge conflict), it's silently lost.

M5: No Integrity Verification on Vault Load

Location: load_vault() (line ~203)

Issue: NIP-44 provides authenticated encryption (Poly1305 MAC), which is good. But there's no additional integrity check — if the ciphertext file is truncated (partial write from H3), the decryption will fail with a cryptic error rather than a clear "vault corrupted" message.

Recommendation: Wrap load_vault() in a try/except that gives a clear error message and suggests recovery steps (fleet-recover, restore from git).

M6: `export --shell` Doesn't Sanitize Key Names

Location: cmd_export() (line ~984)

print(f"export {k}='{escaped}'")

Issue: Secret key names are not validated. A malicious key name like FOO$(rm -rf /) would be injected into the shell export. Values are escaped (single-quote wrapping), but keys are not.

Recommendation: Validate key names against ^[A-Z_][A-Z0-9_]*$ before outputting.

🟢 LOW Findings

L1: PID File Race Condition

Location: daemon_running() (line ~395) only checks socket existence, not PID liveness. daemon_start() checks socket but not PID file.

Recommendation: Verify PID is actually alive (os.kill(pid, 0)) before assuming daemon is running.

L2: Socket Timeout Allows Slow Client DoS

Location: handle_client() — conn.settimeout(10) means a slow client can hold a thread for 10 seconds.

L3: `parse_profile_exports()` Regex May Miss Edge Cases

Location: Line ~358. The regex '^export\s+([A-Z_][A-Z0-9_]*)="?([^"\n]*)"?$' doesn't handle single-quoted values, escaped quotes, or multi-line values.

L4: No File Permission Check on `secrets.vault`

Recommendation: Warn if secrets.vault is world-readable (should be 600).

ℹ️ Design Observations

D1: Self-Encryption Pattern

The vault is encrypted with nip44_encrypt(agent_sk, agent_pk, ...) — the agent encrypts to itself. This is correct for NIP-44 (conversation key derived from both keys), but worth documenting that this means anyone with the agent's nsec can decrypt. The nsec IS the vault key.

D2: Central Manifest Is a Smart Design

secrets.central — metadata encrypted to the owner, no values — is a well-thought-out pattern. The owner can audit what secrets exist across their fleet without seeing values. 👏

D3: NIP-46 Flow Is Well-Structured

The start_nip46() method handles both known-identity (already paired) and ephemeral-identity (fresh pairing) flows cleanly. The QR code for initial pairing is a nice UX touch.

D4: Fleet Commands Are Forward-Looking

fleet-audit and fleet-recover with owner nsec show good operational thinking. Recovery path exists if an agent is lost.

Summary

Severity	Count	Key Issues
🔴 CRITICAL	2	Silent nsec fallback (#C1), secrets in CLI args (#C2)
🟠 HIGH	5	No socket auth (#H1), no memory wipe (#H2), non-atomic writes (#H3), init writes plaintext nsec (#H4), unsafe daemonization (#H5)
🟡 MEDIUM	6	Excessive message size (#M1), no rate limiting (#M2), silent git errors (#M3-M4), no integrity check (#M5), shell injection (#M6)
🟢 LOW	4	PID race (#L1), slow client DoS (#L2), regex gaps (#L3), no permission check (#L4)

Overall Assessment

The cryptographic foundation is solid — NIP-44 (XChaCha20-Poly1305) is a strong choice, NIP-46 remote signing is well-implemented, and the architecture (daemon holds secrets in RAM, CLI talks over socket) is the right pattern.

The critical issues are operational, not cryptographic:

The silent fallback to plaintext nsec (#C1 + #H4) means the security model is currently unenforced
Secrets visible in process lists (#C2) is a classic oversight
The socket has no auth (#H1) — same-user processes bypass the vault entirely

Recommended fix order: C1 → C2 → H3 → H4 → H1 → H2 → everything else.

The code is well-structured, readable, and shows good security thinking in the crypto layer. The gaps are in the OS-level hardening around it. Fixable without architectural changes.

CC: @webdiverblue @k9ert
Filed by Doxios 🦊 on behalf of Ben

## Security Audit: avault.py **Auditor:** Doxios (AI agent, Cobot maintainer) **Requested by:** Ben (@webdiverblue) **Date:** 2026-03-06 **Scope:** `scripts/avault.py` (1494 lines, commit on main) **Methodology:** Static code review — no runtime testing performed --- ## Severity Legend - 🔴 **CRITICAL** — Exploitable, breaks security model - 🟠 **HIGH** — Significant risk, should fix before production use - 🟡 **MEDIUM** — Defense-in-depth gap, fix when practical - 🟢 **LOW** — Hardening opportunity, nice-to-have - ℹ️ **INFO** — Design observation, not a vulnerability --- ## 🔴 CRITICAL Findings ### C1: Silent nsec Fallback Undermines Entire Security Model **Location:** `get_nsec_string()` (line ~174), `cli_or_daemon()` (line ~738) **Issue:** When the daemon isn't running, every CLI command silently falls back to reading the nsec from `~/.profile` or `NOSTR_NSEC` env var. This means the entire NIP-46 remote signing flow — the core value proposition — is bypassed whenever the daemon is down. **Impact:** An attacker who can read `~/.profile` gets full vault access. The daemon's RAM-only security guarantee is meaningless while the fallback exists. **Recommendation:** - Add a `--no-fallback` flag (or make it the default) that refuses to operate without the daemon - At minimum, print a loud warning: `⚠️ INSECURE: Using plaintext nsec from ~/.profile` - After #12 (remove nsec from profile), this becomes moot — but until then, it's a critical gap ### C2: Secrets Passed in CLI Arguments (Visible in Process List) **Location:** `cmd_set()`, argparse `--value` parameter (line ~942) **Issue:** `avault set myservice --key API_KEY --value sk-1234secret` puts the secret value in the process argument list, visible to any user via `ps aux` or `/proc/PID/cmdline`. **Impact:** Any local user can harvest secrets by watching process lists. On shared systems, this is trivially exploitable. **Recommendation:** - Read `--value` from stdin when value is `-` or when `--value` is omitted: `echo 'sk-1234' | avault set myservice --key API_KEY` - Or prompt interactively: `avault set myservice --key API_KEY` → `Value: ` (with terminal echo disabled via `getpass`) --- ## 🟠 HIGH Findings ### H1: No Authentication on Unix Socket **Location:** `serve()` (line ~628), `handle_client()` (line ~611) **Issue:** The daemon socket is protected only by filesystem permissions (`chmod 0o600`). Any process running as the same user can connect and issue commands, including `get` (read any secret), `set` (modify secrets), and `shutdown`. **Impact:** If any other process running as the same user is compromised (e.g., a Cobot plugin, a Node.js app, a cron script), it has full vault access. This is especially relevant because avault's purpose is to *isolate* secrets from the agent — but the agent process (same user) can just talk to the socket directly. **Recommendation:** - Consider a simple shared-secret token (generated at daemon start, stored in a file readable only by authorized processes) - Or use `SO_PEERCRED` to verify the connecting PID against an allowlist - Long-term: run daemon as a different user than the agent, use group permissions ### H2: No Memory Wiping on Shutdown **Location:** `serve()` cleanup (line ~687) ```python self.vault = None self.keys = None ``` **Issue:** Setting Python references to `None` does NOT erase the secret data from memory. The original bytes remain in the Python heap until garbage collected and the memory page is reused. A memory dump (core dump, `/proc/PID/mem`, cold boot attack) can recover secrets after "wiping". **Impact:** The README promises "Kill the daemon or reboot → secrets gone." This is not strictly true. Secrets persist in memory until GC and page reuse. **Recommendation:** - Use `ctypes.memset()` or `mmap` with explicit zeroing for sensitive buffers - Disable core dumps: `resource.setrlimit(resource.RLIMIT_CORE, (0, 0))` - Acknowledge the Python limitation in docs — true secure memory wiping requires C/Rust - Consider `mlock()` to prevent secrets from being swapped to disk ### H3: Vault File Written Without Atomic Rename **Location:** `save_vault()` (line ~234) ```python VAULT_FILE.write_text(encrypted + "\n") ``` **Issue:** If the process crashes mid-write, `secrets.vault` is corrupted — partially written ciphertext that can't be decrypted. All secrets are lost. **Impact:** Data loss on crash during save. The ACID guarantee mentioned in the PRD is not actually implemented. **Recommendation:** ```python import tempfile tmp = tempfile.NamedTemporaryFile(dir=AVAULT_DIR, delete=False, mode='w') tmp.write(encrypted + '\n') tmp.flush() os.fsync(tmp.fileno()) tmp.close() os.rename(tmp.name, str(VAULT_FILE)) ``` ### H4: `cmd_init` Writes Generated nsec to `~/.profile` in Plaintext **Location:** `cmd_init()` (line ~812) ```python with open(PROFILE_FILE, "a") as f: f.write(f'\nexport NOSTR_NSEC="{nsec_str}"\n') ``` **Issue:** When generating a new keypair, the nsec is appended to `~/.profile` in plaintext. This directly contradicts the security model. **Impact:** The newly generated nsec is immediately compromised to any process that can read `~/.profile`. **Recommendation:** - Only write `nsec.enc` (encrypted to owner). Never write plaintext nsec to disk. - If nsec is needed temporarily for the init flow, hold in RAM only and guide the user through the NIP-46 setup immediately. ### H5: Background Daemon Doesn't Redirect File Descriptors Safely **Location:** `daemon_start()` background fork (line ~724) ```python sys.stdin.close() devnull = open(os.devnull, "w") sys.stdout = devnull sys.stderr = devnull ``` **Issue:** File descriptors 0/1/2 are not properly redirected. `sys.stdin.close()` closes the Python wrapper but not necessarily fd 0. A new file opened later could get fd 0, causing confusion. Also missing double-fork for proper daemonization. **Recommendation:** - Use proper daemonization: double-fork, `os.dup2(devnull_fd, 0/1/2)`, `os.umask(0o077)` - Or better: remove background mode entirely and rely on systemd/supervisord (which also handles restarts) --- ## 🟡 MEDIUM Findings ### M1: 10MB Message Size Limit Is Excessive **Location:** `recv_msg()` (line ~152) ```python if length > 10 * 1024 * 1024: # 10MB sanity limit ``` **Issue:** A rogue client can force the daemon to allocate 10MB per request. With socket backlog of 5 and threaded handling, this allows ~50MB memory consumption from a malicious local process. **Recommendation:** Reduce to 64KB or 256KB. No legitimate vault request should exceed a few KB. ### M2: No Rate Limiting or Connection Limits on Socket **Location:** `serve()` (line ~628) **Issue:** Unlimited concurrent connections, no rate limiting. A local DoS can exhaust daemon threads/memory. **Recommendation:** Limit concurrent connections (e.g., max 10 threads). Add simple rate limiting per second. ### M3: `_auto_commit()` Silently Swallows All Errors **Location:** `_auto_commit()` (line ~251) ```python except Exception: pass # non-fatal ``` **Issue:** If git push fails, the operator has no idea their vault changes aren't backed up. A network outage could mean days of unsynced changes. **Recommendation:** Log failures (at minimum to stderr). Consider a status flag or file that tracks last successful push. ### M4: `git push` Runs as Detached `Popen` — No Error Handling **Location:** `_auto_commit()` (line ~266) ```python subprocess.Popen( ["git", "push", "origin", "main"], ... stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, ) ``` **Issue:** Fire-and-forget push with discarded output. If push fails (auth expired, remote down, merge conflict), it's silently lost. ### M5: No Integrity Verification on Vault Load **Location:** `load_vault()` (line ~203) **Issue:** NIP-44 provides authenticated encryption (Poly1305 MAC), which is good. But there's no additional integrity check — if the ciphertext file is truncated (partial write from H3), the decryption will fail with a cryptic error rather than a clear "vault corrupted" message. **Recommendation:** Wrap `load_vault()` in a try/except that gives a clear error message and suggests recovery steps (fleet-recover, restore from git). ### M6: `export --shell` Doesn't Sanitize Key Names **Location:** `cmd_export()` (line ~984) ```python print(f"export {k}='{escaped}'") ``` **Issue:** Secret key names are not validated. A malicious key name like `FOO$(rm -rf /)` would be injected into the shell export. Values are escaped (single-quote wrapping), but keys are not. **Recommendation:** Validate key names against `^[A-Z_][A-Z0-9_]*$` before outputting. --- ## 🟢 LOW Findings ### L1: PID File Race Condition **Location:** `daemon_running()` (line ~395) only checks socket existence, not PID liveness. `daemon_start()` checks socket but not PID file. **Recommendation:** Verify PID is actually alive (`os.kill(pid, 0)`) before assuming daemon is running. ### L2: Socket Timeout Allows Slow Client DoS **Location:** `handle_client()` — `conn.settimeout(10)` means a slow client can hold a thread for 10 seconds. ### L3: `parse_profile_exports()` Regex May Miss Edge Cases **Location:** Line ~358. The regex `'^export\s+([A-Z_][A-Z0-9_]*)="?([^"\n]*)"?$'` doesn't handle single-quoted values, escaped quotes, or multi-line values. ### L4: No File Permission Check on `secrets.vault` **Recommendation:** Warn if `secrets.vault` is world-readable (should be 600). --- ## ℹ️ Design Observations ### D1: Self-Encryption Pattern The vault is encrypted with `nip44_encrypt(agent_sk, agent_pk, ...)` — the agent encrypts to itself. This is correct for NIP-44 (conversation key derived from both keys), but worth documenting that this means anyone with the agent's nsec can decrypt. The nsec IS the vault key. ### D2: Central Manifest Is a Smart Design `secrets.central` — metadata encrypted to the owner, no values — is a well-thought-out pattern. The owner can audit what secrets exist across their fleet without seeing values. 👏 ### D3: NIP-46 Flow Is Well-Structured The `start_nip46()` method handles both known-identity (already paired) and ephemeral-identity (fresh pairing) flows cleanly. The QR code for initial pairing is a nice UX touch. ### D4: Fleet Commands Are Forward-Looking `fleet-audit` and `fleet-recover` with owner nsec show good operational thinking. Recovery path exists if an agent is lost. --- ## Summary | Severity | Count | Key Issues | |----------|-------|------------| | 🔴 CRITICAL | 2 | Silent nsec fallback (#C1), secrets in CLI args (#C2) | | 🟠 HIGH | 5 | No socket auth (#H1), no memory wipe (#H2), non-atomic writes (#H3), init writes plaintext nsec (#H4), unsafe daemonization (#H5) | | 🟡 MEDIUM | 6 | Excessive message size (#M1), no rate limiting (#M2), silent git errors (#M3-M4), no integrity check (#M5), shell injection (#M6) | | 🟢 LOW | 4 | PID race (#L1), slow client DoS (#L2), regex gaps (#L3), no permission check (#L4) | ## Overall Assessment The **cryptographic foundation is solid** — NIP-44 (XChaCha20-Poly1305) is a strong choice, NIP-46 remote signing is well-implemented, and the architecture (daemon holds secrets in RAM, CLI talks over socket) is the right pattern. The **critical issues are operational, not cryptographic:** 1. The silent fallback to plaintext nsec (#C1 + #H4) means the security model is currently unenforced 2. Secrets visible in process lists (#C2) is a classic oversight 3. The socket has no auth (#H1) — same-user processes bypass the vault entirely **Recommended fix order:** C1 → C2 → H3 → H4 → H1 → H2 → everything else. The code is well-structured, readable, and shows good security thinking in the crypto layer. The gaps are in the OS-level hardening around it. Fixable without architectural changes. --- *CC: @webdiverblue @k9ert* *Filed by Doxios 🦊 on behalf of Ben*