feat: Alpha integration test suite via filedrop #143

New issue

Closed

opened 2026-02-27 11:44:09 +00:00 by Zeus · 0 comments

Zeus commented

2026-02-27 11:44:09 +00:00

Collaborator

Goal

A bash test script that exercises Alpha end-to-end via filedrop messaging. Lives in the repo, gets deployed to olymp.

Architecture

How It Works

Test script sends a filedrop message to Alpha with a specific instruction
Alpha picks it up via FileDropPlugin.receive() (polls every 30s)
Alpha processes it through its LLM and replies via exec("filedrop send ...")
Test script polls the TestRunner inbox for a response
Response is validated against expected criteria

Key Facts

Alpha uses Cobot with ollama (qwen2.5:3b) — responses will be LLM-generated, not deterministic
Filedrop is a CommunicationProvider, not a ToolProvider — Alpha receives messages via polling but currently sends replies via exec("filedrop send ...") (see #140)
Alpha polls every 30s (polling.interval_seconds: 30) — tests need generous timeouts
TestRunner identity already exists at /olymp/filedrop/TestRunner/ (inbox dir present)
Agent registry at /olymp/agents.json — TestRunner needs an entry if wake support is wanted
Filedrop CLI is at /usr/local/bin/filedrop — use FILEDROP_AGENT=TestRunner filedrop send Alpha ...

Message Format

Messages are JSON files in /olymp/filedrop/<Agent>/inbox/:

{
  "id": "1739977200_TestRunner_a1b2c3",
  "from": "TestRunner",
  "to": "Alpha",
  "subject": "test:ping",
  "content": "Reply with exactly: PONG",
  "timestamp": 1739977200,
  "sent_at": "2026-02-19T15:00:00Z"
}

Implementation

File: `tests/integration/test_alpha.sh`

#!/bin/bash
set -euo pipefail

# Config
DEFAULT_TIMEOUT=120  # seconds (Alpha polls every 30s + LLM processing)
SENDER="TestRunner"
TARGET="Alpha"
INBOX="/olymp/filedrop/$SENDER/inbox"
PAUSE_BETWEEN=5  # seconds between tests

# Usage
./test_alpha.sh              # all tests
./test_alpha.sh ping exec    # specific tests
./test_alpha.sh --list       # list available tests
./test_alpha.sh --timeout 90 # custom timeout

Core Functions

# Send a test message and wait for response
send_and_wait() {
    local test_name="$1"
    local subject="test:$test_name"
    local content="$2"
    local timeout="${3:-$DEFAULT_TIMEOUT}"
    
    # Clear inbox of old responses
    # Send via: FILEDROP_AGENT=$SENDER filedrop send $TARGET "$subject" "$content" --no-wake
    # Then wake: FILEDROP_AGENT=$SENDER filedrop wake $TARGET
    # Poll inbox for response matching this test (by subject or timestamp)
    # Return response content or timeout
}

# Validate response
check_response() {
    local response="$1"
    local pattern="$2"  # grep pattern
    # Return 0 if pattern matches, 1 otherwise
}

Test Cases

#	Name	Message to Alpha	Validation	Notes
1	ping	"Reply with exactly: PONG"	Response contains "PONG"	Basic liveness
2	echo	"Repeat this word back: BUTTERFLY"	Response contains "BUTTERFLY"	Instruction following
3	exec	"Run `echo hello_from_alpha` and tell me the output"	Response contains "hello_from_alpha"	Exec tool works
4	file_read	"Read the file /tmp/test_alpha_read.txt and tell me its contents"	Response contains test string	Pre-create file with known content
5	file_write	"Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt"	Check file exists and contains text	Verify via filesystem
6	filedrop_reply	"Send a filedrop message to TestRunner with subject test:reply_ok"	Response arrives with correct subject	Tests filedrop send capability
7	identity	"What is your name?"	Response contains "Alpha"	Identity awareness

Dropped from original spec:

~~memory~~ — Alpha uses qwen2.5:3b which may not reliably do multi-turn memory in filedrop context
~~knowledge~~ — knowledge plugin may not be configured
~~cron~~ — cron tools exist but testing scheduling via filedrop is flaky
~~error_handling~~ — hard to validate graceful error handling from outside

These can be added as follow-ups once the basic suite works.

Report Format

Output to stdout and optionally to a file (--report <path>):

# Alpha Integration Test Report

**Date:** 2026-02-27 10:00:00 UTC
**Commit:** abc1234
**Target:** Alpha (qwen2.5:3b)
**Timeout:** 120s

## Summary

| Total | Passed | Failed | Timeout |
|-------|--------|--------|----------|
| 7     | 6      | 0      | 1       |

## Results

### ✅ ping (2.3s)
Sent: Reply with exactly: PONG
Response: PONG

### ❌ file_write (timeout)
Sent: Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt
Response: (none within 120s)

Exit Codes

0 — all tests passed
1 — one or more tests failed
2 — script error (missing dependencies, bad args)

Deployment & Execution

In the repo

Script at tests/integration/test_alpha.sh (chmod +x)
README at tests/integration/README.md

On olymp (after deploy)

scripts/deploy-alpha.sh copies tests/integration/* → /olymp/shared/tests/ on every deploy.

To run tests manually on olymp:

# Run all tests
/olymp/shared/tests/test_alpha.sh

# Run specific tests
/olymp/shared/tests/test_alpha.sh ping exec

# With report
/olymp/shared/tests/test_alpha.sh --report /tmp/alpha-report.md

# Custom timeout
/olymp/shared/tests/test_alpha.sh --timeout 180

Any agent on olymp can run the tests — just execute the script. The TestRunner filedrop identity is used automatically.

Automated (future)

Could be triggered post-deploy in deploy-alpha.sh or via a cron job.

Pre-requisites

filedrop CLI available in PATH ✅
TestRunner inbox exists at /olymp/filedrop/TestRunner/inbox/ ✅
Alpha is running and processing filedrop messages
jq installed for JSON parsing

Implementation Notes for Sub-Agent

Work in a worktree: cd cobot.git && git worktree add ../cobot-impl-143 -b feat/integration-tests forgejo/main
Create tests/integration/ directory
Use FILEDROP_AGENT=TestRunner env var when calling filedrop CLI
Generous timeouts — Alpha polls every 30s, then needs LLM time. Default 120s minimum.
Response matching — poll the TestRunner inbox for messages from Alpha with timestamps after the send. Use jq to parse JSON.
Clean up — remove test files (/tmp/test_alpha_*) and processed inbox messages after each test
Don't test on olymp — the script should be syntactically valid but doesn't need to actually run during development. Verify with bash -n test_alpha.sh and shellcheck.
Run pre-push checks: ruff check cobot/ and ruff format --check cobot/ (even though this is bash, CI still runs on the whole repo)

Related: #140 (filedrop_send tool — Alpha uses exec workaround for now)

## Goal A bash test script that exercises Alpha end-to-end via filedrop messaging. Lives in the repo, gets deployed to olymp. ## Architecture ### How It Works 1. Test script sends a filedrop message to Alpha with a specific instruction 2. Alpha picks it up via `FileDropPlugin.receive()` (polls every 30s) 3. Alpha processes it through its LLM and replies via `exec("filedrop send ...")` 4. Test script polls the TestRunner inbox for a response 5. Response is validated against expected criteria ### Key Facts - **Alpha uses Cobot** with `ollama` (qwen2.5:3b) — responses will be LLM-generated, not deterministic - **Filedrop is a CommunicationProvider**, not a ToolProvider — Alpha receives messages via polling but currently sends replies via `exec("filedrop send ...")` (see #140) - **Alpha polls every 30s** (`polling.interval_seconds: 30`) — tests need generous timeouts - **TestRunner identity** already exists at `/olymp/filedrop/TestRunner/` (inbox dir present) - **Agent registry** at `/olymp/agents.json` — TestRunner needs an entry if wake support is wanted - **Filedrop CLI** is at `/usr/local/bin/filedrop` — use `FILEDROP_AGENT=TestRunner filedrop send Alpha ...` ### Message Format Messages are JSON files in `/olymp/filedrop/<Agent>/inbox/`: ```json { "id": "1739977200_TestRunner_a1b2c3", "from": "TestRunner", "to": "Alpha", "subject": "test:ping", "content": "Reply with exactly: PONG", "timestamp": 1739977200, "sent_at": "2026-02-19T15:00:00Z" } ``` ## Implementation ### File: `tests/integration/test_alpha.sh` ```bash #!/bin/bash set -euo pipefail # Config DEFAULT_TIMEOUT=120 # seconds (Alpha polls every 30s + LLM processing) SENDER="TestRunner" TARGET="Alpha" INBOX="/olymp/filedrop/$SENDER/inbox" PAUSE_BETWEEN=5 # seconds between tests # Usage ./test_alpha.sh # all tests ./test_alpha.sh ping exec # specific tests ./test_alpha.sh --list # list available tests ./test_alpha.sh --timeout 90 # custom timeout ``` ### Core Functions ```bash # Send a test message and wait for response send_and_wait() { local test_name="$1" local subject="test:$test_name" local content="$2" local timeout="${3:-$DEFAULT_TIMEOUT}" # Clear inbox of old responses # Send via: FILEDROP_AGENT=$SENDER filedrop send $TARGET "$subject" "$content" --no-wake # Then wake: FILEDROP_AGENT=$SENDER filedrop wake $TARGET # Poll inbox for response matching this test (by subject or timestamp) # Return response content or timeout } # Validate response check_response() { local response="$1" local pattern="$2" # grep pattern # Return 0 if pattern matches, 1 otherwise } ``` ### Test Cases | # | Name | Message to Alpha | Validation | Notes | |---|------|-----------------|------------|-------| | 1 | **ping** | "Reply with exactly: PONG" | Response contains "PONG" | Basic liveness | | 2 | **echo** | "Repeat this word back: BUTTERFLY" | Response contains "BUTTERFLY" | Instruction following | | 3 | **exec** | "Run `echo hello_from_alpha` and tell me the output" | Response contains "hello_from_alpha" | Exec tool works | | 4 | **file_read** | "Read the file /tmp/test_alpha_read.txt and tell me its contents" | Response contains test string | Pre-create file with known content | | 5 | **file_write** | "Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt" | Check file exists and contains text | Verify via filesystem | | 6 | **filedrop_reply** | "Send a filedrop message to TestRunner with subject test:reply_ok" | Response arrives with correct subject | Tests filedrop send capability | | 7 | **identity** | "What is your name?" | Response contains "Alpha" | Identity awareness | **Dropped from original spec:** - ~~memory~~ — Alpha uses qwen2.5:3b which may not reliably do multi-turn memory in filedrop context - ~~knowledge~~ — knowledge plugin may not be configured - ~~cron~~ — cron tools exist but testing scheduling via filedrop is flaky - ~~error_handling~~ — hard to validate graceful error handling from outside These can be added as follow-ups once the basic suite works. ### Report Format Output to stdout and optionally to a file (`--report <path>`): ```markdown # Alpha Integration Test Report **Date:** 2026-02-27 10:00:00 UTC **Commit:** abc1234 **Target:** Alpha (qwen2.5:3b) **Timeout:** 120s ## Summary | Total | Passed | Failed | Timeout | |-------|--------|--------|----------| | 7 | 6 | 0 | 1 | ## Results ### ✅ ping (2.3s) Sent: Reply with exactly: PONG Response: PONG ### ❌ file_write (timeout) Sent: Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt Response: (none within 120s) ``` ### Exit Codes - `0` — all tests passed - `1` — one or more tests failed - `2` — script error (missing dependencies, bad args) ## Deployment & Execution ### In the repo - Script at `tests/integration/test_alpha.sh` (chmod +x) - README at `tests/integration/README.md` ### On olymp (after deploy) `scripts/deploy-alpha.sh` copies `tests/integration/*` → `/olymp/shared/tests/` on every deploy. To run tests manually on olymp: ```bash # Run all tests /olymp/shared/tests/test_alpha.sh # Run specific tests /olymp/shared/tests/test_alpha.sh ping exec # With report /olymp/shared/tests/test_alpha.sh --report /tmp/alpha-report.md # Custom timeout /olymp/shared/tests/test_alpha.sh --timeout 180 ``` Any agent on olymp can run the tests — just execute the script. The TestRunner filedrop identity is used automatically. ### Automated (future) Could be triggered post-deploy in `deploy-alpha.sh` or via a cron job. ## Pre-requisites - `filedrop` CLI available in PATH ✅ - TestRunner inbox exists at `/olymp/filedrop/TestRunner/inbox/` ✅ - Alpha is running and processing filedrop messages - `jq` installed for JSON parsing ## Implementation Notes for Sub-Agent 1. **Work in a worktree**: `cd cobot.git && git worktree add ../cobot-impl-143 -b feat/integration-tests forgejo/main` 2. **Create `tests/integration/` directory** 3. **Use `FILEDROP_AGENT=TestRunner`** env var when calling filedrop CLI 4. **Generous timeouts** — Alpha polls every 30s, then needs LLM time. Default 120s minimum. 5. **Response matching** — poll the TestRunner inbox for messages from Alpha with timestamps after the send. Use `jq` to parse JSON. 6. **Clean up** — remove test files (`/tmp/test_alpha_*`) and processed inbox messages after each test 7. **Don't test on olymp** — the script should be syntactically valid but doesn't need to actually run during development. Verify with `bash -n test_alpha.sh` and shellcheck. 8. **Run pre-push checks**: `ruff check cobot/` and `ruff format --check cobot/` (even though this is bash, CI still runs on the whole repo) Related: #140 (filedrop_send tool — Alpha uses exec workaround for now)