feat: Alpha integration test suite via filedrop #143

Closed
opened 2026-02-27 11:44:09 +00:00 by Zeus · 0 comments
Collaborator

Goal

A bash test script that exercises Alpha end-to-end via filedrop messaging. Lives in the repo, gets deployed to olymp.

Architecture

How It Works

  1. Test script sends a filedrop message to Alpha with a specific instruction
  2. Alpha picks it up via FileDropPlugin.receive() (polls every 30s)
  3. Alpha processes it through its LLM and replies via exec("filedrop send ...")
  4. Test script polls the TestRunner inbox for a response
  5. Response is validated against expected criteria

Key Facts

  • Alpha uses Cobot with ollama (qwen2.5:3b) — responses will be LLM-generated, not deterministic
  • Filedrop is a CommunicationProvider, not a ToolProvider — Alpha receives messages via polling but currently sends replies via exec("filedrop send ...") (see #140)
  • Alpha polls every 30s (polling.interval_seconds: 30) — tests need generous timeouts
  • TestRunner identity already exists at /olymp/filedrop/TestRunner/ (inbox dir present)
  • Agent registry at /olymp/agents.json — TestRunner needs an entry if wake support is wanted
  • Filedrop CLI is at /usr/local/bin/filedrop — use FILEDROP_AGENT=TestRunner filedrop send Alpha ...

Message Format

Messages are JSON files in /olymp/filedrop/<Agent>/inbox/:

{
  "id": "1739977200_TestRunner_a1b2c3",
  "from": "TestRunner",
  "to": "Alpha",
  "subject": "test:ping",
  "content": "Reply with exactly: PONG",
  "timestamp": 1739977200,
  "sent_at": "2026-02-19T15:00:00Z"
}

Implementation

File: tests/integration/test_alpha.sh

#!/bin/bash
set -euo pipefail

# Config
DEFAULT_TIMEOUT=120  # seconds (Alpha polls every 30s + LLM processing)
SENDER="TestRunner"
TARGET="Alpha"
INBOX="/olymp/filedrop/$SENDER/inbox"
PAUSE_BETWEEN=5  # seconds between tests

# Usage
./test_alpha.sh              # all tests
./test_alpha.sh ping exec    # specific tests
./test_alpha.sh --list       # list available tests
./test_alpha.sh --timeout 90 # custom timeout

Core Functions

# Send a test message and wait for response
send_and_wait() {
    local test_name="$1"
    local subject="test:$test_name"
    local content="$2"
    local timeout="${3:-$DEFAULT_TIMEOUT}"
    
    # Clear inbox of old responses
    # Send via: FILEDROP_AGENT=$SENDER filedrop send $TARGET "$subject" "$content" --no-wake
    # Then wake: FILEDROP_AGENT=$SENDER filedrop wake $TARGET
    # Poll inbox for response matching this test (by subject or timestamp)
    # Return response content or timeout
}

# Validate response
check_response() {
    local response="$1"
    local pattern="$2"  # grep pattern
    # Return 0 if pattern matches, 1 otherwise
}

Test Cases

# Name Message to Alpha Validation Notes
1 ping "Reply with exactly: PONG" Response contains "PONG" Basic liveness
2 echo "Repeat this word back: BUTTERFLY" Response contains "BUTTERFLY" Instruction following
3 exec "Run echo hello_from_alpha and tell me the output" Response contains "hello_from_alpha" Exec tool works
4 file_read "Read the file /tmp/test_alpha_read.txt and tell me its contents" Response contains test string Pre-create file with known content
5 file_write "Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt" Check file exists and contains text Verify via filesystem
6 filedrop_reply "Send a filedrop message to TestRunner with subject test:reply_ok" Response arrives with correct subject Tests filedrop send capability
7 identity "What is your name?" Response contains "Alpha" Identity awareness

Dropped from original spec:

  • memory — Alpha uses qwen2.5:3b which may not reliably do multi-turn memory in filedrop context
  • knowledge — knowledge plugin may not be configured
  • cron — cron tools exist but testing scheduling via filedrop is flaky
  • error_handling — hard to validate graceful error handling from outside

These can be added as follow-ups once the basic suite works.

Report Format

Output to stdout and optionally to a file (--report <path>):

# Alpha Integration Test Report

**Date:** 2026-02-27 10:00:00 UTC
**Commit:** abc1234
**Target:** Alpha (qwen2.5:3b)
**Timeout:** 120s

## Summary

| Total | Passed | Failed | Timeout |
|-------|--------|--------|----------|
| 7     | 6      | 0      | 1       |

## Results

### ✅ ping (2.3s)
Sent: Reply with exactly: PONG
Response: PONG

### ❌ file_write (timeout)
Sent: Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt
Response: (none within 120s)

Exit Codes

  • 0 — all tests passed
  • 1 — one or more tests failed
  • 2 — script error (missing dependencies, bad args)

Deployment & Execution

In the repo

  • Script at tests/integration/test_alpha.sh (chmod +x)
  • README at tests/integration/README.md

On olymp (after deploy)

scripts/deploy-alpha.sh copies tests/integration/*/olymp/shared/tests/ on every deploy.

To run tests manually on olymp:

# Run all tests
/olymp/shared/tests/test_alpha.sh

# Run specific tests
/olymp/shared/tests/test_alpha.sh ping exec

# With report
/olymp/shared/tests/test_alpha.sh --report /tmp/alpha-report.md

# Custom timeout
/olymp/shared/tests/test_alpha.sh --timeout 180

Any agent on olymp can run the tests — just execute the script. The TestRunner filedrop identity is used automatically.

Automated (future)

Could be triggered post-deploy in deploy-alpha.sh or via a cron job.

Pre-requisites

  • filedrop CLI available in PATH
  • TestRunner inbox exists at /olymp/filedrop/TestRunner/inbox/
  • Alpha is running and processing filedrop messages
  • jq installed for JSON parsing

Implementation Notes for Sub-Agent

  1. Work in a worktree: cd cobot.git && git worktree add ../cobot-impl-143 -b feat/integration-tests forgejo/main
  2. Create tests/integration/ directory
  3. Use FILEDROP_AGENT=TestRunner env var when calling filedrop CLI
  4. Generous timeouts — Alpha polls every 30s, then needs LLM time. Default 120s minimum.
  5. Response matching — poll the TestRunner inbox for messages from Alpha with timestamps after the send. Use jq to parse JSON.
  6. Clean up — remove test files (/tmp/test_alpha_*) and processed inbox messages after each test
  7. Don't test on olymp — the script should be syntactically valid but doesn't need to actually run during development. Verify with bash -n test_alpha.sh and shellcheck.
  8. Run pre-push checks: ruff check cobot/ and ruff format --check cobot/ (even though this is bash, CI still runs on the whole repo)

Related: #140 (filedrop_send tool — Alpha uses exec workaround for now)

## Goal A bash test script that exercises Alpha end-to-end via filedrop messaging. Lives in the repo, gets deployed to olymp. ## Architecture ### How It Works 1. Test script sends a filedrop message to Alpha with a specific instruction 2. Alpha picks it up via `FileDropPlugin.receive()` (polls every 30s) 3. Alpha processes it through its LLM and replies via `exec("filedrop send ...")` 4. Test script polls the TestRunner inbox for a response 5. Response is validated against expected criteria ### Key Facts - **Alpha uses Cobot** with `ollama` (qwen2.5:3b) — responses will be LLM-generated, not deterministic - **Filedrop is a CommunicationProvider**, not a ToolProvider — Alpha receives messages via polling but currently sends replies via `exec("filedrop send ...")` (see #140) - **Alpha polls every 30s** (`polling.interval_seconds: 30`) — tests need generous timeouts - **TestRunner identity** already exists at `/olymp/filedrop/TestRunner/` (inbox dir present) - **Agent registry** at `/olymp/agents.json` — TestRunner needs an entry if wake support is wanted - **Filedrop CLI** is at `/usr/local/bin/filedrop` — use `FILEDROP_AGENT=TestRunner filedrop send Alpha ...` ### Message Format Messages are JSON files in `/olymp/filedrop/<Agent>/inbox/`: ```json { "id": "1739977200_TestRunner_a1b2c3", "from": "TestRunner", "to": "Alpha", "subject": "test:ping", "content": "Reply with exactly: PONG", "timestamp": 1739977200, "sent_at": "2026-02-19T15:00:00Z" } ``` ## Implementation ### File: `tests/integration/test_alpha.sh` ```bash #!/bin/bash set -euo pipefail # Config DEFAULT_TIMEOUT=120 # seconds (Alpha polls every 30s + LLM processing) SENDER="TestRunner" TARGET="Alpha" INBOX="/olymp/filedrop/$SENDER/inbox" PAUSE_BETWEEN=5 # seconds between tests # Usage ./test_alpha.sh # all tests ./test_alpha.sh ping exec # specific tests ./test_alpha.sh --list # list available tests ./test_alpha.sh --timeout 90 # custom timeout ``` ### Core Functions ```bash # Send a test message and wait for response send_and_wait() { local test_name="$1" local subject="test:$test_name" local content="$2" local timeout="${3:-$DEFAULT_TIMEOUT}" # Clear inbox of old responses # Send via: FILEDROP_AGENT=$SENDER filedrop send $TARGET "$subject" "$content" --no-wake # Then wake: FILEDROP_AGENT=$SENDER filedrop wake $TARGET # Poll inbox for response matching this test (by subject or timestamp) # Return response content or timeout } # Validate response check_response() { local response="$1" local pattern="$2" # grep pattern # Return 0 if pattern matches, 1 otherwise } ``` ### Test Cases | # | Name | Message to Alpha | Validation | Notes | |---|------|-----------------|------------|-------| | 1 | **ping** | "Reply with exactly: PONG" | Response contains "PONG" | Basic liveness | | 2 | **echo** | "Repeat this word back: BUTTERFLY" | Response contains "BUTTERFLY" | Instruction following | | 3 | **exec** | "Run `echo hello_from_alpha` and tell me the output" | Response contains "hello_from_alpha" | Exec tool works | | 4 | **file_read** | "Read the file /tmp/test_alpha_read.txt and tell me its contents" | Response contains test string | Pre-create file with known content | | 5 | **file_write** | "Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt" | Check file exists and contains text | Verify via filesystem | | 6 | **filedrop_reply** | "Send a filedrop message to TestRunner with subject test:reply_ok" | Response arrives with correct subject | Tests filedrop send capability | | 7 | **identity** | "What is your name?" | Response contains "Alpha" | Identity awareness | **Dropped from original spec:** - ~~memory~~ — Alpha uses qwen2.5:3b which may not reliably do multi-turn memory in filedrop context - ~~knowledge~~ — knowledge plugin may not be configured - ~~cron~~ — cron tools exist but testing scheduling via filedrop is flaky - ~~error_handling~~ — hard to validate graceful error handling from outside These can be added as follow-ups once the basic suite works. ### Report Format Output to stdout and optionally to a file (`--report <path>`): ```markdown # Alpha Integration Test Report **Date:** 2026-02-27 10:00:00 UTC **Commit:** abc1234 **Target:** Alpha (qwen2.5:3b) **Timeout:** 120s ## Summary | Total | Passed | Failed | Timeout | |-------|--------|--------|----------| | 7 | 6 | 0 | 1 | ## Results ### ✅ ping (2.3s) Sent: Reply with exactly: PONG Response: PONG ### ❌ file_write (timeout) Sent: Write the text ALPHA_WAS_HERE to /tmp/test_alpha_write.txt Response: (none within 120s) ``` ### Exit Codes - `0` — all tests passed - `1` — one or more tests failed - `2` — script error (missing dependencies, bad args) ## Deployment & Execution ### In the repo - Script at `tests/integration/test_alpha.sh` (chmod +x) - README at `tests/integration/README.md` ### On olymp (after deploy) `scripts/deploy-alpha.sh` copies `tests/integration/*` → `/olymp/shared/tests/` on every deploy. To run tests manually on olymp: ```bash # Run all tests /olymp/shared/tests/test_alpha.sh # Run specific tests /olymp/shared/tests/test_alpha.sh ping exec # With report /olymp/shared/tests/test_alpha.sh --report /tmp/alpha-report.md # Custom timeout /olymp/shared/tests/test_alpha.sh --timeout 180 ``` Any agent on olymp can run the tests — just execute the script. The TestRunner filedrop identity is used automatically. ### Automated (future) Could be triggered post-deploy in `deploy-alpha.sh` or via a cron job. ## Pre-requisites - `filedrop` CLI available in PATH ✅ - TestRunner inbox exists at `/olymp/filedrop/TestRunner/inbox/` ✅ - Alpha is running and processing filedrop messages - `jq` installed for JSON parsing ## Implementation Notes for Sub-Agent 1. **Work in a worktree**: `cd cobot.git && git worktree add ../cobot-impl-143 -b feat/integration-tests forgejo/main` 2. **Create `tests/integration/` directory** 3. **Use `FILEDROP_AGENT=TestRunner`** env var when calling filedrop CLI 4. **Generous timeouts** — Alpha polls every 30s, then needs LLM time. Default 120s minimum. 5. **Response matching** — poll the TestRunner inbox for messages from Alpha with timestamps after the send. Use `jq` to parse JSON. 6. **Clean up** — remove test files (`/tmp/test_alpha_*`) and processed inbox messages after each test 7. **Don't test on olymp** — the script should be syntactically valid but doesn't need to actually run during development. Verify with `bash -n test_alpha.sh` and shellcheck. 8. **Run pre-push checks**: `ruff check cobot/` and `ruff format --check cobot/` (even though this is bash, CI still runs on the whole repo) Related: #140 (filedrop_send tool — Alpha uses exec workaround for now)
k9ert closed this issue 2026-02-27 15:32:30 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ultanio/cobot#143
No description provided.