Desktop Gym

A simulated macOS task arena for training and evaluation.

Structured task scenarios spanning browser, native apps, terminal, and cross-app workflows. Each scenario generates a trace artifact for replay, eval scoring, and fine-tuning data collection.

Current arena

Simulated desktop state

Safari · github.com/PsiClaw/psi-claw
Frontmost · PR #12 open · API skill active
API call readyNo DOM scrape neededLow risk
VS Code · psi-claw
Background · clean working tree
No unsaved changesTypeScript activeEditable repo
Terminal · psi-claw dev server
Background process healthy
pnpm dev runningNo sudoPort 3000
Training queue

Task scenarios

Review and merge PR
Ready

Use GitHub API skill to fetch PR diff, summarize changes, run eval on code quality, then request operator approval before merging.

Draft Thursday async update
Running

Read recent memory files and retrospection logs, populate the Ritual Foundation update form fields, and confirm with operator before submission.

Desktop cleanup: Downloads folder
Queued

Observe Downloads folder state, categorize files, propose reversible moves to appropriate directories. No deletes without confirmation.

Cross-app: Commit and notify team
Drafting

Stage changes in VS Code, compose commit message, push via terminal, then compose a brief Slack thread update — all with operator approval at irreversible steps.