A simulated macOS task arena for training and evaluation.
Structured task scenarios spanning browser, native apps, terminal, and cross-app workflows. Each scenario generates a trace artifact for replay, eval scoring, and fine-tuning data collection.
Simulated desktop state
Task scenarios
Use GitHub API skill to fetch PR diff, summarize changes, run eval on code quality, then request operator approval before merging.
Read recent memory files and retrospection logs, populate the Ritual Foundation update form fields, and confirm with operator before submission.
Observe Downloads folder state, categorize files, propose reversible moves to appropriate directories. No deletes without confirmation.
Stage changes in VS Code, compose commit message, push via terminal, then compose a brief Slack thread update — all with operator approval at irreversible steps.