Permission Boundary
When can an agent execute a tool? When must it stop for human approval?
Opening
OpenHive is an open-source control plane for creating, running, and governing agent systems: platform monitoring, project coordination, channel assistants, approvals, audits, isolated execution, and controlled evolution in one operating model.
Audience: technical teams, production teams, and background/platform operations teams. Use arrow keys or the bottom controls to navigate. Press F for fullscreen. Print to PDF from the browser when needed.
Production Gap
When can an agent execute a tool? When must it stop for human approval?
Every decision, prompt, tool call, change, and rollback must be traceable.
Code, commands, vendor secrets, and business credentials cannot live in one uncontrolled process.
Self-improvement must pass through evaluation, approval, rollout, and rollback instead of unrestricted self-modification.
What OpenHive Is
It is not a single vertical app or a chat-assistant framework. It provides a unified runtime, trusted extensions, approval boundaries, audit replay, isolated execution, and multi-agent orchestration.
Who can run, what can run, which model is used, and which resources can be reached.
Evaluation, approval, audit, rollout, and rollback become first-class product surfaces.
Business behavior comes from templates, plugins, skill packs, and policies instead of being hard-coded into the core runtime.
Platform Shape
Agent Roles
Platform monitor agent responsible for platform health, anomaly windows, silent Keeper detection, and platform-level audit signals.
Project manager agent that coordinates projects, analyzes signals, proposes changes, and manages Scouts and workflows.
Group or channel assistant that responds to users, executes installed skills, and collects feedback and context.
Work role for governed tasks or sandbox execution, used for stronger isolation or more specialized execution paths.
Background execution path for recurring analysis, classification, alerts, reports, and Prompt Shadow runs.
Platform Operator
Admins can inspect Queen heartbeat health, recent runs, consecutive healthy runs, failure windows, and next scheduled run.
Queen watches platform anomalies, silent Keepers, missed runs, and states that need operational attention.
Queen events flow into platform audit; run details can link to remediation, diffs, and approval records.
For production teams, Queen is the platform on-call view. For technical teams, Queen is the observable entry point for platform health, scheduling, and runtime governance.
Project Manager Agent
Create and manage Scouts, project workflows, and collaboration context.
Process feedback queues, run evaluations, and identify change candidates.
Create reviewable changes, skill evolution proposals, or work tasks instead of directly modifying production behavior.
Drive tools, skills, and sandbox work inside approval and capability boundaries.
Front-Line Collaboration Entry
Feishu / Lark is the current baseline integration, with a path to more messaging providers later.
Scout can only execute skills that are installed, assigned, and aligned with credential declarations.
Scout collects field signals, Keeper analyzes them, and production owners confirm through approval and audit.
Single-Runtime Principle
Queen, Keeper, Scout, and Worker share the runtime loop, tool governance, context governance, and observability model.
The core runtime stays business-agnostic; business value is injected by plugins, skills, templates, and policies.
Security boundaries, approval resume, tool planning, and audit trails can be verified against one runtime model.
Docker / K8s / Local Isolation
Fast to start by default, but the in-process LocalAgentPool is not a hard process boundary.
Separate the agent runtime, scrub inherited environment variables, and access models through the gateway relay.
Put Agent / Sandbox / Pipeline tasks behind clearer container runtime boundaries.
Use Pods, NetworkPolicy, health checks, and smoke tests to prove the production topology incrementally.
Be precise: the current preview path is a product evaluation entry point; stronger secret residency and network boundaries require explicit isolated paths and deployment proof.
Credential Proxy / Provider Management
Vendor keys and integration credentials should stay in the Gateway or another explicitly trusted secret-holder role whenever possible.
Provider secrets can be admin-managed, encrypted at rest, and returned only as masked status, never raw values.
Isolated runtimes use Gateway Relay for budgeted, model-allowlisted, scope-limited access.
RunState / ToolExecutionPlan
The runtime receives tool name, arguments, and context.
Record capabilities, argument previews, idempotency keys, and policy decisions.
RunState enters awaiting_approval before side effects happen.
Approved, rejected, expired, or requires-more-context outcomes all enter audit.
Approved calls execute once; rejected calls return safe tool results to the model.
Shadow Prompt Testing
Pipeline runs the current production prompt normally, and its result continues to power real notifications and workflows.
The same Pipeline step runs again with shadow_prompt to produce candidate output.
The system stores production/shadow differences, and promotion waits for PM or production-owner approval in the Dashboard.
Prompt Shadow turns prompt tuning from guesswork into a comparable, auditable, rollback-ready rollout flow.
Self-evolution, but governed
Feedback, failures, misclassifications, repeated requests, and post-run review candidates.
Keeper or an evolution plugin creates proposed skill, prompt, memory, or policy changes.
Run tests, Prompt Shadow, diffs, evidence checks, and security scans.
Human owners decide whether to promote to local installed copies or project configuration.
History and audit let production teams attribute and recover.
Plugin / Skill / Marketplace
Stateless capability units that run as JSON stdin/stdout subprocesses and do not directly share core state.
Platform capabilities such as channels, policy, card actions, and evolution logic.
Compose templates, skill packs, policies, and blueprints into reusable workload starting points.
Install, upgrade, edit, promote, and execute flows all go through permissions, audit, and local-copy management.
Production Visibility
Project overview, agent status, runs, sessions, work tasks, configuration, and skill governance.
Cross-project usage, trends, cost signals, and scale-of-operation views.
Audit entry point for platform events, project changes, diffs, approvals, and Queen events.
Queen monitoring, user admission, provider management, platform AI governance, and runtime maintenance.
First Pilots
Coding support, technical research, documentation workflows, and internal operations assistants; a strong dogfooding path for technical teams.
Recurring analysis, ecosystem tracking, alerts, and report generation; a good path for production teams to validate continuous-operation value.
Support, product, region, or process-specific agents that scale under shared policy and rollout control.
From Preview To Production-Facing
Local Docker PostgreSQL, FastAPI, and Next.js to validate the product path and team workflow.
Introduce separated processes, environment scrubbing, Relay model access, and secret-residency validation.
Put governed commands, workspace tasks, patch approvals, and log collection behind a container boundary.
Use Pods, NetworkPolicy, health checks, smoke tests, and release validation to support stronger deployment models.
Questions Before The Pilot
Recommendation
The best first step is not launching many agents. It is choosing one repetitive, measurable, clearly owned, rollback-ready production scenario.
One business/platform workflow, one owner, one metric.
Isolation mode, secret residency, tool approval, Prompt Shadow, and rollback rules.
Queen monitors the platform, Keeper manages the project, Scout enters channels, and Pipeline runs shadow evaluation.