The coordination
layer for
AI taskforces.
A coordination protocol and runtime that lets a higher-level AI supervisor reliably govern specialized workers, tools, and evaluations. Inspectable, safe, and robust.
Today, multi-step AI agents are powerful but messy. When workflows scale, they become hard to trust: inventing arguments, looping on failures, or quietly executing unauthorized actions.
Taskforce establishes formal execution contracts, handling the tedious repair loops automatically, and escalating to the human only for strategic, irreversible milestones.
Let's optimize a manual grunt process.
Sourcing early-stage startups and matching them to investors for an AI-native VC fund is a fragile house of cards under typical improvisational agent frameworks.
Scattered Sourcing
Associates scrape directories manually. The unconstrained LLM improvises founder records, invents profiles, and clogs the CRM with duplicate data.
Intake Friction
Parsing pitch deck PDFs. Models frequently miscalculate runway times or confuse Gross Merchandise Value with actual Annual Recurring Revenue.
Silent Failures
Cross-checking metric claims. If an API connection drops, scripts ignore the error and route unverified, hallucinated metrics directly to partners.
2 AM Slack Pings
The script breaks because a model payload returned markdown fences instead of pure JSON. Developers wake up in the middle of the night to fix it.
We encapsulated this sourcing handbook into an immutable playbook.
Taskforce locks the execution boundaries. The strategic L2 supervisor checks schemas, dispatches tasks to specialized L3 workers, repairs validation anomalies autonomously, and escalates to human gates only when strategic.
Initialize Playbook
Taskforce loads the playbook schema and locks execution boundaries against the Hub registry. The agent is strictly barred from calling unapproved tools or creating unverified loops.
Specialized Dispatch
L3 worker fetches unstructured files and matches schemas. Under task constraints, output structures are normalized and strictly validated.
Contract Check Failed
The contract judge scans output text and detects a factual grounding gap: the assertion "ABRT is deprecated" has no source evidence. Execution halts.
Autonomous Self-Repair
Taskforce Strategic Supervisor intercepts the audit exception. Instead of crashing, it feeds the validation report back, dispatches a patch work order, and commands a corrective rewrite.
Re-Audit: Factual Grounding Verified
The corrected output is re-run through evaluation. Factual grounding reaches 100% compliance with strict corpus alignment. The node passes safely.
Strategic Human Gate
Low-level issues are managed. However, high-stakes external actions (dispatching the finalized deal sheets) require an explicit human signature.
Committed
Deal sheet successfully committed to production channels. Playbook logs are locked, and a transparent runtime receipt is generated.
Autonomy without contracts is chaos.
Most developer frameworks optimize for maximum agent autonomy—giving models tools and letting them improvise. But in enterprise systems, unconstrained autonomy creates silent failures and unpredictable operational overhead.
LLMs figuring out the pipeline on the fly.
When agents randomly choose tools, synthesize inputs, or attempt recursive repairs, they enter unstable execution paths.
- Models invent tool payloads or make up parameters, throwing runtime exceptions.
- Infinite recursion loops consume significant token budget on trivial syntax retries.
- Silent degraded paths generate fake or mock data to bypass failed API steps.
- Humans are dragged in for low-level mechanical fixes (e.g. "repair this JSON field format").
Execution constrained by immutable Playbooks.
Taskforce constrains LLMs inside strict, inspectable boundaries. The system operates inside predefined capabilities approved in the Hub.
- All capabilities, tools, and schemas are pre-approved and locked in the Taskforce Hub.
- Autonomous repair protocols automatically resolve low-level schema issues without human noise.
- Failing path honesty: if a required resource or credential is missing, fail clearly and halt.
- Humans only answer the **"WHAT"** (approving high-stakes, irreversible, or strategic outputs).
Honesty is a runtime property, not just a team value.
No Silent Degradations
If an API call fails or resources are offline, the system throws an explicit error immediately. We do not mask failures with synthetic placeholder responses.
Predefined Capabilities
Supervisors lookup approved workers and playbooks from the central Hub. A model cannot invent new tools or improvise actions beyond its registry contract.
People will not build serious, long-term companies on AI infrastructure that quietly substitutes fake behavior or skips the hard parts.
Taskforce works under a zero-pretence rule. If a required credential, eval score, or human signature is missing, the workflow fails clearly. This uncompromising predictability is the bedrock of enterprise trust.
"We make agentic work inspectable, repeatable, and completely safe by replacing endless LLM trial-and-error with formal software boundaries."
Inspired by how real organizations operate.
We separate strategic coordination from low-level narrow execution. This L2/L3 division ensures complete accountability and granular inspectability.
The Supervisor
Acts like an architect or mission commander. It is given a playbook, maps the execution nodes, monitors tasks, handles repair loops, and escalates when necessary.
Taskforce Hub.
The Hub is an internal, strict capability registry. It is the source of truth for playbooks, tools, schemas, and evaluators. Taskforce references this directory to check permissions, enforce contracts, and route tasks.
| Kind | Identifier | Status |
|---|---|---|
| playbook | market-intel.v1 | approved |
| worker | worker.collect@1.2 | approved |
| worker | worker.score@2.1 | approved |
| worker | worker.rewrite@1.0 | approved |
| eval | eval.claims@0.9 | approved |
| tool | tool.arxiv.query | approved |
| tool | tool.github.search | approved |
| worker | worker.voice@0.4 | review |
| pattern | fail.grounding-mismatch | indexed |
| pattern | fail.api-quota-limit | indexed |
Separating Design Mode from Execution Mode.
The factory supervisor.
L2 works directly under the pre-approved Playbook contract: dispatching work packages, verifying schemas, running self-repair loops, and stopping execution at strategic human boundaries.
The system architect.
L2 acts as a system designer. It analyzes the mission objective, maps missing worker requirements, structures schemas, tests the evaluation gates, and prompts the human to sign off before deployment.
Build serious AI operations on a runtime that refuses to pretend.
We are deploying private sandbox environments for developers, AI-native founders, and investors. Input your corporate credentials to deploy a sample Taskforce playbook.