The Tacit Intelligence Company

RL Environments for Domains Where Human Judgment Is Required

Custom training environments with verifiable rewards—for healthcare, insurance, coaching, and case management.

Tacit Knowledge

Expertise That Can't Be Written Down

Tacit knowledge is what experts know but can't articulate. A senior underwriter spots fraud in 30 seconds. A veteran case manager knows which clients need a call versus an email. Ask them how they know? They shrug. "Experience."

This knowledge takes years to develop and walks out the door when experts leave. It isn't captured in documents, training manuals, or conversation logs—because the decision trace alone doesn't reveal the reasoning.

The Core Problem

A novice and an expert can reach the same conclusion for completely different reasons. The expert noticed three red flags and ruled them out. The novice got lucky.

You can't reverse-engineer reasoning from outcomes.

Our environments solve this by measuring the reasoning process itself—not just the conclusion. We design scenarios where expertise becomes visible through the questions asked, the information sought, and the factors weighed.

Measure Humans

Calibrate against your best practitioners. Understand what separates expert reasoning from novice pattern-matching. Build the ground truth that defines "good" for your domain.

Measure AI

Generate verifiable rewards for training. Dense signal per turn, not sparse end-of-conversation feedback. The same environments that measure humans produce training signal for models.

The Problem

You Can't RLVR What You Can't Verify

Reinforcement Learning from Verifiable Rewards works for math and code because you can check the answer. But most high-stakes domains—healthcare, insurance, advisory, case management—don't have ground-truth answers you can verify programmatically.

We found a way to verify what was previously unverifiable.

Early ResultsPreliminary

Small Models, Big Gains

Preliminary results show that smaller models trained with our verifiable expert signal dramatically outperform larger base models on domain reasoning tasks.

Results from Insurance Case Management training.

4B trained vs 8B base
+155%

Our fine-tuned 4B model more than doubles the performance of a larger 8B base model

4B trained vs 32B base
+53%

Outperforms a model 8Ă— its size on domain reasoning tasks

4B trained vs 235B baseline
+28%

A 4B model outperforming a 235B model on domain reasoning

A 4B model fine-tuned on 106 expert scenarios outperforms models 60Ă— its size on domain-specific reasoning.

Same architecture. Better training signal. These are early results—we're scaling scenario count now.

What We Build

Custom RL Environments with Verifiable Rewards

Expert-Calibrated Ground Truth

Our scenarios contain information AI must uncover through skilled questioning. Did it ask what an expert would ask? Did it discover the factors that change the decision? Binary, verifiable, no learned reward model required.

Simulated Environments

We construct the game for your AI to play. Scenarios form a continuous space over your domain—a gym where your model works out against simulated clients and real-world constraints.

Organization-Specific Training

Every organization has unique constraints, history, and affordances. We build environments calibrated to your specific context—not generic domain models that miss what makes your organization different.

How It Works

From Expertise Capture to Training Signal

01

Domain Mapping

Understand Expertise

We work with your target domain to understand what expertise looks like—the questions experts ask, the factors they weigh, the reasoning they can't articulate.

02

Reward Design

Define Ground Truth

Expert-calibrated ground truth. Grounded in expertise research, calibrated against human practitioners, producing verifiable signal for the first time in judgment domains.

03

Scenario Architecture

Build Environments

We build interactive scenarios with counterfactuals and branching paths. An agent exploring the scenario reveals its reasoning approach through the information it seeks.

04

Training Infrastructure

Generate Trajectories

Trajectory generation at scale. Dense rewards per turn. Compatible with standard RL pipelines—GRPO, PPO, DPO, TRL.

Environments

In the Pipeline

Actively Training4B +28% vs 235B

Insurance Case Management

Return to work reasoning, social rehabilitation planning, and case progression for injury and health insurance workflows.

In Development

Clinical Triage

Patient prioritization, symptom assessment, and escalation decisions in healthcare intake settings.

In Development

Strength & Conditioning Education

Exercise prescription, periodization, and client assessment reasoning for higher education programs training the next generation of practitioners.

In Development

Nutrition Coaching

Dietary assessment, behavior change strategy, and personalized guidance across diverse client populations.

In Development

Experimental Design

Scientific research methodology—hypothesis formation, variable control, statistical power, and study design reasoning.

Work With Us

How We Work With Labs

Training Signal License

Access to trajectory data from our environments for your training pipelines.

  • Per-domain or comprehensive access
  • Continuous generation at scale
  • Compatible with GRPO, PPO, DPO, TRL, OpenPipe

Custom Environment Build

We design and build RL environments for your specific target domains.

  • Turnkey—we build, maintain, and run the training
  • Expert network access for calibration
  • Ongoing scenario development
The Team

Built By Infrastructure Veterans

We've spent decades understanding how experts actually think—not from documents, but from building systems for the world's most elite organizations.

Smartabase / Teamworks

Built the world's leading human performance operating system—European Space Agency, US SOCOM, police, fire, and government agencies across 15+ countries.

AI Infrastructure

Head of AI Engineering previously VP Engineering at Avos (Steve Chen's company post-YouTube), leading the team for Series A with NEA and Google Ventures.

Get Access

RL environments for domains where human judgment is required.