Pillar / Automate · AI

AI shipped to production — not demoed.

LLM features, evals, RAG, agents, cost routing — built by engineers who've shipped them before. We hold the eval suite, the cost dashboard, and the guardrails kit. You hold the model decisions.

Start a project AI readiness review · 3 weeks fixed

Eval-firstNo eval = no ship
Cost-routedPer-route model selection
GuardrailedSafety + privacy enforced

01 · Use cases

Four shapes — pick by what you're trying to ship.

Each shape has a different eval surface and a different cost-failure mode. We pick on the merits.

01
LLM features
Inline AI in your existing product — summarization, drafting, classification. Eval coverage > 80% before ship.
02
RAG
Retrieval-augmented generation over your docs/wiki/tickets. Hybrid retrieval, citation-aware, dataset-aware ranking.
03
Agents
Tool-using LLMs — booking flows, ops automation, code review. Cost ceilings + step budgets enforced; humans in the loop where it matters.
04
Cost routing
Model selection per request — small model for easy queries, big model for hard ones. Saves 60–80% on token spend without quality loss.

02 · What you get

Three artifacts every AI engagement leaves behind.

Versioned, owned by you, replayable.

01
Eval suite
Golden-dataset + ground-truth labels + automated scoring. Runs on every PR; failures block the deploy.
02
Cost-routed prompt pipeline
Per-route model selection, token budgets, fallback chains. Cost dashboards your CFO can read alone.
03
Guardrails kit
PII redaction, jailbreak detection, output moderation, audit log. The compliance surface, not just a llm.completions wrapper.

03 · How we deliver

Eval → cost-route → guardrail → ship.

AI engineering inverts the usual order — we evaluate before we build. The eval suite IS the spec.

Eval
1–2 weeks
Golden dataset, ground-truth labeling, scoring rubric. The acceptance criteria for everything that follows.
Cost-route
1–2 weeks
Per-route model selection, token budgets, batch vs streaming. Cost dashboards wired to your warehouse.
Guardrail
1–2 weeks
PII, jailbreak, moderation, audit. Plumbed into your existing security/compliance pipeline.
Ship
4–12 weeks
Biweekly release with eval scores trending. Regression beyond threshold blocks the deploy.

04 · How to engage

Four shapes — start with a readiness review.

Same pod, four contracting shapes. AI buyers who skip the readiness review tend to over-pay for capability they don't have an eval for.

3 weeks · fixed
Architecture review
A written, fixed-scope assessment with priced recommendations. Zero pressure to engage further — most buyers start here.
Fits when
You need a second pair of eyes before a migration, refactor, or vendor swap.
8 – 24 weeks
Project-based
A defined deliverable, a fixed timeline, a quoted price. We ship, hand off, and stay on call through stabilization.
Fits when
The scope is clear and the date matters more than ongoing capacity.
Annual retainer
Managed operations
A named pod, a unified SLA, and a monthly executive review. The team you'd hire if you weren't trying to stay lean.
Fits when
Multiple workstreams, a roadmap longer than a year, no time to coordinate vendors.
Long-term placement
Embedded engineer
One senior engineer (SRE, platform, ML, security) placed inside your team for 6+ months, accountable to your manager — backed by ours.
Fits when
You have leadership and tooling, but a specific seat is empty and contracting cycles are too slow.

Indicative ranges in the calculator

Open the calculator

05 · FAQ

Four questions AI buyers ask first.

Which model do you use?

Whichever the eval picks. We default to a small model with a big-model fallback for hard queries — saves 60–80% on tokens. Vendor choice is documented in SOW.

Self-hosted or API-based?

Either. Self-hosted via vLLM or Ollama on your infra; API-based via OpenAI/Anthropic/Together. Compliance constraints decide.

What about prompt injection and jailbreaks?

The guardrails kit (deliverable above) handles input sanitization, output moderation, and an audit log. Not perfect, but it's the actual surface — not a vendor's marketing slide.

Smallest engagement?

A 3-week AI readiness review. Use-case audit, eval feasibility read, cost projection, guardrails gap analysis.

06 · Pairs with

AI ships best when grounded in real product surfaces.

07 · Engage

Bring us a use case, an eval question, or an OpenAI invoice that's getting weird.

One delivery lead replies within a business day. Readiness reviews are fixed-scope.

Start a project Submit an RFP

Instant SEO scorecard.

Automate one workflow — 30-min call.

AI shipped to production — not demoed.

Four shapes — pick by what you're trying to ship.

LLM features

RAG

Agents

Cost routing

Three artifacts every AI engagement leaves behind.

Eval suite

Cost-routed prompt pipeline

Guardrails kit

Eval → cost-route → guardrail → ship.

Eval

Cost-route

Guardrail

Ship

Four shapes — start with a readiness review.

Architecture review

Project-based

Managed operations

Embedded engineer

Four questions AI buyers ask first.

AI ships best when grounded in real product surfaces.

Bring us a use case, an eval question, or an OpenAI invoice that's getting weird.