Architecture review
A written, fixed-scope assessment with priced recommendations. Zero pressure to engage further — most buyers start here.
You need a second pair of eyes before a migration, refactor, or vendor swap.
Pillar / Automate · AI
LLM features, evals, RAG, agents, cost routing — built by engineers who've shipped them before. We hold the eval suite, the cost dashboard, and the guardrails kit. You hold the model decisions.
01 · Use cases
Each shape has a different eval surface and a different cost-failure mode. We pick on the merits.
Inline AI in your existing product — summarization, drafting, classification. Eval coverage > 80% before ship.
Retrieval-augmented generation over your docs/wiki/tickets. Hybrid retrieval, citation-aware, dataset-aware ranking.
Tool-using LLMs — booking flows, ops automation, code review. Cost ceilings + step budgets enforced; humans in the loop where it matters.
Model selection per request — small model for easy queries, big model for hard ones. Saves 60–80% on token spend without quality loss.
02 · What you get
Versioned, owned by you, replayable.
Golden-dataset + ground-truth labels + automated scoring. Runs on every PR; failures block the deploy.
Per-route model selection, token budgets, fallback chains. Cost dashboards your CFO can read alone.
PII redaction, jailbreak detection, output moderation, audit log. The compliance surface, not just a llm.completions wrapper.
03 · How we deliver
AI engineering inverts the usual order — we evaluate before we build. The eval suite IS the spec.
Golden dataset, ground-truth labeling, scoring rubric. The acceptance criteria for everything that follows.
Per-route model selection, token budgets, batch vs streaming. Cost dashboards wired to your warehouse.
PII, jailbreak, moderation, audit. Plumbed into your existing security/compliance pipeline.
Biweekly release with eval scores trending. Regression beyond threshold blocks the deploy.
04 · How to engage
Same pod, four contracting shapes. AI buyers who skip the readiness review tend to over-pay for capability they don't have an eval for.
A written, fixed-scope assessment with priced recommendations. Zero pressure to engage further — most buyers start here.
You need a second pair of eyes before a migration, refactor, or vendor swap.
A defined deliverable, a fixed timeline, a quoted price. We ship, hand off, and stay on call through stabilization.
The scope is clear and the date matters more than ongoing capacity.
A named pod, a unified SLA, and a monthly executive review. The team you'd hire if you weren't trying to stay lean.
Multiple workstreams, a roadmap longer than a year, no time to coordinate vendors.
One senior engineer (SRE, platform, ML, security) placed inside your team for 6+ months, accountable to your manager — backed by ours.
You have leadership and tooling, but a specific seat is empty and contracting cycles are too slow.
Indicative ranges in the calculator
Open the calculator05 · FAQ
Whichever the eval picks. We default to a small model with a big-model fallback for hard queries — saves 60–80% on tokens. Vendor choice is documented in SOW.
Either. Self-hosted via vLLM or Ollama on your infra; API-based via OpenAI/Anthropic/Together. Compliance constraints decide.
The guardrails kit (deliverable above) handles input sanitization, output moderation, and an audit log. Not perfect, but it's the actual surface — not a vendor's marketing slide.
A 3-week AI readiness review. Use-case audit, eval feasibility read, cost projection, guardrails gap analysis.
06 · Pairs with
07 · Engage
One delivery lead replies within a business day. Readiness reviews are fixed-scope.