Skip to content
helptype.
Book a Call

Pillar / Automate · AI

AI shipped to production — not demoed.

LLM features, evals, RAG, agents, cost routing — built by engineers who've shipped them before. We hold the eval suite, the cost dashboard, and the guardrails kit. You hold the model decisions.

  • Eval-firstNo eval = no ship
  • Cost-routedPer-route model selection
  • GuardrailedSafety + privacy enforced

01 · Use cases

Four shapes — pick by what you're trying to ship.

Each shape has a different eval surface and a different cost-failure mode. We pick on the merits.

  1. 01

    LLM features

    Inline AI in your existing product — summarization, drafting, classification. Eval coverage > 80% before ship.

  2. 02

    RAG

    Retrieval-augmented generation over your docs/wiki/tickets. Hybrid retrieval, citation-aware, dataset-aware ranking.

  3. 03

    Agents

    Tool-using LLMs — booking flows, ops automation, code review. Cost ceilings + step budgets enforced; humans in the loop where it matters.

  4. 04

    Cost routing

    Model selection per request — small model for easy queries, big model for hard ones. Saves 60–80% on token spend without quality loss.

02 · What you get

Three artifacts every AI engagement leaves behind.

Versioned, owned by you, replayable.

  1. 01

    Eval suite

    Golden-dataset + ground-truth labels + automated scoring. Runs on every PR; failures block the deploy.

  2. 02

    Cost-routed prompt pipeline

    Per-route model selection, token budgets, fallback chains. Cost dashboards your CFO can read alone.

  3. 03

    Guardrails kit

    PII redaction, jailbreak detection, output moderation, audit log. The compliance surface, not just a llm.completions wrapper.

03 · How we deliver

Eval → cost-route → guardrail → ship.

AI engineering inverts the usual order — we evaluate before we build. The eval suite IS the spec.

  1. Eval

    1–2 weeks

    Golden dataset, ground-truth labeling, scoring rubric. The acceptance criteria for everything that follows.

  2. Cost-route

    1–2 weeks

    Per-route model selection, token budgets, batch vs streaming. Cost dashboards wired to your warehouse.

  3. Guardrail

    1–2 weeks

    PII, jailbreak, moderation, audit. Plumbed into your existing security/compliance pipeline.

  4. Ship

    4–12 weeks

    Biweekly release with eval scores trending. Regression beyond threshold blocks the deploy.

04 · How to engage

Four shapes — start with a readiness review.

Same pod, four contracting shapes. AI buyers who skip the readiness review tend to over-pay for capability they don't have an eval for.

  1. 3 weeks · fixed

    Architecture review

    A written, fixed-scope assessment with priced recommendations. Zero pressure to engage further — most buyers start here.

    Fits when

    You need a second pair of eyes before a migration, refactor, or vendor swap.

  2. 8 – 24 weeks

    Project-based

    A defined deliverable, a fixed timeline, a quoted price. We ship, hand off, and stay on call through stabilization.

    Fits when

    The scope is clear and the date matters more than ongoing capacity.

  3. Annual retainer

    Managed operations

    A named pod, a unified SLA, and a monthly executive review. The team you'd hire if you weren't trying to stay lean.

    Fits when

    Multiple workstreams, a roadmap longer than a year, no time to coordinate vendors.

  4. Long-term placement

    Embedded engineer

    One senior engineer (SRE, platform, ML, security) placed inside your team for 6+ months, accountable to your manager — backed by ours.

    Fits when

    You have leadership and tooling, but a specific seat is empty and contracting cycles are too slow.

Indicative ranges in the calculator

Open the calculator

05 · FAQ

Four questions AI buyers ask first.

Which model do you use?

Whichever the eval picks. We default to a small model with a big-model fallback for hard queries — saves 60–80% on tokens. Vendor choice is documented in SOW.

Self-hosted or API-based?

Either. Self-hosted via vLLM or Ollama on your infra; API-based via OpenAI/Anthropic/Together. Compliance constraints decide.

What about prompt injection and jailbreaks?

The guardrails kit (deliverable above) handles input sanitization, output moderation, and an audit log. Not perfect, but it's the actual surface — not a vendor's marketing slide.

Smallest engagement?

A 3-week AI readiness review. Use-case audit, eval feasibility read, cost projection, guardrails gap analysis.

06 · Pairs with

AI ships best when grounded in real product surfaces.

07 · Engage

Bring us a use case, an eval question, or an OpenAI invoice that's getting weird.

One delivery lead replies within a business day. Readiness reviews are fixed-scope.