Behavioral Scenario #21
The AI Tooling Standard
“Half your team is using GitHub Copilot. The other half thinks it's dangerous. Leadership wants a policy by Friday.”
The Situation
You're a Staff Android engineer at a 200-person company. Copilot was approved for use 3 months ago but with no formal guidelines. You've noticed output quality varies wildly — one senior engineer ships 3x faster with it, a mid-level engineer shipped two subtle race conditions last sprint that both came from accepted Copilot suggestions. Your manager asks you to draft the team's AI tooling standard before a company-wide policy is set.
Context
- Copilot has been in use for 3 months with no formal guidelines — rapid adoption without guardrails
- Two race conditions last sprint were traced directly to accepted AI suggestions
- Productivity gains are real but inconsistent across engineers and task types
- Leadership wants a policy finalized by Friday to inform a company-wide rollout
- Team skeptics feel their concerns about code quality and IP haven't been heard
- No precedent exists in the org — this is the first tooling standard of its kind
The Question
“How do you approach establishing standards for a new technology your team is already using unevenly?”
Response Options
One of these is the strongest response. The others reflect common approaches with real trade-offs.
I recommended banning AI tools across the team until a formal security and quality vetting process is completed by the platform team.
I built a trust matrix documenting which task types are safe to delegate to AI with minimal review (boilerplate, test scaffolding, documentation) versus which require mandatory secondary review (payment logic, concurrency, security-sensitive code). I attached a review checklist for AI-generated code in high-risk categories and proposed a 4-week measurement pilot before the policy was finalized.
I left it to individual engineers' discretion — they know their own context and risk tolerance better than any top-down policy could.
I mandated Copilot for all engineers to normalize usage and eliminate the friction caused by the uneven adoption.
The Debrief
Why the Best Response Works
Answer B works because a trust matrix matches scrutiny to risk — which is exactly what the race condition data demands. High-risk categories (concurrency, payments) need heightened review. Low-risk categories (boilerplate, scaffolding) don't. Blanket bans and blanket adoption both fail to make this distinction. The 4-week pilot prevents premature policy — the right standard follows data, not deadlines.
What to Avoid
Banning AI tools signals fear rather than judgment — and ignores real productivity gains. Mandating them ignores real quality risks. Both bypass the core question: which uses of AI are safe for this team in this codebase? The trust matrix is the answer to that question.
What the Interviewer Is Probing
The interviewer is evaluating whether you can set standards that hold without you in the room — and whether you treat policy as a risk management tool rather than a political statement. The trust matrix and measurement pilot are the signals that distinguish Staff-level policy thinking from reactive rule-making.
SOAR Structure
**Situation:** 200-person company, 3 months of unguided Copilot use, two race conditions in a single sprint traced to AI-generated code, company-wide policy due Friday. **Obstacle:** No precedent in the org, team split between adopters and skeptics, real productivity gains and real quality risks both present. **Action:** Built a trust matrix mapping task types to review intensity, created a review checklist for high-risk AI-generated code, proposed a 4-week measurement pilot before finalization. **Result:** Trust matrix adopted as the company-wide AI tooling standard; race condition rate dropped to zero in the next 6 sprints; skeptic concerns formally addressed through the checklist.
The Learning Arc
"I realized the goal wasn't to control the tool — it was to make the risk visible so engineers could make informed decisions. Once I framed it as a risk matrix instead of a ban or a mandate, the whole team could see what we were actually trying to solve."
IC Level Calibration
Identify the quality risk from unguided AI usage, propose a review checklist for AI-generated code, and flag the race condition pattern to the team.
Write the trust matrix that matches review intensity to task risk category, establish measurement criteria before mandating anything, and create adoption norms that hold without ongoing enforcement — so the standard survives after you stop watching.
Connect the AI tooling policy to the org's broader engineering principles and make it self-updating: define the criteria by which the trust matrix itself should be revised as the technology and team's experience mature.
Company Calibration
Code review culture: trust is proportionate to review rigor
Stripe
RFC comment period: decisions with broad impact need structured input
Amazon
LP: Insist on the Highest Standards — operational readiness before scaling
Meta
Move Fast with Stable Infra: velocity requires guardrails, not restrictions
Want to pick your response and see the full analysis?
Practice This Scenario Interactively