Behavioral Scenario #25
The AI Portfolio Candidate
“The candidate built an impressive Android app entirely with AI assistance. You're not sure they understand any of it.”
The Situation
You're interviewing a senior Android engineer candidate. Their portfolio app is technically impressive — clean architecture, Compose UI, full offline sync, well-tested. In the screen call, they disclosed they used Claude and Copilot extensively throughout. In the technical interview, they struggle to explain key decisions in the codebase: why they chose a particular caching strategy, how the sync conflict resolution works, what the StateFlow vs SharedFlow tradeoff was. The hiring committee is split.
Context
- The portfolio output is genuinely impressive — the architecture is sound and the tests are meaningful
- The candidate disclosed AI use proactively and transparently in the screen call
- In the technical interview, they struggled to explain their own key architectural decisions
- The hiring committee is split 50/50 on whether to advance
- The role requires debugging production issues under pressure — judgment under uncertainty is essential
- This is a senior-level hire — the expectation is that they own and explain their architectural decisions
The Question
“Tell me about a time you had to make a nuanced hiring or evaluation decision.”
Response Options
One of these is the strongest response. The others reflect common approaches with real trade-offs.
I recommended rejection — the portfolio output is impressive but if they can't explain their own architectural decisions, they didn't really build it. The portfolio represents AI output, not their judgment.
I proposed a focused follow-up session: give the candidate a novel architectural problem they've never seen — similar complexity to the offline sync in their portfolio but in a different domain — and evaluate their live reasoning process. I'm not testing whether they remember their past decisions; I'm testing whether they have the judgment to make new ones. The follow-up would answer the committee's question definitively.
I recommended hiring — the portfolio proves they can deliver high-quality output with modern tools, and the ability to use AI effectively is an increasingly important engineering skill.
I asked the candidate to rebuild a portion of the app's sync logic in the interview without AI assistance, to see if they could reproduce the quality of the AI-generated output.
The Debrief
Why the Best Response Works
Answer B works because it separates the right question from the wrong format. The committee is asking 'do they have judgment?' — but the technical interview asked 'do you remember why you made past decisions?' These are different questions. A live architectural problem tests the right thing: can they reason through a novel problem they've never seen? That's what production debugging requires.
What to Avoid
Rejecting without deeper probing and hiring on output alone are both premature. The portfolio can't answer the judgment question — it only answers the output question. The follow-up session answers the judgment question, which is the one that matters for a senior-level role where the engineer is expected to own and explain their architectural decisions in production and in code review.
What the Interviewer Is Probing
The interviewer is evaluating whether you know what you're evaluating. Hiring managers who probe for judgment rather than output quality demonstrate Staff-level hiring rigor — they understand that impressive portfolios built with AI assistance are a new class of evaluation artifact that requires a new class of evaluation method.
SOAR Structure
**Situation:** Senior Android candidate with impressive AI-assisted portfolio; struggled to explain key architectural decisions in technical interview; hiring committee split 50/50. **Obstacle:** Committee debating based on incomplete evidence — portfolio quality vs. explanation failure; premature rejection or hire both had significant downside risk. **Action:** Proposed focused follow-up session with a novel architectural problem — not portfolio recall, but live judgment in a new domain. **Result:** Candidate performed strongly in the follow-up — demonstrated clear reasoning about tradeoffs in a domain they hadn't prepared for. Hired; became one of the strongest architectural voices on the team.
The Learning Arc
"This taught me that the question isn't whether someone used AI — it's whether they have judgment. The portfolio told me they could use AI to produce good output. The follow-up told me they could think. That's the thing you can't outsource."
IC Level Calibration
Evaluate the portfolio output quality and flag the judgment gap to the committee — raise the concern about unexplained decisions as a risk signal and ask whether the team has a way to probe deeper.
Evaluate the judgment behind the output, not the output itself. Propose the novel architectural problem follow-up as the mechanism to answer the committee's question definitively — separate the evaluation question from the interview format limitation.
Evaluate whether this person raises or lowers the team's average judgment. Ask: if they join, will junior engineers learn from their reasoning or just their output? A senior engineer who can't explain their architectural decisions can't teach them either — and that's the compounding cost.
Company Calibration
Amazon
Bar Raiser: evaluate for the bar you need, not the output you see
Structured interviews: what you're testing matters more than what you observe
Stripe
Strong opinions loosely held: probe reasoning, not conclusions
Netflix
Keeper test: would you fight to keep this person on your team?
Want to pick your response and see the full analysis?
Practice This Scenario Interactively