criticalsenior2023

Samsung Internal Tools

The Proprietary Code Leak

Engineers used an external AI to fix bugs. The fix worked. The source code was now in someone else's training data.

The Incident

In early 2023, Samsung semiconductor engineers were given approval to use ChatGPT to assist with their work. Within weeks, three separate incidents occurred: an engineer pasted proprietary source code into ChatGPT to ask for optimization suggestions; a second engineer submitted internal meeting notes for summarization; a third uploaded test sequences for a new chip. All of this data was transmitted to OpenAI's servers, potentially incorporated into future model training. Samsung had not conducted a data governance review before enabling the tool. The company banned ChatGPT internally within weeks and began building a private internal LLM — at significant cost and delay.

Evidence from the Scene

  • Engineers were given AI tool access with no data classification guidelines
  • Three separate data incidents occurred within weeks of access being granted
  • The source code pasted was for an unreleased semiconductor product
  • Samsung had no technical controls preventing IP from being submitted to external AI
  • The company discovered the incidents through internal monitoring, not external disclosure
  • Cost of the internal LLM replacement exceeded initial productivity gain estimates

The Suspects

2 of these are the real root causes. The others are plausible-sounding distractors.

No data governance policy before enabling external AI tools

No technical controls (DLP) preventing IP submission to external services

Individual engineers were negligent in their data handling

ChatGPT had a security vulnerability that exposed the data

The Verdict

Real Root Causes

  • No data governance policy before enabling external AI tools

    AI tools were enabled without a data classification framework. Engineers weren't told what could and couldn't be submitted to external AI services. A simple policy — 'do not submit code, internal documents, or unreleased product data to external AI' — would have prevented all three incidents.

  • No technical controls (DLP) preventing IP submission to external services

    Data Loss Prevention (DLP) tools can detect and block transmission of code patterns, internal identifiers, or classified document formats to external services. Samsung had no such controls in place for AI tool usage.

Plausible But Wrong

  • Individual engineers were negligent in their data handling

    Engineers used the tool in the most natural way — submitting the code they were working on. Without policy or training, this was a predictable behavior. Attributing this to individual negligence ignores the systemic failure.

  • ChatGPT had a security vulnerability that exposed the data

    There was no security breach of OpenAI's systems. The data was submitted voluntarily through the normal product interface. The risk is the standard terms-of-service and training data inclusion terms of external AI services, not a vulnerability.

Summary

Samsung enabled external AI tools without data governance, without data classification training, and without technical controls. Engineers did exactly what engineers do — used the tool to solve the problem in front of them. The failure was organizational, not individual.

The Real Decision That Caused This

The real decision failure was treating AI tool adoption as a productivity decision rather than a data governance decision. Before enabling any external AI tool, organizations need: a data classification policy specifying what can be submitted, technical controls enforcing that policy, and engineer training on the specific risks of AI data handling.

Lesson Hint

Chapter 8 (Security & Privacy) covers data classification and the Android Keystore. Chapter 13 (Engineering with AI Agents) covers the organizational governance required before enabling AI tools.

Want to test yourself before reading the verdict?

Open Interactive Case in Autopsy Lab