AI Data Exposure Banner

Introduction

When employees began experimenting with ChatGPT and similar tools in 2023–2024, the excitement was palpable. Productivity skyrocketed, content creation accelerated, and previously complex workflows became a matter of simple prompts. But behind the efficiency lay a critical blind spot — every prompt was a potential data leak.

By mid‑2025, more than a dozen documented incidents worldwide had proven the danger: confidential reports, personal information, and even internal source code were being exposed through conversational AI tools. The most recent and publicised example? The NSW Reconstruction Authority (NSWRA) incident, where sensitive internal content reportedly found its way into an AI training dataset.

This isn’t a failure of the model — it’s a failure of human caution. Let’s unpack what happened, why it’s more common than you think, and how both individuals and organisations can stop feeding the machine with what was never meant to leave their walls.


1. The NSW Reconstruction Authority Case — A Lesson in Oversharing

In October 2025, reports surfaced that the NSW Reconstruction Authority had inadvertently exposed internal communications and documentation through interactions with a generative AI platform. While details are still emerging, the core issue was clear: employees entered proprietary or sensitive data into public AI interfaces without understanding how that information could be retained, processed, or reused.

This wasn’t malicious. It was convenience.

Much like many public sector bodies worldwide, NSWRA was experimenting with AI for drafting reports, summarising long documents, and creating communication templates. But in doing so, the boundary between internal and external blurred. Once information is entered into a third-party AI model — unless explicitly isolated — it effectively leaves your network perimeter.

The Core Mistake

  • Assuming private prompts stay private.
  • Using public AI tools for internal work.
  • No clear data‑handling policy or guardrails.

It’s a modern equivalent of emailing a confidential report to a public forum by accident — only this time, the system remembers.


2. Why AI Leaks Keep Happening

You don’t need an insider threat or hacker to cause a data breach anymore. A well‑meaning employee and a helpful chatbot are enough. Across industries, the same behavioural patterns repeat:

a. Overtrust in “Private” AI

People assume “if I’m logged in” or “if it’s a paid plan,” then their prompts are protected. But many generative AI systems retain query data for model improvement, even anonymised, unless you explicitly disable it.

b. Shadow AI Adoption

Employees using unapproved AI tools outside official governance channels. It’s the new “shadow IT,” but harder to detect — no software installs, just browser access.

c. Lack of Clarity on Data Residency

Most staff have no idea where their AI inputs are stored. Are they in Australia? The US? A European data centre governed by GDPR? The answer often changes between vendors and service tiers.

d. Absence of Prompt Hygiene

Even trained professionals underestimate what counts as sensitive. Case notes, legal summaries, customer names, or code snippets — all of it can be considered confidential when processed externally.

e. Imitation of Industry Peers

When others start using ChatGPT or Gemini for business operations, there’s social pressure to follow. Unfortunately, normalisation doesn’t equal safety.


3. Real‑World AI Data Exposure Incidents

AI data leakage isn’t hypothetical — it’s been headline news multiple times since 2023.

Samsung (2023)

Three separate employees at Samsung’s semiconductor division uploaded source code and internal meeting notes to ChatGPT for debugging help. Within days, Samsung banned all public generative AI use, citing irretrievable data exposure.

British Cyber Consultancy (2024)

An analyst accidentally pasted a client threat report draft — including IP addresses and red‑team findings — into a ChatGPT prompt. The model used was integrated with third‑party plugins, meaning the data potentially traversed multiple external APIs.

Global Law Firm (2024)

Paralegals began using AI to summarise discovery documents. A plugin integration inadvertently exposed privileged case data to an online dataset. Regulatory inquiries followed under GDPR.

Financial Institution, Singapore (2025)

Staff uploaded internal risk models for AI‑assisted report generation. The vendor later confirmed that prompts had been used for “model improvement.” The regulator classified it as an unauthorised disclosure under data protection law.

Each case follows the same pattern: no technical hack, just human haste.


4. The Human Psychology Behind AI Oversharing

Why are skilled professionals — engineers, lawyers, even cybersecurity teams — still making these mistakes? Because AI feels safe. It’s not an email to a stranger; it’s a chat window. It mimics human rapport, lowering psychological barriers.

Illusion of Anonymity

People believe, “It’s just text,” ignoring that metadata, writing style, and embedded details can identify individuals or organisations.

The Efficiency Trap

Deadlines, burnout, and cost‑cutting pressures make AI the perfect shortcut. When productivity is rewarded more than caution, risk becomes invisible.

Anthropomorphism of AI

We subconsciously treat chatbots as assistants, not systems. That false sense of intimacy leads to over‑disclosure — the “talking to a colleague” effect.

The Knowledge Gap

Most staff don’t fully grasp how AI models work — how prompts can be stored, tokenised, and retrained. Without understanding, it’s impossible to apply caution effectively.


5. How AI Systems Actually Handle Your Data

To understand the risk, you need to know what happens after you hit “Enter.”

When you type a prompt into a generative AI interface:

  1. It’s transmitted securely (usually via HTTPS) to the vendor’s servers.
  2. The model processes your prompt and generates a response.
  3. Depending on the vendor, the prompt and output may be logged for system performance, debugging, or training.
  4. If the model supports plug‑ins or external APIs, your input may traverse additional services beyond the main provider.
  5. Some vendors offer “no‑retention” modes, but these must be manually enabled or part of enterprise agreements.

In short: unless you’re using an isolated, enterprise‑hosted LLM with contractual data protection clauses, you’re not in control of what happens next.


Australia’s privacy framework — particularly under the Privacy Act 1988 and upcoming Digital Platform Services (DPDP) reforms — places clear obligations on organisations handling personal data. Feeding that data into a public AI without consent or safeguards could constitute a breach.

Internationally, the EU AI Act, NIST AI Risk Management Framework, and UK ICO guidance all stress transparency, accountability, and human oversight in AI usage. But regulation lags behind adoption. Many entities using GPT‑based tools today have no AI‑specific data governance in place.

For public sector agencies like NSWRA, this risk is amplified: data isn’t just private — it’s public trust material. A single AI mishandling incident can undermine years of confidence in digital transformation efforts.


7. The Corporate AI Supply Chain — Hidden Dependencies

Even if an organisation uses an “AI‑powered productivity platform” rather than ChatGPT directly, they may still be relying on the same backend models via API. Vendors often white‑label or integrate OpenAI, Anthropic, or Google models into their products. That means your data’s journey is longer than you think.

A common chain looks like this:

Employee → SaaS platform → AI API (OpenAI, Anthropic, Google) → Logging/Monitoring → Storage/Analytics

Every link adds a potential exposure point.


8. Securing AI Usage — Practical Defenses

Generative AI can be used safely — but only with deliberate architecture, governance, and user training. Here’s how organisations can build those guardrails.

1. Implement an AI Use Policy

Define what types of data can and cannot be entered into AI systems. Include examples of sensitive inputs and approved tools.

2. Deploy Enterprise AI Solutions

Use self‑hosted or enterprise‑tier models (such as Azure OpenAI Service) that guarantee data isolation and no‑retention policies.

3. Enforce Data Classification Awareness

Label internal data as “Confidential,” “Internal Use,” or “Public.” Integrate this awareness into employee training and automated DLP tools.

4. Monitor for Shadow AI

Use network monitoring and cloud access security brokers (CASB) to detect unapproved AI tool usage.

5. Audit and Log AI Interactions

For regulated industries, maintain audit trails of prompts and outputs for accountability and investigation.

6. Embed “Prompt Hygiene” Training

Teach employees how to phrase prompts safely — replacing real identifiers with placeholders. Example:

Instead of: “Summarise this client complaint from John Doe at XYZ Ltd.”
Use: “Summarise this customer complaint from [Client] regarding [Product].”

7. Introduce AI Gateways

Deploy internal proxy layers that filter prompts before they reach the public model, stripping PII or sensitive data automatically.

8. Review Vendor Contracts

Ensure every AI service has data‑processing agreements (DPAs) and clear retention limits.


9. Building a Culture of Responsible AI Use

Technology controls are only as strong as the culture enforcing them. To make responsible AI use sustainable, organisations should:

  • Lead from the top. Executives must model good behaviour and transparency around AI use.
  • Reward security mindfulness. Recognise employees who report misuse or suggest safer practices.
  • Integrate AI ethics into onboarding. Make it part of every employee’s first‑day briefing.
  • Communicate transparently. When incidents happen, disclose them promptly and explain the lessons learned.

Trust in AI begins with trust in leadership.


10. The Future of AI Data Security

As AI becomes embedded in every workflow — from email drafting to strategic decision‑making — the line between local and cloud processing will blur further. But we can already see the next wave of risks emerging:

  • Prompt Injection Attacks: Malicious content embedded in text or websites that hijacks AI behaviour.
  • Model Inversion: Extracting original training data by querying the model.
  • Synthetic Insider Threats: Attackers impersonating employees using AI‑generated communications.
  • Regulatory Backlash: Expect stricter AI compliance audits and disclosure requirements within the next two years.

In short, AI security will soon be as critical as network security — and far more human‑centric.


Key Takeaways

  • Every prompt is a potential data disclosure.
  • Public AI tools are not designed for confidential use.
  • Enterprise isolation, DLP, and strong policies are essential.
  • “Shadow AI” is the new shadow IT — visibility and culture are your first defenses.
  • The goal isn’t to block AI — it’s to use it responsibly.

AI doesn’t leak secrets. People do.
The question is whether your systems — and your culture — are built to stop them.