AI Security and Privacy: How to Use AI Without Leaking Data
AI Security and Privacy: How to Use AI Without Leaking Data
Every prompt you type into an AI chatbot is data you are sending to someone else’s server. Every document you upload, every code snippet you paste, every customer name you mention — all of it travels over the internet to a data center where it is processed and, depending on the service and your plan, potentially stored and used to train future models.
In 2026, with 88% of enterprises using AI in at least one business function, the question is not whether your organization will use AI. It is whether you will use it safely. This guide covers what actually happens to your data when you use AI services, how to protect sensitive information, and what policies and tools minimize risk without blocking productivity.
Our AI security and privacy guidance draws on published vendor documentation, regulatory frameworks, and enterprise security best practices. Specific policies change frequently — always verify current terms directly with providers.
Table of Contents
- Key Takeaways
- What Happens to Your Data When You Use AI
- Data Policies by Provider
- Consumer vs Enterprise Data Handling
- How to Opt Out of Training Data Collection
- Zero Trust Architecture for AI
- Practical Security Measures
- AI and Regulatory Compliance
- Local AI: The Maximum Privacy Option
- Building an AI Acceptable Use Policy
- What Changed in 2026
- Common Mistakes in AI Security
- FAQ
- Sources
- Related Articles
Key Takeaways
- Free tiers of most AI services use your data for model training by default. Paid plans typically offer opt-out options or no-training guarantees.
- Enterprise plans are fundamentally different from consumer plans. They include contractual Data Processing Addendums (DPAs) that legally prohibit training on your data.
- Local AI models (Llama, Mistral) send zero data to external servers but require technical setup and capable hardware.
- A clear AI acceptable use policy is essential. Without one, employees will use AI tools anyway — just without guardrails.
- The regulatory landscape is tightening. Multiple US states and the EU AI Act now require disclosures about AI data handling.
What Happens to Your Data When You Use AI
When you send a prompt to ChatGPT, Claude, Gemini, or any cloud AI service, the following typically occurs:
The Data Journey
-
Transmission. Your input travels encrypted (TLS) from your device to the provider’s servers. All major providers use encryption in transit.
-
Processing. The AI model processes your input to generate a response. This happens on the provider’s servers (or their cloud infrastructure partner).
-
Temporary storage. Your input and the model’s output are stored temporarily, at minimum for the duration of your conversation session.
-
Logging. Most providers log interactions for abuse detection, safety monitoring, and service improvement. Retention periods vary from 30 days to indefinitely.
-
Training (the critical variable). This is where providers differ significantly. Some use your interactions to train future model versions. Others do not. The distinction depends on your plan type and opt-out settings.
What Data Is at Risk
The most common types of sensitive data users inadvertently share with AI services:
- Source code — Pasting proprietary code for debugging or review
- Customer data — Names, emails, account details included in prompts
- Financial information — Revenue numbers, pricing strategies, financial projections
- Legal documents — Contracts, agreements, compliance reports
- Internal communications — Strategy memos, performance reviews, HR discussions
- Credentials — API keys, passwords, access tokens pasted into prompts
- Intellectual property — Product plans, research findings, trade secrets
Data Policies by Provider
OpenAI (ChatGPT)
Consumer plans (Free, Plus, Pro):
- By default, OpenAI uses your inputs and outputs to train and improve models
- You can opt out via Settings > Data Controls > “Improve the model for everyone” toggle
- Opting out disables conversation history (your chats will not be saved)
- Even with opt-out, data may be retained for up to 30 days for abuse monitoring
Team and Enterprise plans:
- Data is not used for model training by default
- Enterprise accounts operate under contractual DPAs
- Data is encrypted at rest (AES-256) in addition to encryption in transit
- SOC 2 Type II compliance
- Custom data retention policies available
Anthropic (Claude)
Consumer plans (Free, Pro):
- Anthropic’s usage policy states that free-tier conversations may be used for safety research and model improvement
- Pro plan users can opt out of training data contribution
- Anthropic has historically been more conservative with data usage than competitors
Team and Enterprise plans:
- Contractual guarantee that data is not used for training
- Data Processing Addendums available
- SOC 2 Type II compliance
- Custom data retention
Google (Gemini)
Consumer plans:
- Google AI Pro conversations may be reviewed by human raters for quality improvement
- Users can manage data through Google’s My Activity controls
- Workspace data (when using Gemini in Docs, Gmail, etc.) is subject to the Google Workspace DPA for paid accounts
Enterprise (Google Cloud / Vertex AI):
- Customer data is not used for model training
- Covered by Google Cloud’s enterprise DPA
- Data residency options available (specific regions)
- FedRAMP and other compliance certifications
GitHub Copilot
Individual plans:
- As of March 25, 2026, GitHub announced that starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train AI models unless users explicitly opt out
- Opt out via Settings > Copilot > Features > “Allow GitHub to use my data for AI model training”
Business and Enterprise plans:
- Code snippets are not retained after generating suggestions
- No training on your code by default
- Telemetry data is minimized
- Enterprise adds SAML SSO and additional security controls
Consumer vs Enterprise Data Handling
The difference between consumer and enterprise AI plans is not just a matter of features or usage limits. It is a legal and architectural difference.
Consumer Plans
| Aspect | Typical Consumer Plan |
|---|---|
| Data used for training | Yes, by default (opt-out available) |
| Contractual protections | Terms of Service only |
| Data retention | Varies, often indefinite |
| Encryption at rest | Varies by provider |
| Compliance certifications | Limited |
| Admin controls | Individual settings only |
| Data residency | No geographic guarantees |
Enterprise Plans
| Aspect | Typical Enterprise Plan |
|---|---|
| Data used for training | No (contractually prohibited) |
| Contractual protections | Custom DPA with legal liability |
| Data retention | Configurable, with deletion guarantees |
| Encryption at rest | AES-256 standard |
| Compliance certifications | SOC 2, ISO 27001, HIPAA (varies) |
| Admin controls | Organization-wide policies |
| Data residency | Region-specific options |
The core distinction: consumer plans make promises in their terms of service. Enterprise plans make legally binding commitments in negotiated contracts with financial penalties for violations.
How to Opt Out of Training Data Collection
ChatGPT (OpenAI)
- Open ChatGPT Settings
- Navigate to Data Controls
- Disable “Improve the model for everyone”
- Note: This also disables conversation history
Alternatively, submit an opt-out request via OpenAI’s privacy portal or use the API (API usage is not used for training by default).
Claude (Anthropic)
- Review Anthropic’s current data usage policy at https://www.anthropic.com/privacy
- Pro subscribers can manage data preferences through account settings
- API usage follows separate terms — data sent via the API is not used for training
Gemini (Google)
- Visit myactivity.google.com
- Navigate to Gemini Apps activity
- Turn off “Gemini Apps Activity” to prevent future conversations from being saved and reviewed
- Delete existing stored activity if desired
GitHub Copilot
- Go to github.com/settings/copilot/features
- Under the Privacy heading, disable “Allow GitHub to use my data for AI model training”
- This must be done before April 24, 2026, when the new default takes effect
General Best Practices
- Review privacy settings immediately after creating an account on any AI service
- Re-check settings after major service updates (providers sometimes reset or add new defaults)
- Use API access when available — most providers exclude API usage from training data by default
- Document your opt-out choices for compliance purposes
Zero Trust Architecture for AI
Microsoft announced its Zero Trust for AI reference architecture in March 2026, and the principles apply broadly to any organization using AI services.
Zero Trust Principles Applied to AI
Never trust, always verify. Do not assume that an AI service is safe because it is from a major provider. Verify data handling policies, confirm encryption standards, and audit actual behavior.
Least privilege access. Not every employee needs access to every AI tool. A marketing team member does not need a coding AI agent with repository access. Match AI tool access to job function.
Assume breach. Design your AI usage policies as if a provider’s data will eventually be compromised. Never share credentials, API keys, or other authentication tokens with AI services.
Practical Zero Trust for AI Usage
-
Network segmentation. Route AI API traffic through monitored network paths. Use API gateways to log all requests and responses.
-
Data classification. Label data by sensitivity level. Define which sensitivity levels can be shared with which AI services. Block the highest sensitivity tiers from all external AI tools.
-
Identity and access management. Use SSO and role-based access controls for AI platform accounts. Audit who is using which tools and how frequently.
-
Continuous monitoring. Log AI interactions for security review. Automated tools can scan prompts for sensitive data patterns (credit card numbers, social security numbers, API keys) before they are sent to external services.
-
Incident response planning. Have a plan for what happens if sensitive data is inadvertently shared with an AI service. Know the provider’s data deletion procedures and response timelines.
Practical Security Measures
For Individual Users
- Never paste credentials. API keys, passwords, tokens, and SSH keys should never appear in AI prompts. Use environment variable names or placeholder values instead.
- Anonymize before sharing. Replace real customer names, email addresses, and account numbers with generic placeholders before pasting into AI tools.
- Use paid plans. The $20/month investment for Claude Pro or ChatGPT Plus is worth it for the improved data handling and opt-out options alone.
- Check before uploading. Before uploading a document to any AI service, confirm it does not contain sensitive data you would not want stored on external servers.
For Teams and Organizations
- Deploy a DLP (Data Loss Prevention) gateway. Tools like Nightfall AI, Microsoft Purview, and custom solutions can scan outbound AI prompts for sensitive data and block or redact before transmission.
- Create approved tool lists. Designate which AI tools are approved for work use and which are prohibited. Update the list as privacy policies change.
- Provide sanctioned alternatives. If you block employees from using consumer ChatGPT, give them an approved alternative (ChatGPT Enterprise, Claude Team, or a self-hosted solution). Banning AI without providing alternatives drives usage underground.
- Implement prompt templates. Pre-built templates with placeholders for sensitive fields help employees use AI effectively without inadvertently sharing confidential data.
- Regular training. Quarterly reminders about AI data hygiene. Include examples of what not to share and why.
For Developers
- Use API keys with minimum scope. Do not share admin-level API keys with AI coding assistants. Create read-only or limited-scope keys where possible.
- Review AI-generated code for secrets. AI coding tools sometimes hallucinate plausible-looking API keys or credentials. Scan generated code before committing.
- Exclude sensitive files from AI context. Use .gitignore-style exclusion rules in AI coding tools to prevent .env files, credential stores, and private keys from being sent to AI services.
- Consider local models for sensitive codebases. Llama 3.3 running locally via Ollama handles many coding tasks without sending any code to external servers.
AI and Regulatory Compliance
United States
In 2026, AI regulation in the US is state-driven:
- California: Requirements for AI transparency, including disclosure of AI-generated content and training data sources.
- Texas: AI governance statutes effective in early 2026 requiring algorithmic logic disclosures.
- Illinois: AI use in employment decisions requires disclosure and certain consent mechanisms.
- Colorado: AI transparency requirements for high-risk AI systems, including insurance and hiring applications.
There is no comprehensive federal AI privacy law as of March 2026, though multiple bills are in various stages of legislative progress.
European Union
The EU AI Act, in force since mid-2025, creates a risk-based framework:
- Unacceptable risk — Banned applications (social scoring, real-time biometric surveillance in public spaces with limited exceptions)
- High risk — Strict requirements for AI in healthcare, education, employment, law enforcement (conformity assessments, documentation, human oversight)
- Limited risk — Transparency obligations (chatbots must disclose they are AI)
- Minimal risk — No specific requirements
Organizations serving EU customers must comply regardless of where they are headquartered.
Industry-Specific Requirements
- Healthcare (HIPAA): AI tools processing protected health information (PHI) must be covered by Business Associate Agreements (BAAs). Standard consumer AI plans do not satisfy this requirement.
- Finance (SOX, PCI-DSS): Financial data processed by AI must meet existing audit and security standards. Custom deployments with logging and access controls are typically required.
- Education (FERPA): Student data shared with AI services must comply with FERPA requirements. Enterprise plans with appropriate DPAs are necessary.
Local AI: The Maximum Privacy Option
For organizations where no data can leave the network, local AI deployment offers complete control.
What Local AI Means
Running an AI model on your own hardware — a laptop, a workstation, or an on-premises server. The model operates entirely on your infrastructure. No data is transmitted to any external service.
Current Local Options
Llama 3.3 (Meta) — The most capable open-weight model family. The 70B parameter version runs on consumer hardware with quantization (a process that reduces the model’s precision to fit in less memory). Requires a GPU with 32GB+ VRAM for full performance.
Mistral Large — Competitive European alternative. Available for local deployment through various frameworks.
Ollama — A tool that simplifies running models locally. Install, pull a model, and start chatting in minutes. Supports Llama, Mistral, and dozens of other models.
LM Studio — Desktop application for running local models with a graphical interface. No command-line knowledge needed.
Trade-Offs
| Factor | Cloud AI (ChatGPT, Claude) | Local AI (Llama, Mistral) |
|---|---|---|
| Output quality | Frontier-level | Good but not frontier |
| Privacy | Depends on plan and settings | Complete — no data leaves device |
| Setup effort | Zero | Moderate (install tools, download models) |
| Cost | $20+/month | Free (software), hardware cost |
| Speed | Fast (server GPUs) | Depends on your hardware |
| Context window | Up to 1M tokens | Typically 8K–128K tokens |
| Multimodal | Full (text, image, code, voice) | Limited (primarily text and code) |
When to Use Local AI
- Processing data that cannot leave your network under any circumstances (classified, HIPAA, trade secrets)
- Development and testing where you need AI assistance but cannot risk code exposure
- Organizations in regulated industries where cloud AI compliance is too complex or expensive
- Privacy-conscious individuals who want zero data sharing on principle
Building an AI Acceptable Use Policy
Every organization using AI tools needs a written policy. Without one, employees will make their own decisions — and those decisions will not always align with your security requirements.
Core Policy Elements
1. Approved tools. List specific AI tools that are approved for work use, along with the plan type (e.g., “ChatGPT Enterprise — approved; ChatGPT Free — prohibited for work data”).
2. Data classification rules. Define what types of data can and cannot be shared with AI services:
- Public data: May be shared with any approved AI tool
- Internal data: May be shared with approved AI tools on enterprise plans only
- Confidential data: May only be processed by locally-deployed AI or approved enterprise tools with DPAs
- Restricted data: Must not be shared with any AI service (credentials, PHI without BAA, classified information)
3. Required safeguards. Mandate specific behaviors:
- Anonymize customer data before sharing
- Never paste credentials or API keys
- Use enterprise plan settings, not personal accounts
- Review AI outputs before external use
4. Reporting procedures. Define what to do if sensitive data is accidentally shared with an AI service:
- Who to notify (security team, compliance officer)
- What information to document (what was shared, which service, when)
- Provider data deletion request procedures
5. Review cadence. AI tools and policies change rapidly. Commit to reviewing the policy quarterly and updating when providers change their terms.
What Changed in 2026
Opt-out became opt-in for more services. GitHub’s March 2026 announcement that Copilot will train on user data by default (unless opted out) reflects a broader industry trend. As AI companies need more training data, default settings increasingly favor data collection. User vigilance is more important than ever.
Zero Trust for AI became a formal framework. Microsoft’s March 2026 release of its Zero Trust for AI reference architecture provides enterprises with a structured approach to securing AI usage. A Zero Trust Assessment for AI pillar is expected in summer 2026.
State-level regulation accelerated. With no federal AI privacy law, US states are acting independently. Texas, California, Illinois, and Colorado all have AI-specific statutes taking effect in the first half of 2026. Compliance requires awareness of which states’ laws apply to your operations.
Enterprise AI governance matured. 68% of privacy professionals have now acquired AI governance responsibilities. AI is no longer just an IT concern — it is a compliance, legal, and risk management issue integrated across business functions.
Context windows created new risk surface. With Claude and Gemini supporting 1M-token context windows, users can now upload entire codebases, complete legal document sets, or years of email history in a single session. This dramatically increases both the utility and the data exposure risk of AI interactions.
Common Mistakes in AI Security
Assuming “don’t use AI” is a workable policy. It is not. Employees will use AI tools regardless of prohibitions. The result is unmonitored, ungoverned AI usage with zero security controls. A better approach is providing approved tools with clear guidelines.
Using personal accounts for work. An employee using their personal ChatGPT Free account for work tasks means your company’s data is being processed under consumer terms with potential training data usage. Provide and mandate team or enterprise accounts.
Trusting opt-out settings without verifying. Opt-out toggles can reset after updates, and the specific data covered by opt-out varies. Periodically verify that your settings are still active and understand exactly what they cover.
Ignoring API key security. Developers regularly paste API keys into AI prompts for debugging help. A single leaked production API key can compromise entire systems. Train developers to use placeholder values and never share real credentials.
Overlooking file uploads. Uploading a document to an AI service shares its entire contents. A PDF that contains a single paragraph of proprietary data alongside public information still exposes that proprietary data. Review documents before uploading.
Failing to update policies when providers change terms. AI service terms change frequently. GitHub’s March 2026 policy change is one example. Assign someone to monitor provider policy updates and adjust your internal policies accordingly.
FAQ
Does ChatGPT use my data for training?
On consumer plans (Free, Plus, Pro), yes, by default. You can opt out via Settings > Data Controls > “Improve the model for everyone,” but this also disables conversation history. On Team and Enterprise plans, your data is not used for training, backed by contractual DPAs. API usage is not used for training by default, regardless of plan.
Is Claude more private than ChatGPT?
Anthropic has historically been more conservative with data usage. Claude’s policies are generally more restrictive about training data collection. However, the safest approach with any cloud AI service is to use enterprise-tier plans with contractual protections rather than relying on published privacy policies alone, as policies can change.
Can AI tools leak my source code?
There is no confirmed case of a major provider directly exposing one user’s code to another user. However, if your code is used as training data, patterns and structures from it could influence future model outputs. The practical risk is low but not zero. Use enterprise plans or local models for highly sensitive code.
What should our company’s AI policy include?
At minimum: a list of approved AI tools and plan types, data classification rules defining what can be shared with each tool tier, mandatory anonymization requirements for customer data, a prohibition on sharing credentials, a reporting procedure for accidental data exposure, and a quarterly review schedule.
Is running AI locally really more secure?
For data privacy, yes — unequivocally. Data processed by a local model never leaves your machine. However, local models introduce their own security considerations: you need to keep models and tools updated, secure the hardware itself, and ensure the model weights were downloaded from legitimate sources.
How do I know if my AI usage complies with GDPR?
If you process EU personal data with AI, you need: a lawful basis for processing (typically legitimate interest or consent), a DPA with your AI provider, a record of processing activities that includes AI usage, the ability to respond to data subject access requests (including AI interactions), and compliance with the EU AI Act’s transparency requirements. Consult legal counsel for your specific situation.
What is the biggest AI security risk for businesses in 2026?
Unmonitored employee use of consumer AI tools with sensitive company data. This is not a hypothetical — surveys indicate the majority of knowledge workers use AI tools that their IT departments have not approved. Providing sanctioned alternatives with clear policies is far more effective than attempting to block AI usage entirely.
Sources
- Microsoft Security Blog — Zero Trust for AI (March 2026): https://www.microsoft.com/en-us/security/blog/2026/03/19/new-tools-and-guidance-announcing-zero-trust-for-ai/
- Anthropic Privacy Policy: https://www.anthropic.com/privacy
- OpenAI Enterprise Privacy: https://openai.com/enterprise-privacy
- Google Cloud DPA: https://cloud.google.com/terms/data-processing-addendum
- EU AI Act: https://artificialintelligenceact.eu
- Hyperproof Data Protection Strategies 2026: https://hyperproof.io/resource/data-protection-strategies-for-2026/
- GitHub Copilot Privacy Policy Update (March 2026): https://github.blog
Related Articles
- AI Tools Privacy Security Guide
- AI Safety Debate
- Open Source vs Closed AI
- Best Local AI Models
- Best AI for Local LLM On Device 2026
- Run Llama Locally
- Best AI for Cybersecurity
- Best AI for Compliance
- Best AI for Fraud Detection
- Best AI for Threat Detection
- Best AI for Penetration Testing
- Best AI for Contract Review
- Best AI for Legal Research
- AI Costs Explained
- AI API Pricing Comparison
- AI for Business
- AI Tools for Small Business
- Best AI for Document Management
- Best AI for Video Surveillance
- Best AI for Background Checks
- Llama vs Mistral
- Complete Guide to AI Models