AI code assistants are now standard development tools. GitHub Copilot has over 1.3 million paid users. Cursor, Codeium, and Tabnine each have substantial adoption in engineering teams. Most engineering teams deployed these tools before any governance framework existed for them.
Three risks get underestimated in that adoption pattern: what happens to your source code when it's sent to the AI vendor, whether AI-generated code creates IP or licensing exposure, and how code assistants affect compliance for regulated systems. This guide covers each.
Risk 1: Source Code Exposure to AI Vendors
AI code assistants work by sending your code as context to an AI model. The model uses that context — your code, your variable names, your architecture patterns, your comments — to generate suggestions. What happens to that code after inference depends on the vendor's data retention and training policies.
GitHub Copilot (Individual plan): Code snippets may be used to improve GitHub's products by default. Telemetry settings can reduce this but not eliminate retention entirely.
GitHub Copilot (Business and Enterprise plans): GitHub commits to not training models on your code. This is in the subscription agreement. Copilot Business also offers a Data Excluded feature for organizations that want to exclude specific repositories from Copilot entirely.
Cursor: Code context is sent to AI providers (OpenAI/Anthropic) for inference. Cursor has a Privacy Mode setting that prevents code from being used to train models. Enterprise plans offer stronger isolation.
Codeium: Offers an enterprise self-hosted option that keeps code entirely on your infrastructure — no data sent to Codeium servers.
Tabnine: Offers both SaaS and self-hosted/on-premise options. Enterprise plans use models that run locally or in your cloud.
Governance implication: For any proprietary codebase, personal/free tier plans of AI code assistants are insufficient. The data handling commitments in free tiers are not enterprise-grade. Require Business or Enterprise tier — and document that requirement in your AI acceptable use policy.
Risk 2: IP and Licensing in AI-Generated Code
AI code assistants are trained on public code repositories. Some of that code is under licenses (GPL, LGPL, MPL) that impose conditions on derivative works. If Copilot reproduces GPL-licensed code verbatim in a suggestion, and a developer accepts that suggestion into a commercial codebase, the codebase may be exposed to GPL obligations.
This is a real — not theoretical — risk. GitHub's own research found that Copilot reproduced identifiable strings from public repositories in a measurable percentage of suggestions.
What the risks looks like in practice:
A developer is implementing a specific algorithm. Copilot suggests an implementation. The implementation happens to be identical to a GPL-licensed implementation on GitHub. The developer accepts the suggestion without recognizing the source. The code ships in a commercial product. A license audit surfaces the GPL code.
Controls that reduce this risk:
| Control | How to implement | Risk reduction |
|---|---|---|
| Enable public code duplication detection | GitHub Copilot org settings → "Suggestions matching public code: Block" | Blocks verbatim matches and near-matches |
| Developer awareness training | Train engineers to recognize when Copilot suggestions look "too complete" for novel code | Reduces acceptance of wholesale function copies |
| Code review checklist item | Add "AI-generated code reviewed for licensing risk" to PR template for commercially sensitive components | Creates audit trail |
| Legal counsel review for high-risk areas | For algorithms with competitive value, get legal review of any AI-assisted implementation | Mitigates for highest-value code |
The legal landscape around AI-generated code copyright is still developing. GitHub's Copilot indemnification (available on Enterprise plans) offers some protection against IP claims related to Copilot suggestions. Review the current scope of that indemnification with your counsel.
Risk 3: Regulated System Compliance
When AI code assistants are used in codebases that handle regulated data or systems, compliance obligations apply.
SOC 2: If the codebase touches systems in SOC 2 scope, the AI code assistant is a vendor with access to information about your architecture, data handling, and internal systems. It belongs in your vendor register. The data retention and training policies of the AI vendor affect your supply chain risk assessment.
HIPAA: If any developer is using an AI code assistant while working in a healthcare codebase, and that codebase contains PHI schemas, test data, or connection strings to systems that process PHI, those artifacts may be sent to the AI vendor. Unless that vendor has signed a HIPAA Business Associate Agreement (BAA) covering the code assistant service, this is a compliance gap. GitHub offers a BAA for Copilot Enterprise. Most AI code assistant vendors do not.
PCI DSS: Similar concern. Cardholder data environment (CDE) code should not be sent to AI vendor inference endpoints unless the vendor is in scope for your PCI assessment.
SOC 2 + AI: See the guide on AI tools in SOC 2 programs for the full evidence map.
Settings to Configure Before Deploying at Scale
GitHub Copilot (Business/Enterprise)
Navigate to your GitHub organization settings → Copilot:
| Setting | Recommended configuration |
|---|---|
| Suggestions matching public code | Block (not Allow) |
| Allow GitHub to use my code for training | Disabled |
| Copilot in the CLI | Limit to approved users if CLI access to production is controlled |
| Copilot Data Excluded (Enterprise only) | Add repositories containing secrets, regulated data, or highly proprietary code |
| GitHub Copilot Chat in IDE | Enable only for approved IDEs on managed devices |
Cursor
| Setting | Recommended configuration |
|---|---|
| Privacy Mode | Enable |
| .cursorignore file | Add files containing credentials, secrets, regulated data patterns |
| Model selection | Use models with clearer data handling commitments for sensitive repos |
General (all tools)
- Block code assistants from IDE access to
.env, credential files, and secrets management directories via.gitignoreor.cursorignorepatterns - Use managed device policies to prevent personal free-tier plan usage on work devices
- Require Business/Enterprise tier accounts provisioned through the organization
Acceptable Use Policy: AI Code Assistants
An engineering-specific AI acceptable use policy should address:
Approved tools and tiers
- List approved tools and required subscription tier (personal free plans are not approved for work use)
- Require use of organization-managed accounts, not personal accounts
Prohibited inputs
- Credentials, API keys, secrets — never include in AI context
- Plaintext PII, PHI, cardholder data — prohibited as context
- Proprietary algorithms under active patent review — restrict AI assistance
- Code from third-party systems under NDA — don't paste into AI context
Code review requirements
- All AI-generated code undergoes standard code review (same as human-authored)
- PR template includes AI-assisted code disclosure (optional, but increasingly common for regulated industries)
- For commercially sensitive components: legal review flag when AI-generated code covers a novel algorithm or data structure
Incident reporting
- If a developer suspects they may have exposed a secret or regulated data to an AI code assistant: treat as a potential data incident, report to security lead immediately
Governance Checklist: AI Code Assistants
- Approved code assistant tools and tiers listed in the AI acceptable use policy
- Personal free-tier plans prohibited or limited to non-work repositories
- Organization-managed accounts provisioned for all approved code assistants
- Public code duplication detection enabled in GitHub Copilot org settings
-
.cursorignore/ equivalent patterns configured for credential and sensitive data files - AI code assistants added to the vendor register
- DPA or data handling review completed for each approved tool
- BAA obtained if any developer uses code assistants in healthcare/PHI codebases (Copilot Enterprise only currently)
- Developer awareness training covers AI code assistant IP and data risks
- PR template updated to include AI-generated code disclosure if required by compliance framework
Tracking all your AI tools in one place? The AI Tool Register Template includes a row for each code assistant with columns for data classification permitted, training opt-out status, and DPA status. For a full vendor security review, the AI Vendor Due Diligence Checklist walks through the 30 questions to ask any AI vendor — including the ones specific to code assistants.
