Feb 24, 2026

Stanford CS146S Deep Dive (4): Secure Vibe Coding — AI Code Security Guide

Deep dive into Stanford CS146S Weeks 6-7: real-world Prompt Injection to RCE exploit, OWASP Top 10 in the Agent era, AI code review methodology, and how to build secure Vibe Coding practices.

Bruce

AI SecurityVibe CodingStanford CS146SPrompt InjectionCode Review

AI Guides

2418 Words

2026-02-24

This is Part 4 of the “Stanford Vibe Coding Course Deep Dive” series. See the series navigation at the end of this article.

Weeks 6 and 7 of CS146S are the most spine-chilling weeks of the entire course.

Week 6 covers security: when AI writes your code, who ensures it’s not vulnerable to attacks? Even scarier — what happens when the AI itself becomes the attack surface?

Week 7 covers review: how much can we actually trust AI-generated code?

Many AI coding courses only teach you how to code faster. CS146S raises the bar for what’s shippable: testable, auditable, and defensible. These two weeks are the essential path from “Vibe Coder” to “Professional Vibe Coder.”

Real-World Case: Prompt Injection Leading to Remote Code Execution

Let’s start with a real security vulnerability.

In 2025, security researchers discovered a critical vulnerability in GitHub Copilot (CVE-2025-53773): an attacker could use Prompt Injection to make Copilot execute arbitrary commands on your machine.

The Attack Chain

Planting malicious instructions: Attackers embed hidden instructions in source code files, web pages, or GitHub Issues. These instructions are invisible or inconspicuous to humans, but the AI reads and executes them.
Manipulating configuration files: After reading this content, Copilot Agent Mode is manipulated into modifying VS Code’s configuration file .vscode/settings.json. Specifically, it adds:
```
{
  "chat.tools.autoApprove": true
}
```
Activating YOLO mode: This configuration disables all user confirmation prompts. From this point on, Copilot can perform any action — including running terminal commands — without requiring your approval.
Executing arbitrary commands: Subsequent attacker instructions are delivered through the same Prompt Injection vector, and Copilot executes terminal commands unsupervised — downloading malware, stealing credentials, or enrolling in botnets.

The most unsettling detail: the configuration file changes are written to disk immediately, not presented as a diff for your review. By the time you notice the change, it’s already too late.

Impact Scope

Cross-platform: Windows, macOS, and Linux all affected
Potential consequences: ransomware, information theft, botnet recruitment
Multiple attack vectors: can be planted via code repositories, web content, Issue comments, and more

Microsoft patched this vulnerability in August 2025, but it revealed a fundamental problem: AI coding tools are inherently an attack surface. They read external input (code, documentation, web pages) and have the ability to modify files and execute commands. This combination of capabilities creates the perfect conditions for Prompt Injection attacks.

A Warning for All AI Coding Tools

This isn’t just a Copilot problem. Any AI coding tool with the following characteristics faces similar risks:

Can read external content (codebases, web pages, documentation)
Can modify files
Can execute shell commands
Has an “auto-approve” mode

Claude Code mitigates this risk through its permission model — high-risk operations require explicit authorization, and operations are displayed for user confirmation. But this also means: as a user, you can’t blindly approve every AI operation. Before each confirmation, you need to understand what it’s about to do.

OWASP Top 10: New Threats in the Agent Era

The OWASP Top 10 is the classic framework for web application security. But in the AI Agent era, these traditional threats have taken on new forms.

The Evolution of Injection Attacks

Traditional injection attacks like SQL injection and XSS still exist, but now there’s a new member: Prompt Injection.

Injection Type	Traditional Form	Agent Era Form
SQL Injection	User input directly concatenated into SQL	AI-generated SQL may contain injection vulnerabilities
XSS	User input rendered to page without escaping	AI-generated frontend code may miss escaping
Command Injection	User input passed to shell commands	AI Agent manipulated to execute malicious commands
Prompt Injection	N/A	External content manipulates AI behavior

Prompt Injection is particularly dangerous because it doesn’t attack your application — it attacks the AI that builds your application. Once the AI is compromised, all the code it generates could contain backdoors.

Research from Palo Alto Networks Unit42 reveals several systemic weaknesses of AI Agents regarding security:

Identity spoofing and impersonation: Attackers can masquerade as legitimate services and interact with your AI Agent through protocols like MCP.
Over-trusting external data: AI Agents tend to trust all content they read, including potentially tampered documentation and configurations.
Blurred permission boundaries: When an AI Agent is connected to multiple MCP Servers, a compromised Server could affect the entire system.

AI’s Ability and Limitations in Finding Security Vulnerabilities

Let’s flip the perspective — AI can not only introduce security vulnerabilities but also discover them.

In the Week 6 reading materials, Semgrep’s research provides the most systematic evaluation to date.

Experiment Design

Subjects: 11 large, actively maintained open-source Python projects (Django, Flask, FastAPI frameworks)
Code volume: 8+ million lines of code total
Tools: Claude Code and OpenAI Codex
Target vulnerabilities: authentication bypass, IDOR, path traversal, SQL injection, SSRF, XSS

Key Findings

Claude Code reported 329 findings, of which 46 were real vulnerabilities — 14% true positive rate, 86% false positive rate.

OpenAI Codex reported 116 findings, of which 21 were real vulnerabilities — 18% true positive rate, 82% false positive rate.

Combined, they discovered approximately 20 high-severity vulnerabilities.

More granular data reveals each tool’s strengths and weaknesses:

Vulnerability Type	Claude Code True Positive Rate	Codex True Positive Rate
IDOR	22%	0%
Path Traversal	10%	47%
Auth Bypass	14%	18%
SQL Injection	5%	N/A
XSS	16%	0%

The Scariest Finding: Non-Determinism

Same code, same AI, same prompt — three runs produced 3, 6, and 11 completely different findings.

This stems from AI’s “context decay” — during the analysis of large codebases, the AI gradually loses earlier context details. A vulnerability noticed during the first run might be overlooked in the second run due to context compression.

The practical implication is huge: don’t run a one-time AI security scan and consider yourself safe. Multiple runs, cross-validation, and combining traditional static analysis tools is the only reliable security strategy.

Conclusion

The current state of AI security scanning is analogous to: a junior security researcher with intuition but questionable reliability. It can catch issues humans might overlook, but the false positive rate is high and results are unstable. The correct approach is to use it as one link in the security toolchain, not the only one (for Claude Code’s built-in security scanning capabilities, see Claude Code Security Deep Dive).

Context Rot and Its Hidden Connection to Security

Week 6 also references Chroma team’s research on Context Rot. This research may seem unrelated to security, but it reveals a critical security implication.

Context Rot refers to the continuous degradation of model performance as input length increases. In the security domain, this means:

Security rules get “forgotten” in long conversations: You emphasize “don’t use eval()” at the start of a conversation, but after 50 turns, the AI might use eval() somewhere — because the early security constraints were deprioritized during context compression.
Vulnerabilities in complex codebases are harder to detect: When the AI needs to analyze large amounts of code, its “attention” to each file gets diluted. Security issues hidden in edge cases are more likely to be overlooked.
AI’s security awareness is not constant: The same model might correctly refuse an unsafe operation in a short context, but allow it in a long context due to “attention decay.”

Countermeasure: Security-related constraints should be placed in the most prominent position in context (such as the beginning of CLAUDE.md), and periodically restated in new sessions. Don’t expect the AI to remember the security requirements from turn 1 after 100 turns of conversation.

AI Code Review: Where Is the Trust Boundary?

Week 7 shifts to another critical question: how should we review AI-generated code?

Traditional Code Review has mature methodologies, but AI code has its own unique “smell” that requires different review strategies.

Characteristics of AI-Generated Code

Based on the experience of millions of AI Code Reviews (Graphite’s presentation), several typical characteristics of AI-generated code can be summarized:

Superficially correct, deeply flawed: AI excels at generating syntactically correct, seemingly reasonable code, but may have hidden issues in boundary conditions, error handling, and concurrency safety.
Over-engineered: AI tends to add unnecessary abstraction layers, redundant error handling, and excessive type annotations. The code looks “professional” but actually increases complexity.
Traces of pattern copying: AI “memorizes” certain patterns from training data, even when those patterns don’t apply to the current scenario. For example, using enterprise-grade architecture patterns in a simple utility script.
Inconsistent security handling: AI might implement perfect input validation in some places but completely ignore it in other similar places. This inconsistency is more dangerous than no validation at all — it creates a false sense of security.
Hallucinated APIs and libraries: AI might call non-existent functions or use deprecated APIs. These might be caught at compile time, but in dynamic languages they may not surface until runtime.

Seven-Step AI Code Review Method

Based on CS146S materials and GitHub engineers’ Code Review philosophy, here’s a review method tailored for AI-generated code:

1. Intent Verification

Ask: Is this code doing what I actually wanted?

AI might perfectly implement a feature you didn’t ask for, or apply an unexpected “creative interpretation” of your requirements. Confirm the direction is right before examining details.

2. Security Scan

Check: Are there common security issues?

Focus on:

Is user input validated and escaped?
Do SQL queries use parameterized statements?
Do file operations have path traversal risks?
Are HTTP requests vulnerable to SSRF?
Is authentication and authorization logic complete?
Is sensitive information leaking into logs or responses?

3. Boundary Conditions

Test: What happens in extreme cases?

AI tends to handle the happy path, often falling short on:

Empty values, null, undefined
Extremely large or small inputs
Concurrent access
Network timeouts, service unavailability
Full disk, out of memory

4. Dependency Audit

Verify: Are the introduced dependencies reliable?

AI might recommend unreliable third-party packages — few stars, long unmaintained, with known vulnerabilities, or even non-existent (hallucinated package names). Every new dependency should be manually verified.

5. Performance Assessment

Evaluate: How does it perform under real-world load?

AI might write code that works fine on small datasets but crashes at production scale. Pay special attention to:

N+1 queries
Potential infinite loops
Memory leaks (especially in long-running services)
Unreasonable full table scans

6. Consistency Check

Compare: Is it consistent with existing code style?

AI-generated code might be inconsistent with the project’s established patterns in naming, error handling, log formatting, etc. Mixing multiple styles in one project significantly reduces maintainability.

7. Maintainability Assessment

Think: Will you understand this in three months?

AI tends to generate “disposable” code — it works but doesn’t consider future maintenance. Check:

Do critical logic sections have comments?
Are function responsibilities single-purpose?
Is the code structure easy to modify?
Do tests cover core logic?

Automated Assistance

Pure manual review is too inefficient. The following tools can help:

Tool Type	Representative Products	Purpose
Static Analysis	Semgrep, ESLint, Pylint	Auto-detect code standards and security issues
Type Checking	TypeScript, mypy	Catch type errors at compile time
Security Scanning	Snyk, Dependabot	Dependency vulnerability detection
AI Review	Graphite, CodeRabbit	Use AI to review AI code (fighting fire with fire)
Test Coverage	Coverage.py, Istanbul	Ensure critical paths have tests

Graphite’s CPO Tomas Reimers shared insights from millions of AI Code Reviews in his Week 7 talk — AI Review tools don’t replace human review but serve as a first filter, allowing human reviewers to focus their energy on higher-level concerns.

Building a Secure Vibe Coding Workflow

Synthesizing the content from Weeks 6-7, a secure Vibe Coding workflow should include the following defense layers:

First Line of Defense: Secure Context

Define security rules explicitly in CLAUDE.md or project configuration:

## Security Standards
- All user input must be validated and escaped
- SQL queries must use parameterized queries, no string concatenation
- File operations must validate paths to prevent directory traversal
- API responses must not contain sensitive information (passwords, tokens, keys)
- New dependencies must be checked for security and maintenance status
- Do not use eval(), exec(), or similar dynamic code execution

Writing security rules into context makes the AI automatically comply when generating code. This doesn’t guarantee 100% security, but it eliminates most low-level security mistakes.

Second Line of Defense: Automated Checks

Integrate automated security checks into the CI/CD pipeline:

Pre-commit hooks: Run linting and basic security checks
CI Pipeline: Run full test suite + static analysis + dependency scanning
PR Review: Automatically run AI Code Review tools

Third Line of Defense: Human Review

For code changes involving security, human review is non-negotiable. Focus on:

Changes to authentication/authorization logic
Database schema changes
New external service integrations
Configuration file changes (especially permission-related)

Fourth Line of Defense: Least Privilege

Don’t give AI Agents “root access.” Restrict the files and directories they can access.
Don’t use “auto-approve all operations” mode.
Regularly audit AI Agent operation logs.
Don’t store sensitive credentials in code repositories or locations accessible to AI.

Fifth Line of Defense: Defense in Depth

Assume all the above defenses can be breached. Deploy runtime security measures:

WAF (Web Application Firewall)
RASP (Runtime Application Self-Protection)
Anomaly behavior detection
Regular security audits and penetration testing

Balancing Security and Speed

Some might say: don’t all these security checks slow down Vibe Coding?

CS146S’s answer is: insecure speed is false speed.

The damage from a single security vulnerability — data breaches, legal liability, reputational harm — far exceeds the time you saved with Vibe Coding. Moreover, most security measures (context configuration, CI automated checks, AI Review) are zero marginal cost once established — they run automatically without manual intervention each time.

True Professional Vibe Coding isn’t about choosing between speed and security, but building one-time security infrastructure and then moving fast within a secure framework.

This is the most valuable lesson from these two weeks of CS146S: rapid prototyping is just the starting point — testable, auditable, and defensible is the finish line.

Claude Code Security Deep Dive — Detailed look at Claude Code’s built-in security scanning capabilities
Claude Code Hooks Practical Guide — Automate security checks with Hooks
MCP Protocol Complete Guide — A prerequisite for understanding MCP security boundaries
MCP Security Guide — Best practices for MCP Server security
Build a Claude Code from Scratch — Implement security checks yourself for deeper understanding

This is Part 4 of the “Stanford Vibe Coding Course Deep Dive” series:

Stanford CS146S Deep Dive (1): How Vibe Coding Became an Academic Discipline
Stanford CS146S Deep Dive (2): Context Engineering (Week 3)
Stanford CS146S Deep Dive (3): Agent Manager (Week 4)
This article: Stanford CS146S Deep Dive (4): Secure Vibe Coding (Week 6-7)
Stanford CS146S Deep Dive (5): From Prototype to Production (Week 8-9)

Stanford CS146S Deep Dive (4): Secure Vibe Coding — AI Code Security Guide

Real-World Case: Prompt Injection Leading to Remote Code Execution

The Attack Chain

Impact Scope

A Warning for All AI Coding Tools

OWASP Top 10: New Threats in the Agent Era

The Evolution of Injection Attacks

Systemic Security Blind Spots in AI-Generated Code

AI’s Ability and Limitations in Finding Security Vulnerabilities

Experiment Design

Key Findings

The Scariest Finding: Non-Determinism

Conclusion

Context Rot and Its Hidden Connection to Security

AI Code Review: Where Is the Trust Boundary?

Characteristics of AI-Generated Code

Seven-Step AI Code Review Method

1. Intent Verification

2. Security Scan

3. Boundary Conditions

4. Dependency Audit

5. Performance Assessment

6. Consistency Check

7. Maintainability Assessment

Automated Assistance

Building a Secure Vibe Coding Workflow

First Line of Defense: Secure Context

Second Line of Defense: Automated Checks

Third Line of Defense: Human Review

Fourth Line of Defense: Least Privilege

Fifth Line of Defense: Defense in Depth

Balancing Security and Speed

Series Navigation

Comments

Real-World Case: Prompt Injection Leading to Remote Code Execution

The Attack Chain

Impact Scope

A Warning for All AI Coding Tools

OWASP Top 10: New Threats in the Agent Era

The Evolution of Injection Attacks

Systemic Security Blind Spots in AI-Generated Code

AI’s Ability and Limitations in Finding Security Vulnerabilities

Experiment Design

Key Findings

The Scariest Finding: Non-Determinism

Conclusion

Context Rot and Its Hidden Connection to Security

AI Code Review: Where Is the Trust Boundary?

Characteristics of AI-Generated Code

Seven-Step AI Code Review Method

1. Intent Verification

2. Security Scan

3. Boundary Conditions

4. Dependency Audit

5. Performance Assessment

6. Consistency Check

7. Maintainability Assessment

Automated Assistance

Building a Secure Vibe Coding Workflow

First Line of Defense: Secure Context

Second Line of Defense: Automated Checks

Third Line of Defense: Human Review

Fourth Line of Defense: Least Privilege

Fifth Line of Defense: Defense in Depth

Balancing Security and Speed

Related Reading

Series Navigation

Comments