🇨🇳 中文

Stanford CS146S Deep Dive (4): Secure Vibe Coding — AI Code Security Guide

Deep dive into Stanford CS146S Weeks 6-7: real-world Prompt Injection to RCE exploit, OWASP Top 10 in the Agent era, AI code review methodology, and how to build secure Vibe Coding practices.

Bruce

AI SecurityVibe CodingStanford CS146SPrompt InjectionCode Review

AI Guides

2418  Words

2026-02-24


This is Part 4 of the “Stanford Vibe Coding Course Deep Dive” series. See the series navigation at the end of this article.

Weeks 6 and 7 of CS146S are the most spine-chilling weeks of the entire course.

Week 6 covers security: when AI writes your code, who ensures it’s not vulnerable to attacks? Even scarier — what happens when the AI itself becomes the attack surface?

Week 7 covers review: how much can we actually trust AI-generated code?

Many AI coding courses only teach you how to code faster. CS146S raises the bar for what’s shippable: testable, auditable, and defensible. These two weeks are the essential path from “Vibe Coder” to “Professional Vibe Coder.”

Real-World Case: Prompt Injection Leading to Remote Code Execution

Let’s start with a real security vulnerability.

In 2025, security researchers discovered a critical vulnerability in GitHub Copilot (CVE-2025-53773): an attacker could use Prompt Injection to make Copilot execute arbitrary commands on your machine.

The Attack Chain

  1. Planting malicious instructions: Attackers embed hidden instructions in source code files, web pages, or GitHub Issues. These instructions are invisible or inconspicuous to humans, but the AI reads and executes them.

  2. Manipulating configuration files: After reading this content, Copilot Agent Mode is manipulated into modifying VS Code’s configuration file .vscode/settings.json. Specifically, it adds:

    {
      "chat.tools.autoApprove": true
    }
    
  3. Activating YOLO mode: This configuration disables all user confirmation prompts. From this point on, Copilot can perform any action — including running terminal commands — without requiring your approval.

  4. Executing arbitrary commands: Subsequent attacker instructions are delivered through the same Prompt Injection vector, and Copilot executes terminal commands unsupervised — downloading malware, stealing credentials, or enrolling in botnets.

The most unsettling detail: the configuration file changes are written to disk immediately, not presented as a diff for your review. By the time you notice the change, it’s already too late.

Impact Scope

  • Cross-platform: Windows, macOS, and Linux all affected
  • Potential consequences: ransomware, information theft, botnet recruitment
  • Multiple attack vectors: can be planted via code repositories, web content, Issue comments, and more

Microsoft patched this vulnerability in August 2025, but it revealed a fundamental problem: AI coding tools are inherently an attack surface. They read external input (code, documentation, web pages) and have the ability to modify files and execute commands. This combination of capabilities creates the perfect conditions for Prompt Injection attacks.

A Warning for All AI Coding Tools

This isn’t just a Copilot problem. Any AI coding tool with the following characteristics faces similar risks:

  • Can read external content (codebases, web pages, documentation)
  • Can modify files
  • Can execute shell commands
  • Has an “auto-approve” mode

Claude Code mitigates this risk through its permission model — high-risk operations require explicit authorization, and operations are displayed for user confirmation. But this also means: as a user, you can’t blindly approve every AI operation. Before each confirmation, you need to understand what it’s about to do.

OWASP Top 10: New Threats in the Agent Era

The OWASP Top 10 is the classic framework for web application security. But in the AI Agent era, these traditional threats have taken on new forms.

The Evolution of Injection Attacks

Traditional injection attacks like SQL injection and XSS still exist, but now there’s a new member: Prompt Injection.

Injection TypeTraditional FormAgent Era Form
SQL InjectionUser input directly concatenated into SQLAI-generated SQL may contain injection vulnerabilities
XSSUser input rendered to page without escapingAI-generated frontend code may miss escaping
Command InjectionUser input passed to shell commandsAI Agent manipulated to execute malicious commands
Prompt InjectionN/AExternal content manipulates AI behavior

Prompt Injection is particularly dangerous because it doesn’t attack your application — it attacks the AI that builds your application. Once the AI is compromised, all the code it generates could contain backdoors.

Systemic Security Blind Spots in AI-Generated Code

Research from Palo Alto Networks Unit42 reveals several systemic weaknesses of AI Agents regarding security:

  1. Identity spoofing and impersonation: Attackers can masquerade as legitimate services and interact with your AI Agent through protocols like MCP.
  2. Over-trusting external data: AI Agents tend to trust all content they read, including potentially tampered documentation and configurations.
  3. Blurred permission boundaries: When an AI Agent is connected to multiple MCP Servers, a compromised Server could affect the entire system.

AI’s Ability and Limitations in Finding Security Vulnerabilities

Let’s flip the perspective — AI can not only introduce security vulnerabilities but also discover them.

In the Week 6 reading materials, Semgrep’s research provides the most systematic evaluation to date.

Experiment Design

  • Subjects: 11 large, actively maintained open-source Python projects (Django, Flask, FastAPI frameworks)
  • Code volume: 8+ million lines of code total
  • Tools: Claude Code and OpenAI Codex
  • Target vulnerabilities: authentication bypass, IDOR, path traversal, SQL injection, SSRF, XSS

Key Findings

Claude Code reported 329 findings, of which 46 were real vulnerabilities — 14% true positive rate, 86% false positive rate.

OpenAI Codex reported 116 findings, of which 21 were real vulnerabilities — 18% true positive rate, 82% false positive rate.

Combined, they discovered approximately 20 high-severity vulnerabilities.

More granular data reveals each tool’s strengths and weaknesses:

Vulnerability TypeClaude Code True Positive RateCodex True Positive Rate
IDOR22%0%
Path Traversal10%47%
Auth Bypass14%18%
SQL Injection5%N/A
XSS16%0%

The Scariest Finding: Non-Determinism

Same code, same AI, same prompt — three runs produced 3, 6, and 11 completely different findings.

This stems from AI’s “context decay” — during the analysis of large codebases, the AI gradually loses earlier context details. A vulnerability noticed during the first run might be overlooked in the second run due to context compression.

The practical implication is huge: don’t run a one-time AI security scan and consider yourself safe. Multiple runs, cross-validation, and combining traditional static analysis tools is the only reliable security strategy.

Conclusion

The current state of AI security scanning is analogous to: a junior security researcher with intuition but questionable reliability. It can catch issues humans might overlook, but the false positive rate is high and results are unstable. The correct approach is to use it as one link in the security toolchain, not the only one (for Claude Code’s built-in security scanning capabilities, see Claude Code Security Deep Dive).

Context Rot and Its Hidden Connection to Security

Week 6 also references Chroma team’s research on Context Rot. This research may seem unrelated to security, but it reveals a critical security implication.

Context Rot refers to the continuous degradation of model performance as input length increases. In the security domain, this means:

  1. Security rules get “forgotten” in long conversations: You emphasize “don’t use eval()” at the start of a conversation, but after 50 turns, the AI might use eval() somewhere — because the early security constraints were deprioritized during context compression.

  2. Vulnerabilities in complex codebases are harder to detect: When the AI needs to analyze large amounts of code, its “attention” to each file gets diluted. Security issues hidden in edge cases are more likely to be overlooked.

  3. AI’s security awareness is not constant: The same model might correctly refuse an unsafe operation in a short context, but allow it in a long context due to “attention decay.”

Countermeasure: Security-related constraints should be placed in the most prominent position in context (such as the beginning of CLAUDE.md), and periodically restated in new sessions. Don’t expect the AI to remember the security requirements from turn 1 after 100 turns of conversation.

AI Code Review: Where Is the Trust Boundary?

Week 7 shifts to another critical question: how should we review AI-generated code?

Traditional Code Review has mature methodologies, but AI code has its own unique “smell” that requires different review strategies.

Characteristics of AI-Generated Code

Based on the experience of millions of AI Code Reviews (Graphite’s presentation), several typical characteristics of AI-generated code can be summarized:

  1. Superficially correct, deeply flawed: AI excels at generating syntactically correct, seemingly reasonable code, but may have hidden issues in boundary conditions, error handling, and concurrency safety.

  2. Over-engineered: AI tends to add unnecessary abstraction layers, redundant error handling, and excessive type annotations. The code looks “professional” but actually increases complexity.

  3. Traces of pattern copying: AI “memorizes” certain patterns from training data, even when those patterns don’t apply to the current scenario. For example, using enterprise-grade architecture patterns in a simple utility script.

  4. Inconsistent security handling: AI might implement perfect input validation in some places but completely ignore it in other similar places. This inconsistency is more dangerous than no validation at all — it creates a false sense of security.

  5. Hallucinated APIs and libraries: AI might call non-existent functions or use deprecated APIs. These might be caught at compile time, but in dynamic languages they may not surface until runtime.

Seven-Step AI Code Review Method

Based on CS146S materials and GitHub engineers’ Code Review philosophy, here’s a review method tailored for AI-generated code:

1. Intent Verification

Ask: Is this code doing what I actually wanted?

AI might perfectly implement a feature you didn’t ask for, or apply an unexpected “creative interpretation” of your requirements. Confirm the direction is right before examining details.

2. Security Scan

Check: Are there common security issues?

Focus on:

  • Is user input validated and escaped?
  • Do SQL queries use parameterized statements?
  • Do file operations have path traversal risks?
  • Are HTTP requests vulnerable to SSRF?
  • Is authentication and authorization logic complete?
  • Is sensitive information leaking into logs or responses?

3. Boundary Conditions

Test: What happens in extreme cases?

AI tends to handle the happy path, often falling short on:

  • Empty values, null, undefined
  • Extremely large or small inputs
  • Concurrent access
  • Network timeouts, service unavailability
  • Full disk, out of memory

4. Dependency Audit

Verify: Are the introduced dependencies reliable?

AI might recommend unreliable third-party packages — few stars, long unmaintained, with known vulnerabilities, or even non-existent (hallucinated package names). Every new dependency should be manually verified.

5. Performance Assessment

Evaluate: How does it perform under real-world load?

AI might write code that works fine on small datasets but crashes at production scale. Pay special attention to:

  • N+1 queries
  • Potential infinite loops
  • Memory leaks (especially in long-running services)
  • Unreasonable full table scans

6. Consistency Check

Compare: Is it consistent with existing code style?

AI-generated code might be inconsistent with the project’s established patterns in naming, error handling, log formatting, etc. Mixing multiple styles in one project significantly reduces maintainability.

7. Maintainability Assessment

Think: Will you understand this in three months?

AI tends to generate “disposable” code — it works but doesn’t consider future maintenance. Check:

  • Do critical logic sections have comments?
  • Are function responsibilities single-purpose?
  • Is the code structure easy to modify?
  • Do tests cover core logic?

Automated Assistance

Pure manual review is too inefficient. The following tools can help:

Tool TypeRepresentative ProductsPurpose
Static AnalysisSemgrep, ESLint, PylintAuto-detect code standards and security issues
Type CheckingTypeScript, mypyCatch type errors at compile time
Security ScanningSnyk, DependabotDependency vulnerability detection
AI ReviewGraphite, CodeRabbitUse AI to review AI code (fighting fire with fire)
Test CoverageCoverage.py, IstanbulEnsure critical paths have tests

Graphite’s CPO Tomas Reimers shared insights from millions of AI Code Reviews in his Week 7 talk — AI Review tools don’t replace human review but serve as a first filter, allowing human reviewers to focus their energy on higher-level concerns.

Building a Secure Vibe Coding Workflow

Synthesizing the content from Weeks 6-7, a secure Vibe Coding workflow should include the following defense layers:

First Line of Defense: Secure Context

Define security rules explicitly in CLAUDE.md or project configuration:

## Security Standards
- All user input must be validated and escaped
- SQL queries must use parameterized queries, no string concatenation
- File operations must validate paths to prevent directory traversal
- API responses must not contain sensitive information (passwords, tokens, keys)
- New dependencies must be checked for security and maintenance status
- Do not use eval(), exec(), or similar dynamic code execution

Writing security rules into context makes the AI automatically comply when generating code. This doesn’t guarantee 100% security, but it eliminates most low-level security mistakes.

Second Line of Defense: Automated Checks

Integrate automated security checks into the CI/CD pipeline:

  • Pre-commit hooks: Run linting and basic security checks
  • CI Pipeline: Run full test suite + static analysis + dependency scanning
  • PR Review: Automatically run AI Code Review tools

Third Line of Defense: Human Review

For code changes involving security, human review is non-negotiable. Focus on:

  • Changes to authentication/authorization logic
  • Database schema changes
  • New external service integrations
  • Configuration file changes (especially permission-related)

Fourth Line of Defense: Least Privilege

  • Don’t give AI Agents “root access.” Restrict the files and directories they can access.
  • Don’t use “auto-approve all operations” mode.
  • Regularly audit AI Agent operation logs.
  • Don’t store sensitive credentials in code repositories or locations accessible to AI.

Fifth Line of Defense: Defense in Depth

Assume all the above defenses can be breached. Deploy runtime security measures:

  • WAF (Web Application Firewall)
  • RASP (Runtime Application Self-Protection)
  • Anomaly behavior detection
  • Regular security audits and penetration testing

Balancing Security and Speed

Some might say: don’t all these security checks slow down Vibe Coding?

CS146S’s answer is: insecure speed is false speed.

The damage from a single security vulnerability — data breaches, legal liability, reputational harm — far exceeds the time you saved with Vibe Coding. Moreover, most security measures (context configuration, CI automated checks, AI Review) are zero marginal cost once established — they run automatically without manual intervention each time.

True Professional Vibe Coding isn’t about choosing between speed and security, but building one-time security infrastructure and then moving fast within a secure framework.

This is the most valuable lesson from these two weeks of CS146S: rapid prototyping is just the starting point — testable, auditable, and defensible is the finish line.

Series Navigation

This is Part 4 of the “Stanford Vibe Coding Course Deep Dive” series:

  1. Stanford CS146S Deep Dive (1): How Vibe Coding Became an Academic Discipline
  2. Stanford CS146S Deep Dive (2): Context Engineering (Week 3)
  3. Stanford CS146S Deep Dive (3): Agent Manager (Week 4)
  4. This article: Stanford CS146S Deep Dive (4): Secure Vibe Coding (Week 6-7)
  5. Stanford CS146S Deep Dive (5): From Prototype to Production (Week 8-9)

Comments

Join the discussion — requires a GitHub account