AI Security Code Review (Stage 4)

Why Findings-Driven?

Sending the entire codebase to AI causes high token costs and excessive noise. It is more efficient for Stage 3 tools (Semgrep and grype) to narrow candidates first, and AI to focus only on those results.

[Stage 3] Semgrep · grype → findings.json
                                ↓
[Stage 4] AI: code context + findings → validation, deep interpretation, and related finding discovery
                                ↓
                       PR comment (does not block build)

Tool	Detection Method	Strengths	Limitations
Gitleaks	Regex pattern matching	Hardcoded secrets	Cannot detect secrets hidden in variables
grype	CVE DB matching	Known vulnerabilities	Cannot detect 0-day or logic bugs
Semgrep	Code pattern rules	Common vulnerability patterns	Ignores business logic context
AI (Stage 4)	Natural language reasoning	FP classification, context understanding, related discovery	High FP rate, API costs

Operate as reporting, not build blocking

AI review has a high FP (false positive) rate. Use it only for PR comments or Security tab reporting, and avoid using it to force build failures.

GitHub Actions Configuration Example

This workflow collects findings from Stage 3 tools, then has AI analyze them with code context.

YAML
# .github/workflows/ai-review.yml
name: AI Security Review (Findings-Driven)

on:
  pull_request:
    branches: [main]

permissions:
  pull-requests: write

jobs:
  ai-review:
    runs-on: ubuntu-latest
    # The secrets context is not available in a job-level if; move it to env and check per step.
    env:
      HAS_ANTHROPIC_KEY: ${{ secrets.ANTHROPIC_API_KEY != '' }}
    steps:
      - uses: actions/checkout@v7
        with:
          fetch-depth: 0

      # Collect Stage 3 tool results (light rerun) — skipped entirely when the key is missing
      - name: Run Semgrep (SARIF)
        if: env.HAS_ANTHROPIC_KEY == 'true'
        run: |
          pip install semgrep -q
          semgrep --config=auto --sarif-output=semgrep.sarif \
            --include='*.py' --include='*.js' --include='*.ts' \
            --include='*.go' --include='*.java' || true

      - name: Run grype (JSON)
        if: env.HAS_ANTHROPIC_KEY == 'true'
        run: |
          curl -sSfL https://get.anchore.io/grype \
            | sh -s -- -b /usr/local/bin
          grype dir:. -o json > grype.json || true

      # AI: findings + code context → validation and interpretation
      - name: AI Findings Analysis
        if: env.HAS_ANTHROPIC_KEY == 'true'
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          pip install anthropic -q
          python3 << 'PYEOF'
          import json, pathlib, anthropic, sys

          # Semgrep findings parse
          semgrep_issues = []
          try:
              sarif = json.loads(pathlib.Path("semgrep.sarif").read_text())
              for run in sarif.get("runs", []):
                  for result in run.get("results", []):
                      loc = result.get("locations", [{}])[0]
                      region = loc.get("physicalLocation", {}).get("region", {})
                      uri = loc.get("physicalLocation", {}).get("artifactLocation", {}).get("uri", "")
                      line = region.get("startLine", 0)
                      rule_id = result.get("ruleId", "")
                      msg = result.get("message", {}).get("text", "")
                      # extract line context (±5 lines)
                      ctx = ""
                      try:
                          lines = pathlib.Path(uri).read_text().splitlines()
                          start = max(0, line - 6)
                          end = min(len(lines), line + 5)
                          ctx = "\n".join(f"{i+1}: {l}" for i, l in enumerate(lines[start:end], start=start))
                      except Exception:
                          pass
                      semgrep_issues.append({"rule": rule_id, "file": uri, "line": line, "msg": msg, "ctx": ctx})
          except Exception:
              pass

          # grype CVE findings parse (High/Critical only)
          grype_issues = []
          try:
              grype = json.loads(pathlib.Path("grype.json").read_text())
              for match in grype.get("matches", []):
                  sev = match.get("vulnerability", {}).get("severity", "")
                  if sev in ("High", "Critical"):
                      grype_issues.append({
                          "cve": match["vulnerability"]["id"],
                          "pkg": match["artifact"]["name"],
                          "ver": match["artifact"]["version"],
                          "severity": sev,
                          "fixed": match["vulnerability"].get("fix", {}).get("versions", []),
                      })
          except Exception:
              pass

          if not semgrep_issues and not grype_issues:
              pathlib.Path("review_result.txt").write_text("PASS")
              print("No detected findings — skip AI analysis")
              sys.exit(0)

          # assemble prompt (limited to top 13 — 8 Semgrep + 5 grype)
          semgrep_block = "\n".join(
              f"[Semgrep #{i+1}] {x['rule']} @ {x['file']}:{x['line']}\nMessage: {x['msg']}\nCode:\n{x['ctx']}"
              for i, x in enumerate(semgrep_issues[:8])
          )
          grype_block = "\n".join(
              f"[grype] {x['cve']} — {x['pkg']}@{x['ver']} ({x['severity']}) → Fixed version: {x['fixed']}"
              for x in grype_issues[:5]
          )

          prompt = f"""Below are detected results from static analysis tools (Semgrep) and SCA tools (grype).
Assess each item using the format below.

Assessment format:
- **[Item number]** Real vulnerability (TP) or false positive (FP) | Risk: High/Medium/Low | 1-2 sentence rationale
- If TP: add a one-line real exploit scenario
- For grype CVEs, determine whether the package is used in actual runtime paths

---
{semgrep_block}

{grype_block}
---

If there are no detected items, output PASS."""

          client = anthropic.Anthropic()
          response = client.messages.create(
              model="claude-opus-4-7",
              max_tokens=1500,
              messages=[{"role": "user", "content": prompt}]
          )
          result = response.content[0].text
          pathlib.Path("review_result.txt").write_text(result)
          print(result)
          PYEOF

      - name: Post PR comment
        if: env.HAS_ANTHROPIC_KEY == 'true'
        uses: actions/github-script@v9
        with:
          script: |
            const fs = require('fs');
            let result;
            try { result = fs.readFileSync('review_result.txt', 'utf8'); }
            catch { result = 'PASS'; }
            if (result.trim() === 'PASS') return;
            const total = (result.match(/\[Semgrep|grype/g) || []).length;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: [
                '## 🔍 AI Security Review (Findings-Driven)',
                '',
                '> AI validates and interprets the detection results from Stage 3 tools (Semgrep·grype).',
                '> False positives are possible; evaluate with context. This is not a build-blocking criterion.',
                '',
                result
              ].join('\n')
            });

Workflow Execution Flow

PR opened
  │
  ├─ [Stage 3] Semgrep → semgrep.sarif  ─┐
  └─ [Stage 3] grype   → grype.json     ─┤
                                        ↓
                            findings parse + code context extract
                                        ↓
                            Claude API (top 13 findings only)
                                        ↓
                            PR comment: TP/FP assessment + risk level

Token-saving points:

Send only the top 8 Semgrep findings with ±5 lines of context each
Include only grype Critical/High findings (exclude Medium/Low)
Skip API calls entirely when there are no findings

How to Enable

Add ANTHROPIC_API_KEY to GitHub Secrets
The workflow moves the key check to env and gates each step, so registering the key enables it automatically (steps are skipped when the key is missing)

Notes

External transfer of sensitive code

Code snippets flagged by Semgrep are sent to Anthropic servers. If internal security policy restricts external API transfer, policy review is required before adoption. Replacing with an on-prem LLM (such as Ollama) can also be considered.

FP rate and cost

LLM-based judgments frequently produce false positives. Control cost by limiting findings ([:8], [:5]) and estimate monthly API usage in advance based on team size and PR frequency.

Learn More

5-Stage Strategy — Full stage structure and AI defense layer positioning
DevSecOps — SAST — Rule-based static analysis (Semgrep · CodeQL)
DevSecOps — Organization-wide Pipeline Design