AI Impact Report

Last updated: March 4, 2026

What Does This Report Show and Why Is It Important?

The AI Impact Report (also called the Impact Scorecard) helps you understand how AI coding assistants are impacting your engineering organization's productivity and velocity. Powered by Span's proprietary span-detect-1 AI detection engine, this report answers critical questions like:

  • How much of our code is AI-generated? Track AI adoption across teams, individuals, repositories, job level, location and much more

  • Is AI making us more productive? Compare velocity and efficiency between different AI-assisted workflows

  • Are review cycles changing? Understand how AI affects code review dynamics

Why It Matters: As organizations invest in AI coding tools (GitHub Copilot, Cursor, etc.), leaders need data-driven insights to measure ROI, optimize adoption strategies, and identify where AI is—or isn't—delivering value.

Supported Languages

This report is built on language specific models. Today it supports:

  • Python

  • Typescript

  • Javascript

  • Ruby

  • Java

  • C#

  • Go

  • Kotlin

  • Swift


How to Set Up & Access the Report

Prerequisites

  1. VCS Integration Required: Active GitHub or GitLab connection

  2. Data Availability: Your organization must have merged PRs with code in any of the supported languages

Navigation

Access the report at:

Insights > AI Transformation > AI Impact

Initial Setup

  • No manual configuration needed - The report automatically analyzes all merged PRs from connected repositories

  • Default filters apply (entire organization, all repositories, current date)

  • Your filter selections and preferences persist automatically in local storage

Qualified PRs Explained

Not all PRs are included in calculations. There are two qualification levels:

Level 1: AI Code Ratio Qualified

  • PR must have at least one code chunk in a supported language. Each PR's lines are grouped into contiguous chunks. We only analyze added and modified lines of code and only chunks that have at least 700 characters.

Level 2: AI Dosage Analysis Qualified

  • At least one chunk in a supported language AND

  • At least 30% of non-ignored lines in supported languages

Why? PRs with too little supported code produce unreliable AI detection results. The 30% threshold ensures data quality.

Tip: Turn on "Show Unknown PRs" to see what's being filtered out.

Learn more about qualification criteria here.


Where Does the Data Come From?

Data Sources

  1. Pull Request Data - Retrieved from your VCS integration (GitHub/GitLab)

  2. AI Detection - Processed by span-detect-1, Span's proprietary ML model that detects AI-generated code patterns

  3. Metrics Aggregation - Calculated at organization, team, person, and repository levels

  4. Benchmarking Data - Anonymized industry data from similar organizations

How AI Detection Works

  • Tool-Agnostic: Detects AI code regardless of source (GitHub Copilot, Cursor, ChatGPT, Claude, etc.)

  • Pattern-Based: Analyzes code characteristics, not tool metadata

  • Confidence Intervals: Provides lower and upper bounds to reflect detection uncertainty

  • Automatic: Runs when PRs are merged or reverted—no manual action required

Data Freshness

  • Analysis runs automatically on merged PRs every few hours

  • AI analysis is run on non-analyzed PRs that are merged as part of our asset processing, which happens after data syncs.

Configuration Options

1. Filters

Customize your analysis scope:

  • Date Range - Select the time period to analyze. The Data availability banner can guide you to the full analyzed PR period you can select

  • Repository - Filter to specific repos

  • Team/People - Drill down by organizational structure

  • Pull Request Has Tests - Include/exclude PRs with tests

  • AI Work Classification Trait - Filter by work type (New Features, Maintenance, Productivity)

2. Baseline vs. Comparison Selector

Define your own "AI dosage" groups to compare:

  • Baseline Options: No AI (0-5%), Low AI (5-25%), Medium AI (25-50%), High AI (50-100%), Unknown, or combinations

  • Comparison Options: Same buckets

  • Default: Compares No AI (0-5%) vs. High AI (50-100%)

Example: Compare teams using No AI vs. teams using any AI (High, Medium, Low) to see if AI-assisted development correlates with better outcomes.

3. View Controls

  • Metric Selection - Switch between AI Code Ratio, PR throughput, PR review cycles and Estimated detected defects (Early Access)

  • Breakdown Dimensions - View data by Group (teams), Person (individuals), Repository, Job title, Tenure, Location

  • Show Unknown PRs Toggle - Display/hide PRs where AI usage couldn't be determined (default: OFF)

4. Benchmarking Filters

Compare your data against:

  • Organization Benchmarks - Your org's baseline performance

  • Industry Benchmarks - Industry percentiles (P50 shown by default)

  • Segment by: IC Level (Junior/Mid/Senior), Job Title, Manager status, Team/Group


Key Metrics & How They're Calculated

1. AI Code Ratio (Primary Metric)

What it measures: Percentage of new/modified lines detected as AI-generated in supported programming languages

Formula:

AI Code Ratio = (Lower Bound + Upper Bound) / (2 × Total Supported Lines)

Components:

  • Lower Bound: Conservative, high-confidence AI detections

  • Upper Bound: Optimistic estimate including likely AI

  • Total Supported Lines: Lines in languages Span can analyze (Python, JavaScript, TypeScript, Ruby, etc.)

Important: Ignored file patterns apply to analyzed lines. That applies to common generated file patterns as well (an example would be *.generated.js files for JavaScript) - the lines of code in those files will not be included in ratio calculations. You can change your file ignore patterns in Settings -> Metrics.

2. Velocity represented by PR throughput

What it measures: Merged PR weight per developer-week. Each developer-week is a 7 day period identified by its start date and a code contributor ID.

Comparison: Shows difference between baseline and comparison groups

  • Example: Developers using High AI (50-100%) complete 15% more "work-weight" per week than No AI developers

A note about weighted PRs:

  • Distance → a PR’s weight, between 0.1–2.0, normalized by global PR complexity and line count.

  • Time → duration from first commit to merge.

  • Interpretation: A higher value means teams are merging more “work-weight” per day.

Example: A 10% increase in weighted PRs / cycle time day suggests faster throughput for AI-assisted code.

3. Quality represented by PR review cycles

What it measures:  the back-and-forth between author and reviewer:

  • 1 Cycle = open → approve (no revisions)

  • Multiple Cycles = open → comment → commit → approve (each full loop adds 1)

Insight: Helps identify if AI code requires more or fewer review iterations. If High-AI PRs have higher review cycles, it may indicate increased reviewer pushback or uncertainty about AI-generated changes.

A corresponding rise in the rework stage of the lifecycle often supports this interpretation.


Understanding Report Features

Hotspots Section

  • Bright Spots: Segments (teams/people/repos) showing the most positive change with AI usage

    • Example: "Alice Johnson (+15%)" - Alice's velocity improved 15% with AI

  • Pressure Points: Segments showing negative changes

    • Example: "Backend Team (-8%)" - Backend team's review cycles increased 8%

Impact Metrics Display

  • Shows delta (change %) between baseline and comparison groups

  • Visual indicators for positive/negative trends

  • Dynamically updates based on your baseline/comparison selection

Data Tables

  • Sortable, filterable breakdowns by Group, Person, or Repo

  • Shows absolute values and comparison deltas

  • Export-friendly for further analysis


FAQ

Q: Why don't some PRs show AI usage?

A: PRs must meet qualification criteria—at least 30% of lines in supported languages and code blocks long enough to classify. Turn on "Show Unknown PRs" to see filtered PRs.

Q: What languages are supported?

A: Python, JavaScript, TypeScript, Go, Java, Kotlin, Go, C#. Don't see one of your languages supported? Reach out to your Span rep to see if it's on the roadmap.

Q: How reliable is AI detection?

A: Very reliable at ~95% accuracy.

Q: Does this work with all AI tools?

A: Yes! Detection is tool-agnostic—it identifies AI-generated code patterns regardless of which tool (Copilot, Cursor, ChatGPT, etc.) was used.

Q: Can I compare multiple teams?

A: Yes. Use the breakdown dimensions to drill down by team, and apply filters to compare specific groups.

Q: How do benchmarks work?

A: Span calculates industry percentiles (P50, P75, P90) from anonymized data across similar organizations. Your metrics are automatically compared to these benchmarks.

Q: Are contractors included?

A: Only qualifying developers are included in analysis.

Q: Can I export this data?

A: Data tables support standard interactions. Contact your Span CSM for custom export options.

Q: What if we don't use AI tools yet?

A: The report establishes a baseline. You can track adoption trends as your org starts using AI coding assistants.

Q: Does this track local/uncommitted AI usage?

A: No. Analysis only covers merged PRs—it doesn't track AI usage in development branches or local work not yet committed.


Limitations & Important Notes

  • Beta Status: Report is currently in Beta—features may evolve

  • Language Support: Only works with supported programming languages

  • Minimum Chunk Size: Code blocks must meet minimum length requirements

  • PR-Centric: Only analyzes merged PRs, not in-progress work

  • Qualification Filtering: 30% supported code threshold ensures data quality


Related Features

  • AI Tools Report - Track adoption of specific coding assistant tools

  • AI Work Classification - Categorizes work as Features, Maintenance, or Productivity

  • Standard Velocity Reports - Traditional engineering productivity metrics