AI Impact Report

Last updated: March 4, 2026

What Does This Report Show and Why Is It Important?

The AI Impact Report (also called the Impact Scorecard) helps you understand how AI coding assistants are impacting your engineering organization's productivity and velocity. Powered by Span's proprietary span-detect-1 AI detection engine, this report answers critical questions like:

How much of our code is AI-generated? Track AI adoption across teams, individuals, repositories, job level, location and much more
Is AI making us more productive? Compare velocity and efficiency between different AI-assisted workflows
Are review cycles changing? Understand how AI affects code review dynamics

Why It Matters: As organizations invest in AI coding tools (GitHub Copilot, Cursor, etc.), leaders need data-driven insights to measure ROI, optimize adoption strategies, and identify where AI is—or isn't—delivering value.

Supported Languages

This report is built on language specific models. Today it supports:

Python
Typescript
Javascript
Ruby
Java
C#
Go
Kotlin
Swift

How to Set Up & Access the Report

Prerequisites

VCS Integration Required: Active GitHub or GitLab connection
Data Availability: Your organization must have merged PRs with code in any of the supported languages

Navigation

Access the report at:

Insights > AI Transformation > AI Impact

Initial Setup

No manual configuration needed - The report automatically analyzes all merged PRs from connected repositories
Default filters apply (entire organization, all repositories, current date)
Your filter selections and preferences persist automatically in local storage

Qualified PRs Explained

Not all PRs are included in calculations. There are two qualification levels:

Level 1: AI Code Ratio Qualified

PR must have at least one code chunk in a supported language. Each PR's lines are grouped into contiguous chunks. We only analyze added and modified lines of code and only chunks that have at least 700 characters.

Level 2: AI Dosage Analysis Qualified

At least one chunk in a supported language AND
At least 30% of non-ignored lines in supported languages

Why? PRs with too little supported code produce unreliable AI detection results. The 30% threshold ensures data quality.

Tip: Turn on "Show Unknown PRs" to see what's being filtered out.

Learn more about qualification criteria here.

Where Does the Data Come From?

Data Sources

Pull Request Data - Retrieved from your VCS integration (GitHub/GitLab)
AI Detection - Processed by span-detect-1, Span's proprietary ML model that detects AI-generated code patterns
Metrics Aggregation - Calculated at organization, team, person, and repository levels
Benchmarking Data - Anonymized industry data from similar organizations

How AI Detection Works

Tool-Agnostic: Detects AI code regardless of source (GitHub Copilot, Cursor, ChatGPT, Claude, etc.)
Pattern-Based: Analyzes code characteristics, not tool metadata
Confidence Intervals: Provides lower and upper bounds to reflect detection uncertainty
Automatic: Runs when PRs are merged or reverted—no manual action required

Data Freshness

Analysis runs automatically on merged PRs every few hours
AI analysis is run on non-analyzed PRs that are merged as part of our asset processing, which happens after data syncs.

Configuration Options

1. Filters

Customize your analysis scope:

Date Range - Select the time period to analyze. The Data availability banner can guide you to the full analyzed PR period you can select
Repository - Filter to specific repos
Team/People - Drill down by organizational structure
Pull Request Has Tests - Include/exclude PRs with tests
AI Work Classification Trait - Filter by work type (New Features, Maintenance, Productivity)

2. Baseline vs. Comparison Selector

Define your own "AI dosage" groups to compare:

Baseline Options: No AI (0-5%), Low AI (5-25%), Medium AI (25-50%), High AI (50-100%), Unknown, or combinations
Comparison Options: Same buckets
Default: Compares No AI (0-5%) vs. High AI (50-100%)

Example: Compare teams using No AI vs. teams using any AI (High, Medium, Low) to see if AI-assisted development correlates with better outcomes.

3. View Controls

Metric Selection - Switch between AI Code Ratio, PR throughput, PR review cycles and Estimated detected defects (Early Access)
Breakdown Dimensions - View data by Group (teams), Person (individuals), Repository, Job title, Tenure, Location
Show Unknown PRs Toggle - Display/hide PRs where AI usage couldn't be determined (default: OFF)

4. Benchmarking Filters

Compare your data against:

Organization Benchmarks - Your org's baseline performance
Industry Benchmarks - Industry percentiles (P50 shown by default)
Segment by: IC Level (Junior/Mid/Senior), Job Title, Manager status, Team/Group

Key Metrics & How They're Calculated

1. AI Code Ratio (Primary Metric)

What it measures: Percentage of new/modified lines detected as AI-generated in supported programming languages

Formula:

AI Code Ratio = (Lower Bound + Upper Bound) / (2 × Total Supported Lines)

Components:

Lower Bound: Conservative, high-confidence AI detections
Upper Bound: Optimistic estimate including likely AI
Total Supported Lines: Lines in languages Span can analyze (Python, JavaScript, TypeScript, Ruby, etc.)

Important: Ignored file patterns apply to analyzed lines. That applies to common generated file patterns as well (an example would be *.generated.js files for JavaScript) - the lines of code in those files will not be included in ratio calculations. You can change your file ignore patterns in Settings -> Metrics.

2. Velocity represented by PR throughput

What it measures: Merged PR weight per developer-week. Each developer-week is a 7 day period identified by its start date and a code contributor ID.

Comparison: Shows difference between baseline and comparison groups

Example: Developers using High AI (50-100%) complete 15% more "work-weight" per week than No AI developers

A note about weighted PRs:

Distance → a PR’s weight, between 0.1–2.0, normalized by global PR complexity and line count.
Time → duration from first commit to merge.
Interpretation: A higher value means teams are merging more “work-weight” per day.

Example: A 10% increase in weighted PRs / cycle time day suggests faster throughput for AI-assisted code.

3. Quality represented by PR review cycles

What it measures: the back-and-forth between author and reviewer:

1 Cycle = open → approve (no revisions)
Multiple Cycles = open → comment → commit → approve (each full loop adds 1)

Insight: Helps identify if AI code requires more or fewer review iterations. If High-AI PRs have higher review cycles, it may indicate increased reviewer pushback or uncertainty about AI-generated changes.

A corresponding rise in the rework stage of the lifecycle often supports this interpretation.

Understanding Report Features

Hotspots Section

Bright Spots: Segments (teams/people/repos) showing the most positive change with AI usage
- Example: "Alice Johnson (+15%)" - Alice's velocity improved 15% with AI
Pressure Points: Segments showing negative changes
- Example: "Backend Team (-8%)" - Backend team's review cycles increased 8%

Impact Metrics Display

Shows delta (change %) between baseline and comparison groups
Visual indicators for positive/negative trends
Dynamically updates based on your baseline/comparison selection

Data Tables

Sortable, filterable breakdowns by Group, Person, or Repo
Shows absolute values and comparison deltas
Export-friendly for further analysis

FAQ

Q: Why don't some PRs show AI usage?

A: PRs must meet qualification criteria—at least 30% of lines in supported languages and code blocks long enough to classify. Turn on "Show Unknown PRs" to see filtered PRs.

Q: What languages are supported?

A: Python, JavaScript, TypeScript, Go, Java, Kotlin, Go, C#. Don't see one of your languages supported? Reach out to your Span rep to see if it's on the roadmap.

Q: How reliable is AI detection?

A: Very reliable at ~95% accuracy.

Q: Does this work with all AI tools?

A: Yes! Detection is tool-agnostic—it identifies AI-generated code patterns regardless of which tool (Copilot, Cursor, ChatGPT, etc.) was used.

Q: Can I compare multiple teams?

A: Yes. Use the breakdown dimensions to drill down by team, and apply filters to compare specific groups.

Q: How do benchmarks work?

A: Span calculates industry percentiles (P50, P75, P90) from anonymized data across similar organizations. Your metrics are automatically compared to these benchmarks.

Q: Are contractors included?

A: Only qualifying developers are included in analysis.

Q: Can I export this data?

A: Data tables support standard interactions. Contact your Span CSM for custom export options.

Q: What if we don't use AI tools yet?

A: The report establishes a baseline. You can track adoption trends as your org starts using AI coding assistants.

Q: Does this track local/uncommitted AI usage?

A: No. Analysis only covers merged PRs—it doesn't track AI usage in development branches or local work not yet committed.

Limitations & Important Notes

Beta Status: Report is currently in Beta—features may evolve
Language Support: Only works with supported programming languages
Minimum Chunk Size: Code blocks must meet minimum length requirements
PR-Centric: Only analyzes merged PRs, not in-progress work
Qualification Filtering: 30% supported code threshold ensures data quality

Related Features

AI Tools Report - Track adoption of specific coding assistant tools
AI Work Classification - Categorizes work as Features, Maintenance, or Productivity
Standard Velocity Reports - Traditional engineering productivity metrics