AI Impact Report
Last updated: March 4, 2026
What Does This Report Show and Why Is It Important?
The AI Impact Report (also called the Impact Scorecard) helps you understand how AI coding assistants are impacting your engineering organization's productivity and velocity. Powered by Span's proprietary span-detect-1 AI detection engine, this report answers critical questions like:
How much of our code is AI-generated? Track AI adoption across teams, individuals, repositories, job level, location and much more
Is AI making us more productive? Compare velocity and efficiency between different AI-assisted workflows
Are review cycles changing? Understand how AI affects code review dynamics
Why It Matters: As organizations invest in AI coding tools (GitHub Copilot, Cursor, etc.), leaders need data-driven insights to measure ROI, optimize adoption strategies, and identify where AI is—or isn't—delivering value.
Supported Languages
This report is built on language specific models. Today it supports:
Python
Typescript
Javascript
Ruby
Java
C#
Go
Kotlin
Swift
How to Set Up & Access the Report
Prerequisites
VCS Integration Required: Active GitHub or GitLab connection
Data Availability: Your organization must have merged PRs with code in any of the supported languages
Navigation
Access the report at:
Insights > AI Transformation > AI ImpactInitial Setup
No manual configuration needed - The report automatically analyzes all merged PRs from connected repositories
Default filters apply (entire organization, all repositories, current date)
Your filter selections and preferences persist automatically in local storage
Qualified PRs Explained
Not all PRs are included in calculations. There are two qualification levels:
Level 1: AI Code Ratio Qualified
PR must have at least one code chunk in a supported language. Each PR's lines are grouped into contiguous chunks. We only analyze added and modified lines of code and only chunks that have at least 700 characters.
Level 2: AI Dosage Analysis Qualified
At least one chunk in a supported language AND
At least 30% of non-ignored lines in supported languages
Why? PRs with too little supported code produce unreliable AI detection results. The 30% threshold ensures data quality.
Tip: Turn on "Show Unknown PRs" to see what's being filtered out.
Learn more about qualification criteria here.
Where Does the Data Come From?
Data Sources
Pull Request Data - Retrieved from your VCS integration (GitHub/GitLab)
AI Detection - Processed by span-detect-1, Span's proprietary ML model that detects AI-generated code patterns
Metrics Aggregation - Calculated at organization, team, person, and repository levels
Benchmarking Data - Anonymized industry data from similar organizations
How AI Detection Works
Tool-Agnostic: Detects AI code regardless of source (GitHub Copilot, Cursor, ChatGPT, Claude, etc.)
Pattern-Based: Analyzes code characteristics, not tool metadata
Confidence Intervals: Provides lower and upper bounds to reflect detection uncertainty
Automatic: Runs when PRs are merged or reverted—no manual action required
Data Freshness
Analysis runs automatically on merged PRs every few hours
AI analysis is run on non-analyzed PRs that are merged as part of our asset processing, which happens after data syncs.
Configuration Options
1. Filters
Customize your analysis scope:
Date Range - Select the time period to analyze. The Data availability banner can guide you to the full analyzed PR period you can select
Repository - Filter to specific repos
Team/People - Drill down by organizational structure
Pull Request Has Tests - Include/exclude PRs with tests
AI Work Classification Trait - Filter by work type (New Features, Maintenance, Productivity)
2. Baseline vs. Comparison Selector
Define your own "AI dosage" groups to compare:
Baseline Options: No AI (0-5%), Low AI (5-25%), Medium AI (25-50%), High AI (50-100%), Unknown, or combinations
Comparison Options: Same buckets
Default: Compares No AI (0-5%) vs. High AI (50-100%)
Example: Compare teams using No AI vs. teams using any AI (High, Medium, Low) to see if AI-assisted development correlates with better outcomes.
3. View Controls
Metric Selection - Switch between AI Code Ratio, PR throughput, PR review cycles and Estimated detected defects (Early Access)
Breakdown Dimensions - View data by Group (teams), Person (individuals), Repository, Job title, Tenure, Location
Show Unknown PRs Toggle - Display/hide PRs where AI usage couldn't be determined (default: OFF)
4. Benchmarking Filters
Compare your data against:
Organization Benchmarks - Your org's baseline performance
Industry Benchmarks - Industry percentiles (P50 shown by default)
Segment by: IC Level (Junior/Mid/Senior), Job Title, Manager status, Team/Group
Key Metrics & How They're Calculated
1. AI Code Ratio (Primary Metric)
What it measures: Percentage of new/modified lines detected as AI-generated in supported programming languages
Formula:
AI Code Ratio = (Lower Bound + Upper Bound) / (2 × Total Supported Lines)Components:
Lower Bound: Conservative, high-confidence AI detections
Upper Bound: Optimistic estimate including likely AI
Total Supported Lines: Lines in languages Span can analyze (Python, JavaScript, TypeScript, Ruby, etc.)
Important: Ignored file patterns apply to analyzed lines. That applies to common generated file patterns as well (an example would be *.generated.js files for JavaScript) - the lines of code in those files will not be included in ratio calculations. You can change your file ignore patterns in Settings -> Metrics.
2. Velocity represented by PR throughput
What it measures: Merged PR weight per developer-week. Each developer-week is a 7 day period identified by its start date and a code contributor ID.
Comparison: Shows difference between baseline and comparison groups
Example: Developers using High AI (50-100%) complete 15% more "work-weight" per week than No AI developers
A note about weighted PRs:
Distance → a PR’s weight, between 0.1–2.0, normalized by global PR complexity and line count.
Time → duration from first commit to merge.
Interpretation: A higher value means teams are merging more “work-weight” per day.
Example: A 10% increase in weighted PRs / cycle time day suggests faster throughput for AI-assisted code.
3. Quality represented by PR review cycles
What it measures: the back-and-forth between author and reviewer:
1 Cycle = open → approve (no revisions)
Multiple Cycles = open → comment → commit → approve (each full loop adds 1)
Insight: Helps identify if AI code requires more or fewer review iterations. If High-AI PRs have higher review cycles, it may indicate increased reviewer pushback or uncertainty about AI-generated changes.
A corresponding rise in the rework stage of the lifecycle often supports this interpretation.
Understanding Report Features
Hotspots Section
Bright Spots: Segments (teams/people/repos) showing the most positive change with AI usage
Example: "Alice Johnson (+15%)" - Alice's velocity improved 15% with AI
Pressure Points: Segments showing negative changes
Example: "Backend Team (-8%)" - Backend team's review cycles increased 8%
Impact Metrics Display
Shows delta (change %) between baseline and comparison groups
Visual indicators for positive/negative trends
Dynamically updates based on your baseline/comparison selection
Data Tables
Sortable, filterable breakdowns by Group, Person, or Repo
Shows absolute values and comparison deltas
Export-friendly for further analysis
FAQ
Q: Why don't some PRs show AI usage?
A: PRs must meet qualification criteria—at least 30% of lines in supported languages and code blocks long enough to classify. Turn on "Show Unknown PRs" to see filtered PRs.
Q: What languages are supported?
A: Python, JavaScript, TypeScript, Go, Java, Kotlin, Go, C#. Don't see one of your languages supported? Reach out to your Span rep to see if it's on the roadmap.
Q: How reliable is AI detection?
A: Very reliable at ~95% accuracy.
Q: Does this work with all AI tools?
A: Yes! Detection is tool-agnostic—it identifies AI-generated code patterns regardless of which tool (Copilot, Cursor, ChatGPT, etc.) was used.
Q: Can I compare multiple teams?
A: Yes. Use the breakdown dimensions to drill down by team, and apply filters to compare specific groups.
Q: How do benchmarks work?
A: Span calculates industry percentiles (P50, P75, P90) from anonymized data across similar organizations. Your metrics are automatically compared to these benchmarks.
Q: Are contractors included?
A: Only qualifying developers are included in analysis.
Q: Can I export this data?
A: Data tables support standard interactions. Contact your Span CSM for custom export options.
Q: What if we don't use AI tools yet?
A: The report establishes a baseline. You can track adoption trends as your org starts using AI coding assistants.
Q: Does this track local/uncommitted AI usage?
A: No. Analysis only covers merged PRs—it doesn't track AI usage in development branches or local work not yet committed.
Limitations & Important Notes
Beta Status: Report is currently in Beta—features may evolve
Language Support: Only works with supported programming languages
Minimum Chunk Size: Code blocks must meet minimum length requirements
PR-Centric: Only analyzes merged PRs, not in-progress work
Qualification Filtering: 30% supported code threshold ensures data quality
Related Features
AI Tools Report - Track adoption of specific coding assistant tools
AI Work Classification - Categorizes work as Features, Maintenance, or Productivity
Standard Velocity Reports - Traditional engineering productivity metrics