Weighted PRs/Week
Last updated: March 16, 2026
What It Measures
Weighted PRs / week measures the number of pull requests scaled by size and complexity per week per active contributor. Unlike simple PR counting, this metric weights each pull request to account for the fact that different PRs represent different amounts of work.
The metric helps you answer: "How much code delivery work is our team completing, scaled by effort and complexity?"
How It's Calculated
The Formula
The metric uses this calculation:
Total Weighted PRs / Active Coding Days × 7 = Weighted PRs per WeekThis normalizes work output to a weekly rate, accounting for developers with different schedules.
PR Weighting
Each merged PR receives a weight calculated by combining complexity and size scores:
Complexity Score (0–1): An AI-derived measure of the logical complexity of code changes
Size Score (0–1): A measure of code volume, calculated logarithmically from lines of code changed, capped at 1.0
These are summed together to produce a PR weight (typically 0 to 2, where 1 represents an average PR). The size scoring uses logarithmic scaling, meaning that a PR with approximately 32 lines of code receives a score of 0.5 (representing the geometric mean of PR sizes), while PRs with 1,000+ lines reach the maximum size score of 1.0.
Why this approach? Raw PR counts incentivize quantity over quality. Weighting by complexity and size reflects actual work effort, so completing one complex feature counts as more productive than merging many trivial fixes.
Active Contributor Filtering
The metric only counts PRs from developers meeting specific criteria:
Currently employed in your organization
Have had recent code contribution activity
Not marked as out of office on the day the PR was merged
Not excluded from your organization's contributor lists
This filtering ensures the metric reflects actual development capacity without distortion from leave or inactive periods.
Weekly Normalization
Total weighted PRs is divided by the count of active coding days, then multiplied by 7. This ensures fair comparison between developers with different schedules. For example, a developer with only 2 active working days who completes 3 weighted PRs would show as 10.5 weighted PRs/week, reflecting their weekly capacity if working all 5 days.
What This Metric Means
Interpreting Your Results
Weighted PRs/Week | Interpretation |
0–2 | Lower throughput; may reflect focus on complex work, onboarding, or process constraints |
2–5 | Healthy throughput; typical for balanced workloads with mixed work types |
5–8 | Higher throughput; strong delivery momentum and capacity |
8+ | Very high throughput; exceptional sustained velocity |
Context Matters
Different project phases and work types naturally produce different throughput:
New feature development: Moderate throughput expected while building systems
Maintenance phase: Lower throughput acceptable with emphasis on stability
Bug fixes and optimization: Higher throughput common with smaller, focused scope
Architecture or refactoring: Temporarily lower throughput due to complexity of changes
What It Doesn't Directly Measure
Code quality: High throughput doesn't guarantee low defect rates
Review speed: Doesn't measure code review turnaround time
Testing coverage: No visibility into test rigor or coverage levels
Team collaboration: Doesn't account for collaboration patterns or dependencies
Business value: More PRs doesn't automatically mean more customer value
Why Weight PRs?
Problem with Raw PR Counts
Counting unweighted PRs creates misaligned incentives:
20 one-line fixes appears more productive than 1 complex 2,000-line refactor
Teams optimize for PR volume rather than meaningful progress
Different work types appear artificially equivalent
Solution: Weighting by Complexity and Size
By scaling PRs based on both complexity and size:
Reflects actual effort: Substantial work is recognized as such
Accounts for difficulty: Complex changes receive appropriate weight
Reduces perverse incentives: Encourages meaningful work over volume
Enables fair comparison: Different work types measured on a common scale
Best Practices
✅ Do
Track trends over time: Use this metric to identify acceleration, plateaus, or declines in delivery velocity
Combine with other metrics: Pair with cycle time and quality metrics for complete insights
Account for context: Recognize that different projects naturally produce different throughput
Use rolling averages: Smooth out week-to-week variation with multi-week moving averages
Segment meaningfully: Analyze by project or team rather than organization-wide totals alone
❌ Don't
Use for performance reviews: This is one signal among many; never evaluate solely on this metric
Set hard velocity targets: Metrics provide insight, not mandates; targets can create unintended consequences
Ignore context: Investigate metric changes before drawing conclusions
Compare across different teams: Team composition and project types differ significantly
Assume causation: Correlation with tool or process changes requires investigation
Real-World Examples
Example 1: Strong Consistent Delivery
Developer: Alex (5 active working days)
PRs merged: 8 total
Total weighted work: 6.4 units
Result: (6.4 ÷ 5) × 7 ≈ 9.0 weighted PRs/week
Interpretation: Strong, consistent delivery of mixed-complexity work.
Example 2: High-Complexity Focus
Developer: Jordan (4 active working days)
PRs merged: 2 total
Total weighted work: 3.8 units
Result: (3.8 ÷ 4) × 7 ≈ 6.7 weighted PRs/week
Interpretation: Fewer PRs but each represented substantial, complex work.
Example 3: Schedule Normalization
Developer: Casey (2 active working days; 3 days out of office)
PRs merged: 5 total
Total weighted work: 4.2 units
Result: (4.2 ÷ 2) × 7 ≈ 14.7 weighted PRs/week
Interpretation: The metric normalizes for available time, showing extrapolated weekly capacity.
Troubleshooting & Investigation
Low Weighted PRs/Week
Possible contributing factors:
Recently joined organization
Focused on complex, multi-step work
Scheduled leave or out-of-office time
Code review bottlenecks or dependencies
Working on non-merged branches
How to investigate:
Compare against raw PR count (should correlate)
Review average PR scope in the period
Check active working days vs. calendar days
Examine specific PRs for patterns in scope or complexity
Unexpected Spikes
Possible causes:
Increase in smaller-scope PRs or bug fixes
Return from leave (catch-up period)
Sprint or release boundary timing
Major feature completion
How to interpret:
Verify against project timelines and planned work
Consider whether this represents a new normal or a peak
Watch quality metrics if spikes are sustained
Missing Data
Common reasons:
VCS integration not connected
No PRs merged in the time period
Contributor not classified as developer type
Time period falls during leave
Remediation:
Verify VCS integration is connected
Check contributor classification
Adjust time range to include merged work
Confirm leave is correctly recorded
Related Metrics
Weighted PRs / week is most valuable alongside:
PR Cycle Time: Speed from creation to merge
Code Review Response Time: Review process health
Defect Ratio: Quality indicator
Total Merged PRs: Raw PR count for comparison
Active Coding Days: Baseline activity level
Together, these provide a complete picture of delivery velocity, process efficiency, and code quality.
Comparison Groups: Three Benchmarking Modes
Span supports three distinct comparison groups for percentile benchmarking:
1. Organization Benchmarks (Default)
Your metrics are compared against all contributors in your entire organization. This is the default mode and it looks at the whole company unless you change it.
2. Industry Benchmarks
Your metrics are compared against aggregate data from Span's entire customer base. This feature must be explicitly enabled by administrators. When you select industry benchmarks, the system automatically clears any person or team filters to ensure you're comparing against the full industry average.
3. Dimension-Based Benchmarks
You can select a specific dimension (such as a team, role, or organizational unit) as your comparison group. When you select dimension-based benchmarks, you're comparing only against members of that specific dimension, not the whole company.
Important: Percentiles are NOT automatically segmented by team or role. You explicitly choose which comparison group to use—whole company, industry, or a specific dimension.
Interpreting Your Percentile Rank
Since weighted PRs/week is designated as a positive metric (higher is better):
Percentile Rank | Interpretation |
0–25th | Below average; lower output than most peers |
25th–50th | Somewhat below average |
50th (median) | Average; at the exact center of your comparison group |
50th–75th | Somewhat above average |
75th–90th | Well above average; top performer |
90th+ | Among the highest performers in your comparison group |
Example: If you're at the 90th percentile in organization mode, you produce more weighted PRs than approximately 90% of everyone in your company. If you switch to a team-based benchmark, you'd be comparing only against that team's members.
Key Points to Remember
Default comparison = whole company: Unless you explicitly select a dimension or industry benchmark, you're being compared against everyone in your organization
Mutually exclusive modes: Selecting one benchmarking mode automatically clears filters from the others to prevent confusion
Only developers included: Source not found: developer filtering The metric only counts merged PRs from active developers (excluding out-of-office days)
Percentiles shift over time: As your organization changes, percentile thresholds recalculate—you can't directly compare percentile rankings across different time periods
So to directly answer your question: Yes, by default it's looking at the whole company, but you have the flexibility to change that to a specific team, role, or other dimension if you want more targeted comparisons.