Weighted PRs/Week

Last updated: March 16, 2026

What It Measures

Weighted PRs / week measures the number of pull requests scaled by size and complexity per week per active contributor. Unlike simple PR counting, this metric weights each pull request to account for the fact that different PRs represent different amounts of work.

The metric helps you answer: "How much code delivery work is our team completing, scaled by effort and complexity?"

How It's Calculated

The Formula

The metric uses this calculation:

Total Weighted PRs / Active Coding Days × 7 = Weighted PRs per Week

This normalizes work output to a weekly rate, accounting for developers with different schedules.

PR Weighting

Each merged PR receives a weight calculated by combining complexity and size scores:

Complexity Score (0–1): An AI-derived measure of the logical complexity of code changes
Size Score (0–1): A measure of code volume, calculated logarithmically from lines of code changed, capped at 1.0

These are summed together to produce a PR weight (typically 0 to 2, where 1 represents an average PR). The size scoring uses logarithmic scaling, meaning that a PR with approximately 32 lines of code receives a score of 0.5 (representing the geometric mean of PR sizes), while PRs with 1,000+ lines reach the maximum size score of 1.0.

Why this approach? Raw PR counts incentivize quantity over quality. Weighting by complexity and size reflects actual work effort, so completing one complex feature counts as more productive than merging many trivial fixes.

Active Contributor Filtering

The metric only counts PRs from developers meeting specific criteria:

Currently employed in your organization
Have had recent code contribution activity
Not marked as out of office on the day the PR was merged
Not excluded from your organization's contributor lists

This filtering ensures the metric reflects actual development capacity without distortion from leave or inactive periods.

Weekly Normalization

Total weighted PRs is divided by the count of active coding days, then multiplied by 7. This ensures fair comparison between developers with different schedules. For example, a developer with only 2 active working days who completes 3 weighted PRs would show as 10.5 weighted PRs/week, reflecting their weekly capacity if working all 5 days.

What This Metric Means

Interpreting Your Results

Weighted PRs/Week	Interpretation
0–2	Lower throughput; may reflect focus on complex work, onboarding, or process constraints
2–5	Healthy throughput; typical for balanced workloads with mixed work types
5–8	Higher throughput; strong delivery momentum and capacity
8+	Very high throughput; exceptional sustained velocity

Context Matters

Different project phases and work types naturally produce different throughput:

New feature development: Moderate throughput expected while building systems
Maintenance phase: Lower throughput acceptable with emphasis on stability
Bug fixes and optimization: Higher throughput common with smaller, focused scope
Architecture or refactoring: Temporarily lower throughput due to complexity of changes

What It Doesn't Directly Measure

Code quality: High throughput doesn't guarantee low defect rates
Review speed: Doesn't measure code review turnaround time
Testing coverage: No visibility into test rigor or coverage levels
Team collaboration: Doesn't account for collaboration patterns or dependencies
Business value: More PRs doesn't automatically mean more customer value

Why Weight PRs?

Problem with Raw PR Counts

Counting unweighted PRs creates misaligned incentives:

20 one-line fixes appears more productive than 1 complex 2,000-line refactor
Teams optimize for PR volume rather than meaningful progress
Different work types appear artificially equivalent

Solution: Weighting by Complexity and Size

By scaling PRs based on both complexity and size:

Reflects actual effort: Substantial work is recognized as such
Accounts for difficulty: Complex changes receive appropriate weight
Reduces perverse incentives: Encourages meaningful work over volume
Enables fair comparison: Different work types measured on a common scale

Best Practices

✅ Do

Track trends over time: Use this metric to identify acceleration, plateaus, or declines in delivery velocity
Combine with other metrics: Pair with cycle time and quality metrics for complete insights
Account for context: Recognize that different projects naturally produce different throughput
Use rolling averages: Smooth out week-to-week variation with multi-week moving averages
Segment meaningfully: Analyze by project or team rather than organization-wide totals alone

❌ Don't

Use for performance reviews: This is one signal among many; never evaluate solely on this metric
Set hard velocity targets: Metrics provide insight, not mandates; targets can create unintended consequences
Ignore context: Investigate metric changes before drawing conclusions
Compare across different teams: Team composition and project types differ significantly
Assume causation: Correlation with tool or process changes requires investigation

Real-World Examples

Example 1: Strong Consistent Delivery

Developer: Alex (5 active working days)
PRs merged: 8 total
Total weighted work: 6.4 units

Result: (6.4 ÷ 5) × 7 ≈ 9.0 weighted PRs/week

Interpretation: Strong, consistent delivery of mixed-complexity work.

Example 2: High-Complexity Focus

Developer: Jordan (4 active working days)
PRs merged: 2 total  
Total weighted work: 3.8 units

Result: (3.8 ÷ 4) × 7 ≈ 6.7 weighted PRs/week

Interpretation: Fewer PRs but each represented substantial, complex work.

Example 3: Schedule Normalization

Developer: Casey (2 active working days; 3 days out of office)
PRs merged: 5 total
Total weighted work: 4.2 units

Result: (4.2 ÷ 2) × 7 ≈ 14.7 weighted PRs/week

Interpretation: The metric normalizes for available time, showing extrapolated weekly capacity.

Troubleshooting & Investigation

Low Weighted PRs/Week

Possible contributing factors:

Recently joined organization
Focused on complex, multi-step work
Scheduled leave or out-of-office time
Code review bottlenecks or dependencies
Working on non-merged branches

How to investigate:

Compare against raw PR count (should correlate)
Review average PR scope in the period
Check active working days vs. calendar days
Examine specific PRs for patterns in scope or complexity

Unexpected Spikes

Possible causes:

Increase in smaller-scope PRs or bug fixes
Return from leave (catch-up period)
Sprint or release boundary timing
Major feature completion

How to interpret:

Verify against project timelines and planned work
Consider whether this represents a new normal or a peak
Watch quality metrics if spikes are sustained

Missing Data

Common reasons:

VCS integration not connected
No PRs merged in the time period
Contributor not classified as developer type
Time period falls during leave

Remediation:

Verify VCS integration is connected
Check contributor classification
Adjust time range to include merged work
Confirm leave is correctly recorded

Related Metrics

Weighted PRs / week is most valuable alongside:

PR Cycle Time: Speed from creation to merge
Code Review Response Time: Review process health
Defect Ratio: Quality indicator
Total Merged PRs: Raw PR count for comparison
Active Coding Days: Baseline activity level

Together, these provide a complete picture of delivery velocity, process efficiency, and code quality.

Comparison Groups: Three Benchmarking Modes

Span supports three distinct comparison groups for percentile benchmarking:

1. Organization Benchmarks (Default)

Your metrics are compared against all contributors in your entire organization. This is the default mode and it looks at the whole company unless you change it.

2. Industry Benchmarks

Your metrics are compared against aggregate data from Span's entire customer base. This feature must be explicitly enabled by administrators. When you select industry benchmarks, the system automatically clears any person or team filters to ensure you're comparing against the full industry average.

3. Dimension-Based Benchmarks

You can select a specific dimension (such as a team, role, or organizational unit) as your comparison group. When you select dimension-based benchmarks, you're comparing only against members of that specific dimension, not the whole company.

Important: Percentiles are NOT automatically segmented by team or role. You explicitly choose which comparison group to use—whole company, industry, or a specific dimension.

Interpreting Your Percentile Rank

Since weighted PRs/week is designated as a positive metric (higher is better):

Percentile Rank	Interpretation
0–25th	Below average; lower output than most peers
25th–50th	Somewhat below average
50th (median)	Average; at the exact center of your comparison group
50th–75th	Somewhat above average
75th–90th	Well above average; top performer
90th+	Among the highest performers in your comparison group

Example: If you're at the 90th percentile in organization mode, you produce more weighted PRs than approximately 90% of everyone in your company. If you switch to a team-based benchmark, you'd be comparing only against that team's members.

Key Points to Remember

Default comparison = whole company: Unless you explicitly select a dimension or industry benchmark, you're being compared against everyone in your organization
Mutually exclusive modes: Selecting one benchmarking mode automatically clears filters from the others to prevent confusion
Only developers included: Source not found: developer filtering The metric only counts merged PRs from active developers (excluding out-of-office days)
Percentiles shift over time: As your organization changes, percentile thresholds recalculate—you can't directly compare percentile rankings across different time periods

So to directly answer your question: Yes, by default it's looking at the whole company, but you have the flexibility to change that to a specific team, role, or other dimension if you want more targeted comparisons.