Weighted PRs/Week

Last updated: March 16, 2026

What It Measures

Weighted PRs / week measures the number of pull requests scaled by size and complexity per week per active contributor. Unlike simple PR counting, this metric weights each pull request to account for the fact that different PRs represent different amounts of work.

The metric helps you answer: "How much code delivery work is our team completing, scaled by effort and complexity?"

How It's Calculated

The Formula

The metric uses this calculation: 

Total Weighted PRs / Active Coding Days × 7 = Weighted PRs per Week

This normalizes work output to a weekly rate, accounting for developers with different schedules.

PR Weighting

Each merged PR receives a weight calculated by combining complexity and size scores: 

  • Complexity Score (0–1): An AI-derived measure of the logical complexity of code changes

  • Size Score (0–1): A measure of code volume, calculated logarithmically from lines of code changed, capped at 1.0

These are summed together to produce a PR weight (typically 0 to 2, where 1 represents an average PR). The size scoring uses logarithmic scaling, meaning that a PR with approximately 32 lines of code receives a score of 0.5 (representing the geometric mean of PR sizes), while PRs with 1,000+ lines reach the maximum size score of 1.0. 

Why this approach? Raw PR counts incentivize quantity over quality. Weighting by complexity and size reflects actual work effort, so completing one complex feature counts as more productive than merging many trivial fixes.

Active Contributor Filtering

The metric only counts PRs from developers meeting specific criteria: 

  • Currently employed in your organization

  • Have had recent code contribution activity

  • Not marked as out of office on the day the PR was merged

  • Not excluded from your organization's contributor lists

This filtering ensures the metric reflects actual development capacity without distortion from leave or inactive periods.

Weekly Normalization

Total weighted PRs is divided by the count of active coding days, then multiplied by 7.  This ensures fair comparison between developers with different schedules. For example, a developer with only 2 active working days who completes 3 weighted PRs would show as 10.5 weighted PRs/week, reflecting their weekly capacity if working all 5 days.

What This Metric Means

Interpreting Your Results

Weighted PRs/Week

Interpretation

0–2

Lower throughput; may reflect focus on complex work, onboarding, or process constraints

2–5

Healthy throughput; typical for balanced workloads with mixed work types

5–8

Higher throughput; strong delivery momentum and capacity

8+

Very high throughput; exceptional sustained velocity

Context Matters

Different project phases and work types naturally produce different throughput:

  • New feature development: Moderate throughput expected while building systems

  • Maintenance phase: Lower throughput acceptable with emphasis on stability

  • Bug fixes and optimization: Higher throughput common with smaller, focused scope

  • Architecture or refactoring: Temporarily lower throughput due to complexity of changes

What It Doesn't Directly Measure

  • Code quality: High throughput doesn't guarantee low defect rates

  • Review speed: Doesn't measure code review turnaround time

  • Testing coverage: No visibility into test rigor or coverage levels

  • Team collaboration: Doesn't account for collaboration patterns or dependencies

  • Business value: More PRs doesn't automatically mean more customer value

Why Weight PRs?

Problem with Raw PR Counts

Counting unweighted PRs creates misaligned incentives:

  • 20 one-line fixes appears more productive than 1 complex 2,000-line refactor

  • Teams optimize for PR volume rather than meaningful progress

  • Different work types appear artificially equivalent

Solution: Weighting by Complexity and Size

By scaling PRs based on both complexity and size:

  • Reflects actual effort: Substantial work is recognized as such

  • Accounts for difficulty: Complex changes receive appropriate weight

  • Reduces perverse incentives: Encourages meaningful work over volume

  • Enables fair comparison: Different work types measured on a common scale

Best Practices

Do

  • Track trends over time: Use this metric to identify acceleration, plateaus, or declines in delivery velocity

  • Combine with other metrics: Pair with cycle time and quality metrics for complete insights

  • Account for context: Recognize that different projects naturally produce different throughput

  • Use rolling averages: Smooth out week-to-week variation with multi-week moving averages

  • Segment meaningfully: Analyze by project or team rather than organization-wide totals alone

Don't

  • Use for performance reviews: This is one signal among many; never evaluate solely on this metric

  • Set hard velocity targets: Metrics provide insight, not mandates; targets can create unintended consequences

  • Ignore context: Investigate metric changes before drawing conclusions

  • Compare across different teams: Team composition and project types differ significantly

  • Assume causation: Correlation with tool or process changes requires investigation

Real-World Examples

Example 1: Strong Consistent Delivery

Developer: Alex (5 active working days)
PRs merged: 8 total
Total weighted work: 6.4 units

Result: (6.4 ÷ 5) × 7 ≈ 9.0 weighted PRs/week

Interpretation: Strong, consistent delivery of mixed-complexity work.

Example 2: High-Complexity Focus

Developer: Jordan (4 active working days)
PRs merged: 2 total  
Total weighted work: 3.8 units

Result: (3.8 ÷ 4) × 7 ≈ 6.7 weighted PRs/week

Interpretation: Fewer PRs but each represented substantial, complex work.

Example 3: Schedule Normalization

Developer: Casey (2 active working days; 3 days out of office)
PRs merged: 5 total
Total weighted work: 4.2 units

Result: (4.2 ÷ 2) × 7 ≈ 14.7 weighted PRs/week

Interpretation: The metric normalizes for available time, showing extrapolated weekly capacity.

Troubleshooting & Investigation

Low Weighted PRs/Week

Possible contributing factors:

  • Recently joined organization

  • Focused on complex, multi-step work

  • Scheduled leave or out-of-office time

  • Code review bottlenecks or dependencies

  • Working on non-merged branches

How to investigate:

  1. Compare against raw PR count (should correlate)

  2. Review average PR scope in the period

  3. Check active working days vs. calendar days

  4. Examine specific PRs for patterns in scope or complexity

Unexpected Spikes

Possible causes:

  • Increase in smaller-scope PRs or bug fixes

  • Return from leave (catch-up period)

  • Sprint or release boundary timing

  • Major feature completion

How to interpret:

  • Verify against project timelines and planned work

  • Consider whether this represents a new normal or a peak

  • Watch quality metrics if spikes are sustained

Missing Data

Common reasons:

  • VCS integration not connected

  • No PRs merged in the time period

  • Contributor not classified as developer type

  • Time period falls during leave

Remediation:

  • Verify VCS integration is connected

  • Check contributor classification

  • Adjust time range to include merged work

  • Confirm leave is correctly recorded

Related Metrics

Weighted PRs / week is most valuable alongside:

  • PR Cycle Time: Speed from creation to merge

  • Code Review Response Time: Review process health

  • Defect Ratio: Quality indicator

  • Total Merged PRs: Raw PR count for comparison

  • Active Coding Days: Baseline activity level

Together, these provide a complete picture of delivery velocity, process efficiency, and code quality.

Comparison Groups: Three Benchmarking Modes

Span supports three distinct comparison groups for percentile benchmarking:

1. Organization Benchmarks (Default)

Your metrics are compared against all contributors in your entire organization.   This is the default mode and it looks at the whole company unless you change it.

2. Industry Benchmarks

Your metrics are compared against aggregate data from Span's entire customer base. This feature must be explicitly enabled by administrators. When you select industry benchmarks, the system automatically clears any person or team filters to ensure you're comparing against the full industry average.

3. Dimension-Based Benchmarks

You can select a specific dimension (such as a team, role, or organizational unit) as your comparison group. When you select dimension-based benchmarks, you're comparing only against members of that specific dimension, not the whole company.

Important: Percentiles are NOT automatically segmented by team or role. You explicitly choose which comparison group to use—whole company, industry, or a specific dimension.

Interpreting Your Percentile Rank

Since weighted PRs/week is designated as a positive metric (higher is better): 

Percentile Rank

Interpretation

0–25th

Below average; lower output than most peers

25th–50th

Somewhat below average

50th (median)

Average; at the exact center of your comparison group

50th–75th

Somewhat above average

75th–90th

Well above average; top performer

90th+

Among the highest performers in your comparison group

Example: If you're at the 90th percentile in organization mode, you produce more weighted PRs than approximately 90% of everyone in your company. If you switch to a team-based benchmark, you'd be comparing only against that team's members.

Key Points to Remember

  • Default comparison = whole company: Unless you explicitly select a dimension or industry benchmark, you're being compared against everyone in your organization

  • Mutually exclusive modes: Selecting one benchmarking mode automatically clears filters from the others to prevent confusion

  • Only developers included: Source not found: developer filtering The metric only counts merged PRs from active developers (excluding out-of-office days)

  • Percentiles shift over time: As your organization changes, percentile thresholds recalculate—you can't directly compare percentile rankings across different time periods

So to directly answer your question: Yes, by default it's looking at the whole company, but you have the flexibility to change that to a specific team, role, or other dimension if you want more targeted comparisons.