Adding win percentage to DataFrame

Adding win percentage to a pandas DataFrame is a common task in sports analytics, gaming leaderboards, A/B testing reports, ML model evaluation, and business KPIs — it transforms raw wins and games-played columns into a meaningful percentage metric for each row (team, player, model, campaign, etc.). The calculation is simple — wins / games_played — but real-world use involves handling division by zero, formatting for readability, rounding, sorting, and doing it efficiently on large DataFrames. In 2026, vectorized operations make this fast even on millions of rows — no slow loops needed — and Polars offers even better performance for huge datasets.

Here’s a complete, practical guide to adding win percentage to a DataFrame: basic calculation, safe handling of edge cases, formatting, sorting/visualization, real-world patterns, and modern best practices with Polars comparison and scalability.

The core step is vectorized division — pandas handles entire columns at once, producing a new Series of percentages.


import pandas as pd

# Example DataFrame
df = pd.DataFrame({
    'Team': ['A', 'B', 'C', 'D'],
    'Wins': [20, 15, 10, 0],
    'Games': [30, 25, 20, 10]
})

# Basic win percentage (as decimal)
df['Win Percentage'] = df['Wins'] / df['Games']

print(df)
#   Team  Wins  Games  Win Percentage
# 0    A    20     30        0.666667
# 1    B    15     25        0.600000
# 2    C    10     20        0.500000
# 3    D     0     10        0.000000

Handle division by zero safely — replace with 0, NaN, or a custom value — and format as percentage strings for display.


# Safe calculation: avoid ZeroDivisionError
df['Win Percentage'] = df['Wins'] / df['Games'].replace(0, np.nan)
df['Win Percentage'] = df['Win Percentage'].fillna(0)  # or keep NaN

# Format as percentage (2 decimal places)
df['Win %'] = df['Win Percentage'].map('{:.2%}'.format)

print(df)
#   Team  Wins  Games  Win Percentage   Win %
# 0    A    20     30        0.666667  66.67%
# 1    B    15     25        0.600000  60.00%
# 2    C    10     20        0.500000  50.00%
# 3    D     0     10        0.000000   0.00%

Real-world pattern: sports team stats or model performance — add win %, sort by it, and handle ties/no games gracefully.


# Full example with ties and sorting
df = pd.DataFrame({
    'Team': ['A', 'B', 'C', 'D'],
    'Wins': [20, 15, 10, 0],
    'Losses': [8, 9, 9, 10],
    'Ties': [2, 1, 1, 0]
})

df['Games'] = df['Wins'] + df['Losses'] + df['Ties']

# Win % with ties as 0.5 wins (common in some leagues)
df['Win Percentage'] = (df['Wins'] + 0.5 * df['Ties']) / df['Games']
df['Win %'] = df['Win Percentage'].map('{:.2%}'.format)

# Sort by win percentage descending
df = df.sort_values('Win Percentage', ascending=False).reset_index(drop=True)

print(df)
#   Team  Wins  Losses  Ties  Games  Win Percentage   Win %
# 0    A    20       8     2     30        0.700000  70.00%
# 1    B    15       9     1     25        0.620000  62.00%
# 2    C    10       9     1     20        0.525000  52.50%
# 3    D     0      10     0     10        0.000000   0.00%

Best practices make win percentage columns robust, readable, and scalable. Always protect against zero games — use .replace(0, np.nan) + fillna(0) or conditional logic. Store as decimal (float) for calculations, format as string ('{:.2%}'.format) only for display — avoids precision loss. Use vectorized operations — never iterrows() for calculations; df['Wins'] / df['Games'] is fast. Modern tip: switch to Polars for very large DataFrames — df.with_columns((pl.col("Wins") / pl.col("Games")).alias("Win Percentage")) is 10–100× faster than pandas. Add type hints — pd.DataFrame with column types — improves static analysis. In production, log zero-game cases — they often indicate data issues. Combine with numpy.clip — df['Win Percentage'] = np.clip(df['Win Percentage'], 0, 1) — prevents invalid values. Round consistently — .round(4) before formatting if needed for comparisons. Visualize — use df.sort_values('Win Percentage').plot.barh(x='Team', y='Win %') for leaderboards.

Adding win percentage to a DataFrame turns raw wins/games into a powerful, comparable metric — safe, vectorized, and formatted for insight. In 2026, protect against zero games, use Polars for scale, format for display only, and add type hints for safety. Master this pattern, and you’ll analyze success across teams, models, campaigns, and more with accuracy and efficiency.

Next time you have wins and games columns — calculate win percentage vectorized and safe. It’s pandas’ cleanest way to turn counts into a meaningful success rate.

Generating content...