Basic Ingredients of a Good Function in Python – Data Science Perspective 2026
Writing good functions is a fundamental skill for any data scientist. A well-crafted function should be clear, reusable, maintainable, and easy to test. In 2026, there are several essential "ingredients" that separate average functions from excellent ones.
TL;DR — The 7 Essential Ingredients of a Good Function
- Clear and descriptive name
- Single Responsibility (does one thing well)
- Type hints for parameters and return value
- Comprehensive docstring
- Sensible default values
- Proper error handling
- Clean and readable implementation
1. Complete Example of a Good Data Science Function
from typing import List, Dict, Any, Optional
import pandas as pd
def summarize_customer_behavior(
transactions: List[Dict[str, Any]],
min_transactions: int = 3,
include_inactive: bool = False
) -> Dict[str, Any]:
"""
Summarize customer behavior from transaction data.
This function calculates key customer metrics including total spend,
average order value, and activity status.
Args:
transactions: List of transaction dictionaries containing
'customer_id', 'amount', 'date', etc.
min_transactions: Minimum number of transactions to consider
a customer active (default: 3).
include_inactive: Whether to include customers below the
min_transactions threshold.
Returns:
Dictionary containing:
- total_customers
- active_customers
- total_revenue
- avg_order_value
- customer_metrics (DataFrame)
Example:
>>> result = summarize_customer_behavior(transactions)
>>> print(result['avg_order_value'])
"""
df = pd.DataFrame(transactions)
# Basic cleaning
df = df.dropna(subset=["customer_id", "amount"])
# Calculate customer-level metrics
customer_metrics = df.groupby("customer_id").agg({
"amount": ["sum", "mean", "count"],
"date": ["min", "max"]
}).reset_index()
customer_metrics.columns = ["customer_id", "total_spend", "avg_order_value",
"transaction_count", "first_purchase", "last_purchase"]
# Filter active customers
if not include_inactive:
customer_metrics = customer_metrics[
customer_metrics["transaction_count"] >= min_transactions
]
return {
"total_customers": len(customer_metrics),
"active_customers": len(customer_metrics),
"total_revenue": customer_metrics["total_spend"].sum(),
"avg_order_value": customer_metrics["avg_order_value"].mean(),
"customer_metrics": customer_metrics
}
2. Best Practices for Function Ingredients in 2026
- Name: Use clear, verb-based names (e.g., `calculate_revenue`, `clean_customer_data`)
- Single Responsibility: Each function should do one logical task
- Type Hints: Always include type hints for parameters and return values
- Docstring: Write clear Google-style docstrings with Args, Returns, and Example sections
- Defaults: Provide sensible default values for optional parameters
- Error Handling: Handle common edge cases gracefully
- Readability: Keep functions short and focused (ideally under 30–40 lines)
Conclusion
A good function is more than just working code — it is clear, well-documented, typed, and follows the single responsibility principle. In 2026, data scientists who consistently apply these ingredients write code that is easier to maintain, test, and collaborate on. Strong function design is one of the biggest differentiators between junior and senior data scientists.
Next steps:
- Review your current data science functions and evaluate them against these seven ingredients
- Refactor at least one function to improve its name, docstring, type hints, and structure