Lists in Python for Data Science – Complete Guide 2026
Lists are one of the most fundamental and frequently used data structures in Python data science. They are flexible, dynamic, and perfect for storing sequences of data such as feature names, model predictions, row records, or intermediate results.
TL;DR — Key Features of Lists
- Ordered, mutable, and allows duplicates
- Can hold mixed data types
- Dynamic size (can grow and shrink)
- Very fast for append operations
1. Creating and Basic Operations
# Creating lists
features = ["amount", "quantity", "profit", "region", "category"]
scores = [85, 92, 78, 95, 88]
# Common operations
features.append("log_amount") # Add item
features.insert(0, "customer_id") # Insert at position
features.remove("category") # Remove by value
popped = features.pop() # Remove and return last item
print(features)
print(len(features)) # Length
2. Real-World Data Science Usage
import pandas as pd
df = pd.read_csv("sales_data.csv")
# Example 1: Dynamic list of column names
numeric_cols = [col for col in df.columns if df[col].dtype in ["int64", "float64"]]
# Example 2: Building a list of processed records
processed_records = []
for row in df.itertuples():
record = {
"customer_id": row.customer_id,
"amount": row.amount,
"profit": round(row.amount * 0.25, 2)
}
processed_records.append(record)
result_df = pd.DataFrame(processed_records)
# Example 3: List of feature names for model training
feature_list = ["amount", "quantity", "profit", "region", "log_amount", "is_weekend"]
3. List Comprehensions vs Traditional Loops
# List comprehension (preferred for simple cases)
squared = [x ** 2 for x in scores if x > 80]
# Traditional loop (better for complex logic)
high_value = []
for x in scores:
if x > 80:
high_value.append(x * 1.1)
4. Best Practices for Lists in Data Science 2026
- Use list comprehensions for simple transformations and filtering
- Use generator expressions when working with large data (memory efficient)
- Avoid using lists for very large numeric data — prefer NumPy arrays instead
- Use
.append()in loops when building lists dynamically - Consider
collections.dequefor frequent appends/pops from both ends
Conclusion
Lists remain one of the most versatile data structures in Python for data science. In 2026, use them confidently for storing feature names, building result sets, creating dynamic column lists, and intermediate processing steps. Combine lists with list comprehensions and generators to write clean, efficient, and Pythonic data science code.
Next steps:
- Review how you currently use lists in your projects and optimize them with comprehensions or generators where appropriate