Eager vs Lazy Execution¶
Understanding when to use eager vs lazy evaluation is key to using functional_list effectively. This guide explains the differences, trade-offs, and when to use each mode.
🔥 Eager Mode (ListMapper)¶
What is Eager Execution?¶
In eager mode, each operation executes immediately and creates a new materialized ListMapper containing all the results in memory.
from functional_list import ListMapper
# Each operation executes immediately
numbers = ListMapper[int](1, 2, 3, 4) # Creates list [1, 2, 3, 4]
squared = numbers.map(lambda x: x * x) # Immediately creates [1, 4, 9, 16]
filtered = squared.filter(lambda x: x > 5) # Immediately creates [9, 16]
print(filtered) # List[9, 16] - already computed
Characteristics of Eager Mode¶
✅ Pros: - Simple and predictable: What you see is what you get - Easy debugging: Inspect intermediate results at any step - Multiple iterations: Can iterate over results multiple times - Random access: Access any element by index instantly - Great for small/medium datasets: Fast and straightforward
❌ Cons: - Memory usage: Creates intermediate lists for each operation - Potentially slower: Computes everything even if you only need part of it - Not optimal for large data: Can be inefficient with millions of items
When to Use Eager Mode¶
Use eager mode when: - Working with small to medium-sized datasets (< 100K items) - You need to inspect intermediate results while debugging - You'll access results multiple times - You need random access to elements - Simplicity and readability are priorities
Example: Eager Mode¶
from functional_list import ListMapper
# Log processing with eager evaluation
logs = ListMapper.from_text("app.log")
# Each step creates a new list
errors = logs.filter(lambda line: "ERROR" in line) # Creates list of errors
print(f"Found {len(errors)} errors") # Can check length
parsed = errors.map(parse_log_line) # Creates list of parsed logs
print(f"First error: {parsed[0]}") # Can access by index
by_type = parsed.group_by(lambda e: e["type"]) # Creates grouped list
print(f"Error types: {len(by_type)}") # Can inspect groups
💤 Lazy Mode (LazyListMapper)¶
What is Lazy Execution?¶
In lazy mode, operations are recorded but not executed. Computation only happens when you explicitly materialize the results (e.g., collect(), to_list(), or iterate).
from functional_list import ListMapper
# Build a pipeline (nothing executes yet)
lazy = (
ListMapper[int](1, 2, 3, 4)
.lazy() # Switch to lazy mode
.map(lambda x: x * x) # Record: "square each element"
.filter(lambda x: x > 5) # Record: "keep > 5"
)
# Still nothing executed!
print(type(lazy)) # <class 'LazyListMapper'>
# Now trigger execution
result = lazy.to_list() # Executes the entire pipeline
print(result) # [9, 16]
Characteristics of Lazy Mode¶
✅ Pros:
- Memory efficient: No intermediate lists created
- Optimizable: Can potentially optimize the entire pipeline
- Early termination: Only computes what's needed (e.g., with take(n))
- Great for large datasets: Stream through data without loading everything
- Composable: Easy to build and share pipeline definitions
❌ Cons: - Single iteration: Generator-based sources can only be consumed once - No random access: Can't use indexing - Some operations force materialization: Sorting, counting, etc. - Debugging is harder: Can't inspect intermediate steps easily
When to Use Lazy Mode¶
Use lazy mode when:
- Working with large datasets (> 100K items)
- You only need a subset of results (e.g., take(10))
- Memory efficiency is critical
- Processing streaming data
- Building reusable pipeline templates
- You want to minimize intermediate allocations
Example: Lazy Mode¶
from functional_list import ListMapper
# Process a large dataset lazily
lazy_pipeline = (
ListMapper[int](*range(1_000_000)) # 1 million numbers
.lazy() # Switch to lazy mode
.map(lambda x: x * x) # Not executed yet
.filter(lambda x: x > 100_000) # Not executed yet
.map(lambda x: x // 100) # Not executed yet
)
# Only compute first 10 results (efficient!)
top_10 = lazy_pipeline.take(10)
print(top_10) # Only processes what's needed
# Or materialize all results when ready
all_results = lazy_pipeline.collect() # Returns a ListMapper
🔄 Switching Between Modes¶
Eager → Lazy¶
Convert an eager ListMapper to lazy with .lazy():
from functional_list import ListMapper
eager = ListMapper[int](1, 2, 3, 4)
lazy = eager.lazy() # Now it's lazy
Lazy → Eager¶
Materialize a lazy pipeline with .collect() or .to_list():
lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
# Collect as ListMapper (eager)
eager = lazy.collect()
print(type(eager)) # <class 'ListMapper'>
# Or convert to regular Python list
python_list = lazy.to_list()
print(type(python_list)) # <class 'list'>
⚖️ Comparison Table¶
| Feature | Eager (ListMapper) |
Lazy (LazyListMapper) |
|---|---|---|
| Execution | Immediate | Deferred until materialization |
| Intermediate results | Stored in memory | Not created |
| Memory usage | Higher (all intermediates) | Lower (streaming) |
| Random access | ✅ Yes (lst[5]) |
❌ No |
| Multiple iterations | ✅ Yes | ⚠️ Limited (generators consumed once) |
| Debugging | ✅ Easy (inspect steps) | ❌ Harder (no intermediates) |
| Early termination | ❌ Computes everything | ✅ Yes (e.g., take(10)) |
| Best for | Small/medium data | Large datasets, streaming |
🎯 Practical Examples¶
Example 1: When Eager Wins¶
Scenario: Analyzing a small log file, need to inspect at each step
from functional_list import ListMapper
# 1000 log lines - small enough for eager
logs = ListMapper.from_text("today.log")
# Eager: easy to debug and inspect
errors = logs.filter(lambda x: "ERROR" in x)
print(f"Total errors: {len(errors)}") # ✅ Easy
parsed = errors.map(parse_log)
print(f"First error: {parsed[0]}") # ✅ Random access
by_hour = parsed.group_by(lambda e: e["hour"])
for hour, items in by_hour: # ✅ Multiple iterations
print(f"Hour {hour}: {len(items)} errors")
Example 2: When Lazy Wins¶
Scenario: Processing millions of records, only need top 10
from functional_list import ListMapper
# 10 million records - too large for eager intermediates
lazy = (
ListMapper.from_parquet("big_data.parquet")
.lazy()
.filter(lambda r: r["status"] == "active") # Don't materialize
.map(lambda r: calculate_score(r)) # Don't materialize
.filter(lambda score: score > 0.8) # Don't materialize
.sort(reverse=True) # Forces materialization
)
# Only get top 10 - lazy saved us from creating huge intermediate lists
top_10 = lazy.take(10)
Example 3: Hybrid Approach¶
Scenario: Large initial data, but small result - use lazy then eager
from functional_list import ListMapper
# Start lazy for large filtering
filtered = (
ListMapper[int](*range(10_000_000))
.lazy()
.filter(lambda x: x % 1000 == 0) # Reduces to ~10K items
.collect() # Materialize to eager (result is small now)
)
# Now use eager for easy manipulation
result = (
filtered
.map(lambda x: x // 1000)
.distinct()
.sort()
)
print(f"First 10: {result.take(10)}")
print(f"Last 10: {result[-10:]}") # ✅ Eager allows slicing
🔍 Operations That Force Materialization¶
Some operations require materializing the data, even in lazy mode:
| Operation | Why Materialization Needed |
|---|---|
sort() |
Must see all elements to sort |
distinct() |
Must track seen elements |
count() |
Must count all elements |
reduce() |
Must process all elements |
group_by_key() |
Must collect all groups |
Indexing [i] |
Requires random access |
len() |
Needs total count |
lazy = ListMapper[int](3, 1, 2).lazy()
# These force materialization:
sorted_lm = lazy.sort() # Returns ListMapper (eager)
count = lazy.count() # Processes all
unique = lazy.distinct() # Returns ListMapper (eager)
💡 Best Practices¶
Start Lazy for Large Data
Begin with lazy mode when dealing with large datasets, then materialize when the data is reduced:
Use Eager for Debugging
When debugging, convert to eager to inspect:
Watch Out for Multiple Iterations
Lazy generators can only be consumed once:
🎓 Summary¶
- Use Eager for small/medium data, debugging, and when you need multiple iterations or random access
- Use Lazy for large datasets, streaming, and when memory efficiency matters
- You can switch modes easily with
.lazy()and.collect() - Some operations force materialization regardless of mode
- Hybrid approaches work great: filter lazily, then switch to eager
Choose the mode that best fits your data size and access patterns!