Skip to content

Eager vs Lazy Execution

Understanding when to use eager vs lazy evaluation is key to using functional_list effectively. This guide explains the differences, trade-offs, and when to use each mode.

🔥 Eager Mode (ListMapper)

What is Eager Execution?

In eager mode, each operation executes immediately and creates a new materialized ListMapper containing all the results in memory.

from functional_list import ListMapper

# Each operation executes immediately
numbers = ListMapper[int](1, 2, 3, 4)      # Creates list [1, 2, 3, 4]
squared = numbers.map(lambda x: x * x)      # Immediately creates [1, 4, 9, 16]
filtered = squared.filter(lambda x: x > 5)  # Immediately creates [9, 16]

print(filtered)  # List[9, 16] - already computed

Characteristics of Eager Mode

✅ Pros: - Simple and predictable: What you see is what you get - Easy debugging: Inspect intermediate results at any step - Multiple iterations: Can iterate over results multiple times - Random access: Access any element by index instantly - Great for small/medium datasets: Fast and straightforward

❌ Cons: - Memory usage: Creates intermediate lists for each operation - Potentially slower: Computes everything even if you only need part of it - Not optimal for large data: Can be inefficient with millions of items

When to Use Eager Mode

Use eager mode when: - Working with small to medium-sized datasets (< 100K items) - You need to inspect intermediate results while debugging - You'll access results multiple times - You need random access to elements - Simplicity and readability are priorities

Example: Eager Mode

from functional_list import ListMapper

# Log processing with eager evaluation
logs = ListMapper.from_text("app.log")

# Each step creates a new list
errors = logs.filter(lambda line: "ERROR" in line)        # Creates list of errors
print(f"Found {len(errors)} errors")  # Can check length

parsed = errors.map(parse_log_line)                       # Creates list of parsed logs
print(f"First error: {parsed[0]}")     # Can access by index

by_type = parsed.group_by(lambda e: e["type"])            # Creates grouped list
print(f"Error types: {len(by_type)}")  # Can inspect groups

💤 Lazy Mode (LazyListMapper)

What is Lazy Execution?

In lazy mode, operations are recorded but not executed. Computation only happens when you explicitly materialize the results (e.g., collect(), to_list(), or iterate).

from functional_list import ListMapper

# Build a pipeline (nothing executes yet)
lazy = (
    ListMapper[int](1, 2, 3, 4)
    .lazy()                              # Switch to lazy mode
    .map(lambda x: x * x)                # Record: "square each element"
    .filter(lambda x: x > 5)             # Record: "keep > 5"
)

# Still nothing executed!
print(type(lazy))  # <class 'LazyListMapper'>

# Now trigger execution
result = lazy.to_list()  # Executes the entire pipeline
print(result)  # [9, 16]

Characteristics of Lazy Mode

✅ Pros: - Memory efficient: No intermediate lists created - Optimizable: Can potentially optimize the entire pipeline - Early termination: Only computes what's needed (e.g., with take(n)) - Great for large datasets: Stream through data without loading everything - Composable: Easy to build and share pipeline definitions

❌ Cons: - Single iteration: Generator-based sources can only be consumed once - No random access: Can't use indexing - Some operations force materialization: Sorting, counting, etc. - Debugging is harder: Can't inspect intermediate steps easily

When to Use Lazy Mode

Use lazy mode when: - Working with large datasets (> 100K items) - You only need a subset of results (e.g., take(10)) - Memory efficiency is critical - Processing streaming data - Building reusable pipeline templates - You want to minimize intermediate allocations

Example: Lazy Mode

from functional_list import ListMapper

# Process a large dataset lazily
lazy_pipeline = (
    ListMapper[int](*range(1_000_000))   # 1 million numbers
    .lazy()                               # Switch to lazy mode
    .map(lambda x: x * x)                 # Not executed yet
    .filter(lambda x: x > 100_000)        # Not executed yet
    .map(lambda x: x // 100)              # Not executed yet
)

# Only compute first 10 results (efficient!)
top_10 = lazy_pipeline.take(10)
print(top_10)  # Only processes what's needed

# Or materialize all results when ready
all_results = lazy_pipeline.collect()  # Returns a ListMapper

🔄 Switching Between Modes

Eager → Lazy

Convert an eager ListMapper to lazy with .lazy():

from functional_list import ListMapper

eager = ListMapper[int](1, 2, 3, 4)
lazy = eager.lazy()  # Now it's lazy

Lazy → Eager

Materialize a lazy pipeline with .collect() or .to_list():

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)

# Collect as ListMapper (eager)
eager = lazy.collect()
print(type(eager))  # <class 'ListMapper'>

# Or convert to regular Python list
python_list = lazy.to_list()
print(type(python_list))  # <class 'list'>

⚖️ Comparison Table

Feature Eager (ListMapper) Lazy (LazyListMapper)
Execution Immediate Deferred until materialization
Intermediate results Stored in memory Not created
Memory usage Higher (all intermediates) Lower (streaming)
Random access ✅ Yes (lst[5]) ❌ No
Multiple iterations ✅ Yes ⚠️ Limited (generators consumed once)
Debugging ✅ Easy (inspect steps) ❌ Harder (no intermediates)
Early termination ❌ Computes everything ✅ Yes (e.g., take(10))
Best for Small/medium data Large datasets, streaming

🎯 Practical Examples

Example 1: When Eager Wins

Scenario: Analyzing a small log file, need to inspect at each step

from functional_list import ListMapper

# 1000 log lines - small enough for eager
logs = ListMapper.from_text("today.log")

# Eager: easy to debug and inspect
errors = logs.filter(lambda x: "ERROR" in x)
print(f"Total errors: {len(errors)}")  # ✅ Easy

parsed = errors.map(parse_log)
print(f"First error: {parsed[0]}")     # ✅ Random access

by_hour = parsed.group_by(lambda e: e["hour"])
for hour, items in by_hour:            # ✅ Multiple iterations
    print(f"Hour {hour}: {len(items)} errors")

Example 2: When Lazy Wins

Scenario: Processing millions of records, only need top 10

from functional_list import ListMapper

# 10 million records - too large for eager intermediates
lazy = (
    ListMapper.from_parquet("big_data.parquet")
    .lazy()
    .filter(lambda r: r["status"] == "active")     # Don't materialize
    .map(lambda r: calculate_score(r))             # Don't materialize
    .filter(lambda score: score > 0.8)             # Don't materialize
    .sort(reverse=True)                            # Forces materialization
)

# Only get top 10 - lazy saved us from creating huge intermediate lists
top_10 = lazy.take(10)

Example 3: Hybrid Approach

Scenario: Large initial data, but small result - use lazy then eager

from functional_list import ListMapper

# Start lazy for large filtering
filtered = (
    ListMapper[int](*range(10_000_000))
    .lazy()
    .filter(lambda x: x % 1000 == 0)  # Reduces to ~10K items
    .collect()  # Materialize to eager (result is small now)
)

# Now use eager for easy manipulation
result = (
    filtered
    .map(lambda x: x // 1000)
    .distinct()
    .sort()
)

print(f"First 10: {result.take(10)}")
print(f"Last 10: {result[-10:]}")  # ✅ Eager allows slicing

🔍 Operations That Force Materialization

Some operations require materializing the data, even in lazy mode:

Operation Why Materialization Needed
sort() Must see all elements to sort
distinct() Must track seen elements
count() Must count all elements
reduce() Must process all elements
group_by_key() Must collect all groups
Indexing [i] Requires random access
len() Needs total count
lazy = ListMapper[int](3, 1, 2).lazy()

# These force materialization:
sorted_lm = lazy.sort()        # Returns ListMapper (eager)
count = lazy.count()           # Processes all
unique = lazy.distinct()       # Returns ListMapper (eager)

💡 Best Practices

Start Lazy for Large Data

Begin with lazy mode when dealing with large datasets, then materialize when the data is reduced:

result = (
    large_dataset
    .lazy()
    .filter(...)      # Reduces data
    .filter(...)      # Reduces more
    .collect()        # Now small enough for eager
    .sort()           # Use eager operations
)

Use Eager for Debugging

When debugging, convert to eager to inspect:

# During development
intermediate = pipeline.lazy().filter(...).map(...).collect()
print(intermediate[:10])  # Inspect results

Watch Out for Multiple Iterations

Lazy generators can only be consumed once:

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
list1 = lazy.to_list()  # [2, 4, 6]
list2 = lazy.to_list()  # [] - generator exhausted!

# Solution: materialize first
eager = lazy.collect()
list1 = eager.to_list()  # [2, 4, 6]
list2 = eager.to_list()  # [2, 4, 6] - works!

🎓 Summary

  • Use Eager for small/medium data, debugging, and when you need multiple iterations or random access
  • Use Lazy for large datasets, streaming, and when memory efficiency matters
  • You can switch modes easily with .lazy() and .collect()
  • Some operations force materialization regardless of mode
  • Hybrid approaches work great: filter lazily, then switch to eager

Choose the mode that best fits your data size and access patterns!