Eager vs Lazy Execution¶

Understanding when to use eager vs lazy evaluation is key to using functional_list effectively. This guide explains the differences, trade-offs, and when to use each mode.

🔥 Eager Mode (`ListMapper`)¶

What is Eager Execution?¶

In eager mode, each operation executes immediately and creates a new materialized ListMapper containing all the results in memory.

from functional_list import ListMapper

# Each operation executes immediately
numbers = ListMapper[int](1, 2, 3, 4)      # Creates list [1, 2, 3, 4]
squared = numbers.map(lambda x: x * x)      # Immediately creates [1, 4, 9, 16]
filtered = squared.filter(lambda x: x > 5)  # Immediately creates [9, 16]

print(filtered)  # List[9, 16] - already computed

Characteristics of Eager Mode¶

✅ Pros: - Simple and predictable: What you see is what you get - Easy debugging: Inspect intermediate results at any step - Multiple iterations: Can iterate over results multiple times - Random access: Access any element by index instantly - Great for small/medium datasets: Fast and straightforward

❌ Cons: - Memory usage: Creates intermediate lists for each operation - Potentially slower: Computes everything even if you only need part of it - Not optimal for large data: Can be inefficient with millions of items

When to Use Eager Mode¶

Use eager mode when: - Working with small to medium-sized datasets (< 100K items) - You need to inspect intermediate results while debugging - You'll access results multiple times - You need random access to elements - Simplicity and readability are priorities

Example: Eager Mode¶

from functional_list import ListMapper

# Log processing with eager evaluation
logs = ListMapper.from_text("app.log")

# Each step creates a new list
errors = logs.filter(lambda line: "ERROR" in line)        # Creates list of errors
print(f"Found {len(errors)} errors")  # Can check length

parsed = errors.map(parse_log_line)                       # Creates list of parsed logs
print(f"First error: {parsed[0]}")     # Can access by index

by_type = parsed.group_by(lambda e: e["type"])            # Creates grouped list
print(f"Error types: {len(by_type)}")  # Can inspect groups

💤 Lazy Mode (`LazyListMapper`)¶

What is Lazy Execution?¶

In lazy mode, operations are recorded but not executed. Computation only happens when you explicitly materialize the results (e.g., collect(), to_list(), or iterate).

from functional_list import ListMapper

# Build a pipeline (nothing executes yet)
lazy = (
    ListMapper[int](1, 2, 3, 4)
    .lazy()                              # Switch to lazy mode
    .map(lambda x: x * x)                # Record: "square each element"
    .filter(lambda x: x > 5)             # Record: "keep > 5"
)

# Still nothing executed!
print(type(lazy))  # <class 'LazyListMapper'>

# Now trigger execution
result = lazy.to_list()  # Executes the entire pipeline
print(result)  # [9, 16]

Characteristics of Lazy Mode¶

✅ Pros: - Memory efficient: No intermediate lists created - Optimizable: Can potentially optimize the entire pipeline - Early termination: Only computes what's needed (e.g., with take(n)) - Great for large datasets: Stream through data without loading everything - Composable: Easy to build and share pipeline definitions

❌ Cons: - Single iteration: Generator-based sources can only be consumed once - No random access: Can't use indexing - Some operations force materialization: Sorting, counting, etc. - Debugging is harder: Can't inspect intermediate steps easily

When to Use Lazy Mode¶

Use lazy mode when: - Working with large datasets (> 100K items) - You only need a subset of results (e.g., take(10)) - Memory efficiency is critical - Processing streaming data - Building reusable pipeline templates - You want to minimize intermediate allocations

Example: Lazy Mode¶

from functional_list import ListMapper

# Process a large dataset lazily
lazy_pipeline = (
    ListMapper[int](*range(1_000_000))   # 1 million numbers
    .lazy()                               # Switch to lazy mode
    .map(lambda x: x * x)                 # Not executed yet
    .filter(lambda x: x > 100_000)        # Not executed yet
    .map(lambda x: x // 100)              # Not executed yet
)

# Only compute first 10 results (efficient!)
top_10 = lazy_pipeline.take(10)
print(top_10)  # Only processes what's needed

# Or materialize all results when ready
all_results = lazy_pipeline.collect()  # Returns a ListMapper

🔄 Switching Between Modes¶

Eager → Lazy¶

Convert an eager ListMapper to lazy with .lazy():

from functional_list import ListMapper

eager = ListMapper[int](1, 2, 3, 4)
lazy = eager.lazy()  # Now it's lazy

Lazy → Eager¶

Materialize a lazy pipeline with .collect() or .to_list():

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)

# Collect as ListMapper (eager)
eager = lazy.collect()
print(type(eager))  # <class 'ListMapper'>

# Or convert to regular Python list
python_list = lazy.to_list()
print(type(python_list))  # <class 'list'>

⚖️ Comparison Table¶

Feature	Eager (`ListMapper`)	Lazy (`LazyListMapper`)
Execution	Immediate	Deferred until materialization
Intermediate results	Stored in memory	Not created
Memory usage	Higher (all intermediates)	Lower (streaming)
Random access	✅ Yes (`lst[5]`)	❌ No
Multiple iterations	✅ Yes	⚠️ Limited (generators consumed once)
Debugging	✅ Easy (inspect steps)	❌ Harder (no intermediates)
Early termination	❌ Computes everything	✅ Yes (e.g., `take(10)`)
Best for	Small/medium data	Large datasets, streaming

🎯 Practical Examples¶

Example 1: When Eager Wins¶

Scenario: Analyzing a small log file, need to inspect at each step

from functional_list import ListMapper

# 1000 log lines - small enough for eager
logs = ListMapper.from_text("today.log")

# Eager: easy to debug and inspect
errors = logs.filter(lambda x: "ERROR" in x)
print(f"Total errors: {len(errors)}")  # ✅ Easy

parsed = errors.map(parse_log)
print(f"First error: {parsed[0]}")     # ✅ Random access

by_hour = parsed.group_by(lambda e: e["hour"])
for hour, items in by_hour:            # ✅ Multiple iterations
    print(f"Hour {hour}: {len(items)} errors")

Example 2: When Lazy Wins¶

Scenario: Processing millions of records, only need top 10

from functional_list import ListMapper

# 10 million records - too large for eager intermediates
lazy = (
    ListMapper.from_parquet("big_data.parquet")
    .lazy()
    .filter(lambda r: r["status"] == "active")     # Don't materialize
    .map(lambda r: calculate_score(r))             # Don't materialize
    .filter(lambda score: score > 0.8)             # Don't materialize
    .sort(reverse=True)                            # Forces materialization
)

# Only get top 10 - lazy saved us from creating huge intermediate lists
top_10 = lazy.take(10)

Example 3: Hybrid Approach¶

Scenario: Large initial data, but small result - use lazy then eager

from functional_list import ListMapper

# Start lazy for large filtering
filtered = (
    ListMapper[int](*range(10_000_000))
    .lazy()
    .filter(lambda x: x % 1000 == 0)  # Reduces to ~10K items
    .collect()  # Materialize to eager (result is small now)
)

# Now use eager for easy manipulation
result = (
    filtered
    .map(lambda x: x // 1000)
    .distinct()
    .sort()
)

print(f"First 10: {result.take(10)}")
print(f"Last 10: {result[-10:]}")  # ✅ Eager allows slicing

🔍 Operations That Force Materialization¶

Some operations require materializing the data, even in lazy mode:

Operation	Why Materialization Needed
`sort()`	Must see all elements to sort
`distinct()`	Must track seen elements
`count()`	Must count all elements
`reduce()`	Must process all elements
`group_by_key()`	Must collect all groups
Indexing `[i]`	Requires random access
`len()`	Needs total count

lazy = ListMapper[int](3, 1, 2).lazy()

# These force materialization:
sorted_lm = lazy.sort()        # Returns ListMapper (eager)
count = lazy.count()           # Processes all
unique = lazy.distinct()       # Returns ListMapper (eager)

💡 Best Practices¶

Start Lazy for Large Data

Begin with lazy mode when dealing with large datasets, then materialize when the data is reduced:

result = (
    large_dataset
    .lazy()
    .filter(...)      # Reduces data
    .filter(...)      # Reduces more
    .collect()        # Now small enough for eager
    .sort()           # Use eager operations
)

Use Eager for Debugging

When debugging, convert to eager to inspect:

# During development
intermediate = pipeline.lazy().filter(...).map(...).collect()
print(intermediate[:10])  # Inspect results

Watch Out for Multiple Iterations

Lazy generators can only be consumed once:

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
list1 = lazy.to_list()  # [2, 4, 6]
list2 = lazy.to_list()  # [] - generator exhausted!

# Solution: materialize first
eager = lazy.collect()
list1 = eager.to_list()  # [2, 4, 6]
list2 = eager.to_list()  # [2, 4, 6] - works!

🎓 Summary¶

Use Eager for small/medium data, debugging, and when you need multiple iterations or random access
Use Lazy for large datasets, streaming, and when memory efficiency matters
You can switch modes easily with .lazy() and .collect()
Some operations force materialization regardless of mode
Hybrid approaches work great: filter lazily, then switch to eager

Choose the mode that best fits your data size and access patterns!