functional_list Documentation¶

Welcome to the complete documentation for functional_list - a powerful Python library that brings functional programming paradigms and Apache Spark RDD-style transformations to Python lists.

🎯 What is functional_list?¶

functional_list provides both eager (ListMapper) and lazy (LazyListMapper) execution modes for writing elegant, chainable data transformations in Python. Think of it as bringing the best parts of Spark RDD operations to your local Python environment.

Key Features at a Glance¶

🔗 Rich functional API: map, filter, reduce, flat_map, reduce_by_key, group_by, and more
⚡ Multiple execution backends: Serial, Local (threads/processes), Async, Ray, Dask
💤 Lazy evaluation: Build efficient transformation pipelines
📁 Built-in I/O: CSV, JSON, JSONL, Parquet, and text file support
🚀 Performance: Optional Cython-accelerated operations
🐍 Type-safe: Full type hints for IDE support
📦 Modular: Zero required dependencies, install only what you need

🚀 Quick Start¶

Installation¶

# Basic installation
pip install functional-list

# With all optional features (Ray, Dask, Parquet support)
pip install functional-list[all]

# Or use uv
uv add functional-list

Your First Pipeline¶

from functional_list import ListMapper

# Create a ListMapper
numbers = ListMapper[int](1, 2, 3, 4, 5)

# Chain transformations
result = (
    numbers
    .map(lambda x: x * x)           # Square each number
    .filter(lambda x: x % 2 == 0)   # Keep only even results
    .reduce(lambda x, y: x + y)     # Sum them up
)

print(result)  # Output: 20

Classic Word Count Example¶

The famous MapReduce word count, made simple:

from functional_list import ListMapper

# Input documents
lines = ListMapper[str](
    "python is good",
    "python is better than x",
    "python is the best",
)

# Word count pipeline
word_counts = (
    lines
    .flat_map(lambda s: s.split())      # Split into words
    .map(lambda w: (w, 1))              # Create (word, 1) pairs
    .reduce_by_key(lambda x, y: x + y)  # Sum counts by key
)

print(word_counts)
# Output: [('than', 1), ('the', 1), ('best', 1), ('better', 1),
#          ('good', 1), ('is', 3), ('python', 3), ('x', 1)]

📚 Core Concepts¶

Eager vs Lazy Execution¶

Eager mode (ListMapper) materializes results immediately:

result = ListMapper[int](1, 2, 3).map(lambda x: x * 2)
print(result)  # List[2, 4, 6] - already computed

Lazy mode (LazyListMapper) defers computation:

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
# Nothing computed yet!

result = lazy.collect()  # Now it executes
print(result)  # List[2, 4, 6]

Execution Backends¶

Choose the right backend for your workload:

Backend	Best For	Example Use Case
Serial	Small data, debugging	Quick scripts, development
Local (threads)	I/O-bound tasks	HTTP requests, file operations
Local (processes)	CPU-bound tasks	Heavy computation, data processing
Async	Async I/O	Concurrent API calls
Ray	Distributed computing	Large-scale data processing
Dask	Distributed computing	Parallel workflows, big data

from functional_list import ListMapper, LocalBackend

# Use threading for I/O-bound work
result = data.map(
    fetch_url,
    backend=LocalBackend(mode="threads", workers=10)
)

File I/O¶

Read data directly from various formats:

from functional_list import ListMapper

# CSV files
users = ListMapper.from_csv("users.csv")

# JSON files
data = ListMapper.from_json("data.json")

# Parquet files (requires pyarrow)
records = ListMapper.from_parquet("data.parquet", columns=["id", "name"])

# Process the data
adults = users.filter(lambda u: u["age"] >= 18)

This documentation is organized to help you learn progressively:

Quickstart - Get up and running in 5 minutes
Concepts - Understand the core ideas
Backends - Execution backend concepts
Eager vs Lazy - Choosing the right mode
Guides - Detailed how-to guides
Backends Guide - Using different backends
Lazy Pipelines - Working with lazy evaluation
I/O Operations - Reading and writing files
API Reference - Complete API documentation
ListMapper - Eager mode API
LazyListMapper - Lazy mode API
Backends - Backend API reference
Benchmarks - Performance comparisons

💡 Common Use Cases¶

Data Processing Pipeline¶

from functional_list import ListMapper

# ETL pipeline
processed = (
    ListMapper.from_csv("raw_data.csv")
    .filter(lambda row: row["status"] == "active")
    .map(lambda row: transform(row))
    .distinct()
)
processed.to_json("processed.json")

Parallel Web Scraping¶

from functional_list import ListMapper, LocalBackend

urls = ListMapper[str](*url_list)
pages = urls.map(
    fetch_page,
    backend=LocalBackend(mode="threads", workers=20)
)

Log Analysis¶

errors = (
    ListMapper.from_text("app.log")
    .filter(lambda line: "ERROR" in line)
    .map(parse_log_line)
    .group_by(lambda e: e["error_type"])
)

🆘 Getting Help¶

Found a bug? Report it on GitLab
Have a question? Check the guides or API reference
Want to contribute? See the GitLab repository

🔗 Quick Links¶

Ready to get started? Head over to the Quickstart guide!