functional_list Documentation¶
Welcome to the complete documentation for functional_list - a powerful Python library that brings functional programming paradigms and Apache Spark RDD-style transformations to Python lists.
🎯 What is functional_list?¶
functional_list provides both eager (ListMapper) and lazy (LazyListMapper) execution modes for writing elegant, chainable data transformations in Python. Think of it as bringing the best parts of Spark RDD operations to your local Python environment.
Key Features at a Glance¶
- 🔗 Rich functional API:
map,filter,reduce,flat_map,reduce_by_key,group_by, and more - ⚡ Multiple execution backends: Serial, Local (threads/processes), Async, Ray, Dask
- 💤 Lazy evaluation: Build efficient transformation pipelines
- 📁 Built-in I/O: CSV, JSON, JSONL, Parquet, and text file support
- 🚀 Performance: Optional Cython-accelerated operations
- 🐍 Type-safe: Full type hints for IDE support
- 📦 Modular: Zero required dependencies, install only what you need
🚀 Quick Start¶
Installation¶
# Basic installation
pip install functional-list
# With all optional features (Ray, Dask, Parquet support)
pip install functional-list[all]
# Or use uv
uv add functional-list
Your First Pipeline¶
from functional_list import ListMapper
# Create a ListMapper
numbers = ListMapper[int](1, 2, 3, 4, 5)
# Chain transformations
result = (
numbers
.map(lambda x: x * x) # Square each number
.filter(lambda x: x % 2 == 0) # Keep only even results
.reduce(lambda x, y: x + y) # Sum them up
)
print(result) # Output: 20
Classic Word Count Example¶
The famous MapReduce word count, made simple:
from functional_list import ListMapper
# Input documents
lines = ListMapper[str](
"python is good",
"python is better than x",
"python is the best",
)
# Word count pipeline
word_counts = (
lines
.flat_map(lambda s: s.split()) # Split into words
.map(lambda w: (w, 1)) # Create (word, 1) pairs
.reduce_by_key(lambda x, y: x + y) # Sum counts by key
)
print(word_counts)
# Output: [('than', 1), ('the', 1), ('best', 1), ('better', 1),
# ('good', 1), ('is', 3), ('python', 3), ('x', 1)]
📚 Core Concepts¶
Eager vs Lazy Execution¶
Eager mode (ListMapper) materializes results immediately:
result = ListMapper[int](1, 2, 3).map(lambda x: x * 2)
print(result) # List[2, 4, 6] - already computed
Lazy mode (LazyListMapper) defers computation:
lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
# Nothing computed yet!
result = lazy.collect() # Now it executes
print(result) # List[2, 4, 6]
Execution Backends¶
Choose the right backend for your workload:
| Backend | Best For | Example Use Case |
|---|---|---|
| Serial | Small data, debugging | Quick scripts, development |
| Local (threads) | I/O-bound tasks | HTTP requests, file operations |
| Local (processes) | CPU-bound tasks | Heavy computation, data processing |
| Async | Async I/O | Concurrent API calls |
| Ray | Distributed computing | Large-scale data processing |
| Dask | Distributed computing | Parallel workflows, big data |
from functional_list import ListMapper, LocalBackend
# Use threading for I/O-bound work
result = data.map(
fetch_url,
backend=LocalBackend(mode="threads", workers=10)
)
File I/O¶
Read data directly from various formats:
from functional_list import ListMapper
# CSV files
users = ListMapper.from_csv("users.csv")
# JSON files
data = ListMapper.from_json("data.json")
# Parquet files (requires pyarrow)
records = ListMapper.from_parquet("data.parquet", columns=["id", "name"])
# Process the data
adults = users.filter(lambda u: u["age"] >= 18)
📖 Navigation Guide¶
This documentation is organized to help you learn progressively:
- Quickstart - Get up and running in 5 minutes
- Concepts - Understand the core ideas
- Backends - Execution backend concepts
- Eager vs Lazy - Choosing the right mode
- Guides - Detailed how-to guides
- Backends Guide - Using different backends
- Lazy Pipelines - Working with lazy evaluation
- I/O Operations - Reading and writing files
- API Reference - Complete API documentation
- ListMapper - Eager mode API
- LazyListMapper - Lazy mode API
- Backends - Backend API reference
- Benchmarks - Performance comparisons
💡 Common Use Cases¶
Data Processing Pipeline¶
from functional_list import ListMapper
# ETL pipeline
processed = (
ListMapper.from_csv("raw_data.csv")
.filter(lambda row: row["status"] == "active")
.map(lambda row: transform(row))
.distinct()
)
processed.to_json("processed.json")
Parallel Web Scraping¶
from functional_list import ListMapper, LocalBackend
urls = ListMapper[str](*url_list)
pages = urls.map(
fetch_page,
backend=LocalBackend(mode="threads", workers=20)
)
Log Analysis¶
errors = (
ListMapper.from_text("app.log")
.filter(lambda line: "ERROR" in line)
.map(parse_log_line)
.group_by(lambda e: e["error_type"])
)
🆘 Getting Help¶
- Found a bug? Report it on GitLab
- Have a question? Check the guides or API reference
- Want to contribute? See the GitLab repository
🔗 Quick Links¶
Ready to get started? Head over to the Quickstart guide!