Skip to content

functional_list Documentation

Welcome to the complete documentation for functional_list - a powerful Python library that brings functional programming paradigms and Apache Spark RDD-style transformations to Python lists.

🎯 What is functional_list?

functional_list provides both eager (ListMapper) and lazy (LazyListMapper) execution modes for writing elegant, chainable data transformations in Python. Think of it as bringing the best parts of Spark RDD operations to your local Python environment.

Key Features at a Glance

  • 🔗 Rich functional API: map, filter, reduce, flat_map, reduce_by_key, group_by, and more
  • ⚡ Multiple execution backends: Serial, Local (threads/processes), Async, Ray, Dask
  • 💤 Lazy evaluation: Build efficient transformation pipelines
  • 📁 Built-in I/O: CSV, JSON, JSONL, Parquet, and text file support
  • 🚀 Performance: Optional Cython-accelerated operations
  • 🐍 Type-safe: Full type hints for IDE support
  • 📦 Modular: Zero required dependencies, install only what you need

🚀 Quick Start

Installation

# Basic installation
pip install functional-list

# With all optional features (Ray, Dask, Parquet support)
pip install functional-list[all]

# Or use uv
uv add functional-list

Your First Pipeline

from functional_list import ListMapper

# Create a ListMapper
numbers = ListMapper[int](1, 2, 3, 4, 5)

# Chain transformations
result = (
    numbers
    .map(lambda x: x * x)           # Square each number
    .filter(lambda x: x % 2 == 0)   # Keep only even results
    .reduce(lambda x, y: x + y)     # Sum them up
)

print(result)  # Output: 20

Classic Word Count Example

The famous MapReduce word count, made simple:

from functional_list import ListMapper

# Input documents
lines = ListMapper[str](
    "python is good",
    "python is better than x",
    "python is the best",
)

# Word count pipeline
word_counts = (
    lines
    .flat_map(lambda s: s.split())      # Split into words
    .map(lambda w: (w, 1))              # Create (word, 1) pairs
    .reduce_by_key(lambda x, y: x + y)  # Sum counts by key
)

print(word_counts)
# Output: [('than', 1), ('the', 1), ('best', 1), ('better', 1),
#          ('good', 1), ('is', 3), ('python', 3), ('x', 1)]

📚 Core Concepts

Eager vs Lazy Execution

Eager mode (ListMapper) materializes results immediately:

result = ListMapper[int](1, 2, 3).map(lambda x: x * 2)
print(result)  # List[2, 4, 6] - already computed

Lazy mode (LazyListMapper) defers computation:

lazy = ListMapper[int](1, 2, 3).lazy().map(lambda x: x * 2)
# Nothing computed yet!

result = lazy.collect()  # Now it executes
print(result)  # List[2, 4, 6]

Execution Backends

Choose the right backend for your workload:

Backend Best For Example Use Case
Serial Small data, debugging Quick scripts, development
Local (threads) I/O-bound tasks HTTP requests, file operations
Local (processes) CPU-bound tasks Heavy computation, data processing
Async Async I/O Concurrent API calls
Ray Distributed computing Large-scale data processing
Dask Distributed computing Parallel workflows, big data
from functional_list import ListMapper, LocalBackend

# Use threading for I/O-bound work
result = data.map(
    fetch_url,
    backend=LocalBackend(mode="threads", workers=10)
)

File I/O

Read data directly from various formats:

from functional_list import ListMapper

# CSV files
users = ListMapper.from_csv("users.csv")

# JSON files
data = ListMapper.from_json("data.json")

# Parquet files (requires pyarrow)
records = ListMapper.from_parquet("data.parquet", columns=["id", "name"])

# Process the data
adults = users.filter(lambda u: u["age"] >= 18)

This documentation is organized to help you learn progressively:

  1. Quickstart - Get up and running in 5 minutes
  2. Concepts - Understand the core ideas
  3. Backends - Execution backend concepts
  4. Eager vs Lazy - Choosing the right mode
  5. Guides - Detailed how-to guides
  6. Backends Guide - Using different backends
  7. Lazy Pipelines - Working with lazy evaluation
  8. I/O Operations - Reading and writing files
  9. API Reference - Complete API documentation
  10. ListMapper - Eager mode API
  11. LazyListMapper - Lazy mode API
  12. Backends - Backend API reference
  13. Benchmarks - Performance comparisons

💡 Common Use Cases

Data Processing Pipeline

from functional_list import ListMapper

# ETL pipeline
processed = (
    ListMapper.from_csv("raw_data.csv")
    .filter(lambda row: row["status"] == "active")
    .map(lambda row: transform(row))
    .distinct()
)
processed.to_json("processed.json")

Parallel Web Scraping

from functional_list import ListMapper, LocalBackend

urls = ListMapper[str](*url_list)
pages = urls.map(
    fetch_page,
    backend=LocalBackend(mode="threads", workers=20)
)

Log Analysis

errors = (
    ListMapper.from_text("app.log")
    .filter(lambda line: "ERROR" in line)
    .map(parse_log_line)
    .group_by(lambda e: e["error_type"])
)

🆘 Getting Help


Ready to get started? Head over to the Quickstart guide!