Processing Mode Configuration
Overview
Control whether your pipeline uses Pandas or Dask for data processing at the job level.
Configuration Options
1. Auto Mode (Default) - Smart Switching
processing_mode = 'auto'
Behavior:
- Small datasets (< 10,000 rows): Uses Pandas
- Large datasets (≥ 10,000 rows): Uses Dask
- Optimal for most use cases
2. Force Dask Mode
processing_mode = 'dask'
Behavior:
- ALL operations use Dask, regardless of data size
- Even small datasets (100 rows) will use Dask
- Enables distributed processing for everything
When to use:
- Testing distributed processing
- Ensuring consistent behavior across all data sizes
3. Force Pandas Mode
processing_mode = 'pandas'
Behavior:
- ALL operations use Pandas, regardless of data size
- No distributed processing
When to use:
- Debugging issues
- When you need deterministic single-threaded execution
How to Set Processing Mode
Method 1: Pipeline Configuration (JSON)
{
"processing_mode": "dask",
"nodes": [...],
"edges": [...]
}
Method 2: Python API
from pipeline_runner import PipelineRunner
runner = PipelineRunner(
config=pipeline_config,
processing_mode='dask'
)
await runner.run_async()
Method 3: Programmatic Control
from dataframe_adapter import DataFrameAdapter
DataFrameAdapter.set_processing_mode('dask')
# Your pipeline code here...
DataFrameAdapter.reset_processing_mode()
Behavior Matrix
| Data Size | Mode: auto | Mode: dask | Mode: pandas |
|---|---|---|---|
| 100 rows | Pandas | Dask | Pandas |
| 5K rows | Pandas | Dask | Pandas |
| 10K rows | Dask | Dask | Pandas |
| 50K rows | Dask | Dask | Pandas |
| 1M rows | Dask | Dask | Pandas |
Configuration Priority
The system checks for processing mode in this order:
- Pipeline config (
config['processing_mode']) - Pipeline data (
config['data']['processing_mode']) - PipelineRunner parameter (
processing_mode='dask') - Default (
'auto')
Best Practices
- Default to Auto: Use
automode unless you have a specific reason - Test with Dask: Use
daskmode to test distributed processing - Debug with Pandas: Use
pandasmode for easier debugging - Document Choice: If using non-auto mode, document why in comments