Distributed Testing
Overview
Test Dask distributed processing in Sensyze Dataflow.
Test Pipelines
Quick Test Pipeline (30 seconds)
File: sample_pipelines/distributed_quick_test.json
- Purpose: Fast verification
- Configuration: 5 parallel tasks × 30 seconds
- Expected Speedup: 3-4x
Full Production Test (2 minutes)
File: sample_pipelines/distributed_test_pipeline.json
- Purpose: Production-like testing
- Configuration: 5 parallel tasks × 120 seconds
- Expected Speedup: 3-5x
Running Tests
Using Makefile
make test-dask-quick # Quick test (~10 seconds)
make test-dask # All Dask tests (~45 seconds)
make test-dask-all # All tests with output
Using pytest
cd dataflow-server
pytest tests/test_distributed_dask.py -v
Run Specific Test
pytest tests/test_distributed_dask.py::test_parallel_execution_speedup -v -s
Test Cases
1. Dask Client Initialization
def test_dask_client_initialization():
# Verifies Dask client setup
# Checks dashboard availability
2. Parallel Execution Speedup
def test_parallel_execution_speedup():
# 5 tasks × 5 seconds each
# Validates speedup > 2x
# Expected: ~4x speedup
3. Task Timing Overlap
def test_task_timing_overlap():
# Verifies true parallel execution
# Detects overlapping task windows
4. Result Correctness
def test_result_correctness():
# Validates all results complete
# Checks data accuracy
5. Large Dataset Uses Dask
def test_large_dataset_uses_dask():
# 15,000 rows → Dask
6. Small Dataset Uses Pandas
def test_small_dataset_uses_pandas():
# 100 rows → Pandas
Expected Results
📊 Results:
Tasks completed: 5
Wall time: 6.23s
Theoretical sequential time: 25.00s
Total task time: 25.15s
Speedup factor: 4.01x
Efficiency: 80.2%
✓ PASS: Achieved 4.01x speedup with parallel execution
Performance Benchmarks
| Test Type | Tasks | Duration/Task | Sequential | Parallel | Speedup |
|---|---|---|---|---|---|
| Pytest | 5 | 5s | 25s | ~6s | 4.0x |
| Quick | 5 | 30s | 150s | ~35s | 4.3x |
| Full | 5 | 120s | 600s | ~140s | 4.3x |
| Large | 60 | 2s | 120s | ~15s | 8.0x |