Skip to main content

Distributed Testing

Overview

Test Dask distributed processing in Sensyze Dataflow.

Test Pipelines

Quick Test Pipeline (30 seconds)

File: sample_pipelines/distributed_quick_test.json

  • Purpose: Fast verification
  • Configuration: 5 parallel tasks × 30 seconds
  • Expected Speedup: 3-4x

Full Production Test (2 minutes)

File: sample_pipelines/distributed_test_pipeline.json

  • Purpose: Production-like testing
  • Configuration: 5 parallel tasks × 120 seconds
  • Expected Speedup: 3-5x

Running Tests

Using Makefile

make test-dask-quick   # Quick test (~10 seconds)
make test-dask # All Dask tests (~45 seconds)
make test-dask-all # All tests with output

Using pytest

cd dataflow-server
pytest tests/test_distributed_dask.py -v

Run Specific Test

pytest tests/test_distributed_dask.py::test_parallel_execution_speedup -v -s

Test Cases

1. Dask Client Initialization

def test_dask_client_initialization():
# Verifies Dask client setup
# Checks dashboard availability

2. Parallel Execution Speedup

def test_parallel_execution_speedup():
# 5 tasks × 5 seconds each
# Validates speedup > 2x
# Expected: ~4x speedup

3. Task Timing Overlap

def test_task_timing_overlap():
# Verifies true parallel execution
# Detects overlapping task windows

4. Result Correctness

def test_result_correctness():
# Validates all results complete
# Checks data accuracy

5. Large Dataset Uses Dask

def test_large_dataset_uses_dask():
# 15,000 rows → Dask

6. Small Dataset Uses Pandas

def test_small_dataset_uses_pandas():
# 100 rows → Pandas

Expected Results

📊 Results:
Tasks completed: 5
Wall time: 6.23s
Theoretical sequential time: 25.00s
Total task time: 25.15s
Speedup factor: 4.01x
Efficiency: 80.2%

✓ PASS: Achieved 4.01x speedup with parallel execution

Performance Benchmarks

Test TypeTasksDuration/TaskSequentialParallelSpeedup
Pytest55s25s~6s4.0x
Quick530s150s~35s4.3x
Full5120s600s~140s4.3x
Large602s120s~15s8.0x