Introduction
Sensyze Dataflow is a visual ETL/ELT platform for building, running, and observing data pipelines. It combines a React Flow-based mapper with a FastAPI backend, Temporal workflows, and a hybrid Pandas/Dask execution engine. Supabase provides auth, storage, and the primary database.
What It Does
- Visual pipeline builder for sources, transforms, and destinations
- Temporal-orchestrated execution with parallel layers
- Hybrid data processing via
DataFrameAdapter(Pandas for small data, Dask for large) - Observability with per-node logs, samples, and metrics
- Usage and billing controls (bi-weekly minutes, AI ops limit, Stripe credits)
Architecture
graph TD
User((User)) --> FE[Next.js Frontend]
User --> MKT[Marketing Site]
FE <-> API[FastAPI API]
API <-> Supabase["Supabase Postgres + Auth"]
API <-> Storage["Supabase Storage / Local Storage"]
API <-> Temporal[Temporal Server]
Temporal <-> Worker[Temporal Worker]
Worker <-> Runner[PipelineRunner]
Worker <-> Redis[Redis Cache]
Runner --> Adapter[DataFrameAdapter]
Adapter --> Pandas[Pandas]
Adapter --> Dask[Dask Cluster]
Runner --> DuckDB["DuckDB (staging)"]
Worker --> Obs[Observability Logger]
Obs --> SQLite["Observability SQLite"]
Performance Strategy
- < 10,000 rows: Pandas for low overhead
= 10,000 rows: Dask for parallel execution
Quick Start
Prerequisites
- Docker and Docker Compose
- Node.js 18+
- Python 3.12+
Configure Environment
cp .env.frontend.example .env.frontend
cp .env.backend.example .env.backend
Fill in Supabase, Stripe, and other service keys as needed.
Run Locally (Recommended)
make dev
Common commands:
make dev
make dev-fresh
make dev-clean
make status
Local URLs
- App: http://localhost:3000
- API Docs: http://localhost:8000/docs
- Temporal UI: http://localhost:8080
- Dask Dashboard: http://localhost:8787
Repository Layout
dataflow-server/ # FastAPI, Temporal worker, pipeline engine
frontend/ # Next.js app (mapper, jobs, accounts)
marketing/ # Static marketing site
supabase/ # Supabase config
supabase_migrations/ # Database migrations
diagrams/ # Mermaid sequence diagrams
docs/ # Architecture, ops, and product docs
Tech Stack
Frontend (User Interface)
- Framework: Next.js 14 (App Router)
- Visual Engine: React Flow (DAG visualization and manipulation)
- Styling: Tailwind CSS (Premium Dark Mode aesthetics)
- State Management: Zustand (Persisted drafts and UI state)
Backend (Data Flow Server)
- Framework: FastAPI (Python 3.12)
- Execution Engine: Custom
PipelineRunnersupporting topological execution and async I/O. - Compute Strategy:
- Single Node: Pandas for smaller datasets (< 10k rows).
- Distributed: Dask for large-scale processing (>= 10k rows).
- Embedded Database: DuckDB (Staging intermediate results and SQL transformations).
- Scheduling: Temporal + Redis (Cache).
Integrations
- Storage/DB: Supabase (PostgreSQL for metadata, Auth for users).
- Connectors: REST, SQL, CSV/JSON, Database (Postgre, MySQL, MongoDB, Snowflake, BigQuery), SaaS (Salesforce, Stripe, HubSpot).