Incremental Loading

Overview

Process only new or changed data since the last run, reducing execution time and resource usage.

Use Cases

Daily ETL pipelines processing only today's data
CDC-style updates
Time-series data ingestion

Configuration

Enable Incremental Mode

{
  "id": "source-1",
  "type": "database",
  "data": {
    "connection_id": "uuid",
    "query": "SELECT * FROM orders WHERE created_at > {{state.last_run}}",
    "incremental": {
      "enabled": true,
      "column": "created_at",
      "state_key": "orders_last_run"
    }
  }
}

State Management

The system automatically tracks state:

{
  "state_key": "orders_last_run",
  "value": "2024-01-15T00:00:00Z",
  "updated_at": "2024-01-16T00:00:00Z"
}

Incremental Strategies

By Timestamp

{
  "incremental": {
    "type": "timestamp",
    "column": "updated_at",
    "format": "iso8601"
  }
}

By ID

{
  "incremental": {
    "type": "id",
    "column": "id",
    "last_value": 1000
  }
}

By Partition

{
  "incremental": {
    "type": "partition",
    "column": "date",
    "partition_format": "yyyy-MM-dd"
  }
}

Backfill

To process historical data:

{
  "backfill": {
    "enabled": true,
    "start_date": "2024-01-01",
    "end_date": "2024-01-15",
    "batch_size": 7
  }
}

Best Practices

Always have an index on the incremental column
Use appropriate data types (timestamp preferred)
Handle deletions with soft deletes or CDC
Monitor state to ensure consistency

Incremental Loading

Overview​

Use Cases​

Configuration​

Enable Incremental Mode​

State Management​

Incremental Strategies​

By Timestamp​

By ID​

By Partition​

Backfill​

Best Practices​

Overview

Use Cases

Configuration

Enable Incremental Mode

State Management

Incremental Strategies

By Timestamp

By ID

By Partition

Backfill

Best Practices