Flink
The distributed stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides exactly-once guarantees, event-time processing, and millisecond latency at massive scale.
Core Architecture
JobManager, TaskManagers, task slots, execution graphs, and deployment modes — how Flink distributes and executes streaming jobs.
Time & Watermarks
Event time vs processing time, watermark generation, late data handling, and per-partition watermarks.
Windows
Tumbling, sliding, session, and global windows — bounding infinite streams for aggregation.
State Management
Keyed state, operator state, broadcast state, state backends, TTL, and schema evolution.
Fault Tolerance & Checkpointing
Checkpoints, savepoints, Chandy-Lamport barriers, exactly-once semantics, and two-phase commit.
Programming Model
DataStream API, ProcessFunction, Table API, timers, and the unified batch/streaming model.
Connectors
Kafka, filesystem, JDBC, Elasticsearch connectors, async I/O, and custom source/sink development.
Common Patterns
Real-time aggregation, streaming ETL, fraud detection, CDC processing, and complex event processing.
Backpressure
Credit-based flow control, detecting and resolving backpressure, and its relationship to Kafka lag.
Flink SQL
Streaming SQL, dynamic tables, changelog streams, temporal joins, and the SQL Gateway.
Performance Tuning
Parallelism, RocksDB tuning, network buffers, checkpoint optimization, and serialization.
Monitoring & Operations
Metrics, Prometheus + Grafana, alerting, restart strategies, and common failure scenarios.
Why Flink?
When you need to process millions of events per second with strong consistency guarantees, event-time semantics, and fault tolerance, Flink is the industry standard. It powers real-time analytics, fraud detection, and ETL pipelines at companies like Alibaba, Uber, and Netflix.
- ✓Stateful stream processing — maintain and query large state (TBs) with exactly-once consistency across failures.
- ✓Exactly-once semantics — end-to-end guarantees via checkpointing and two-phase commit to external systems.
- ✓Event-time processing — handle out-of-order and late data correctly using watermarks, not wall-clock time.
- ✓Unified batch and stream — the same API and runtime for both bounded (batch) and unbounded (streaming) data.
- ✓Fault tolerance at scale — lightweight checkpointing with incremental snapshots, millisecond recovery, and no data loss.