Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/apis/client-support-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ These data operations are available under TableAppend, TableScan, TableUpsert an
| Primary Key | Batch Scan (Snapshot) | ✔️ | | | |

:::tip
For more details, see [Table Overview](/table-design/overview.md).
For more details, see [Table Overview](/table-design/overview.mdx).
:::

## Data Types
Expand Down Expand Up @@ -110,5 +110,5 @@ Admin operations are available under FlussAdmin interface.
| Paimon | ✔️ | | | |

:::tip
For more details, see [Streaming Lakehouse](/streaming-lakehouse/overview.md).
For more details, see [Streaming Lakehouse](/streaming-lakehouse/overview.mdx).
:::
Binary file added website/docs/assets/architecture_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/docs/assets/data_organization_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/docs/assets/delta_join_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/docs/assets/deployment_overview_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/docs/assets/streamhouse_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/docs/assets/tiered-storage-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@
title: "Architecture"
sidebar_position: 1
---
import ThemedImage from '@site/src/components/ThemedImage';

# Architecture
A Fluss cluster consists of two main processes: the **CoordinatorServer** and the **TabletServer**.

![Fluss Architecture](../assets/architecture.png)
<ThemedImage
alt="Architecture"
light="architecture_light.png"
dark="architecture.png"
/>

## CoordinatorServer
The **CoordinatorServer** serves as the central control and management component of the cluster. It is responsible for maintaining metadata, managing tablet allocation, listing nodes, and handling permissions.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ sidebar_label: Delta Joins
title: Flink Delta Joins
sidebar_position: 7
---
import ThemedImage from '@site/src/components/ThemedImage';

# Flink Delta Join
Beginning with **Apache Flink 2.1**, a new operator called [Delta Join](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin) was introduced.
Expand All @@ -19,7 +20,11 @@ Starting with **Apache Fluss 0.8**, streaming join jobs running on **Flink 2.1 o

Traditional streaming joins in Flink require maintaining both input sides entirely in state to match records across streams. Delta join, by contrast, uses a **index-key lookup mechanism** to transform the behavior of querying data from the state into querying data from the Fluss source table, thereby avoiding redundant storage of the same data in both the Fluss source table and the state. This drastically reduces state size and improves performance for many streaming analytics and enrichment workloads.

![](../assets/delta_join.png)
<ThemedImage
alt="Delta Join"
light="delta_join_light.png"
dark="delta_join.png"
/>

## Example: Delta Join in Flink 2.1

Expand Down
2 changes: 1 addition & 1 deletion website/docs/engine-flink/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ CREATE CATALOG fluss_catalog WITH (

:::note
1. The `bootstrap.servers` means the Fluss server address. Before you config the `bootstrap.servers`,
you should start the Fluss server first. See [Deploying Fluss](install-deploy/overview.md#how-to-deploy-fluss)
you should start the Fluss server first. See [Deploying Fluss](install-deploy/overview.mdx#how-to-deploy-fluss)
for how to build a Fluss cluster.
Here, it is assumed that there is a Fluss cluster running on your local machine and the CoordinatorServer port is 9123.
2. The`bootstrap.servers` configuration is used to discover all nodes within the Fluss cluster. It can be set with one or more (up to three) Fluss server addresses (either CoordinatorServer or TabletServer) separated by commas.
Expand Down
4 changes: 2 additions & 2 deletions website/docs/engine-spark/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ val spark = SparkSession.builder()

:::note
1. The `spark.sql.catalog.fluss_catalog.bootstrap.servers` means the Fluss server address. Before you config the `bootstrap.servers`,
you should start the Fluss server first. See [Deploying Fluss](install-deploy/overview.md#how-to-deploy-fluss)
you should start the Fluss server first. See [Deploying Fluss](install-deploy/overview.mdx#how-to-deploy-fluss)
for how to build a Fluss cluster.
Here, it is assumed that there is a Fluss cluster running on your local machine and the CoordinatorServer port is 9123.
2. The `spark.sql.catalog.fluss_catalog.bootstrap.servers` configuration is used to discover all nodes within the Fluss cluster. It can be set with one or more (up to three) Fluss server addresses (either CoordinatorServer or TabletServer) separated by commas.
Expand Down Expand Up @@ -226,4 +226,4 @@ The `MAP` type is currently supported for table creation and schema mapping, but
| BinaryType | BYTES |
| ArrayType | ARRAY |
| MapType | MAP (read/write not yet supported) |
| StructType | ROW |
| StructType | ROW |
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ sidebar_label: "Overview"
title: Installation & Deployment
sidebar_position: 1
---
import ThemedImage from '@site/src/components/ThemedImage';

# Overview

Expand All @@ -12,9 +13,11 @@ Below, we provide an overview of the key components of a Fluss cluster, detailin

The figure below shows the building blocks of Fluss clusters:

<img width="1200px" src={require('../assets/deployment_overview.png').default} />


<ThemedImage
alt="Deployment Overview"
light="deployment_overview_light.png"
dark="deployment_overview.png"
/>

When deploying Fluss, there are often multiple options available for each building block.
We have listed them in the table below the figure.
Expand Down
6 changes: 3 additions & 3 deletions website/docs/intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Stateful streaming ETL pipelines (joins, rolling aggregations, deduplication, re
## Where to go next?

- [QuickStart](quickstart/flink.md): Get started with Fluss in minutes.
- [Architecture](concepts/architecture.md): Learn about Fluss's architecture.
- [Table Design](table-design/overview.md): Explore Fluss's table types, partitions and buckets.
- [Lakehouse](streaming-lakehouse/overview.md): Integrate Fluss with your Lakehouse to bring low-latency data to your Lakehouse analytics.
- [Architecture](concepts/architecture.mdx): Learn about Fluss's architecture.
- [Table Design](table-design/overview.mdx): Explore Fluss's table types, partitions and buckets.
- [Lakehouse](streaming-lakehouse/overview.mdx): Integrate Fluss with your Lakehouse to bring low-latency data to your Lakehouse analytics.
- [Development](/community/dev/ide-setup): Set up your development environment and contribute to the community.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ sidebar_label: Overview
title: Tiered Storage
sidebar_position: 1
---
import ThemedImage from '@site/src/components/ThemedImage';

# Overview

Expand All @@ -18,4 +19,8 @@ in the well-known open data lake format for better analytics performance. Curren

The overall tiered storage architecture is shown in the following diagram:

<img width="900px" src={require('../../assets/tiered-storage.png').default} />
<ThemedImage
alt="Tiered Storage"
light="tiered-storage-light.png"
dark="tiered-storage.png"
/>
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: "Lakehouse Overview"
sidebar_position: 1
---
import ThemedImage from '@site/src/components/ThemedImage';

# Lakehouse Overview

Expand Down Expand Up @@ -32,7 +33,11 @@ To build a Streaming Lakehouse, Fluss maintains a tiering service that compacts
The data in the Fluss cluster, stored in streaming Arrow format, is optimized for low-latency read and write operations, making it ideal for short-term data storage. In contrast, the compacted data in the Lakehouse, stored in Parquet format with higher compression, is optimized for efficient analytics and long-term storage.
The data in the Fluss cluster serves as a real-time data layer, retaining days of data with sub-second-level freshness. In contrast, the data in the Lakehouse serves as a historical data layer, retaining months of data with minute-level freshness.

![streamhouse](../assets/streamhouse.png)
<ThemedImage
alt="Streamhouse"
light="streamhouse_light.png"
dark="streamhouse.png"
/>

The core idea of Streaming Lakehouse is shared data and shared metadata between stream and Lakehouse, avoiding data duplication and metadata inconsistency.
Some powerful features it provides are:
Expand All @@ -43,4 +48,4 @@ Some powerful features it provides are:
- **Analytical Streams**: The union reads help data streams to have the powerful analytics capabilities. This reduces complexity when developing streaming applications, simplifies debugging, and allows for immediate access to live data insights.
- **Connect to Lakehouse Ecosystem**: Fluss keeps the table metadata in sync with data lake catalogs while compacting data into Lakehouse. As a result, external engines like Spark, StarRocks, Flink, and Trino can read the data directly. They simply connect to the data lake catalog.

Currently, Fluss supports [Paimon](streaming-lakehouse/integrate-data-lakes/formats/paimon.md), [Iceberg](streaming-lakehouse/integrate-data-lakes/formats/iceberg.md), and [Lance](streaming-lakehouse/integrate-data-lakes/formats/lance.md) as Lakehouse Storage, more kinds of data lake formats are on the roadmap.
Currently, Fluss supports [Paimon](streaming-lakehouse/integrate-data-lakes/formats/paimon.md), [Iceberg](streaming-lakehouse/integrate-data-lakes/formats/iceberg.md), and [Lance](streaming-lakehouse/integrate-data-lakes/formats/lance.md) as Lakehouse Storage, more kinds of data lake formats are on the roadmap.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ sidebar_label: Overview
title: Table Overview
sidebar_position: 1
---
import ThemedImage from '@site/src/components/ThemedImage';

# Table Overview

Expand All @@ -28,8 +29,11 @@ This design ensures efficient data organization, flexibility in handling differe

## Table Data Organization

![Table Data Organization](../assets/data_organization.png)

<ThemedImage
alt="Table Data Organization"
light="data_organization_light.png"
dark="data_organization.png"
/>

### Partition
A **partition** is a logical division of a table's data into smaller, more manageable subsets based on the values of one or more specified columns, known as partition columns.
Expand All @@ -50,4 +54,4 @@ as the log data for the primary table data.
- **.log:** Compact arrangement of log data.

### KvTablet
Each bucket of the Primary Key Table needs to generate a KvTablet. Underlying, each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM (log structured merge) engine which helps KvTablet support high-performance updates and lookup queries.
Each bucket of the Primary Key Table needs to generate a KvTablet. Underlying, each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM (log structured merge) engine which helps KvTablet support high-performance updates and lookup queries.
26 changes: 26 additions & 0 deletions website/src/components/ThemedImage.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import React from 'react';
import ThemedImage from '@theme/ThemedImage';

export default ({light, dark, alt}) => {
return <ThemedImage alt={alt} sources={{
light: require(`../../docs/assets/${light}`).default,
dark: require(`../../docs/assets/${dark}`).default
}}/>;
};
4 changes: 4 additions & 0 deletions website/src/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -3306,3 +3306,7 @@ button.clean-btn[class*='toggle'] svg {
[data-theme='dark'] table[class*='compareTable'] tbody tr:hover td[class*='colHighlight'] {
background-color: rgba(122, 175, 203, 0.2);
}

html[data-theme='light'] .navbar__search {
opacity: 1;
}