Skip to content

branch-4.0: [pick](pr) #58396 #60540 #60684#62320

Open
zhangstar333 wants to merge 5 commits intoapache:branch-4.0from
zhangstar333:branch-4.0-0410
Open

branch-4.0: [pick](pr) #58396 #60540 #60684#62320
zhangstar333 wants to merge 5 commits intoapache:branch-4.0from
zhangstar333:branch-4.0-0410

Conversation

@zhangstar333
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

cherry-pick from master #58396 (#60540) #60684

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

zhangstar333 and others added 3 commits April 10, 2026 13:55
Problem Summary:
before in pr : apache#24788
Previously, CLUSTER BY was used to define sort columns but with limited
syntax (ASC only, no sort order control). This PR changes it to ORDER
BY, which is more intuitive and flexible.
Users can now explicitly specify sort direction and nulls order for each
column.
The default remains ASC with NULLS FIRST for column order.

and support order by clause in iceberg table

```
  CREATE TABLE `test_table2` (
    `id` int NULL,
    `name` text NULL,
    `score` double NULL,
    `create_time` datetimev2(6) NULL
  ) ENGINE=ICEBERG_EXTERNAL_TABLE
  ORDER BY  (`id` ASC NULLS FIRST, `score` DESC NULLS LAST)
  LOCATION 's3a://warehouse/wh/test_with_sr/test_table2'
  PROPERTIES (
    "write-format" = "ORC",
    "doris.version" = "doris-0.0.0-2fa88d38b0",
    "write.parquet.compression-codec" = "zstd"
  );
```
…les (apache#58396)

### What problem does this PR solve?

### Proposed changes

This PR implements static partition overwrite functionality for Iceberg
external tables, allowing users to precisely overwrite specific
partitions using the `INSERT OVERWRITE ... PARTITION (col='value', ...)`
syntax.

### Background

Before this PR, Doris supports:
- ✅ `INSERT INTO` with dynamic partition for Iceberg tables
- ✅ `INSERT OVERWRITE` for full table replacement
- ❌ `INSERT OVERWRITE ... PARTITION (...)` for static partition
overwrite

### New Features

1. **Full Static Partition Mode**: Overwrite a specific partition when
all partition columns are specified
   ```sql
INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25',
region='bj')
   SELECT id, name FROM source_table;
   ```

2. **Hybrid Partition Mode**: Partial static + partial dynamic partition
   ```sql
   -- dt is static, region comes from SELECT dynamically
   INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25')
   SELECT id, name, region FROM source_table;
   ```

### Implementation Details

#### FE Changes
- **Parser** (`DorisParser.g4`, `LogicalPlanBuilder.java`): Extended
partition spec parsing to support `PARTITION (col='value', ...)` syntax
- **InsertPartitionSpec**: New unified data structure to represent
partition modes (auto-detect, dynamic, static)
- **UnboundIcebergTableSink**: Added `staticPartitionKeyValues` field to
carry static partition info
- **BindSink**: Added validation for static partition columns and
generate constant expressions for static partition values
- **IcebergTransaction**: Implemented `commitStaticPartitionOverwrite()`
using Iceberg's `OverwriteFiles.overwriteByRowFilter()` API
- **IcebergUtils**: Added `parsePartitionValueFromString()` utility for
partition value type conversion

#### BE Changes
- **VIcebergTableWriter**: 
- Support full static partition mode (all data goes to single partition)
- Support hybrid partition mode (static columns from config, dynamic
columns from data)
- Added `_is_full_static_partition` and
`_dynamic_partition_column_indices` for mode detection

#### Thrift Changes
- Added `static_partition_values` field to `TIcebergTableSink` for
passing static partition info from FE to BE
…pache#60540)

Problem Summary:

support write iceberg table with sort-order, the write data have been
local sorted, and have add lower/upper_bounds metadata. so the iceberg
plan could use it to prune datafile.
**Notes**: this is only a local sort, not global sort. so if you are
more parallel about iceberg writer, you many see overlapping of
lower/upper_bounds between files.
if you need a global sort, maybe could add order by cluster in the
insert SQL.

you could create table, and then alter table eg:
```
CREATE TABLE test_table2 (
    id INT,
    name STRING,
    score DOUBLE,
    create_time datetime
)
ORDER BY (
    id ASC NULLS FIRST,
    score DESC NULLS LAST)
PROPERTIES (
  'write-format'='ORC'
);
```
@zhangstar333 zhangstar333 changed the title branch-4.0: [pick](pr) branch-4.0: [pick](pr) #58396 #60540 #60684 Apr 10, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhangstar333
Copy link
Copy Markdown
Contributor Author

run buildall

@apache apache deleted a comment from hello-stephen Apr 10, 2026
@zhangstar333
Copy link
Copy Markdown
Contributor Author

run buildall

@zhangstar333
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.35% (1795/2262)
Line Coverage 64.65% (32190/49794)
Region Coverage 65.47% (16099/24590)
Branch Coverage 56.05% (8583/15312)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 19.26% (78/405) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants