Aggregate Functions by gatesn · Pull Request #21 · vortex-data/rfcs

gatesn · 2026-02-28T04:12:52Z

No description provided.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

joseph-isaacs · 2026-03-02T10:10:58Z

+        let agg = &options.aggregate_fn;
+
+        // Try encoding-specific fast path first.
+        if let Some(states) = list.elements().aggregate_list(&list, agg)? {


It also the wrong type

joseph-isaacs · 2026-03-02T13:12:26Z

+    fn accumulate_list(&mut self, list: &ListViewArray) -> VortexResult<()> {
+        for i in 0..list.len() {
+            self.accumulate(&list.list_elements_at(i)?)?;
+            self.flush()?;
+        }
+        Ok(())
+    }


I think we might want to use a array + offset + len, approach to avoid list construction at each step?

What do you mean each step?

I way thinking as you do pushdown or reduce you will need to unwrap the elements, unwrap an encodings and wrap that up with offset + len

Isn't that == canonicalize to ListView?

Signed-off-by: Nicholas Gates <nick@nickgates.com>

joseph-isaacs · 2026-03-03T14:30:46Z

+    /// Merge a partial state scalar into the current group state.
+    fn merge(
+        &self, options: &Self::Options, state: &mut Self::GroupState, partial: &Scalar,
+    ) -> VortexResult<()>;


Why do you define merge in this way? It could be (GroupState, GroupState) -> GroupState

Because then you need an extra function for Scalar -> GroupState and also merging on multiple groups takes an ArrayRef, not a Vec

Can you expand on this or did you define this else where?

Can you explain what the scalar partial is here and is it similar to a GroupState state.

It's the "vortex" version of GroupState. We could just use a Scalar to model GroupState if we wanted. Maybe it's nicer to have a native type for performance. Or maybe it's ok to just use and merge scalars.

No, we definitely want a native type for GroupState, e.g. string_concat can hold a mutable string buffer and accumulate data into it. Then we only convert to scalar when we flush.

joseph-isaacs · 2026-03-03T14:38:54Z

+
+    /// Accumulate a canonical batch into the current group state.
+    fn accumulate(
+        &self, options: &Self::Options, state: &mut Self::GroupState, batch: &Canonical,


This is the fallback and we have encoding specific kernels?

AdamGS · 2026-03-03T15:53:36Z

+        -> VortexResult<Self::GroupState>;
+
+    /// Accumulate a canonical batch into the current group state.
+    fn accumulate(


trying to pull out of stats happens here?

Signed-off-by: Nicholas Gates <nick@nickgates.com>

First PR implementing the Aggregate Functions proposal in vortex-data/rfcs#21 --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>

asubiotto · 2026-03-11T10:39:59Z

Apologies for the delay. I think one thing that would be helpful to add to this RFC is a small section on what kinds of grouping are supported. It seems like it's mostly pre-defined groups (i.e. list offsets and sorted groups?). I mentioned this in our sync, but I think supporting partial aggregations on unordered groups might be interesting too. We have a query of the type select first_value(expensive_column) from table group by unordered_column which seems like it could benefit from being pushed down to the scan layer to make these reads cheaper. Always with a final aggregation in the downstream execution engine of course.

gatesn · 2026-04-16T15:31:07Z

I'm not sure I understand how the scan should group by an unordered column? I would want to avoid anything in the Vortex scan from having to hold state indefinitely (i.e. accumulating state until the end of the scan)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

asubiotto · 2026-04-17T11:32:10Z

I'm not sure I understand how the scan should group by an unordered column? I would want to avoid anything in the Vortex scan from having to hold state indefinitely (i.e. accumulating state until the end of the scan)

The idea is that it would be a partial aggregation within batches (i.e. emit on every batch). I don't have concrete proof of an improvement, but something we do is group_by(x), first_value(y), where y is a complex expensive column to read/process and x is unordered but can have very few distinct values within most batches and I think even just pre-aggregating within each batch can produce savings (if not in segment reads after segment slicing, at least in downstream execution engine processing). The meat of the aggregation is still handed over to the execution engine. I think this is not fundamental nor a blocker, but could be interesting to keep in mind.

gatesn added 16 commits February 25, 2026 21:29

Add an RFC for Execution V2

47fe4e6

Signed-off-by: Nicholas Gates <nick@nickgates.com>

RFCs

d0d3f69

Signed-off-by: Nicholas Gates <nick@nickgates.com>

RFCs

c335c0d

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Make ExecutionArgs a dyn trait

8b67c21

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Make ExecutionArgs a dyn trait

ce38bb9

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Make ExecutionArgs a dyn trait

287fe87

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Make ExecutionArgs a dyn trait

16cac21

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

e7965ad

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

5326c15

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

8261f74

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

9403373

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

28365e4

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

57f7637

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

9e78dc2

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

d4731db

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Assert proposal number

c0686a1

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn mentioned this pull request Feb 28, 2026

Aggregate Fns vortex-data/vortex#6721

Merged

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread rfcs/0021-aggregate-functions.md

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

a10y reviewed Mar 2, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

Comment thread proposals/0021-aggregate-functions.md Outdated

Comment thread proposals/0021-aggregate-functions.md Outdated

gatesn added 3 commits March 2, 2026 21:36

Updates

e30dc6f

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Updates

a9be12c

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Updates

7471b49

Signed-off-by: Nicholas Gates <nick@nickgates.com>

joseph-isaacs reviewed Mar 3, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 3, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 3, 2026

View reviewed changes

Comment thread proposals/0021-aggregate-functions.md Outdated

joseph-isaacs reviewed Mar 3, 2026

View reviewed changes

Comment thread rfcs/0021-aggregate-functions.md

joseph-isaacs reviewed Mar 3, 2026

View reviewed changes

AdamGS reviewed Mar 3, 2026

View reviewed changes

merge

a5a17ee

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn added a commit to vortex-data/vortex that referenced this pull request Mar 6, 2026

Aggregate Fns (#6721)

761c404

First PR implementing the Aggregate Functions proposal in vortex-data/rfcs#21 --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>

Move RFC to rfcs/ directory

4b0eec8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gatesn deployed to rfc-preview April 16, 2026 15:54 — with GitHub Actions View deployment

gatesn changed the title ~~Aggregate Functions~~ RFC 0021: Aggregate Functions Apr 16, 2026

gatesn changed the title ~~RFC 0021: Aggregate Functions~~ Aggregate Functions Apr 16, 2026

Conversation

gatesn commented Feb 28, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatesn Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joseph-isaacs Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asubiotto commented Mar 11, 2026

Uh oh!

gatesn commented Apr 16, 2026

Uh oh!

asubiotto commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gatesn Mar 5, 2026 •

edited

Loading

joseph-isaacs Mar 3, 2026 •

edited

Loading