Skip to content

fix(spark): correct handling of ‘remap’ property#818

Open
andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
andrew-coleman:json
Open

fix(spark): correct handling of ‘remap’ property#818
andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
andrew-coleman:json

Conversation

@andrew-coleman
Copy link
Copy Markdown
Member

When running the substrait to spark converter on plans that were generated by another system, it was found that the emit/outputMappings (remap) property was not being processed correctly. This commit fixes that.
Extra tests were added, containing substrait plans in JSON proto format that were generated by a third party system.

When running the substrait to spark converter on plans that
were generated by another system, it was found that the
emit/outputMappings (remap) property was not being
processed correctly.  This commit fixes that.
Extra tests were added, containing substrait plans in JSON proto
format that were generated by a third party system.

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
}
if (createProject) {
Project(projectList, child)
val ps = output.map(_.toAttribute) ++ projectList
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My imagination falls a bit short in coming up with a meaning for ps. Is there are a more expressive variable name we can use here?

@@ -0,0 +1,123 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a new file and Substrait not being part of the Apache Software Foundation do we want to include a license header like this one?

SparkSession.builder().config("spark.master", "local").enableHiveSupport().getOrCreate()

test("Aggregate") {
val substraitPlanPath = Path.of("../src/test/resources/substrait_plan_with_aggregate_op.json")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit undecided on the testing approach for this specific feature.

I feel like there would be a simpler testing approach where one could use SubstraitBuilder to construct Substrait plans containing remap and roundtripping them Substrait -> Spark -> Substrait without the need for going through JSON / add the test data as CSVs.

On the other hand @benbellick also added JSON plans as testcases to other modules in this repo and I have proposed adding actual execution of plans against Postgres in #700 also including test data.

Would be curious to know what others think.

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants