fix(spark): correct handling of ‘remap’ property#818
fix(spark): correct handling of ‘remap’ property#818andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
Conversation
When running the substrait to spark converter on plans that were generated by another system, it was found that the emit/outputMappings (remap) property was not being processed correctly. This commit fixes that. Extra tests were added, containing substrait plans in JSON proto format that were generated by a third party system. Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
| } | ||
| if (createProject) { | ||
| Project(projectList, child) | ||
| val ps = output.map(_.toAttribute) ++ projectList |
There was a problem hiding this comment.
My imagination falls a bit short in coming up with a meaning for ps. Is there are a more expressive variable name we can use here?
| @@ -0,0 +1,123 @@ | |||
| /* | |||
| * Licensed to the Apache Software Foundation (ASF) under one or more | |||
There was a problem hiding this comment.
since this is a new file and Substrait not being part of the Apache Software Foundation do we want to include a license header like this one?
| SparkSession.builder().config("spark.master", "local").enableHiveSupport().getOrCreate() | ||
|
|
||
| test("Aggregate") { | ||
| val substraitPlanPath = Path.of("../src/test/resources/substrait_plan_with_aggregate_op.json") |
There was a problem hiding this comment.
I'm a bit undecided on the testing approach for this specific feature.
I feel like there would be a simpler testing approach where one could use SubstraitBuilder to construct Substrait plans containing remap and roundtripping them Substrait -> Spark -> Substrait without the need for going through JSON / add the test data as CSVs.
On the other hand @benbellick also added JSON plans as testcases to other modules in this repo and I have proposed adding actual execution of plans against Postgres in #700 also including test data.
Would be curious to know what others think.
Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
When running the substrait to spark converter on plans that were generated by another system, it was found that the emit/outputMappings (remap) property was not being processed correctly. This commit fixes that.
Extra tests were added, containing substrait plans in JSON proto format that were generated by a third party system.