There is already a decent synergy between pandas and scikit-learn and most other popular machine learning libraries, as in a pandas DataFrame is almost always accepted as an input data structure.
However, the output of the scikit-learn transformers is a pure numpy array, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.
There is already a decent synergy between
pandasandscikit-learnand most other popular machine learning libraries, as in apandasDataFrame is almost always accepted as an input data structure.However, the output of the
scikit-learntransformers is a purenumpyarray, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.