>>> import numpy as np
>>> from sklearn.compose import ColumnTransformer
>>> from sklearn.preprocessing import Normalizer
>>> ct = ColumnTransformer(
... [("norm1", Normalizer(norm='l1'), [0, 1]),
... ("norm2", Normalizer(norm='l1'), slice(2, 4))])
>>> X = np.array([[0., 1., 2., 2.],
... [1., 1., 0., 1.]])
>>> # Normalizer scales each row of X to unit norm. A separate scaling
>>> # is applied for the two first and two last elements of each
>>> # row independently.
>>> ct.fit_transform(X)
array([[0. , 1. , 0.5, 0.5],
[0.5, 0.5, 0. , 1. ]])
ColumnTransformer
can be configured with a transformer that requires
a 1d array by setting the column to a string:
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> from sklearn.preprocessing import MinMaxScaler
>>> import pandas as pd
>>> X = pd.DataFrame({
... "documents": ["First item", "second one here", "Is this the last?"],
... "width": [3, 4, 5],
... })
>>> # "documents" is a string which configures ColumnTransformer to
>>> # pass the documents column as a 1d array to the CountVectorizer
>>> ct = ColumnTransformer(
... [("text_preprocess", CountVectorizer(), "documents"),
... ("num_preprocess", MinMaxScaler(), ["width"])])
>>> X_trans = ct.fit_transform(X)
For a more detailed example of usage, see
Column Transformer with Mixed Types.
fit(X, y=None, **params)[source]
Fit all transformers using X.
Parameters:
X{array-like, dataframe} of shape (n_samples, n_features)Input data, of which specified subsets are used to fit the
transformers.
yarray-like of shape (n_samples,…), default=NoneTargets for supervised learning.
**paramsdict, default=NoneParameters to be passed to the underlying transformers’ fit
and
transform
methods.
You can only pass this if metadata routing is enabled, which you
can enable using sklearn.set_config(enable_metadata_routing=True)
.
Added in version 1.4.
fit_transform(X, y=None, **params)[source]
Fit all transformers, transform the data and concatenate results.
Parameters:
X{array-like, dataframe} of shape (n_samples, n_features)Input data, of which specified subsets are used to fit the
transformers.
yarray-like of shape (n_samples,), default=NoneTargets for supervised learning.
**paramsdict, default=NoneParameters to be passed to the underlying transformers’ fit
and
transform
methods.
You can only pass this if metadata routing is enabled, which you
can enable using sklearn.set_config(enable_metadata_routing=True)
.
Added in version 1.4.
Returns:
X_t{array-like, sparse matrix} of shape (n_samples, sum_n_components)Horizontally stacked results of transformers. sum_n_components is the
sum of n_components (output dimension) over transformers. If
any result is a sparse matrix, everything will be converted to
sparse matrices.
get_feature_names_out(input_features=None)[source]
Get output feature names for transformation.
Parameters:
input_featuresarray-like of str or None, default=NoneInput features.
If input_features
is None
, then feature_names_in_
is
used as feature names in. If feature_names_in_
is not defined,
then the following input feature names are generated:
["x0", "x1", ..., "x(n_features_in_ - 1)"]
.
If input_features
is an array-like, then input_features
must
match feature_names_in_
if feature_names_in_
is defined.
get_metadata_routing()[source]
Get metadata routing of this object.
Please check User Guide on how the routing
mechanism works.
Added in version 1.4.
Returns:
routingMetadataRouterA MetadataRouter
encapsulating
routing information.
get_params(deep=True)[source]
Get parameters for this estimator.
Returns the parameters given in the constructor as well as the
estimators contained within the transformers
of the
ColumnTransformer
.
Parameters:
deepbool, default=TrueIf True, will return the parameters for this estimator and
contained subobjects that are estimators.
Returns:
paramsdictParameter names mapped to their values.
property named_transformers_
Access the fitted transformer by name.
Read-only attribute to access any transformer by given name.
Keys are transformer names and values are the fitted transformer
objects.
set_output(*, transform=None)[source]
Set the output container when "transform"
and "fit_transform"
are called.
Calling set_output
will set the output of all estimators in transformers
and transformers_
.
Parameters:
transform{“default”, “pandas”, “polars”}, default=NoneConfigure output of transform
and fit_transform
.
"default"
: Default output format of a transformer
"pandas"
: DataFrame output
"polars"
: Polars output
None
: Transform configuration is unchanged
Added in version 1.4: "polars"
option was added.
set_params(**kwargs)[source]
Set the parameters of this estimator.
Valid parameter keys can be listed with get_params()
. Note that you
can directly set the parameters of the estimators contained in
transformers
of ColumnTransformer
.
Parameters:
**kwargsdictEstimator parameters.
Returns:
selfColumnTransformerThis estimator.
transform(X, **params)[source]
Transform X separately by each transformer, concatenate results.
Parameters:
X{array-like, dataframe} of shape (n_samples, n_features)The data to be transformed by subset.
**paramsdict, default=NoneParameters to be passed to the underlying transformers’ transform
method.
You can only pass this if metadata routing is enabled, which you
can enable using sklearn.set_config(enable_metadata_routing=True)
.
Added in version 1.4.
Returns:
X_t{array-like, sparse matrix} of shape (n_samples, sum_n_components)Horizontally stacked results of transformers. sum_n_components is the
sum of n_components (output dimension) over transformers. If
any result is a sparse matrix, everything will be converted to
sparse matrices.
Tweedie regression on insurance claims
Partial Dependence and Individual Conditional Expectation Plots
Partial Dependence and Individual Conditional Expectation Plots
Permutation Importance vs Random Forest Feature Importance (MDI)
Permutation Importance vs Random Forest Feature Importance (MDI)
Displaying Pipelines
Displaying Pipelines
Evaluation of outlier detection estimators
Evaluation of outlier detection estimators
Introducing the set_output API
Introducing the set_output API
Column Transformer with Heterogeneous Data Sources
Column Transformer with Heterogeneous Data Sources
Column Transformer with Mixed Types
Column Transformer with Mixed Types
Comparing Target Encoder with Other Encoders
Comparing Target Encoder with Other Encoders