I have time series data and tabular data. I have developed the hybrid model which takes time series data as input to CNN architecture and tabular data to the TF DF random forest model, which is given to the FC layer for prediction. I would like to know the feature importance of the tabular data. When I try to use the following line to get feature importance it says:
inspector = rf_model_layer.make_inspector()
TypeError: the object of type ‘NoneType’ has no len()
I have attached the model summary : Model: “model_1”
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 12, 3000, 1)] 0 []
conv2d_2 (Conv2D) (None, 1, 2876, 16) 24016 [‘input_3[0][0]’]
conv2d_3 (Conv2D) (None, 1, 2837, 32) 20512 [‘conv2d_2[0][0]’]
input_4 (InputLayer) [(None, 60)] 0 []
flatten_1 (Flatten) (None, 90784) 0 [‘conv2d_3[0][0]’]
random_forest_model_1 (Ran (None, 1) 1 [‘input_4[0][0]’]
domForestModel)
concatenate_1 (Concatenate (None, 90785) 0 [‘flatten_1[0][0]’,
) ‘random_forest_model_1[0][0]’
dense_2 (Dense) (None, 32) 2905152 [‘concatenate_1[0][0]’]
dense_3 (Dense) (None, 1) 33 [‘dense_2[0][0]’]
==================================================================================================
Total params: 2949714 (11.25 MB)
Trainable params: 2949713 (11.25 MB)
Non-trainable params: 1 (1.00 Byte)
Any suggestions would be appreciated.
Thank you
When you construct the model, are you specifying that it should compute out-of-bag (OOB) variable importances?
model = tfdf.keras.RandomForestModel(compute_oob_variable_importances=True)
More here.
Thank you very much. I did not specify variable importance to be true. Let me try. I also have the query that the below combined model does train both CNN and RF model when I use tuner_search on it? Here’s the sample model built.
class CombinedModel(HyperModel):
def __init__(self, cnn_input_shape, rf_input_shape):
self.cnn_input_shape = cnn_input_shape
self.rf_input_shape = rf_input_shape
def build(self, hp):
# CNN part
cnn_input = Input(shape=self.cnn_input_shape)
RawInputECG = Input(shape=(12,301,1))
cnn_output = tf.keras.layers.Conv2D(filters=16, kernel_size=(12, 125), activation='relu')(cnn_input)
cnn_output = tf.keras.layers.Conv2D(filters=32, kernel_size=(1, 40), activation='relu')(cnn_output)
cnn_output = tf.keras.layers.Flatten()(cnn_output) # here the parameters are fixed to test the model.
# RF part
rf_output = tfdf.keras.RandomForestModel(
num_trees=fixed_hyperparameters['rf_num_trees'],
max_depth=fixed_hyperparameters['rf_max_depth'],
min_examples=fixed_hyperparameters['min_examples']
)(rf_input)
# Combine CNN and RF outputs
combined_layer = concatenate([cnn_output, rf_output])
# Fully Connected layer
fc_activation = hp.Choice('fc_activation', values=['relu', 'sigmoid'])
fc_layer = Dense(32, activation=fc_activation)(combined_layer)
# Output layer
output_layer = Dense(1, activation='relu')(fc_layer)
model = Model(inputs=[cnn_input, rf_input], outputs=output_layer)
model.compile(optimizer=optimizer, loss='mse', metrics=[tf.keras.metrics.RootMeanSquaredError(), mae_error])
return model
if __name__ == '__main__':
cnn_input_shape = (12, 301, 1)
rf_input_shape = ( 60,)
combined_model = CombinedModel(cnn_input_shape, rf_input_shape)
# loading the data for CNN and RF
# loading the tuner
tuner_bo = RandomSearch(
combined_model,
objective=keras_tuner.Objective("val_loss", direction="min"),
max_trials=50,
seed=16,
executions_per_trial=1,
overwrite=False,
project_name="Hybrid_model")
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=20)
mc = ModelCheckpoint(saveFN, monitor='val_loss', mode='min', verbose=1, save_best_only=True)
tuner_bo.search([dataTrain,dataTrainRF], labelsTrain,
validation_data = ([dataVal, dataValRF], labelsVal))
In the above code, rf_model is one of the layers in the combined model. Does it get trained when I call tuner_bo.search? or do i have to extract the best hyperparameters and call the best_ model.fit() to make sure both the CNN and RF layer in the combined model are trained as shown below?
# Get the best hyperparameters
best_hyperparameters = keras_tuner.get_best_hyperparameters()[0]
# Build the model with the best hyperparameters
best_model = build_model(best_hyperparameters)
# Train the model
best_model.fit(train_ds, epochs=num_epochs, validation_data=valid_ds)
# Evaluate the model on the test dataset
test_loss, test_accuracy = best_model.evaluate(test_ds)
I also get the following warning when I use best_model.fit()
WARNING:absl:The model was called directly (i.e. using model(data)
instead of using model.predict(data)
) before being trained. The model will only return zeros until trained. The output shape might change after training Tensor(“inputs:0”, shape=(None, 60), dtype=float32).
Any suggestions would be appreciated.
thank you very much.
Yes, when you invoke search
on the tuner, it should cause the combined model (actually a bunch of candidate models) to be trained in multiple trials as it determines the best combination of hyperparameters. Since the rf model is part of the combined model, it should be trained alongside the cnn model.
You should be able to get the best model (which was already trained in the tuning process):
best_model = tuner.get_best_models(num_models=1)[0]
You may use the best model as is, but as you have done, you can retrain it (examples suggest using the entire dataset (training and validation data combined) using the best hyperparameters.
The tutorial gives as an example:
hypermodel = MyHyperModel()
best_hp = tuner.get_best_hyperparameters()[0]
model = hypermodel.build(best_hp)
hypermodel.fit(best_hp, model, x_all, y_all, epochs=1)
So you might modify your code to match that example and see if it makes a difference.
Thank you @rcauvin. Let me try the approach you have suggested and update here if it works. Regarding my first question, it still gets the TypeError: the object of type ‘NoneType’ has no len() even after I specify variable_importance=True. Here’s the code of how I access the random forest model layer from the combined_model
rf_model_layer = model.layers[5] # Assuming the random forest model is the 5th layer of the combined_model
inspector = rf_model_layer.make_inspector()
It works fine when I call the TFDF RF model alone (not including the CNN model) on tabular data, I get the feature importance.
Since the tfdf random forest model is one of the layers of the Keras model, can I not use make_inspector() on the model directly to get the feature importance? Or the RandomForest model is not trained so I couldn’t access the attribute make_inspector()
I apologize for any inconvenience.
Thank you very much
rf_model_layer = model.layers[5] # Assuming the random forest model is the 5th layer of the combined_model
How are you getting model
?
cnn_input = tf.keras.Input(shape=(12, 301, 1))
rf_input = tf.keras.Input(shape=(60,))
cnn_output = tf.keras.layers.Conv2D(filters=16, kernel_size=(12, 125), activation='relu')(cnn_input)
cnn_output = tf.keras.layers.Conv2D(filters=32, kernel_size=(1, 40), activation='relu')(cnn_output)
cnn_output = tf.keras.layers.Flatten()(cnn_output)
rf_output = tfdf.keras.RandomForestModel(
num_trees=fixed_hyperparameters['rf_num_trees'],
max_depth=fixed_hyperparameters['rf_max_depth'],
min_examples=fixed_hyperparameters['min_examples'],
compute_oob_variable_importances=True
)(rf_input)
combined_output = tf.keras.layers.concatenate([cnn_output, rf_output])
fc_output = tf.keras.layers.Dense(32, activation='relu')(combined_output)
output = tf.keras.layers.Dense(1, activation='relu')(fc_output)
model = tf.keras.Model(inputs=[cnn_input, rf_input], outputs=output)
optimizer = tf.keras.optimizers.Adam(learning_rate=fixed_hyperparameters['learning_rate'])
model.compile(optimizer=optimizer, loss='mse')
trained_model=model.fit([dataTrain, dataTrainRF], labelsTrain, validation_data=([dataVal, dataValRF], labelsVal), epochs=5)
rf_model_layer = model.layers[5] # Assuming the random forest model is the 5th layer
inspector = rf_model_layer.make_inspector()
This is the complete error :
Traceback (most recent call last):
Cell In[184], line 2
inspector = rf_model_layer.make_inspector()
File ~/anaconda3/lib/python3.11/site-packages/tensorflow_decision_forests/keras/core_inference.py:411 in make_inspector
path = self.yggdrasil_model_path_tensor().numpy().decode(“utf-8”)
File ~/anaconda3/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py:153 in error_handler
raise e.with_traceback(filtered_tb) from None
File /tmp/autograph_generated_file0diouhww.py:38 in tf__yggdrasil_model_path_tensor
ag.if_stmt(ag__.ld(multitask_model_index) >= ag__.converted_call(ag__.ld(len), (ag__.ld(self)._models,), None, fscope), if_body, else_body, get_state, set_state, (), 0)
TypeError: in user code:
File "/home/hybrid/anaconda3/lib/python3.11/site-packages/tensorflow_decision_forests/keras/core_inference.py", line 436, in yggdrasil_model_path_tensor *
if multitask_model_index >= len(self._models):
TypeError: object of type 'NoneType' has no len()
Here are the layers present in the model.
layers = model.layers
print(layers)
Output
[<keras.src.engine.input_layer.InputLayer object at 0x7f00587cf390>, <keras.src.layers.convolutional.conv2d.Conv2D object at 0x7f005840fed0>, <keras.src.layers.convolutional.conv2d.Conv2D object at 0x7f0058763b50>, <keras.src.engine.input_layer.InputLayer object at 0x7f00587095d0>, <keras.src.layers.reshaping.flatten.Flatten object at 0x7f00586242d0>, <tensorflow_decision_forests.keras.RandomForestModel object at 0x7f0058463710>, <keras.src.layers.merging.concatenate.Concatenate object at 0x7f00584604d0>, <keras.src.layers.core.dense.Dense object at 0x7f005845fe10>, <keras.src.layers.core.dense.Dense object at 0x7f00587b8910>]
Thank you
I get this error when I try to print the trained_model.layers:
print(trained_model.layers)
Traceback (most recent call last):
Cell In[214], line 1
print(trained_model.layers)
AttributeError: 'History' object has no attribute 'layers'
Thank you very much for your prompt response
Thank you very much for the reference link.
What is happening in the following lines of code in the above reference link? The combined model ( NN and DF, ensemble_nn_and_df) has not been trained before. It’s just two DF models trained separately and how is it reflected in the ensemble_nn_and_df?
Let's train the two Decision Forest components (one after another).
%%time
train_dataset_with_preprocessing = train_dataset.map(lambda x,y: (preprocessor(x), y))
test_dataset_with_preprocessing = test_dataset.map(lambda x,y: (preprocessor(x), y))
model_3.fit(train_dataset_with_preprocessing)
model_4.fit(train_dataset_with_preprocessing)
mean_nn_and_df = tf.reduce_mean(
tf.stack([m1_pred, m2_pred, m3_pred, m4_pred], axis=0), axis=0)
ensemble_nn_and_df = tf_keras.models.Model(raw_features, mean_nn_and_df)
ensemble_nn_and_df.compile(
loss=tf_keras.losses.BinaryCrossentropy(), metrics=["accuracy"])
evaluation_nn_and_df = ensemble_nn_and_df.evaluate(
test_dataset, return_dict=True)
I have numpy array for the CNN model and tabular data (with categorical variables) for RF data. When I use tfdf.keras.pd_dataframe_to_tf_dataset for tabular data for the TFDF Random model (as it contains categorical variables) and tf. data.Dataset.from_tensor_slices for CNN model layers. It is not compatible. Do you have any suggestions on how to make two inputs to be fed compatible with two models? Or if we train models separately as mentioned in the reference link, the compatibility issue wouldn’t arise.
Any help is appreciated.
Thank you very much.
It looks like the tutorial creates and stitches the models together before compiling or training them. Then it compiles and trains the neural network and the decision forests separately, after which ensemble_nn_and_df
is compiled and evaluated.
I think you have a few options for dealing with the input dataset and using it to train the different models:
Train the models separately as in the tutorial (but with preprocessed input for the CNN model and tabular data input for the decision forest model).
Add some feature preprocessing layers so that the CNN model receives preprocessed input while the decision forest model receives the tabular data.
Use preprocessed input for both models.
Hi Geerthy,
As you noted, the random forest model in your example is not trained when calling “fit” on the combined model. Because random forests don’t train with back-propagation, TF-DF model can only be trained by calling “fit” directly on them. And, calling “fit” on the combined model does call fit on the individual models.
In the tutorial that @rcauvin linked (thanks), you can see random forests are trained individually (model_4.fit(...)
and model_3.fit(...)
).
Most of the errors you see are due to the RF model not being trained. This kind of error was a limitation of the TF-DF imposed by the Keras API. We’ve wrote a guide to help users figuring-out the differences.
An alternative and better solution for you is to replace your TF-DF code with YDF code. YDF is the successor of TF-DF (see the TF-DF homepage announcement). TF-DF and YDF use the same learning algorithm implementations, but the YDF’s API was improved, has more features for model understanding, and is less prone to error (see details here). Also, YDF is compatible both with Keras 2 and Keras 3, while TF-DF is currently only compatible with Keras 2.
This tutorial shows different ways to compose neural networks with decision forest models using YDF.
I hope this helps.
Thanks for the suggestion to move to YDF. Unfortunately, I’m having trouble with just importing the ydf
package.
!pip install -U ydf
import ydf
I’m getting an error on the import in my SageMaker Jupyter notebook:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[178], line 1
----> 1 import ydf
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/__init__.py:23
20 from ydf.version import version as __version__
22 # Dataset
---> 23 from ydf.dataset.dataset import create_vertical_dataset
24 from ydf.dataset.dataspec import Column
25 from ydf.dataset.dataspec import Semantic
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/dataset/dataset.py:24
22 from yggdrasil_decision_forests.dataset import data_spec_pb2
23 from ydf.cc import ydf
---> 24 from ydf.dataset import dataspec
25 from ydf.dataset.io import dataset_io
26 from ydf.dataset.io import dataset_io_types
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/dataset/dataspec.py:41
37 YDF_OOD = "<OOD>"
39 # Mapping between Numpy dtypes and YDF dtypes.
40 _NP_DTYPE_TO_YDF_DTYPE = {
---> 41 np.int8: ds_pb.DType.DTYPE_INT8,
42 np.int16: ds_pb.DType.DTYPE_INT16,
43 np.int32: ds_pb.DType.DTYPE_INT32,
44 np.int64: ds_pb.DType.DTYPE_INT64,
45 np.uint8: ds_pb.DType.DTYPE_UINT8,
46 np.uint16: ds_pb.DType.DTYPE_UINT16,
47 np.uint32: ds_pb.DType.DTYPE_UINT32,
48 np.uint64: ds_pb.DType.DTYPE_UINT64,
49 np.float16: ds_pb.DType.DTYPE_FLOAT16,
50 np.float32: ds_pb.DType.DTYPE_FLOAT32,
51 np.float64: ds_pb.DType.DTYPE_FLOAT64,
52 np.bool_: ds_pb.DType.DTYPE_BOOL,
53 np.string_: ds_pb.DType.DTYPE_BYTES,
54 np.str_: ds_pb.DType.DTYPE_BYTES,
55 np.bytes_: ds_pb.DType.DTYPE_BYTES,
56 np.object_: ds_pb.DType.DTYPE_BYTES,
59 NP_SUPPORTED_INT_DTYPE = [
60 np.int8,
61 np.int16,
(...)
67 np.uint64,
70 NP_SUPPORTED_FLOAT_DTYPE = [
71 np.float16,
72 np.float32,
73 np.float64,
AttributeError: module 'yggdrasil_decision_forests.dataset.data_spec_pb2' has no attribute 'DType'
However, when I execute it in a Google Colab notebook, it didn’t result in an error.
Hi Roger,
Thanks for the alert. There is a conflict between the dependency of TF-DF and YDF.
Installing YDF first TF-DF will result in an error, while installing TF-DF first will work.
I’ll release a fix.
In the meantime, you can solve the problem by uninstalling both, and then installing TF-DF first, and YDF after.
pip uninstall tensorflow_decision_forests
pip uninstall ydf
pip install tensorflow_decision_forests
pip install ydf
Or by, forcing the re-installation of YDF
pip install ydf --force
There may be still be an issue with conflicts between TF-DF and YDF. When I execute:
df_model.to_tensorflow_saved_model(path="multi/ranking/1/", mode="tf")
in ydf 0.4.1, it outputs this error:
TypeError Traceback (most recent call last)
Cell In[55], line 4
1 # Export the model to the TensorFlow SavedModel format.
2 # The model can be executed with Servomatic, TensorFlow Serving and
3 # Vertex AI.
----> 4 df_model.to_tensorflow_saved_model(path="multi/ranking/1/", mode="tf")
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/generic_model.py:729, in GenericModel.to_tensorflow_saved_model(self, path, input_model_signature_fn, mode, feature_dtypes, servo_api, feed_example_proto, pre_processing, post_processing, temp_dir)
602 def to_tensorflow_saved_model( # pylint: disable=dangerous-default-value
603 self,
604 path: str,
(...)
613 temp_dir: Optional[str] = None,
614 ) -> None:
615 """Exports the model as a TensorFlow Saved model.
617 This function requires TensorFlow and TensorFlow Decision Forests to be
(...)
726 (default), uses `tempfile.mkdtemp` default temporary directory.
727 """
--> 729 export_tf.ydf_model_to_tensorflow_saved_model(
730 ydf_model=self,
731 path=path,
732 input_model_signature_fn=input_model_signature_fn,
733 mode=mode,
734 feature_dtypes=feature_dtypes,
735 servo_api=servo_api,
736 feed_example_proto=feed_example_proto,
737 pre_processing=pre_processing,
738 post_processing=post_processing,
739 temp_dir=temp_dir,
740 )
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/export_tf.py:141, in ydf_model_to_tensorflow_saved_model(ydf_model, path, input_model_signature_fn, mode, feature_dtypes, servo_api, feed_example_proto, pre_processing, post_processing, temp_dir)
137 if input_model_signature_fn is not None:
138 raise ValueError(
139 "input_model_signature_fn is not supported for `tf` mode."
140 )
--> 141 ydf_model_to_tensorflow_saved_model_tf_mode(
142 ydf_model=ydf_model,
143 path=path,
144 feature_dtypes=feature_dtypes,
145 servo_api=servo_api,
146 feed_example_proto=feed_example_proto,
147 pre_processing=pre_processing,
148 post_processing=post_processing,
149 temp_dir=temp_dir,
150 )
151 else:
152 raise ValueError(f"Invalid mode: {mode}")
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/export_tf.py:194, in ydf_model_to_tensorflow_saved_model_tf_mode(ydf_model, path, feature_dtypes, servo_api, feed_example_proto, pre_processing, post_processing, temp_dir)
190 # The temporary files should remain available until the call to
191 # "tf.saved_model.save"
192 with tempfile.TemporaryDirectory(dir=temp_dir) as effective_temp_dir:
--> 194 tf_module = ydf_model.to_tensorflow_function(
195 temp_dir=effective_temp_dir,
196 squeeze_binary_classification=not servo_api,
197 )
199 # Store pre / post processing operations
200 # Note: Storing the raw variable allows for pre/post-processing to be
201 # TensorFlow modules with resources.
202 tf_module.raw_pre_processing = pre_processing
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/generic_model.py:808, in GenericModel.to_tensorflow_function(self, temp_dir, can_be_saved, squeeze_binary_classification)
742 def to_tensorflow_function( # pytype: disable=name-error
743 self,
744 temp_dir: Optional[str] = None,
745 can_be_saved: bool = True,
746 squeeze_binary_classification: bool = True,
747 ) -> "tensorflow.Module":
748 """Converts the YDF model into a @tf.function callable TensorFlow Module.
750 The output module can be composed with other TensorFlow operations,
(...)
805 A TensorFlow @tf.function.
806 """
--> 808 return export_tf.ydf_model_to_tf_function(
809 ydf_model=self,
810 temp_dir=temp_dir,
811 can_be_saved=can_be_saved,
812 squeeze_binary_classification=squeeze_binary_classification,
813 )
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/export_tf.py:310, in ydf_model_to_tf_function(ydf_model, temp_dir, can_be_saved, squeeze_binary_classification)
304 """Converts a YDF model to a TensorFlow function.
306 See GenericModel.to_tensorflow_function for the documentation.
307 """
309 tf = import_tensorflow()
--> 310 tfdf = import_tensorflow_decision_forests()
311 tf_op = tfdf.keras.core.tf_op
313 # Using prefixes ensure multiple models can be combined in a single
314 # SavedModel.
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/model/export_tf.py:381, in import_tensorflow_decision_forests()
379 """Imports the tensorflow decision forests module."""
380 try:
--> 381 import tensorflow_decision_forests as tfdf # pylint: disable=g-import-not-at-top,import-outside-toplevel # pytype: disable=import-error
383 return tfdf
384 except ImportError as exc:
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/__init__.py:64
60 from tensorflow_decision_forests.tensorflow import check_version
62 check_version.check_version(__version__, compatible_tf_versions)
---> 64 from tensorflow_decision_forests import keras
65 from tensorflow_decision_forests.component import py_tree
66 from tensorflow_decision_forests.component.builder import builder
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/keras/__init__.py:53
15 """Decision Forest in a Keras Model.
17 Usage example:
(...)
48 ```
49 """
51 from typing import Callable, List
---> 53 from tensorflow_decision_forests.keras import core
54 from tensorflow_decision_forests.keras import wrappers
56 # Utility classes
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/keras/core.py:62
60 from tensorflow.python.data.ops import dataset_ops
61 from tensorflow.python.data.ops import load_op
---> 62 from tensorflow_decision_forests.component.inspector import inspector as inspector_lib
63 from tensorflow_decision_forests.component.tuner import tuner as tuner_lib
64 from tensorflow_decision_forests.keras import core_inference
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/component/inspector/inspector.py:64
61 import six
62 import tensorflow as tf
---> 64 from tensorflow_decision_forests.component import py_tree
65 from tensorflow_decision_forests.component.inspector import blob_sequence
66 from yggdrasil_decision_forests.dataset import data_spec_pb2
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/__init__.py:20
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
15 """Decision trees stored as python objects.
17 To be used with the model inspector and model builder.
18 """
---> 20 from tensorflow_decision_forests.component.py_tree import condition
21 from tensorflow_decision_forests.component.py_tree import dataspec
22 from tensorflow_decision_forests.component.py_tree import node
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/condition.py:26
22 from typing import List, Union, Optional
24 import six
---> 26 from tensorflow_decision_forests.component.py_tree import dataspec as dataspec_lib
27 from yggdrasil_decision_forests.dataset import data_spec_pb2
28 from yggdrasil_decision_forests.model.decision_tree import decision_tree_pb2
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/dataspec.py:24
21 import math
22 from typing import NamedTuple, Union, Optional, List
---> 24 from yggdrasil_decision_forests.dataset import data_spec_pb2
26 ColumnType = data_spec_pb2.ColumnType
28 # Special value to out of vocabulary items.
File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/yggdrasil_decision_forests/dataset/data_spec_pb2.py:16
9 # @@protoc_insertion_point(imports)
11 _sym_db = _symbol_database.Default()
---> 16 DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n2yggdrasil_decision_forests/dataset/data_spec.proto\x12(yggdrasil_decision_forests.dataset.proto\"\xb9\x01\n\x11\x44\x61taSpecification\x12\x41\n\x07\x63olumns\x18\x01 \x03(\x0b\x32\x30.yggdrasil_decision_forests.dataset.proto.Column\x12\x18\n\x10\x63reated_num_rows\x18\x02 \x01(\x03\x12G\n\nunstackeds\x18\x03 \x03(\x0b\x32\x33.yggdrasil_decision_forests.dataset.proto.Unstacked\"\x95\x05\n\x06\x43olumn\x12K\n\x04type\x18\x01 \x01(\x0e\x32\x34.yggdrasil_decision_forests.dataset.proto.ColumnType:\x07UNKNOWN\x12\x0c\n\x04name\x18\x02 \x01(\t\x12\x1d\n\x0eis_manual_type\x18\x03 \x01(\x08:\x05\x66\x61lse\x12\x46\n\ttokenizer\x18\x04 \x01(\x0b\x32\x33.yggdrasil_decision_forests.dataset.proto.Tokenizer\x12J\n\tnumerical\x18\x05 \x01(\x0b\x32\x37.yggdrasil_decision_forests.dataset.proto.NumericalSpec\x12N\n\x0b\x63\x61tegorical\x18\x06 \x01(\x0b\x32\x39.yggdrasil_decision_forests.dataset.proto.CategoricalSpec\x12\x14\n\tcount_nas\x18\x07 \x01(\x03:\x01\x30\x12\x61\n\x15\x64iscretized_numerical\x18\x08 \x01(\x0b\x32\x42.yggdrasil_decision_forests.dataset.proto.DiscretizedNumericalSpec\x12\x46\n\x07\x62oolean\x18\t \x01(\x0b\x32\x35.yggdrasil_decision_forests.dataset.proto.BooleanSpec\x12O\n\x0cmulti_values\x18\n \x01(\x0b\x32\x39.yggdrasil_decision_forests.dataset.proto.MultiValuesSpec\x12\x1b\n\x0cis_unstacked\x18\x0b \x01(\x08:\x05\x66\x61lse\"\xdf\x03\n\x0f\x43\x61tegoricalSpec\x12\x1b\n\x13most_frequent_value\x18\x01 \x01(\x03\x12\x1f\n\x17number_of_unique_values\x18\x02 \x01(\x03\x12\x1a\n\x0fmin_value_count\x18\x03 \x01(\x05:\x01\x35\x12)\n\x1bmax_number_of_unique_values\x18\x04 \x01(\x05:\x04\x32\x30\x30\x30\x12\x1e\n\x16is_already_integerized\x18\x05 \x01(\x08\x12S\n\x05items\x18\x07 \x03(\x0b\x32\x44.yggdrasil_decision_forests.dataset.proto.CategoricalSpec.ItemsEntry\x12\x32\n#offset_value_by_one_during_training\x18\x08 \x01(\x08:\x05\x66\x61lse\x1ar\n\nItemsEntry\x12\x0b\n\x03key\x18\x01 \x01(\t\x12S\n\x05value\x18\x02 \x01(\x0b\x32\x44.yggdrasil_decision_forests.dataset.proto.CategoricalSpec.VocabValue:\x02\x38\x01\x1a*\n\nVocabValue\x12\r\n\x05index\x18\x01 \x01(\x03\x12\r\n\x05\x63ount\x18\x02 \x01(\x03\"b\n\rNumericalSpec\x12\x0f\n\x04mean\x18\x01 \x01(\x01:\x01\x30\x12\x11\n\tmin_value\x18\x02 \x01(\x02\x12\x11\n\tmax_value\x18\x03 \x01(\x02\x12\x1a\n\x12standard_deviation\x18\x04 \x01(\x01\"G\n\x0fMultiValuesSpec\x12\x19\n\x11max_observed_size\x18\x01 \x01(\x05\x12\x19\n\x11min_observed_size\x18\x02 \x01(\x05\"6\n\x0b\x42ooleanSpec\x12\x12\n\ncount_true\x18\x01 \x01(\x03\x12\x13\n\x0b\x63ount_false\x18\x02 \x01(\x03\"\x91\x01\n\x18\x44iscretizedNumericalSpec\x12\x16\n\nboundaries\x18\x01 \x03(\x02\x42\x02\x10\x01\x12\"\n\x1aoriginal_num_unique_values\x18\x02 \x01(\x03\x12\x1d\n\x10maximum_num_bins\x18\x03 \x01(\x03:\x03\x32\x35\x35\x12\x1a\n\x0fmin_obs_in_bins\x18\x04 \x01(\x05:\x01\x33\"\xb2\x03\n\tTokenizer\x12Y\n\x08splitter\x18\x01 \x01(\x0e\x32<.yggdrasil_decision_forests.dataset.proto.Tokenizer.Splitter:\tSEPARATOR\x12\x16\n\tseparator\x18\x02 \x01(\t:\x03 ;,\x12\x16\n\x05regex\x18\x03 \x01(\t:\x07([\\S]+)\x12\x1b\n\rto_lower_case\x18\x04 \x01(\x08:\x04true\x12N\n\x08grouping\x18\x05 \x01(\x0b\x32<.yggdrasil_decision_forests.dataset.proto.Tokenizer.Grouping\x1aS\n\x08Grouping\x12\x16\n\x08unigrams\x18\x01 \x01(\x08:\x04true\x12\x16\n\x07\x62igrams\x18\x02 \x01(\x08:\x05\x66\x61lse\x12\x17\n\x08trigrams\x18\x03 \x01(\x08:\x05\x66\x61lse\"X\n\x08Splitter\x12\x0b\n\x07INVALID\x10\x00\x12\r\n\tSEPARATOR\x10\x01\x12\x0f\n\x0bREGEX_MATCH\x10\x02\x12\r\n\tCHARACTER\x10\x03\x12\x10\n\x0cNO_SPLITTING\x10\x04\"\x97\x01\n\tUnstacked\x12\x15\n\roriginal_name\x18\x01 \x01(\t\x12\x18\n\x10\x62\x65gin_column_idx\x18\x02 \x01(\x05\x12\x0c\n\x04size\x18\x03 \x01(\x05\x12K\n\x04type\x18\x04 \x01(\x0e\x32\x34.yggdrasil_decision_forests.dataset.proto.ColumnType:\x07UNKNOWN\"\xde\x04\n\x16\x44\x61taSpecificationGuide\x12L\n\rcolumn_guides\x18\x01 \x03(\x0b\x32\x35.yggdrasil_decision_forests.dataset.proto.ColumnGuide\x12S\n\x14\x64\x65\x66\x61ult_column_guide\x18\x02 \x01(\x0b\x32\x35.yggdrasil_decision_forests.dataset.proto.ColumnGuide\x12,\n\x1dignore_columns_without_guides\x18\x03 \x01(\x08:\x05\x66\x61lse\x12\x30\n\"max_num_scanned_rows_to_guess_type\x18\x04 \x01(\x03:\x04\x31\x30\x30\x30\x12*\n\x1b\x64\x65tect_boolean_as_numerical\x18\x05 \x01(\x08:\x05\x66\x61lse\x12\x38\n)detect_numerical_as_discretized_numerical\x18\x06 \x01(\x08:\x05\x66\x61lse\x12\x39\n-max_num_scanned_rows_to_accumulate_statistics\x18\x07 \x01(\x03:\x02-1\x12\x31\n#unstack_numerical_set_as_numericals\x18\x08 \x01(\x08:\x04true\x12*\n\x1bignore_unknown_type_columns\x18\t \x01(\x08:\x05\x66\x61lse\x12\x41\n3allow_tokenization_for_inference_as_categorical_set\x18\n \x01(\x08:\x04true\"\xfc\x03\n\x0b\x43olumnGuide\x12\x1b\n\x13\x63olumn_name_pattern\x18\x01 \x01(\t\x12\x42\n\x04type\x18\x02 \x01(\x0e\x32\x34.yggdrasil_decision_forests.dataset.proto.ColumnType\x12N\n\ncategorial\x18\x03 \x01(\x0b\x32:.yggdrasil_decision_forests.dataset.proto.CategoricalGuide\x12K\n\tnumerical\x18\x04 \x01(\x0b\x32\x38.yggdrasil_decision_forests.dataset.proto.NumericalGuide\x12K\n\ttokenizer\x18\x05 \x01(\x0b\x32\x38.yggdrasil_decision_forests.dataset.proto.TokenizerGuide\x12 \n\x11\x61llow_multi_match\x18\x06 \x01(\x08:\x05\x66\x61lse\x12\x62\n\x15\x64iscretized_numerical\x18\x07 \x01(\x0b\x32\x43.yggdrasil_decision_forests.dataset.proto.DiscretizedNumericalGuide\x12\x1c\n\rignore_column\x18\x08 \x01(\x08:\x05\x66\x61lse\"\xc8\x02\n\x10\x43\x61tegoricalGuide\x12\x1e\n\x13min_vocab_frequency\x18\x01 \x01(\x05:\x01\x35\x12\x1d\n\x0fmax_vocab_count\x18\x02 \x01(\x05:\x04\x32\x30\x30\x30\x12\x1e\n\x16is_already_integerized\x18\x03 \x01(\x08\x12,\n$number_of_already_integerized_values\x18\x04 \x01(\x03\x12x\n\x1boverride_most_frequent_item\x18\x05 \x01(\x0b\x32S.yggdrasil_decision_forests.dataset.proto.CategoricalGuide.OverrideMostFrequentItem\x1a-\n\x18OverrideMostFrequentItem\x12\x11\n\tstr_value\x18\x05 \x01(\t\"\x10\n\x0eNumericalGuide\"X\n\x0eTokenizerGuide\x12\x46\n\ttokenizer\x18\x01 \x01(\x0b\x32\x33.yggdrasil_decision_forests.dataset.proto.Tokenizer\"V\n\x19\x44iscretizedNumericalGuide\x12\x1d\n\x10maximum_num_bins\x18\x01 \x01(\x03:\x03\x32\x35\x35\x12\x1a\n\x0fmin_obs_in_bins\x18\x02 \x01(\x05:\x01\x33\"\xe1\x03\n\x1c\x44\x61taSpecificationAccumulator\x12^\n\x07\x63olumns\x18\x01 \x03(\x0b\x32M.yggdrasil_decision_forests.dataset.proto.DataSpecificationAccumulator.Column\x1a\xe0\x02\n\x06\x43olumn\x12\x11\n\tkahan_sum\x18\x01 \x01(\x01\x12\x17\n\x0fkahan_sum_error\x18\x02 \x01(\x01\x12\x11\n\tmin_value\x18\x03 \x01(\x01\x12\x11\n\tmax_value\x18\x04 \x01(\x01\x12\x1b\n\x13kahan_sum_of_square\x18\x06 \x01(\x01\x12!\n\x19kahan_sum_of_square_error\x18\x07 \x01(\x01\x12\x86\x01\n\x15\x64iscretized_numerical\x18\x05 \x03(\x0b\x32g.yggdrasil_decision_forests.dataset.proto.DataSpecificationAccumulator.Column.DiscretizedNumericalEntry\x1a;\n\x19\x44iscretizedNumericalEntry\x12\x0b\n\x03key\x18\x01 \x01(\x07\x12\r\n\x05value\x18\x02 \x01(\x05:\x02\x38\x01*\xc9\x01\n\nColumnType\x12\x0b\n\x07UNKNOWN\x10\x00\x12\r\n\tNUMERICAL\x10\x01\x12\x11\n\rNUMERICAL_SET\x10\x02\x12\x12\n\x0eNUMERICAL_LIST\x10\x03\x12\x0f\n\x0b\x43\x41TEGORICAL\x10\x04\x12\x13\n\x0f\x43\x41TEGORICAL_SET\x10\x05\x12\x14\n\x10\x43\x41TEGORICAL_LIST\x10\x06\x12\x0b\n\x07\x42OOLEAN\x10\x07\x12\n\n\x06STRING\x10\x08\x12\x19\n\x15\x44ISCRETIZED_NUMERICAL\x10\t\x12\x08\n\x04HASH\x10\n')
18 _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
19 _builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'yggdrasil_decision_forests.dataset.data_spec_pb2', globals())
TypeError: Couldn't build proto file into descriptor pool: duplicate file name yggdrasil_decision_forests/dataset/data_spec.proto