Epoch 1/2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-15a576831687> in <module>()
8 model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd")
----> 9 model.fit(X_train, y_train, epochs=2)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1127 except Exception as e: # pylint:disable=broad-except
1128 if hasattr(e, "ag_error_metadata"):
-> 1129 raise e.ag_error_metadata.to_exception(e)
1130 else:
1131 raise
TypeError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 878, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 867, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
TypeError: Exception encountered when calling layer "batch_normalization" (type BatchNormalization).
Input 'y' of 'AddV2' Op has type float32 that does not match type uint8 of argument 'x'.
Call arguments received:
• inputs=tf.Tensor(shape=(32, 784), dtype=uint8)
• training=True
Initial notes from triage:
we don't think we typically cast automatically for users, but perhaps this is a mistake.
@fchollet any thoughts on this?
Hi @LukeWood ,
Thanks for your feedback. I ran some tests by feeding uint8
inputs to almost every type of layer, and here are the results:
The following layers do not complain about the uint8
inputs, and they returnfloat32
outputs:
Average
Dense
Embedding
LeakyReLU
Normalization
RandomHeight
RandomWidth
Rescaling
Resizing
This one returns int64
outputs:
Hashing
These layers return uint8
outputs:
Activation
AlphaDropout
CenterCrop
Concatenate
Cropping1D
Cropping2D
Cropping3D
Dropout
Flatten
GaussianDropout
GaussianNoise
GlobalAveragePooling1D
GlobalAveragePooling2D
GlobalAveragePooling3D
GlobalMaxPooling1D
GlobalMaxPooling2D
GlobalMaxPooling3D
Lambda
Maximum
MaxPooling1D
MaxPooling1D
MaxPooling2D
MaxPooling2D
Minimum
Multiply
Permute
RandomContrast
RandomCrop
RandomFlip
RandomRotation
RandomTranslation
RandomZoom
RepeatVector
Reshape
SpatialDropout1D
SpatialDropout2D
SpatialDropout3D
Subtract
ThresholdedReLU
TimeDistributed
UpSampling1D
UpSampling2D
UpSampling3D
ZeroPadding1D
ZeroPadding2D
ZeroPadding3D
These layers reject the uint8
input and raise an exception:
AdditiveAttention
Attention
AveragePooling1D
AveragePooling2D
AveragePooling3D
BatchNormalization
Conv1D
Conv1DTranspose
Conv2D
Conv2DTranspose
Conv3D
Conv3DTranspose
ConvLSTM1D
ConvLSTM2D
ConvLSTM3D
DepthwiseConv1D
DepthwiseConv2D
Discretization
LayerNormalization
LocallyConnected1D
LocallyConnected2D
LSTMCell
MaxPooling3D
MaxPooling3D
PReLU
SeparableConv1D
SeparableConv2D
SimpleRNNCell
Softmax
I did not test the following layers:
ActivityRegularization
CategoryEncoding
DenseFeatures
GRUCell
IntegerLookup
Masking
MultiHeadAttention
StackedRNNCells
StringLookup
TextVectorization
So it looks like not casting is indeed the default, with some important exceptions, including Average
, Dense
, LeakyReLU
, Normalization
, RandomHeight
, RandomWidth
, Rescaling
, and Resizing
.
So perhaps the decision should be done on a case-by-case basis. Regarding BatchNormalization
, it would be nice to be able to use it as the first layer, with uint8
images as input.
Wdyt?
Thanks for the analysis. I think casting the int input to floats will make sense for BN layer.
For those layer that raises error when input is int, I think majority of them just cast the input to backend.floatx() (unless we have a good reason to raise an error about the invalid input type).
Feel free to send a PR for this issue if you would like to contribute, and we can apply it to all the layers that are applicable.
The BatchNormalization layer should automatically cast integer inputs to floats.
The way casting current works in Keras layers, is that each layer has a "dtype policy" which contains a "variable dtype" and a "compute dtype". By default both are equal to float32, but they have have different values (e.g. in mixed precision you'd use a policy with a float32 variable dtype and a float16 compute dtype.
All layers will cast their inputs to their compute dtype. BUT this only happens for floating point inputs (e.g. casting float64 to float32). In your case no casting happens because the input is integer type.
We probably have two options here:
Extend the rule above to cast all numerical dtypes to the compute dtype. This may well be ok I think?
Doing the casting by hand on a case by case basis.
I'm trying to think if there are cases where 1) would be obviously incorrect. Maybe image preprocessing layers? But even then the rule "cast to compute dtype" is simple and consistent. @mattdangerw I remember we look at this in the context of KPL, do you remember what our conclusion was?
I'd favor doing 1) (at the level of the base layer) unless we find a significant reason why this would be incorrect. However, backwards compatibility constraints might prevent us from doing so for layers where uint8 is currently accepted and returns uint8 outputs.
Casting all numerical types to compute dtype would be incorrect for layers that need to handle categorical/discrete integer inputs. Layers that would fall into that camp are Embedding, IntegerLookup, Hashing, CategoryEncoding, among others.
We do have an option to disable the casting entirely (kwargs['autocast'] = False
). Embedding is a good example, it turns off the autocast option, and casts the outputs, rather than the inputs, to the global compute dtype. That's the general pattern I have been trying to follow in preprocessing layers--cast to compute dtype as soon as possible, which is often after some initial categorical computations. For image preprocessing, I actually think casting early is generally ok.
The place I would most worry 1) would be breaking is for custom embedding layers. I think most of the model garden NLP models would fall into this camp. They use custom embedding layers that do not set the autocast
option, but would fail on non-integer inputs.
Possibly 2) might be the more practical option. There are probably only so many of our layers that will actually be used at the top of a model. We could explicitly cast ints to compute dtype for the layers that only operate on floats.
If we go with 1), we need to make sure we leave an option to turn it off (autocast), and make sure that is well know and documented for users. I worry it could be frustrating for a developer who has a legitimate need for integer inputs (there are many), and who can't figure out why inputs are magically changing under the hood.
Thanks for the detailed explainer, Matt!
Possibly 2) might be the more practical option. There are probably only so many of our layers that will actually be used at the top of a model. We could explicitly cast ints to compute dtype for the layers that only operate on floats.
Doing 2) is correct in any case and does not preclude doing 1) in the future, so I would recommend doing 2) right now (at least for BatchNorm and possibly a few more), and open a ticket for future investigation of a more generalized behavior.