A
data type
describes the logical type of a value in the table ecosystem.
It can be used to declare input and/or output types of operations.
Flink’s data types are similar to the SQL standard’s
data type
terminology but also contain information
about the nullability of a value for efficient handling of scalar expressions.
A list of all pre-defined data types can be found
below
.
Data Types in the Table API
Users of the JVM-based API work with instances of
org.apache.flink.table.types.DataType
within the Table API or when
defining connectors, catalogs, or user-defined functions.
A
DataType
instance has two responsibilities:
Declaration of a logical type
which does not imply a concrete physical representation for transmission
or storage but defines the boundaries between JVM-based/Python languages and the table ecosystem.
Optional:
Giving hints about the physical representation of data to the planner
which is useful at the edges to other APIs.
For JVM-based languages, all pre-defined data types are available in
org.apache.flink.table.api.DataTypes
.
Users of the Python API work with instances of
pyflink.table.types.DataType
within the Python Table API or when
defining Python user-defined functions.
A
DataType
instance has such a responsibility:
Declaration of a logical type
which does not imply a concrete physical representation for transmission
or storage but defines the boundaries between Python languages and the table ecosystem.
For Python language, those types are available in
pyflink.table.types.DataTypes
.
It is recommended to add a star import to your table programs for having a fluent API:
frompyflink.table.typesimportDataTypest=DataTypes.INTERVAL(DataTypes.DAY(),DataTypes.SECOND(3))
Data Type and Python Type Mapping
For Python user-defined functions, the inputs will be converted to Python objects corresponding to the data type and the type of the user-defined function result must also match the defined data type.
For vectorized Python UDFs, the input types and output type are pandas.Series. The element type of the pandas.Series corresponds to the specified data type.
Data Type
Python Type
Pandas Type
Physical hints are required at the edges of the table ecosystem where the SQL-based type system ends and
programming-specific data types are required. Hints indicate the data format that an implementation
expects.
For example, a data source could express that it produces values for logical TIMESTAMPs using a java.sql.Timestamp class
instead of using java.time.LocalDateTime which would be the default. With this information, the runtime is able to convert
the produced class into its internal data format. In return, a data sink can declare the data format it consumes from the runtime.
Here are some examples of how to declare a bridging conversion class:
// tell the runtime to not produce or consume java.time.LocalDateTime instances// but java.sql.TimestampDataTypet=DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class);// tell the runtime to not produce or consume boxed integer arrays// but primitive int arraysDataTypet=DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(int[].class);
// tell the runtime to not produce or consume java.time.LocalDateTime instances
// but java.sql.Timestamp
valt:DataType=DataTypes.TIMESTAMP(3).bridgedTo(classOf[java.sql.Timestamp])// tell the runtime to not produce or consume boxed integer arrays
// but primitive int arrays
valt:DataType=DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(classOf[Array[Int]])
Attention Please note that physical hints are usually only required if the
API is extended. Users of predefined sources/sinks/functions do not need to define such hints. Hints within
a table program (e.g. field.cast(TIMESTAMP(3).bridgedTo(Timestamp.class))) are ignored.
List of Data Types
This section lists all pre-defined data types.
For the JVM-based Table API those types are also available in org.apache.flink.table.api.DataTypes.
For the Python Table API, those types are available in pyflink.table.types.DataTypes.
The default planner supports the following set of SQL types:
Data Type
Remarks for Data Type
The type can be declared using CHAR(n) where n is the number of code points. n must have a value between 1
and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
VARCHAR / STRING
Data type of a variable-length character string.
Declaration
DataTypes.STRING()
Attention The specified maximum number of code points n in DataTypes.VARCHAR(n) must be 2,147,483,647 currently.
The type can be declared using VARCHAR(n) where n is the maximum number of code points. n must have a value
between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
STRING is a synonym for VARCHAR(2147483647).
Binary Strings
BINARY
Data type of a fixed-length binary string (=a sequence of bytes).
Declaration
The type can be declared using BINARY(n) where n is the number of bytes. n must have a value
between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
VARBINARY / BYTES
Data type of a variable-length binary string (=a sequence of bytes).
Declaration
DataTypes.BYTES()
Attention The specified maximum number of bytes n in DataTypes.VARBINARY(n) must be 2,147,483,647 currently.
The type can be declared using
VARBINARY(n)
where
n
is the maximum number of bytes.
n
must
have a value between
1
and
2,147,483,647
(both inclusive). If no length is specified,
n
is
equal to
1
.
BYTES
is a synonym for
VARBINARY(2147483647)
.
Exact Numerics
DECIMAL
Data type of a decimal number with fixed precision and scale.
Declaration
DataTypes.DECIMAL(p,s)
Attention
The
precision
and
scale
specified in
DataTypes.DECIMAL(p, s)
must be
38
and
18
separately currently.
The type can be declared using
DECIMAL(p, s)
where
p
is the number of digits in a
number (
precision
) and
s
is the number of digits to the right of the decimal point
in a number (
scale
).
p
must have a value between
1
and
38
(both inclusive).
s
must have a value between
0
and
p
(both inclusive). The default value for
p
is 10.
The default value for
s
is
0
.
Note
: The definitions of precision and scale are inconsistent between the SQL standard and
Java’s BigDecimal. For example, the exact value 0.011 is treated as
DECIMAL(4, 3)
in SQL, whereas
its BigDecimal representation has a precision of 2 and a scale of 3.
NUMERIC(p, s)
and
DEC(p, s)
are synonyms for this type.
TINYINT
Data type of a 1-byte signed integer with values from
-128
to
127
.
Declaration
Data type of an 8-byte signed integer with values from
-9,223,372,036,854,775,808
to
9,223,372,036,854,775,807
.
Declaration
Data type of a 4-byte single precision floating point number.
Compared to the SQL standard, the type does not take parameters.
Declaration
Data type of a date consisting of
year-month-day
with values ranging from
0000-01-01
to
9999-12-31
.
Compared to the SQL standard, the range starts at year
0000
.
Declaration
Data type of a time
without
time zone consisting of
hour:minute:second[.fractional]
with
up to nanosecond precision and values ranging from
00:00:00.000000000
to
23:59:59.999999999
.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported as
the semantics are closer to
java.time.LocalTime
. A time
with
time zone is not provided.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported.
A time
with
time zone is not provided.
Declaration
Describes the number of milliseconds of the day.
Output only if type is not nullable.
java.lang.Long
Describes the number of nanoseconds of the day.
Describes the number of nanoseconds of the day.
Output only if type is not nullable.
DataTypes.TIME(p)
Attention
The
precision
specified in
DataTypes.TIME(p)
must be
0
currently.
The type can be declared using
TIME(p)
where
p
is the number of digits of fractional
seconds (
precision
).
p
must have a value between
0
and
9
(both inclusive). If no
precision is specified,
p
is equal to
0
.
TIMESTAMP
Data type of a timestamp
without
time zone consisting of
year-month-day hour:minute:second[.fractional]
with up to nanosecond precision and values ranging from
0000-01-01 00:00:00.000000000
to
9999-12-31 23:59:59.999999999
.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported as
the semantics are closer to
java.time.LocalDateTime
.
A conversion from and to
BIGINT
(a JVM
long
type) is not supported as this would imply a time
zone. However, this type is time zone free. For more
java.time.Instant
-like semantics use
TIMESTAMP_LTZ
.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported.
A conversion from and to
BIGINT
is not supported as this would imply a time zone.
However, this type is time zone free. If you have such a requirement please use
TIMESTAMP_LTZ
.
Declaration
DataTypes.TIMESTAMP(p)
Attention
The
precision
specified in
DataTypes.TIMESTAMP(p)
must be
3
currently.
The type can be declared using
TIMESTAMP(p)
where
p
is the number of digits of fractional
seconds (
precision
).
p
must have a value between
0
and
9
(both inclusive). If no precision
is specified,
p
is equal to
6
.
TIMESTAMP(p) WITHOUT TIME ZONE
is a synonym for this type.
TIMESTAMP WITH TIME ZONE
Data type of a timestamp
with
time zone consisting of
year-month-day hour:minute:second[.fractional] zone
with up to nanosecond precision and values ranging from
0000-01-01 00:00:00.000000000 +14:59
to
9999-12-31 23:59:59.999999999 -14:59
.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported as the semantics
are closer to
java.time.OffsetDateTime
.
Compared to the SQL standard, leap seconds (
23:59:60
and
23:59:61
) are not supported.
Compared to
TIMESTAMP_LTZ
, the time zone offset information is physically
stored in every datum. It is used individually for every computation, visualization, or communication
to external systems.
Declaration
The type can be declared using
TIMESTAMP(p) WITH TIME ZONE
where
p
is the number of digits of
fractional seconds (
precision
).
p
must have a value between
0
and
9
(both inclusive). If no
precision is specified,
p
is equal to
6
.
TIMESTAMP_LTZ
Data type of a timestamp
with local
time zone consisting of
year-month-day hour:minute:second[.fractional] zone
with up to nanosecond precision and values ranging from
0000-01-01 00:00:00.000000000 +14:59
to
9999-12-31 23:59:59.999999999 -14:59
.
Leap seconds (
23:59:60
and
23:59:61
) are not supported as the semantics are closer to
java.time.OffsetDateTime
.
Compared to
TIMESTAMP WITH TIME ZONE
, the time zone offset information is not stored physically
in every datum. Instead, the type assumes
java.time.Instant
semantics in UTC time zone at
the edges of the table ecosystem. Every datum is interpreted in the local time zone configured in
the current session for computation and visualization.
Leap seconds (
23:59:60
and
23:59:61
) are not supported.
Compared to
TIMESTAMP WITH TIME ZONE
, the time zone offset information is not stored physically
in every datum.
Every datum is interpreted in the local time zone configured in the current session for computation and visualization.
This type fills the gap between time zone free and time zone mandatory timestamp types by allowing
the interpretation of UTC timestamps according to the configured session time zone.
Java Type
Input
Output
Remarks
Describes the number of milliseconds since epoch.
Output only if type is not nullable.
java.sql.Timestamp
Describes the number of milliseconds since epoch.
org.apache.flink.table.data.TimestampData
Internal data structure.
Attention
The
precision
specified in
DataTypes.TIMESTAMP_LTZ(p)
must be
3
currently.
The type can be declared using
TIMESTAMP_LTZ(p)
where
p
is the number
of digits of fractional seconds (
precision
).
p
must have a value between
0
and
9
(both inclusive). If no precision is specified,
p
is equal to
6
.
TIMESTAMP(p) WITH LOCAL TIME ZONE
is a synonym for this type.
INTERVAL YEAR TO MONTH
Data type for a group of year-month interval types.
The type must be parameterized to one of the following resolutions:
interval of years,
interval of years to months,
or interval of months.
An interval of year-month consists of
+years-months
with values ranging from
-9999-11
to
+9999-11
.
The value representation is the same for all types of resolutions. For example, an interval
of months of 50 is always represented in an interval-of-years-to-months format (with default
year precision):
+04-02
.
The type can be declared using the above combinations where p is the number of digits of years
(year precision). p must have a value between 1 and 4 (both inclusive). If no year precision
is specified, p is equal to 2.
INTERVAL DAY TO SECOND
Data type for a group of day-time interval types.
The type must be parameterized to one of the following resolutions with up to nanosecond precision:
interval of days,
interval of days to hours,
interval of days to minutes,
interval of days to seconds,
interval of hours,
interval of hours to minutes,
interval of hours to seconds,
interval of minutes,
interval of minutes to seconds,
or interval of seconds.
An interval of day-time consists of +days hours:months:seconds.fractional with values ranging from
-999999 23:59:59.999999999 to +999999 23:59:59.999999999. The value representation is the same
for all types of resolutions. For example, an interval of seconds of 70 is always represented in
an interval-of-days-to-seconds format (with default precisions): +00 00:01:10.000000.
The type can be declared using the above combinations where p1 is the number of digits of days
(day precision) and p2 is the number of digits of fractional seconds (fractional precision).
p1 must have a value between 1 and 6 (both inclusive). p2 must have a value between 0
and 9 (both inclusive). If no p1 is specified, it is equal to 2 by default. If no p2 is
specified, it is equal to 6 by default.
Constructured Data Types
ARRAY
Data type of an array of elements with same subtype.
Compared to the SQL standard, the maximum cardinality of an array cannot be specified but is
fixed at 2,147,483,647. Also, any valid type is supported as a subtype.
Declaration
The type can be declared using ARRAY<t> where t is the data type of the contained
elements.
t ARRAY is a synonym for being closer to the SQL standard. For example, INT ARRAY is
equivalent to ARRAY<INT>.
Data type of an associative array that maps keys (including NULL) to values (including NULL). A map
cannot contain duplicate keys; each key can map to at most one value.
There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.
The map type is an extension to the SQL standard.
Declaration
The type can be declared using MAP<kt, vt> where kt is the data type of the key elements
and vt is the data type of the value elements.
MULTISET
Data type of a multiset (=bag). Unlike a set, it allows for multiple instances for each of its
elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity.
There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.
Declaration
The type can be declared using MULTISET<t> where t is the data type
of the contained elements.
t MULTISET is a synonym for being closer to the SQL standard. For example, INT MULTISET is
equivalent to MULTISET<INT>.
Data type of a sequence of fields.
A field consists of a field name, field type, and an optional description. The most specific type
of a row of a table is a row type. In this case, each column of the row corresponds to the field
of the row type that has the same ordinal position as the column.
Compared to the SQL standard, an optional field description simplifies the handling with complex
structures.
A row type is similar to the STRUCT type known from other non-standard-compliant frameworks.
The type can be declared using ROW<n0 t0 'd0', n1 t1 'd1', ...> where n is the unique name of
a field, t is the logical type of a field, d is the description of a field.
ROW(...) is a synonym for being closer to the SQL standard. For example, ROW(myField INT, myOtherField BOOLEAN) is
equivalent to ROW<myField INT, myOtherField BOOLEAN>.
User-Defined Data Types
STRUCTURED
Data type for a user-defined object.
Compared to ROW, which may also be considered a “struct-like” type, structured types are distinguishable even if they
contain the same set of fields. For example, Visit(amount DOUBLE) is distinct from Interaction(amount DOUBLE) due
its identifier.
Similar to classes in object-oriented programming languages, structured types are identified by a class name and contain
zero, one or more attributes. Each attribute has a name, a type, and an optional description. A type cannot be defined
in such a way that one of its attribute types (transitively) refers to itself.
Structured types are internally converted by the system into suitable data structures. Serialization and equality checks
are managed by the system based on the logical type.
The type can be declared using STRUCTURED<'c', n0 t0 'd0', n1 t1 'd1', ...> where c is the class name, n is the
unique name of a field, t is the logical type of a field, d is the optional description of a field.
Usually structured types are defined inline and can be reflectively extracted from a corresponding implementation class.
For example, in the signature of an eval() method for functions. This is useful when programmatically defining a table
program. They enable reusing existing JVM classes without manually defining the schema of a data type again.
If the class name matches a class in the classpath, the system will convert a structured object to a JVM object at the edges
of the table ecosystem (e.g. when bridging to a function or connector). The implementation class must provide either a
zero-argument constructor or a full constructor that assigns all attributes.
But the class name does not need to be resolvable in the classpath, it may be used solely to distinguish between objects with
identical attribute sets. However, in Table API and UDF calls, the system will attempt to resolve the class name to an
actual implementation class. If resolution fails, Row is used as a fallback.
Inline structured types can be created from regular POJOs (Plain Old Java Objects) if the implementation class meets the
following requirements:
The class must be globally accessible which means it must be declared public, static, and not abstract.
The class must offer a default constructor with zero arguments or a full constructor that assigns all
fields.
All fields of the class must be readable by either public declaration or a getter that follows common
coding style such as getField(), isField(), field().
All fields of the class must be writable by either public declaration, fully assigning constructor,
or a setter that follows common coding style such as setField(...), field(...).
All fields must be mapped to a data type either implicitly via reflective extraction or explicitly
using the @DataTypeHintannotations.
Fields that are declared static or transient are ignored.
The reflective extraction supports arbitrary nesting of fields as long as a field type does not
(transitively) refer to itself.
The declared field class (e.g. public int age;) must be contained in the list of supported JVM
bridging classes defined for every data type in this document (e.g. java.lang.Integer or int for INT).
For some classes an annotation is required in order to map the class to a data type (e.g. @DataTypeHint("DECIMAL(10, 2)")
to assign a fixed precision and scale for java.math.BigDecimal).
Notsupported.
Declaration
Structured types are usually declared via their implementation classes:
// A simple POJO that qualifies as a structured type.// Note: Without a fully assigning constructor, the order of fields will be alphabetical.// The final data type will be:// STRUCTURED<'com.myorg.Customer', active BOOLEAN, id INT NOT NULL, name STRING, properties MAP<STRING, STRING>>classCustomer{publicintid;publicStringname;publicMap<String,String>properties;publicbooleanactive;// A POJO with a fully assigning constructor defining the field order.// The final data type will be:// STRUCTURED<'com.myorg.Customer', id INT NOT NULL, name STRING, properties MAP<STRING, STRING>, active BOOLEAN>classCustomer{publicintid;publicStringname;publicMap<String,String>properties;publicbooleanactive;publicCustomer(intid,Stringname,Map<String,String>properties,booleanactive){this.id=id;this.name=name;this.properties=properties;this.active=active;// A POJO that uses the @DataTypeHint annotations for supporting the reflective extraction.// The final data type will be:// STRUCTURED<'com.myorg.Customer', age INT NOT NULL, modelClass RAW(...), name STRING, totalBalance DECIMAL(10, 2)>classCustomer{// extract fields automaticallypublicintage;publicStringname;// enrich the extraction with precision informationpublic@DataTypeHint("DECIMAL(10, 2)")BigDecimaltotalBalance;// enrich the extraction with forcing using RAW typespublic@DataTypeHint("RAW")Class<?>modelClass;
Or via explicit declaration:
// Provide an implementation classDataTypes.STRUCTURED(MyPojo.class,DataTypes.FIELD(n0,t0),DataTypes.FIELD(n1,t1),...);// Provide a class name only, the class is resolved only if available in the classpathDataTypes.STRUCTURED("com.myorg.MyPojo",DataTypes.FIELD(n0,t0),DataTypes.FIELD(n1,t1),...);// Full exampleDataTypes.
STRUCTURED(Customer.class,DataTypes.FIELD("age",DataTypes.INT().notNull()),DataTypes.FIELD("name",DataTypes.STRING())
Or via explicit extraction:
DataTypes.of(Class);// For example:DataTypes.of(Customer.class);
Bridging to JVM Types
Java Type
Input
Output
Remarks
Data type of a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN.
Declaration
Data type for describing an arbitrary, unvalidated list of columns.
This type is the return type of calls to DESCRIPTOR(`c0`, `c1`). The type is
intended to be used in arguments of process table functions (PTFs).
The runtime does not support this type. It is a pure helper type during translation
and planning. Table columns cannot be declared with this type. Functions cannot declare
return types of this type.
Declaration
Data type of semi-structured data.
The type supports storing any semi-structured data, including ARRAY, MAP(with keys of type
STRING), and scalar types. The data type of the fields are stored in the data structure, which is
close to the semantics of JSON. Compared to ROW and STRUCTURED type, VARIANT type has the
flexibility to support highly nested and evolving schema.
VARIANT allows for deeply nested data structures, such as arrays within arrays, maps within maps,
or combinations of both.This capability makes VARIANT ideal for scenarios where data complexity
and nesting are significant.
VARIANT allows schema evolution, enabling the storage of data with changing or unknown schemas
without requiring upfront schema definition. For example, if a new field is added to the data, it
can be directly incorporated into the VARIANT data without modifying the table schema. This is
particularly useful in dynamic environments where schemas may evolve over time.
Declaration
VARIANT
Variant type is usually produced by the PARSE_JSON function. For example:
SELECTPARSE_JSON('{"a":1,"b":["a","b","c"]}')ASv
DataTypes.VARIANT()
Bridging to JVM Types
Java Type
Input
Output
Remarks
Data type of bitmap data that stores 32-bit integers in a compressed form using RoaringBitmap.
The bitmap type is useful for efficiently representing and querying large sets of integers. It supports
a variety of built-in scalar functions and
aggregate functions.
The bitmap type is an extension to the SQL standard.
Declaration
BITMAP
Bitmap type can be created from an ARRAY<INT> using the BITMAP_BUILD function. For example:
SELECTBITMAP_BUILD(ARRAY[1,2,3,4,5])
DataTypes.BITMAP()
Bridging to JVM Types
Java Type
Input
Output
Remarks
Data type of an arbitrary serialized type. This type is a black box within the table ecosystem
and is only deserialized at the edges.
The raw type is an extension to the SQL standard.
Declaration
The type can be declared using RAW('class', 'snapshot') where class is the originating class and
snapshot is the serialized TypeSerializerSnapshot in Base64 encoding. Usually, the type string is not
declared directly but is generated while persisting the type.
In the API, the RAW type can be declared either by directly supplying a Class + TypeSerializer or
by passing Class and letting the framework extract Class + TypeSerializer from there.
Data type for representing untyped NULL values.
The null type is an extension to the SQL standard. A null type has no other value
except NULL, thus, it can be cast to any nullable type similar to JVM semantics.
This type helps in representing unknown types in API calls that use a NULL literal
as well as bridging to formats such as JSON or Avro that define such a type as well.
This type is not very useful in practice and is just mentioned here for completeness.
Declaration
Flink Table API and SQL can perform casting between a defined input type and target type. While some
casting operations can always succeed regardless of the input value, others can fail at runtime
(i.e. where there is no way to create a value for the target type). For example, it is always
possible to convert INT to STRING, but you cannot always convert a STRING to INT.
During the planning stage, the query validator rejects queries for invalid type pairs with
a ValidationException, e.g. when trying to cast a TIMESTAMP to an INTERVAL.
Valid type pairs that can fail at runtime will be accepted by the query validator,
but requires the user to correctly handle failures.
In Flink Table API and SQL, casting can be performed by using one of the two following built-in functions:
CAST: The regular cast function defined by the SQL standard. It can fail the job if the cast operation is fallible and the provided input is not valid. The type inference will preserve the nullability of the input type.
TRY_CAST: An extension to the regular cast function which returns NULL in case the cast operation fails. Its return type is always nullable.
For example:
CAST('42'ASINT)--- returns 42 of type INT NOT NULL
CAST(NULLASVARCHAR)--- returns NULL of type VARCHAR
CAST('non-number'ASINT)--- throws an exception and fails the job
TRY_CAST('42'ASINT)--- returns 42 of type INT
TRY_CAST(NULLASVARCHAR)--- returns NULL of type VARCHAR
TRY_CAST('non-number'ASINT)--- returns NULL of type INT
COALESCE(TRY_CAST('non-number'ASINT),0)--- returns 0 of type INT NOT NULL
The matrix below describes the supported cast pairs, where “Y” means supported, “!” means fallible, “N” means unsupported:
All the casting to constant length or variable length will also trim and pad accordingly to the type definition.
TO_TIMESTAMP and TO_TIMESTAMP_LTZ must be used instead of CAST/TRY_CAST.
Supported iff the children type pairs are supported. Fallible iff the children type pairs are fallible.
Supported iff the RAW class and serializer are equals.
Supported iff INTERVAL is a MONTH TO YEAR range.
Supported iff INTERVAL is a DAY TO TIME range.
Supported only for unbounded VARBINARY (BYTES), because trimming or padding would corrupt the serialized bitmap data.
Also note that a cast of a NULL value will always return NULL,
regardless of whether the function used is CAST or TRY_CAST.
Legacy casting
Pre Flink 1.15 casting behaviour can be enabled by setting table.exec.legacy-cast-behaviour to enabled.
In Flink 1.15 this flag is disabled by default.
In particular, this will:
Disable trimming/padding for casting to CHAR/VARCHAR/BINARY/VARBINARY
CAST never fails but returns NULL, behaving as TRY_CAST but without inferring the correct type
Formatting of some casting to CHAR/VARCHAR/STRING produces slightly different results.
We discourage the use of this flag and we strongly suggest for new projects to keep this flag disabled and use the new casting behaviour.
This flag will be removed in the next Flink versions.
Data Type Extraction
At many locations in the API, Flink tries to automatically extract data type from class information using
reflection to avoid repetitive manual schema work. However, extracting a data type reflectively is not always
successful because logical information might be missing. Therefore, it might be necessary to add additional
information close to a class or field declaration for supporting the extraction logic.
The following table lists classes that can be implicitly mapped to a data type without requiring further information.
If you intend to implement classes in Scala, it is recommended to use boxed types (e.g. java.lang.Integer)
instead of Scala’s primitives. Scala’s primitives (e.g. Int or Double) are compiled to JVM primitives (e.g.
int/double) and result in NOT NULL semantics as shown in the table below. Furthermore, Scala primitives that
are used in generics (e.g. java.util.Map[Int, Double]) are erased during compilation and lead to class
information similar to java.util.Map[java.lang.Object, java.lang.Object].
Class
Data Type
Other JVM bridging classes mentioned in this document require a @DataTypeHint annotation.
Data type hints can parameterize or replace the default extraction logic of individual function parameters
and return types, structured classes, or fields of structured classes. An implementer can choose to what
extent the default extraction logic should be modified by declaring a @DataTypeHint annotation.
The @DataTypeHint annotation provides a set of optional hint parameters. Some of those parameters are shown in the
following example. More information can be found in the documentation of the annotation class.
classUser{// defines an INT data type with a default conversion class `java.lang.Integer`public@DataTypeHint("INT")Objecto;// defines a TIMESTAMP data type of millisecond precision with an explicit conversion classpublic@DataTypeHint(value="TIMESTAMP(3)",bridgedTo=java.sql.Timestamp.class)Objecto;// enrich the extraction with forcing using a RAW typepublic@DataTypeHint("RAW")Class<?>modelClass;// defines that all occurrences of java.math.BigDecimal (also in nested fields) will be// extracted as DECIMAL(12, 2)public@DataTypeHint(defaultDecimalPrecision=12,defaultDecimalScale=2)AccountStatementstmt;// defines that whenever a type cannot be mapped to a data type, instead of throwing// an exception, always treat it as a RAW typepublic@DataTypeHint(allowRawGlobally=HintFlag.TRUE)ComplexModelmodel;
importorg.apache.flink.table.annotation.DataTypeHintclassUser{// defines an INT data type with a default conversion class `java.lang.Integer`
@DataTypeHint("INT")varo:AnyRef// defines a TIMESTAMP data type of millisecond precision with an explicit conversion class
@DataTypeHint(value="TIMESTAMP(3)",bridgedTo=java.sql.Timestamp.class)varo:AnyRef// enrich the extraction with forcing using a RAW type
@DataTypeHint("RAW")varmodelClass:Class[_]// defines that all occurrences of java.math.BigDecimal (also in nested fields) will be
// extracted as DECIMAL(12, 2)
@DataTypeHint(defaultDecimalPrecision=12,defaultDecimalScale=2)varstmt:AccountStatement// defines that whenever a type cannot be mapped to a data type, instead of throwing
// an exception, always treat it as a RAW type
@DataTypeHint(allowRawGlobally=HintFlag.TRUE)varmodel:ComplexModel