添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

PutIceberg

Description:

This processor uses Iceberg API to parse and load records into Iceberg tables. The incoming data sets are parsed with Record Reader Controller Service and ingested into an Iceberg table using the configured catalog service and provided table information. The target Iceberg table should already exist and it must have matching schemas with the incoming records, which means the Record Reader schema must contain all the Iceberg schema fields, every additional field which is not present in the Iceberg schema will be ignored. To avoid 'small file problem' it is recommended pre-appending a MergeRecord processor.

Additional Details...

Tags:

iceberg, put, table, store, record, parse, orc, parquet, avro

Properties:

In the list below, the names of required properties appear in bold . Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language .

Display Name API Name Default Value Allowable Values Description
Record Reader record-reader Controller Service API:
RecordReaderFactory
Implementations: SyslogReader
ScriptedReader
Syslog5424Reader
CSVReader
JASN1Reader
YamlTreeReader
ParquetReader
IPFIXReader
GrokReader
JsonTreeReader
ProtobufReader
CEFReader
ExcelReader
XMLReader
AvroReader
CiscoEmblemSyslogMessageReader
ReaderLookup
JsonPathReader
EBCDICRecordReader
WindowsEventLogReader
Specifies the Controller Service to use for parsing incoming data and determining the data's schema.
Catalog Service catalog-service Controller Service API:
IcebergCatalogService
Implementations: HadoopCatalogService
HiveCatalogService
Specifies the Controller Service to use for handling references to table’s metadata files.
Catalog Namespace catalog-namespace The namespace of the catalog.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Table Name table-name The name of the Iceberg table to write to.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Unmatched Column Behavior unmatched-column-behavior Fail on Unmatched Columns
  • Ignore Unmatched Columns Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged
  • Warn on Unmatched Columns Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged
  • Fail on Unmatched Columns A flow will fail if any column in the database that does not have a field in the document.  An error will be logged
If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation.
File Format file-format
  • AVRO
  • PARQUET
  • ORC
File format to use when writing Iceberg data files. If not set, then the 'write.format.default' table property will be used, default value is parquet.
Maximum File Size maximum-file-size The maximum size that a file can be, if the file size is exceeded a new file will be generated with the remaining data. If not set, then the 'write.target-file-size-bytes' table property will be used, default value is 512 MB.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Kerberos User Service kerberos-user-service Controller Service API:
KerberosUserService
Implementations: KerberosTicketCacheUserService
KerberosPasswordUserService
KerberosKeytabUserService
Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos.
Number of Commit Retries number-of-commit-retries 10 Number of times to retry a commit before failing.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Minimum Commit Wait Time minimum-commit-wait-time 100 ms Minimum time to wait before retrying a commit.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Maximum Commit Wait Time maximum-commit-wait-time 2 sec Maximum time to wait before retrying a commit.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Maximum Commit Duration maximum-commit-duration 30 sec Total retry timeout period for a commit.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

Name Value Description
A custom key to add to the snapshot summary. The value must start with 'snapshot-property.' prefix. A custom value to add to the snapshot summary. Adds an entry with custom-key and corresponding value in the snapshot summary. The key format must be 'snapshot-property.custom-key'.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

Name Description
success A FlowFile is routed to this relationship after the data ingestion was successful.
failure A FlowFile is routed to this relationship if the operation failed and retrying the operation will also fail, such as an invalid data or schema.

Reads Attributes:

None specified.

Writes Attributes:

Name Description
iceberg.record.count The number of records in the FlowFile.

State management:

This component does not store state.

Restricted:

This component is not restricted.

System Resource Considerations:

None specified.