Hudi

Hudi table data source

Properties

Properties supported in this source are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data source
Description
Description of the data source
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.Default: true
Table *
Hudi table
Schema
Source schema to assist during the design of the pipeline
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given stringExample: _
Select Fields / Columns
Comma separated list of fields / columns to select from inputs to the sinkExample: id, name, city, state, zipDefault: *
Filter Expression
SQL where clause for filtering recordsExample: date = '2022-01-01',year=22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column valuesDefault: false
Cache
MEMORY_ONLY: Persist data in memory only in deserialized formatMEMORY_AND_DISK: Persist data in memory and if enough memory is not available evicted blocks will be stored on diskMEMORY_ONLY_SER: Same as MEMORY_ONLY but difference being it persists in serialized format. This is generally more space-efficient than deserialized format, but more CPU-intensive to read.MEMORY_AND_DISK_SER: Same as MEMORY_AND_DISK storage level difference being it persists in serialized formatDISK_ONLY: Persist the data partitions only on diskMEMORY_ONLY_2, MEMORY_AND_DISK_2: Same as the levels above, but replicate each partition on two cluster nodesOFF_HEAP: Similar to MEMORY_ONLY_SER, but store the data in off-heap memory. This requires off-heap memory to be enabledDefault: NONE