Parquet

Parquet file data sink

Properties

Properties supported in this sink are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data sink
Description
Description of the data sink
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.Default: true
Select Fields / Columns
Comma separated list of fields / columns to select from inputs to the sinkExample: year, month, dayDefault: *
Path *
Path where the file is locatedExample: s3://[bucketpath],hdfs://[URL]
Output Mode
If mode is batch mode, the values should be either of Append, Overwrite, ErrorIfExists or Ignore. If streaming mode, the values should be append, complete or update.Default: ErrorIfExists
Checkpoint Location
Path to checkpoint file locationExample: s3a://path/to/store,hdfs://path/to/store
SQL to Execute on Each Partition
SQL to execute in streaming mode
Partition By
Comma separated column names to partition byExample: year, month, day
Part Files Per Partition
Number of part files to write per partition column. WARNING Setting this value may degrade performance drastically. It may also increase memory and CPU resource usage.Example: 10,20,2000
Compression
Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd)Default: snappy