Delta

Delta file data sink

Properties

Properties supported in this sink are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data sink
Description
Description of the data sink
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.Default: true
Select Fields / Columns
Comma separated list of fields / columns to select from inputs to the sinkExample: name, city, countryDefault: *
Path *
Path where the file is locatedExample: s3a://[bucketpath]/load.csv
Output Mode
If mode is batch mode, the values should be either of Append, Overwrite, ErrorIfExists or Ignore. If streaming mode, the values should be append, complete or update.Default: ErrorIfExists
Checkpoint Location
Path to checkpoint directory and fileExample: hdfs://hdfs_path/some_folder,s3a://output_bucket/some_folderDefault:
SQL to execute on each partition
SQL to execute in delta streaming modeExample: select job from employee where city = 'NYC'Default:
Replace Where SQL
SQL expression for replacing data for overwriteExample: Alter employee WHERE date >= '2017-01-01' AND date <= '2017-01-31'
Overwrite Schema
Select to overwrite schema on write. Changing a column’s type or name or dropping a column requires rewriting the table. To do this, use the Overwrite Schema optionDefault: false
Merge Schema
Should the schema be merged into existing one? Columns that are present in the DataFrame but missing from the table are automatically added as part of a write transaction. Note: spark.databricks.deltaschema.autoMerge.enabled needs to be true.Default: false
Partition By
Comma separated column names to partition byExample: year, month, dayDefault:
Part Files Per Partition
Number of part files to write per partition column. WARNING Setting this value may degrade performance drastically. It may also increase memory and CPU resource usage.Example: 10,20,2000Default: