Properties supported in this sink are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data sink
Description
Description of the data sink
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.Default: true
Select Fields / Columns
Comma separated list of fields / columns to select from inputs to the sinkExample: first_name, dob, jobDefault: *
Path *
This path where the file is locatedExample: s3a://bucketpath/
Checkpoint Location
Path to checkpoint file locationExample: s3a://path/to/store
SQL to execute on each partition
SQL to execute in streaming mode.Example: Alter customer set active = false where renewed = false
Output Mode
If mode is batch mode, the values should be either of Append, Overwrite, ErrorIfExists or Ignore. If streaming mode, the values should be append, complete or update.Default: ErrorIfExists
Partition By
Comma separated column names to partition byExample: year, month, day
Part Files Per Partition
Number of part files to write per partition column. WARNING Setting this value may degrade performance drastically. It may also increase memory and CPU resource usage.Example: 10,20,2000
Date Format
String that indicates the date format to use when reading dates or timestamps. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to both DateType and TimestampType. By default it is null, which means try to parse times and date by java.sql.Timestamp.valueOf() and java.sql.Date.valueOf().Default: yyyy-MM-dd
Timestamp Format
Sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime Patterns. This applies to timestamp typeDefault: yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
Encoding
Allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE
Line Separator
Defines the line separator that should be used for writing. Default is \n
Time Zone
Sets the string that indicates a time zone ID to be used to parse timestamps in the JSON/CSV datasources or partition values. The following formats of timeZone are supported. Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'. Zone offset: It should be in the format '(+/-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'. Other short names like 'CST' are not recommended to use because they can be ambiguous. If it isn't set, the current value of the SQL config spark.sql.session.timeZone is used by default.
Compression
Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate)Default: none
Ignore Null Fields
Whether to ignore null fields when generating JSON objects.Default: false