CSV

CSV file data sink

Properties

Properties supported in this sink are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data sink
Description
Description of the data sink
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.Default: true
Select Fields / Columns
Comma separated list of fields / columns to select from inputs to the sinkExample: id, name, city, state, zipDefault: *
Path *
This path where the file is locatedExample: hdfs://[URL],s3a://[bucketpath]
Checkpoint Location
Path to checkpoint file locationExample: s3a://path/to/store,hdfs://path/to/store
SQL to execute on each partition
SQL to execute in streaming mode.Example: Alter customer set active = false where renewed = false
Output Mode
If mode is batch mode, the values should be either of Append, Overwrite, ErrorIfExists or Ignore. If streaming mode, the values should be append, complete or update.Default: ErrorIfExists
Partition By
Comma separated column names to partition byExample: year, month, day
Part Files Per Partition
Number of part files to write per partition column. WARNING Setting this value may degrade performance drastically. It may also increase memory and CPU resource usage.Example: 10,20,2000
Separator
Sets a separator for each field and value. This separator can be one or more characters.Default: ,
Encoding
Specifies encoding of saved CSV filesDefault: UTF-8
Quote
Sets a single character used for escaping quoted values where the separator can be part of the value. If an empty string is set, it uses u0000 (null character).Default: "
Quote All
A flag indicating whether all values should always be enclosed in quotes. Default is to only escape values containing a quote character.Default: false
Escape
Sets a single character used for escaping quotes inside an already quoted value.Default: |
Escape Quotes
A flag indicating whether values containing quotes should always be enclosed in quotesDefault: true
Header
Writes the names of columns as the first line. Note that if the given path is a RDD of Strings, this header option will remove all lines same with the header if exists.Default: false
Ignore Leading WhiteSpace
A flag indicating whether or not leading whitespaces from values being written should be skipped.Default: true
Ignore Trailing WhiteSpace
A flag indicating whether or not trailing whitespaces from values being written should be skipped.Default: true
Null Value
Sets the string representation of a null value
DateFormat
Sets the string that indicates a date format. Custom date formats follow the formats at Datetime Patterns Help Section. This applies to date type.Default: yyyy-MM-dd
Timestamp Format
Sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime Patterns Help Section. This applies to timestamp type.Default: yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
Character To Escape Quote Escaping
Sets a single character used for escaping the escape for the quote character. The default value is escape character when escape and quote characters are different, \0 otherwise.
Empty Value
Sets the string representation of an empty value
Compression
Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).