Redshift

Redshift database data source

Properties

Properties supported in this source are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data source
Description
Description of the data source
Connection *
Pre-defined Redshift connection
Table or Query
Database table that should be read OR a query that will be used to read data from the Redshift sourceExample: dbtable
Schema
Source schema to assist during the design of the pipeline
Select Fields / Columns
Comma separated list of fields / column names to select from sourceDefault: *
Filter Expression
SQL where clause for filtering recordsExample: date = '2022-01-01',year=22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column valuesDefault: false
Distribution Style
Distribution style to be used when creating a table. When using KEY, you must also set a distribution key
Distribution Key
The name of a column in the table to use as the distribution key when creating a table
Sort Key Spec
Sort Keys supported by Redshift
Include Column List
Default: false
Preactions
A semicolon-separated list of SQL commands that are executed before data is transferred between Spark and Redshift
Postactions
A semicolon-separated list of SQL commands that are executed after data is transferred between Spark and Redshift
Extra Copy Options
A list extra options to append to the Redshift COPY command when loading data, e.g. TRUNCATECOLUMNS or MAXERROR (see the Redshift docs for other options)
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given stringExample: _