Snowflake

Snowflake database data source

Properties

Properties supported in this source are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data source
Description
Description of the data source
Connection *
Pre-defined Snowflake connection
Database *
Database in the Snowflake account that you are attempting to accessExample: EMPLOYEES
Table *
Table to read from
Database Schema *
Schema within the database that you are attempting to access
Schema
Source schema to assist during the design of the pipeline
Warehouse
The default virtual warehouse to use for the session after connecting
Role
The default security role to use for the session after connecting.Example: ACCOUNTADMIN
Select Fields / Columns
Comma separated list of fields / column names to select from sourceDefault: *
Filter Expression
SQL where clause for filtering recordsExample: date = '2022-01-01',year=22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column valuesDefault: false
Timezone
The time zone to be used by Snowflake<br/spark: Use the time zone from Sparksnowflake: Use the current time zone for Snowflakesf_default: Use the default time zone for the Snowflake user who is connectingUse a specific time zoneExample: snowflake,sf_default,America/New_YorkDefault: spark
Compress
If set to on, the data passed between Snowflake and Spark is compressedDefault: on
S3 Max File Size
The size of the file used when moving data from Snowflake to Spark. The default is 10MB.Example: 100Default: 10
Pre Actions
A semicolon-separated list of SQL commands that are executed before data is transferred between Spark and Snowflake
Post Actions
A semicolon-separated list of SQL commands that are executed before data is transferred between Spark and Snowflake
Keep Column Case
When writing a table from Spark to Snowflake, the Spark connector defaults to shifting the letters in column names to uppercase, unless the column names are in double quotes.Default: off
Continue on Error
This variable controls whether the COPY command aborts if the user enters invalid data (for example, invalid JSON format for a variant data type column).Default: off
Use Staging Table
This parameter controls whether data loading uses a staging table.Default: on
Auto Pushdown
This parameter controls whether automatic query pushdown is enabledDefault: off
Parallelism
The size of the thread pool to use for data uploads and downloads between Snowflake and SparkExample: 8Default: 4
Partition Size in MB
This parameter is used when the query result set is very large and needs to be split into multiple DataFrame partitions. This parameter specifies the recommended uncompressed size for each DataFrame partition. To reduce the number of partitions, make this size larger.Example: 10Default: 100
Use Copy Unload
If this is FALSE, Snowflake uses the Arrow data format when SELECTing data. If this is set to TRUE, then Snowflake reverts to the old behavior of using the COPY UNLOAD command to transmit selected data.Default: false
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given stringExample: _