Snowflake

Snowflake database data source
PropertiesProperties supported in this source are shown below ( * indicates required fields )
Property
Description
Name * 
Name of the data source
Description
Description of the data source
Connection * 
Pre-defined Snowflake connection
Database * 
Database in the Snowflake account that you are attempting to access﻿﻿﻿Example: EMPLOYEES
Table * 
Table to read from
Database Schema * 
Schema within the database that you are attempting to access
Schema
Source schema to assist during the design of the pipeline
Warehouse
The default virtual warehouse to use for the session after connecting
Role
The default security role to use for the session after connecting.﻿﻿﻿Example: ACCOUNTADMIN
Select Fields / Columns
Comma separated list of fields / column names to select from source﻿﻿﻿Default: *
Filter Expression
SQL where clause for filtering records﻿﻿﻿Example: date = '2022-01-01',year=22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column values﻿﻿﻿Default: false
Timezone
The time zone to be used by Snowflake<br/﻿spark: Use the time zone from Spark﻿snowflake: Use the current time zone for Snowflake﻿sf_default: Use the default time zone for the Snowflake user who is connecting﻿Use a specific time zone﻿﻿﻿Example: snowflake,sf_default,America/New_York﻿﻿﻿Default: spark
Compress
If set to on, the data passed between Snowflake and Spark is compressed﻿﻿﻿Default: on
S3 Max File Size
The size of the file used when moving data from Snowflake to Spark. The default is 10MB.﻿﻿﻿Example: 100﻿﻿﻿Default: 10
Pre Actions
A semicolon-separated list of SQL commands that are executed before data is transferred between Spark and Snowflake
Post Actions
A semicolon-separated list of SQL commands that are executed before data is transferred between Spark and Snowflake
Keep Column Case
When writing a table from Spark to Snowflake, the Spark connector defaults to shifting the letters in column names to uppercase, unless the column names are in double quotes.﻿﻿﻿Default: off
Continue on Error
This variable controls whether the COPY command aborts if the user enters invalid data (for example, invalid JSON format for a variant data type column).﻿﻿﻿Default: off
Use Staging Table
This parameter controls whether data loading uses a staging table.﻿﻿﻿Default: on
Auto Pushdown
This parameter controls whether automatic query pushdown is enabled﻿﻿﻿Default: off
Parallelism
The size of the thread pool to use for data uploads and downloads between Snowflake and Spark﻿﻿﻿Example: 8﻿Default: 4
Partition Size in MB
This parameter is used when the query result set is very large and needs to be split into multiple DataFrame partitions. This parameter specifies the recommended uncompressed size for each DataFrame partition. To reduce the number of partitions, make this size larger.﻿﻿﻿Example: 10﻿Default: 100
Use Copy Unload
If this is FALSE, Snowflake uses the Arrow data format when SELECTing data. If this is set to TRUE, then Snowflake reverts to the old behavior of using the COPY UNLOAD command to transmit selected data.﻿﻿﻿Default: false
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given string﻿﻿﻿Example: _
Cache
MEMORY_ONLY: Persist data in memory only in deserialized format﻿MEMORY_AND_DISK: Persist data in memory and if enough memory is not available evicted blocks will be stored on disk﻿MEMORY_ONLY_SER: Same as MEMORY_ONLY but difference being it persists in serialized format. This is generally more space-efficient than deserialized format, but more CPU-intensive to read.﻿MEMORY_AND_DISK_SER: Same as MEMORY_AND_DISK storage level difference being it persists in serialized format﻿DISK_ONLY: Persist the data partitions only on disk﻿MEMORY_ONLY_2, MEMORY_AND_DISK_2: Same as the levels above, but replicate each partition on two cluster nodes﻿OFF_HEAP: Similar to MEMORY_ONLY_SER, but store the data in off-heap memory. This requires off-heap memory to be enabled﻿﻿﻿Default: NONE
﻿﻿﻿