JDBC

JDBC data source
Properties supported in this source are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data source
Description
Description of the data source
Connection *
Pre-defined JDBC connection
Driver Class Name *
Class name of the JDBC driver to connect to the databaseExample: oracle.jdbc.driver.OracleDriver
Table or Query *
Database table that should be read OR a query that will be used to read data from the JDBC source. It is not allowed to specify query and Partition Column options at the same time
Schema
Source schema to assist during the design of the pipeline
Select Fields / Columns
Comma separated list of fields / column names to select from sourceDefault: *
Filter Expression
SQL where clause for filtering recordsExample: date = '2022-01-01',year=22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column valuesDefault: false
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given stringExample: _
Partition Column
Partition Column, Lower Bound and Upper Bound options must all be specified if any of them is specified. In addition, Number of Partitions must be specified. They describe how to partition the table when reading in parallel from multiple workers. Partition Column must be a numeric, date, or timestamp column from the table in question. Notice that Lower Bound and Upper Bound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returnedExample: city
Lower Bound
Partition Column, Lower Bound and Upper Bound options must all be specified if any of them is specified. In addition, Number of Partitions must be specified. They describe how to partition the table when reading in parallel from multiple workers. Partition Column must be a numeric, date, or timestamp column from the table in question. Notice that Lower Bound and Upper Bound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returnedExample: pincode
Upper Bound
Partition Column, Lower Bound and Upper Bound options must all be specified if any of them is specified. In addition, Number of Partitions must be specified. They describe how to partition the table when reading in parallel from multiple workers. Partition Column must be a numeric, date, or timestamp column from the table in question. Notice that Lower Bound and Upper Bound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returnedExample: country
Number of Partitions
The maximum number of partitions that can be used for parallelism in table reading. This also determines the maximum number of concurrent JDBC connectionsExample: 6
Query Timeout in Seconds
The number of seconds the driver will wait for a Statement object to execute to the given number of seconds. Zero means there is no limit. In the write path, this option depends on how JDBC drivers implement the API setQueryTimeout, e.g., the h2 JDBC driver checks the timeout of each query instead of an entire JDBC batch. It defaults to 0.Example: 10
Num Rows Fetched per Round Trip
The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers which default to low fetch size (eg. Oracle with 10 rows)Example: 4
Optional Session Open SQL
After each database session is opened to the remote DB and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block). Use this to implement session initialization codeExample: option("sessionInitStatement", "BEGIN execute immediate 'alter session set "_serial_direct_read"=true'; END;")
Custom Schema if Any
The custom schema to use for reading data from JDBC connectors. You can also specify partial fields, and the others use the default type mapping. The column names should be identical to the corresponding column names of database table. Users can specify the corresponding data types of Spark SQL instead of using the defaultsExample: id DECIMAL(38, 0), name STRING,id DECIMAL(38, 0)Default:
Pushdown Predicate
Option to enable or disable predicate push-down into the JDBC data source. If enabled, Spark will push down filters to the JDBC data source as much as possible. Otherwise, no filter will be pushed down to the JDBC data source and thus all filters will be handled by Spark. Predicate push-down is usually turned off when the predicate filtering is performed faster by Spark than by the JDBC data sourceDefault: true