Cassandra

Cassandra database data source

Properties

Properties supported in this source are shown below ( * indicates required fields )
Property
Description
Name *
Name of the data source
Description
Description of the data source
Connection *
Pre-defined Cassandra connection.
Table *
Cassandra Table name to query from.Example: table_test
Schema
Source schema to assist during the design of the pipeline
Keyspace *
Cassandra Keyspace to read data from.Example: pass
Select Fields / Columns
Comma separated list of fields / columns to select from sourceExample: firstName, lastName, address1, address2, city, zipcodeDefault: *
Filter Expression
SQL where clause for filtering records. This is also used to load partitions from the sourceExample: date=2022-01-01,year = 22 and month = 6 and day = 2
Distinct Values
Select rows with distinct column valuesDefault: false
Connections Per Executor
Minimum number of remote connections per host set on each executor JVM. Default value is estimated automatically based on the total number of executors in the cluster.
Read Consistency Level
Concurent Reads
Sets read parallelism for join with cassandra tablesExample: 512Default: 512
Input Fetch Size in Rows
Number of CQL rows fetched per driver requestExample: 1,000Default: 1,000
Input Reads per Second
Sets max requests or pages per core per second, unlimited by defaultExample: 10000Default: None
Input Split Size
Approximate amount of data to be fetched into a Spark partition. Minimum number of resulting Spark partitions is 1 + 2 * SparkContext.defaultParallelismExample: 1,024Default: 512
Input Metrics
Sets whether to record connector specific metrics on writeDefault: true
Enable Pushdown
Enables pushing down predicates to Cassandra when applicableDefault: true
Normalize Column Names
Normalizes column names by replacing special characters ,;{}()&/\n\t= and space with the given stringExample: _