Schema

Schema processor
PropertiesProperties supported in this processor are shown below ( * indicates required fields )
Property
Description
Name * 
Name of the processor
Description
Description of the processor
Processing Mode
Select for batch and un-select for streaming. If 'Batch' is selected the value of the switch is set to true. If 'Streaming' is selected the value of the switch is set to false.﻿﻿﻿Default: true
Data Format * 
Format can be either XML, Avro, Json, ORC, Parquet, Protobuf﻿﻿﻿Example: json,avro,xml,parquet﻿﻿﻿Default: xml
Compatibility
Select either backward compatible, forward compatible or fully compatible﻿﻿﻿Default: backward
Existing Schema
Select the read schema of an existing table to compare against. Schema format should be xsd, avsc, json or DDL﻿﻿﻿Example: s3://LocationOfReadSchema.json,hdfs://hdfsLocationOfReadSchema.avsc,COL1 STRING, COL2 INT
Existing Schema Base Directory
Select the base directory of an existing schema - schema format should be xsd, avsc, json or DDL﻿﻿﻿Example: s3://baseSchema/,hdfs://baseSchema
Infer New Schema
Infers the schema automatically from the incoming data.﻿﻿﻿Default: false
Schema Location
Path to write the schema differences between new and existing schemas
Cache
MEMORY_ONLY: Persist data in memory only in deserialized format﻿MEMORY_AND_DISK: Persist data in memory and if enough memory is not available evicted blocks will be stored on disk﻿MEMORY_ONLY_SER: Same as MEMORY_ONLY but difference being it persists in serialized format. This is generally more space-efficient than deserialized format, but more CPU-intensive to read.﻿MEMORY_AND_DISK_SER: Same as MEMORY_AND_DISK storage level difference being it persists in serialized format﻿DISK_ONLY: Persist the data partitions only on disk﻿MEMORY_ONLY_2, MEMORY_AND_DISK_2: Same as the levels above, but replicate each partition on two cluster nodes﻿OFF_HEAP: Similar to MEMORY_ONLY_SER, but store the data in off-heap memory. This requires off-heap memory to be enabled﻿﻿﻿Default: NONE
﻿