Pipelines

﻿
Data pipelines can be used to move data between systems, to read, transform and output data into formats that make data analysis easy. Data pipelines can also perform a variety of other important data-related tasks like data quality checks and schema evolution.
Canvas with drag-and-drop interface for defining pipelines
Pre-built connectors to access data from various data sources and output to multiple destinations
Use SQL and other provided processors to transform data
Validate and test pipelines locally before deploying to a Spark runtime
Support for Databricks, AWS Glue, AWS EMR, Spark on Kubernetes and other Spark runtimes
﻿
﻿
Add a caption...
﻿
VariablesVariables are placeholders with values that you define here and use them within the pipeline. When the pipeline runs, the value replaces the variable.
When you create a job, you can override the variable values defined in the pipeline. For example, you can define a directory variable and set it to different values depending which environment the job is set up to run. To use the directory variable within a pipeline, you can refer it as ${directory}. You may use different values for development, staging and production runtimes.
ValidateDesign DocumentVersions﻿