DataMorph

﻿
DataMorph﻿™ is a visual data transformation and orchestration platform to streamline, and enable rapid development and deployment of your modern data applications.
Built with Apache Spark, DuckDB, Kafka, Delta Lake technologies and Airflow, DataMorph﻿™ can transform data at scale across multiple cloud infrastructures. Its simple, user-friendly interface to input various parameters, combined with powerful APIs, and dataflow automation capabilities allows you to streamline your development and deployment in a low-code/no-code setting.
What is DataMorph?DataMorphTM is a visually driven data engineering platform that provides a graphical user interface for defining and managing data pipelines and workflows. DataMorph can be particularly useful for users who are less familiar with coding or who need to build and modify data products quickly and easily.
DataMorph features include:
Canvas with drag-and-drop interface for defining data pipelines and workflows
Connectors for various data sources and destinations
Pre-built data transformation and data quality processors
Job Scheduling and job automation
Monitoring, alerting, and error handling
Collaboration and version control support
Support for deployment to various computing runtimes in the cloud
Role based access control for pipelines, workflows and projects
﻿
Core concepts and terms in DataMorph﻿
Overview of the core concepts and terminology in DataMorphTM.
Project is a 
Branch is a 
Source or Sink is a location or connection from which data can be retrieved or persisted. Some examples of data sources include CSV, parquet, XML files, database such as MySQL or Oracle
Processor or Transformer is a component or tool that modifies or processes data in some way before it is used for further analysis or processing.
Pipeline can be used to move data between systems, to read, transform and output data into formats that make data analysis easy. Data pipelines can also perform a variety of other important data-related tasks like data quality checks and schema evolution.
Task can be a PySpark program or a Java/Scala Spark program that is packaged into a jar file
PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing.
Workflow orchestration is the process of managing and coordinating the execution of data pipelines which involves scheduling and automating the various tasks and processes involved in data pipelines, as well as error handling, alerting, and monitoring.
Action is a workflow component that executes some service like a bash script, a program written in Java or Python, a DataMorphTM pipeline, Email notification, Slack message or conditional branching
A Job is a mechanism to execute a pipeline or a workflow on the runtime. You can define actual values for variables and parameters while creating a job.
A Job Run in an instance of an execution of a job.
﻿