DataMorph™ is a visual data transformation and orchestration platform to streamline, and enable rapid development and deployment of your modern data applications.
Built with Apache Spark, DuckDB, Kafka, Delta Lake technologies and Airflow, DataMorph™ can transform data at scale across multiple cloud infrastructures. Its simple, user-friendly interface to input various parameters, combined with powerful APIs, and dataflow automation capabilities allows you to streamline your development and deployment in a low-code/no-code setting.
DataMorphTM is a visually driven data engineering platform that provides a graphical user interface for defining and managing data pipelines and workflows. DataMorph can be particularly useful for users who are less familiar with coding or who need to build and modify data products quickly and easily.
DataMorph features include:
- Canvas with drag-and-drop interface for defining data pipelines and workflows
- Connectors for various data sources and destinations
- Pre-built data transformation and data quality processors
- Job Scheduling and job automation
- Monitoring, alerting, and error handling
- Collaboration and version control support
- Support for deployment to various computing runtimes in the cloud
- Role based access control for pipelines, workflows and projects
Overview of the core concepts and terminology in DataMorphTM.
- is a
- or is a location or connection from which data can be retrieved or persisted. Some examples of data sources include CSV, parquet, XML files, database such as MySQL or Oracle
- or Transformer is a component or tool that modifies or processes data in some way before it is used for further analysis or processing.
- can be used to move data between systems, to read, transform and output data into formats that make data analysis easy. Data pipelines can also perform a variety of other important data-related tasks like data quality checks and schema evolution.
- can be a PySpark program or a Java/Scala Spark program that is packaged into a jar file
- PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing.
- is the process of managing and coordinating the execution of data pipelines which involves scheduling and automating the various tasks and processes involved in data pipelines, as well as error handling, alerting, and monitoring.
- is a workflow component that executes some service like a bash script, a program written in Java or Python, a DataMorphTM pipeline, Email notification, Slack message or conditional branching
- A is a mechanism to execute a pipeline or a workflow on the runtime. You can define actual values for variables and parameters while creating a job.
- A in an instance of an execution of a job.