Data Integration pipelines, like all software systems, have a lifecycle that includes transitions between development, test and production environments. This article describes the lifecycle and related deployment model used to create and manage CloverDX Data Integration jobs.
CloverDX is designed to simplify managing data integration pipelines through all phases of their lifecycle. CloverDX offers:
The CloverDX platform includes design, runtime and automation tools that operate on or with jobs designed in its ecosystem.
CloverDX Designer – visual development environment (IDE) used to create, test and manage pipeline files.
CloverDX Server – production environment that automates and monitors execution of data pipelines, creates and manages logs and dispatches alerts. Server, for short, allows to build fully autonomous data pipelines.
CloverDX Runtime – low-level execution environment that runs CloverDX jobs that you created in CloverDX Designer. It does not have any user interface and it is fully controlled by Designer or Server through its APIs. The same CloverDX runtime is embedded within both CloverDX Designer and CloverDX Server ensuring that jobs run the same in both environments.
CloverDX organizes all resources for related data pipelines into a CloverDX project. A CloverDX Project is a directory that follows pre-defined conventions for where different file types are located. It typically contains and organizes all pipelines related to a certain use case (or multiple use cases) and their dependencies (like data layouts, parameters and more).
Project files are typically XML documents or property files. These files are all created and edited in CloverDX Designer, managed in production by a CloverDX Server, and used during execution by the CloverDX Runtime.
File type | Extension | Format | Purpose |
---|---|---|---|
Graph | .grf | XML | Defines record level data processing. |
Subgraph | .sgrf | XML | Encapsulates and makes reusable collections of graph components. |
Jobflow | .jbf | XML | Defines pipeline orchestration (a workflow), often executing graphs, subgraphs and other jobflows in some prescribed order tracking dependencies, handling errors and more. Jobflow files can only executed on a CloverDX Sever. |
Data Service | .rjob | XML | Defines HTTP API endpoints published on CloverDX Server. These endpoints can use graphs, subgraphs or jobflows to implement their functionality. |
Metadata | .fmt | XML | Defines layouts and additional properties of source, target and intermediate data structures. Information includes field names, types and formats and more. |
Connection | .cfg | Text | Contains connection information for data sources and targets (e.g. username, password, host, port, database name). |
Parameter | .prm | XML | Contains user-manageable parameters that can be used to change behavior of any job or component. |
CloverDX Designer includes a Project Explorer view that provides hierarchical access to all the files within a project.
A CloverDX Project begins its life in CloverDX Designer. CloverDX is based on Eclipse IDE platform and uses the concept of a workspace to contain one or more projects with common settings (e.g. tabs vs. spaces, color and font settings for interface, general tool layout, ...). For CloverDX though, the workspace and its referenced projects are simply a hierarchy of directories on disk. As CloverDX jobs are designed, related files are created, edited and stored in this directory structure.
CloverDX Projects can be developed “offline” in the Designer only or the Designer can connect to the Server to directly manage the projects there.
Once a project has reached a certain level of maturity it is typically deployed to a CloverDX Server where it can run automatically, and where orchestration and automation features can be exercised. There are several ways to deploy a project to CloverDX Server.
The simplest mechanism for deploying a project to server is to do it directly from within CloverDX Designer. Designer contains controls that allow deployment of any project to any accessible Server. To do this, CloverDX Designer will establish an HTTP (or HTTPS) connection to CloverDX Server and copy project files from the Designer workspace to a Server sandbox.
A sandbox is essentially CloverDX Server’s equivalent of a project – it is a directory on the Server that contains job files along with additional properties for UAC, logging, parallelization settings and more.
When deployment is complete, we say that the sandbox and project are connected and a bidirectional synchronization is set-up between the two. Changes made to project files in either Server or Designer will automatically be reflected in the other environment. Designer typically initiates synchronization process on:
Designer includes an import/export feature which allows the import/export of individual files or entire projects to disk as a zip archive or as a set of files. Likewise, Server contains this same import/export feature. Files can be explicitly exported from one environment and imported into the other using these Designer and Server GUIs.
CloverDX Server provides a public web API that allows the creation of sandboxes and the upload/download of files into those sandboxes. The documentation for this API is available on any installed instance of CloverDX Server at
http://[host]:[port]/clover/api/rest/[api-version]/docs.html
This API can also be used in arbitrary scripts, or in continuous integration tools like Jenkins to deploy projects to a Server.
CloverDX projects can be managed with any common version control system. Because all files are plain text (XML or property files), they are ideally suited to be versioned, compared, reverted and merged. CloverDX Designer ships with a built-in connector for Git. Plugins are also available in the Eclipse Marketplace for other version control systems such as SVN or Microsoft’s Team Foundation Version Control. Developers accustomed to working with their Version Control Systems via a command line interface can easily open a shell, navigate to their Designer workspace on disk and issue commands from there.
Complex projects created by teams of developers can be managed and deployed effectively using version control and multiple sandboxes using a process like this:
The flexibility of the above deployment method allows each team to set their own processes with regards to branch creation, management, code reviews and more. Larger teams typically need a bit more formalized processes while small teams (or lone developers) can rely on a simple and quick deployment model to minimize the overhead.
The approach described above is all that is needed to promote code between multiple different environments – e.g., going from Dev to QA and then to Prod.
The general approach would be:
The CloverDX Platform with its text-based project artifacts, Designer based on Eclipse IDE with built-in version control support, and Server APIs, it supports a project lifecycle that is ideally suited to enable your teams effectively develop, collaborate, test and operate all your data pipelines.