New to CloverDX? This is the best place to start! This fly by walk-through will guide you through fundamentals of CloverDX and teach you the basics along the way.
Metadata Propagation: It Makes Your Data Integration Jobs Much Easier
In the CloverDX 4.0, along with subgraphs, we've introduced another very interesting feature: metadata propagation, that will make your data integration jobs much easier to prepare. Previously, you needed to assign metadata manually to every edge. In case you changed metadata in a graph, you would have to manually re-assign new metadata to each edge afterwards.
However now, once you insert metadata on an edge, CloverDX will try to push it from left to right, and then from right to left, filling in each edge with the metadata where it expects it will fit, significantly reducing the need for metadata management. Also, in case you change metadata somewhere in the graph, the metadata everywhere else in the graph will adjust automatically to reflect these changes.
What does this mean for you?
A whole lot of time saved that you would otherwise spend placing metadata on edges. CloverDX 4.0 now will do this for you the majority of time. Also, this function reduces room for error when assigning metadata in a graph, plus makes changes in complex graphs much easier, as you really only have to change one important edge and the rest will be propagated automatically. And last but not least, it works as a fail-safe, making sure you will get the output with the expected metadata at the end of the graph.
Also in CloverDX 4.0, metadata can be embedded in components or in subgraphs, which means that many times the metadata will come pre-made for you, either from a component, or from a previously-built subgraph. And in these cases, CloverDX will take the metadata and propagate them through the rest of the graph.
When CloverDX propagates metadata, a yellow pop-up will blink on the screen to show you where the metadata will be assigned. Automatically assigned metadata are shown with a dashed grey line. Of course, there is always the option of assigning metadata for any particular edge yourself. These manually-assigned metadata are represented by solid line and will always have priority over automated metadata.
Different types of metadata propagation. Edge with manually-assigned metadata (solid line), edge with no metadata (red line) and edge with automatic metadata assigned (grey long dashes line).
Metadata propagation types
There are basically four types of metadata propagation. The first three are automatic; they assign metadata based on the components used, the type of graph, and the type of data. The last one – explicit metadata propagation - is semi-automated, as it is driven by user action when a user specifies metadata by selecting a reference edge. As a result, this edge will always have the same metadata as the referenced edge.
Metadata propagation and subgraphs
Where the metadata really shines is with subgraphs. When you create a subgraph, you can choose metadata that is required for the input or output (or even for both), so a subgraph will already have metadata embedded inside of it. This is especially handy when you are preparing customized connectors for CloverDX that will tap into your data and you are able to prescribe what kind of metadata will be produced by the connector. When you connect your subgraph to other components, this metadata will then automatically propagate throughout your graph. And when another user works with that subgraph, they won’t need to figure out and prepare the metadata for other components around it – the metadata will simply be there, automatically propagated throughout the graph.
Another very cool and clever way you can use subgraphs is as a template for your metadata (i.e. as a data target or a data source). You can create a customized component that contains a metadata template, which can be wrapped up for frequent reuse in any of your projects. And after adding this component into a graph, these metadata are propagated throughout the graph.
So, for instance, let’s say you have a specific format for Excel that needs to be written at the end of many of your graphs, with specific settings encoded in it. A savvy solution would be to wrap up that SpreadsheetDataWriter as a subgraph, and then configure it so that it insists on a specific format of metadata as its input. This SpreadsheetDataWriter subgraph would then create a unified end-product, across all projects, so that that all users and team members won’t have to worry about using the wrong format.
As you can see, subgraph is the source of metadata for the whole graph.
Using manually assigned metadata
Although in the majority of graphs, using automatically propagated metadata is a great time saver and preferable, there are times when we suggest using manually assigned metadata. This is true for longer data transformations that require a strictly defined output (for instance: you agreed on a specific format of an Excel sheet as the end product of a data transformation). In these situations, it is advisable to manually set the metadata for the graph’s last edge. Even though metadata propagation predicts and spreads your metadata throughout your transformation, you’ll still want to be completely sure that the metadata has propagated exactly as needed (in case someone made some change to a subgraph that alters the metadata, without your knowledge).
This is only a basic introduction to metadata. In the future, we will bring you other blogs and videos that will explain metadata propagation more deeply with examples and use cases.
If you'd like to see more about metadata propagation in action, check out our video here:
When you start working on a CloverDX project, you'll notice that it comes with a predefined structure see here for more details. This structure works... CloverDX How-To
In CloverDX we sometimes get a question if and how we can work with DBT. These questions typically come up when IT/data engineering wants to empower data... Analytics and BI
HTTP APIs currently drive data integration space. No matter which system enterprises use, most of them these days do support some way to extract or modify... API
We frequently get a question what a CloverDX Cluster is, how it works and advise around configuration. So let me shed some light on it as I’ll try to... Deployment
Starting CloverDX 5.16.0, server installer is available via an RPM package making it easy to install and maintain going forward using YUM or DNF package... Deployment
In previous article, we covered how to establish a Kafka connection and how to publish messages into a Kafka topic. This article will cover the other side... CloverDX How-To
Metadata propagation, i.e. the ability to push metadata out from connected components is in the product since CloverETL 4.0.0. A new addition in CloverDX 5.3.0 allows programmers to enable this feature in custom Java components. This allows more seamless...
Code debugging is a productivity feature well known to developers from various programming environments. It allows you to control the execution of a piece of code line-by-line, and look for problems that are hard to spot during normal runs. [Note:...