DJ Concepts on

DJ Concepts onhttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/dj-concepts/Recent content in DJ Concepts onHugo -- gohugo.ioNodeshttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/nodes/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/nodes/In DJ, nodes play a central role. Understanding the relationships between nodes is key to understanding how DJ works. All node types are similar in many ways. Let’s start by covering their similarities. Similarities Between Node Types # A summary of things that are true of all nodes: All nodes have a name and a description All nodes have a schema defined as named columns, each with a specific type All nodes have a system-defined state of either valid or invalid All nodes have a user-defined mode of either draft or published All nodes track the parent nodes they depend on In addition to these universal statements about nodes, there are things that are common to a subset of node types.Dimension Discoveryhttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/dimension-discovery/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/dimension-discovery/In data warehousing, dimensions are parts of the data model that play a huge role in making data understandable and intuitively composable. If you want to learn more about your users, it’s convenient to have a users dimension table with all of the attributes that belong to each user. If your company is expanding into global markets, maintaining a country dimension table that keeps track of business-relevant data for each individual country will come in handy.Node Dependencieshttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/node-dependencies/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/node-dependencies/Relationships between nodes are tracked by a DJ server. A node’s position in the DJ DAG is determined by the node’s definition, particularly the query. Node queries reference other DJ nodes and this is what defines upstream and downstream dependencies for any given node. In other words–given a node, the other nodes it queries are its upstream dependencies and nodes that query it are its downstream dependencies. Source Node Dependency Validation # Source nodes make up the foundational layer that other nodes are built upon.Materializationhttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/materialization/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/materialization/Cube Nodes # When we attach a materialization config to a cube node (instructions here), we are requesting DJ to prepare for the materialization of the cube’s underlying data into an OLAP database (such as Druid). This enables low-latency metric queries across all defined dimensions in the cube. However, many such databases are only configured to work with simple aggregations, so DJ will break down each metric expression into its constituent simple aggregation measures prior to materialization.Metric Decompositionhttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/metric-decomposition/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/metric-decomposition/Metric decomposition is the process by which DataJunction breaks down complex metric expressions into simpler, pre-aggregatable components. This enables efficient materialization to OLAP databases while preserving the mathematical correctness of metrics when queried at different dimension granularities. Why Decomposition is Necessary # OLAP databases like Druid are optimized for rollup aggregations (e.g., SUM, COUNT, MIN, MAX) but cannot directly compute complex metrics like averages or rates from pre-aggregated data. For example:Table Reflectionhttps://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/table-reflection/Mon, 01 Jan 0001 00:00:00 +0000https://6a023ef24bddd400077b9264--thriving-cassata-78ae72.netlify.app/docs/0.1.0/dj-concepts/table-reflection/Source nodes represent external tables that exist in a data warehouse or database. Of course, those tables are not under the management of the DJ server and are often the result of upstream data pipelines. This means changes to those tables can happen at any moment such as columns being dropped or renamed, types being changed, or entire tables being dropped, renamed, or moved. It’s important that DJ is aware of these changes so that it can understand the effects to downstream nodes.