Dependency analysis provides valuable information about your graphs and data, how they are created, and how they change. This chapter describes dependency analysis and how to facilitate it successfully in your graphs.
Why use dependency analysis?
Dependency analysis can save you time and resources in a number of ways:
Code analysis and data lineage — Dependency analysis produces an up-to-date view of the current project. You can see what data and graphs already exist and how they were created, which may help you avoid duplicating development work or data. You can see how a field gets created and how its value changes within a graph, across graphs, and across projects (lineage). You can also assess the impact of planned changes: for example, if you were to add a field to a particular dataset, which graphs would be affected?
Quality control — Dependency analysis helps you assess whether a graph matches its original specifications, and whether it meets its goals. It also provides information about how a graph will run outside the development environment. For example, graphs without analysis warnings are more likely to run smoothly when deployed in the production environment or migrated to a different repository. Dependency analysis even detects certain types of runtime errors, such as DML parsing problems.
Transparency and accountability — The results of dependency analysis are useful to people throughout an organization, beyond the development process. Anyone who uses the organization’s data may be interested in discovering, for example, how a particular data set is derived. Employees may need to know why their latest weekly sales report suddenly looks different, or which fields are used to calculate a particular metric. In addition, the results of dependency analysis may be used to support compliance with regulations, such as Sarbanes-Oxley or Basel II, or quality initiatives, such as ISO 9000.
Running dependency analysis
You can run dependency analysis in any of the following ways:
Automatically at check-in — Analysis is automatically performed on a new or changed file when you check it in. In the GDE, the Check-in Wizard reports any dependency analysis warnings prior to check-in. At the command line, the air project import command reports dependency analysis warnings. With the ‑dry‑run argument, you can run dependency analysis without checking in the files.
Files are only checked in if they are new or changed. Unchanged files are not checked in and therefore are not analyzed. (Forcing a check-in, while possible, is usually to be avoided.) If you make changes to a generic graph but not its associated psets, the psets will not be analyzed at check-in, even if you check in the entire project. You must run dependency analysis on the psets themselves in order to update the lineage.
As you build a graph — From the GDE menu bar, choose File > Dependencies > Analyze to analyze the current graph or pset. (Or click the Analyze button on the toolbar.) Warnings are reported on the Dependency Analysis tab of the Application Output window.
Explicitly at the command line — Analyze an entire project (or any subset of a project) by running the air project analyze-dependencies command.
TIP: To analyze all the psets for a particular graph, run air project analyze-dependencies for that graph, using the -referencing-files option.
No comments:
Post a Comment