arrow_back All posts

Tracing Power BI Visuals Back to Source

Tracing Power BI Visuals Back to Source

In this writing, I want to share how a question I keep hearing from colleagues, "What measure is behind this visual, and where does the data actually come from?", led me to build an open-source tool that traces Power BI project visuals all the way back to their source tables.

I didn't expect a routine conversation about a bar chart to become a story about data lineage at scale. The question was simple. The answer, when done manually, was not. That gap between a simple question and a painful answer is the reason PBIP Lineage Explorer exists.

1. The Question That Keeps Coming Back

A colleague pointed at a bar chart in a Power BI report and asked: "What measure is showing in this visual? How is the DAX written? What are the actual names of the columns and tables from the source? Did you rename anything on the Power BI side?" And then the real question: "How can data engineers trace this back to the source tables and columns that they built themselves?"

This wasn't the first time I heard these questions. Data engineers need to validate that the numbers in a Power BI report match what they expect from the source system they built. BI developers need to understand how measures are constructed before modifying them. Managers need to know that governance requirements are met and that lineage is documented.

Every role has a legitimate reason to ask. And every time, answering means manually digging through files.

2. The Manual Trace: From Visual JSON to Source Table

To show what this manual process looks like, I'll walk through a concrete example using a Contoso sample report saved in PBIP/PBIR format. The report has a bar chart showing Total Sales by Gender. The question is straightforward: what measure drives this visual, how is it calculated, and where does the data come from?

Step 1: Open the visual JSON

Every visual in a PBIR report is stored as a visual.json file inside a folder named with the visual's ID. Here is what the bar chart's visual JSON looks like (simplified for readability):

{
  "visual": {
    "visualType": "barChart",
    "query": {
      "queryState": {
        "Category": {
          "projections": [{
            "field": {
              "Column": {
                "Expression": { "SourceRef": { "Entity": "customer" } },
                "Property": "Gender"
              }
            },
            "queryRef": "customer.Gender"
          }]
        },
        "Y": {
          "projections": [{
            "field": {
              "Measure": {
                "Expression": { "SourceRef": { "Entity": "Internal Measures" } },
                "Property": "Total Sales"
              }
            },
            "queryRef": "Measure.Total Sales"
          }]
        }
      }
    }
  }
}

From this JSON, I can extract two things: the category axis uses customer[Gender], and the value axis uses a measure called Total Sales from a table called Internal Measures. But the JSON tells me nothing about how that measure is written or where the data comes from. I need to go deeper.

Step 2: Find the measure definition in TMDL

The semantic model folder contains TMDL files that define every table, column, measure, and relationship. I open Internal Measures.tmdl and find:

measure 'Total Sales' = SUMX(sales, sales[Quantity] * sales[UnitPrice])

Now I know the DAX. The measure iterates over the sales table and multiplies Quantity by UnitPrice for each row. But where do sales[Quantity] and sales[UnitPrice] actually come from? I need to open another file.

Step 3: Trace columns to their source

I open sales.tmdl and find the column definitions:

column Quantity
    dataType: int64
    sourceColumn: Quantity

column UnitPrice
    dataType: double
    sourceColumn: UnitPrice

The sourceColumn property tells me the Power BI column name matches the source column name: no renaming happened here. But in many real-world projects, Power Query transformations rename columns, and the names in the semantic model no longer match the source. That's when traceability breaks.

To confirm the actual data source, I look at the partition definition in the same file:

partition sales = m
    mode: import
    source =
        let
            Source = Sql.Database(
                "...fabric.microsoft.com",
                "Lakehouse_Contoso"
            ),
            dbo_sales = Source{[Schema="dbo",Item="sales"]}[Data]
        in
            dbo_sales

The data comes from dbo.sales in a Fabric Lakehouse called Lakehouse_Contoso. Now the data engineer has the full chain: the bar chart shows Total Sales, which multiplies sales[Quantity] * sales[UnitPrice], and both columns come from dbo.sales in the lakehouse.

That's three files opened, three mental hops, and we traced exactly one measure in one visual.

Step 4: Check relationships

The visual also uses customer[Gender] on the category axis. How does the customer table relate to the sales table? I open relationships.tmdl:

relationship customer_sales
    fromColumn: sales.CustomerKey
    toColumn: customer.CustomerKey

The relationship joins on CustomerKey. This confirms that customer[Gender] filters sales through this key. A data engineer validating this report would need to verify that the same join logic exists in the source system.

3. Now Multiply That by Every Visual in the Report

That was one bar chart with one measure. The same report has a pivot table that references three measures across multiple tables:

  • Rows: store[CountryName], store[State]
  • Columns: product[Brand]
  • Values: Measure[Number of Orders], Measure[Number of Customers], Internal Measures[Total Sales]

Each of those measures has its own DAX definition:

measure 'Number of Orders' = DISTINCTCOUNT(sales[OrderKey])
measure 'Number of Customers' = DISTINCTCOUNT(sales[CustomerKey])
measure 'Total Sales' = SUMX(sales, sales[Quantity] * sales[UnitPrice])

And each column referenced (OrderKey, CustomerKey, Quantity, UnitPrice, CountryName, State, Brand) lives in a different TMDL file with its own source mapping and relationship chain.

To fully answer "what does this pivot table show and where does the data come from?", I would need to open at least six TMDL files: Measure.tmdl, Internal Measures.tmdl, sales.tmdl, store.tmdl, product.tmdl, and relationships.tmdl.

A real enterprise report doesn't have two pages with eleven visuals. It has dozens of pages with hundreds of visuals. Multiply that across an organization with hundreds of reports.

The manual approach doesn't scale.

4. This Happens Everywhere

This is not a hypothetical problem. Every organization with a mature Power BI deployment runs into this friction, and it manifests in several ways.

Data engineers can't verify data quality without knowing exactly which source columns feed which visuals. When a number looks wrong in a dashboard, the first question is always "what column is this actually reading from?" If the answer requires manually tracing through three layers of files, the verification doesn't happen, or it happens too slowly.

Column renames in Power Query break the traceability chain. A column called CustKey in the source might become CustomerKey in Power Query and Customer ID in the semantic model. Without a tool that tracks the rename chain, the connection between the visual and the source is invisible.

Impact analysis for source schema changes is nearly impossible manually. When a data engineer needs to rename a column in the source database, they need to know which Power BI reports will break. Answering that question manually means opening every report, tracing every visual, and checking every TMDL file. Most teams skip this step entirely and discover the breakage in production.

Governance and compliance require documented lineage. Auditors and data governance teams increasingly ask: "Can you show me the complete path from this dashboard number to its source?" If the answer is "I can, but it takes me an hour per visual," that's not a sustainable governance posture.

Here's the first aha moment that shaped this project: the information needed to trace lineage already exists in PBIP files; it's just scattered across dozens of JSON and TMDL files that no human can navigate efficiently at scale.

5. The Solution: PBIP Lineage Explorer

This gap between "the information exists" and "no one can find it fast enough" is why I built PBIP Lineage Explorer. It's a browser-based application that parses your entire PBIP project, both the report folder and the semantic model folder, and builds an interactive dependency graph. The same trace that took me three files and four mental hops in Section 2 happens in a single click.

Visual-to-source tracing

Click any visual in the tool, and it shows the complete reference chain: which measures the visual uses, how each measure is written in DAX, which columns those measures reference, which tables those columns belong to, and what the source connection looks like. The full path from visual to source, the same path I walked manually through visual.json, Internal Measures.tmdl, and sales.tmdl, is rendered as an interactive dependency tree.

DAX syntax highlighting

Every measure definition is displayed with syntax highlighting, and all referenced measures and columns are identified. When a measure like Total Sales references sales[Quantity] and sales[UnitPrice], the tool doesn't just show the DAX; it links each reference to the actual column definition in the semantic model.

Source column mapping with rename chain

This is where the tool addresses the column renaming problem. The source column mapping can be exported as a CSV file that shows the complete rename chain: from the original source column name through any Power Query transformations to the final semantic model column name. When a data engineer asks "did you rename my columns?", the CSV provides a definitive answer.

Impact analysis

The question "what breaks if I rename dbo.sales.Quantity?" is answered instantly. The tool shows every measure that references sales[Quantity], every visual that uses those measures, and every other column or relationship in the dependency chain. This is the what-if analysis that teams skip because doing it manually takes hours.

Page layout minimap

The tool renders all report visuals positioned as they appear in Power BI, giving a spatial overview of the report layout. Clicking any visual in the minimap jumps to its dependency tree. This makes it easy to answer "what does page 2 show and where does it all come from?" without opening Power BI Desktop.

Orphan detection

Over time, semantic models accumulate measures that are no longer used by any visual. The tool automatically identifies these orphaned measures. This is a cleanup opportunity: every unused measure is technical debt that slows model processing and confuses anyone trying to understand the model.

Advanced pattern recognition

The tool detects field parameters and calculation groups, two features that are notoriously difficult to trace because, as I wrote in a previous post, the visual JSON only captures the last saved state, not the full configuration.

100% client-side processing

No data is uploaded to any server. The tool runs entirely in your browser using the File System Access API (Chrome 86+ or Edge 86+). You open your local PBIP folder, and all parsing happens on your machine. For enterprise teams working with sensitive data, this is non-negotiable.

This was the aha moment that validated the entire project: the tool doesn't invent lineage information; it reads what PBIP files already contain and presents it in a way that humans can actually navigate.

6. What This Means Beyond Individual Productivity

For managers and team leads, the implications go beyond saving time for individual developers.

Data governance and audit trails. When an auditor asks for lineage documentation, the tool's CSV export provides a complete, machine-readable mapping from visual to source. This turns a multi-day documentation exercise into a minutes-long export.

Faster onboarding. New team members joining a project can explore the dependency graph to understand how the report is wired together, without needing someone to walk them through it file by file. The visual minimap gives spatial context; the dependency trees give technical depth.

Reduced risk for source schema changes. Before renaming a column or restructuring a source table, the impact analysis shows exactly which reports and visuals will be affected. This transforms a risky, hope-for-the-best migration into a precise, evidence-based change management process.

Free and open source. PBIP Lineage Explorer is MIT licensed on GitHub. There is no premium tier, no paywall, no ads. For teams evaluating lineage solutions, this removes procurement friction entirely.

7. From a DevOps Standpoint: This Is Foundational

Version control and change visibility

Because the tool reads PBIP files directly, it integrates naturally into a Git-based workflow. When a developer creates a pull request that modifies a measure or adds a new visual, the lineage explorer can be used to verify the full impact of the change before merging. The dependency graph becomes part of the code review process.

Validation and governance

The source column mapping CSV can be committed to the repository alongside the PBIP files, creating a version-controlled lineage document that evolves with the project. Governance teams can diff these CSVs across commits to see how lineage changes over time.

Reflection

This experience left me with a new mental model: data lineage in Power BI is not a missing capability; it's a navigation problem. The PBIP format already contains everything needed to trace a visual back to its source. The challenge was never the information; it was the effort required to connect the dots across scattered files.

The journey from "what measure is behind this visual?" to building an open-source lineage explorer reinforced something I keep discovering: when you treat Power BI artifacts as code, problems that seem unsolvable with the UI become straightforward with the right tooling.

PBIP Lineage Explorer is free, open source, and available at github.com/JonathanJihwanKim/pbip-lineage-explorer.

I hope this helps having fun in exploring data lineage and embracing this new era of understanding Power BI reports as code, from the visual all the way back to the source!