Reduce Dataflow Validation Time in Microsoft Fabric

Light rainbow flowing down path at night. Unsplash. 2025.

Introduction

Power BI and Fabric Dataflows are powerful tools for data transformation and integration, but users often face slow validation (Gen1 Dataflows) or publishing delays (Gen2 Dataflows). One of the key bottlenecks is the 10-minute timeout issue, which can cause frustration when working with complex or large datasets. This blog post explores why this happens and how to speed up the process using an advanced M code solution.

Understanding the Issue

Why Does Power BI Take Time to Validate or Publish Dataflows?

When you validate or publish a dataflow, Power BI must determine the column structure (names, data types) of each query. To do this efficiently, it applies a "Top 0" filter, which retrieves metadata without actually pulling data. This process helps Power BI understand the expected structure of the dataset without incurring excessive data load.

The "Top 0" Filter and Its Impact

Power BI modifies the query execution by appending a limit of 0 rows (e.g., SELECT TOP 0 * FROM Table in SQL). This works efficiently when the data source supports query folding, as the filtering happens at the source. However, when query folding is not possible, Power BI might still attempt to execute the query, leading to long validation times or failures due to timeouts.

Common Solution: Limit Row Retrieval During Development

  • During development, apply a row limit (e.g., Table.FirstN(Source, 100)) to reduce data load assuming query folding is enabled. Or a filter to cut down on the number of rows.

  • Remove the row limit before publishing to avoid affecting the final dataset.

While these methods help, they might not be enough for highly complex queries, or queries that do not support query folding. That’s where an advanced M code solution comes in.

🪄Using Table.View to Speed Up Validation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
let
  Source = Function.InvokeAfter(
    () => 
    #table(
      type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
      {
        {1,"Hello",#date(2016,1,1)},
        {2,"World",#date(2017,12,12)}
      }
      )
    
    , 
    #duration(0, 0, 11, 1)
  ),
  TableTypeToReturn = 
    type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
  OverrideZeroRowFilter = Table.View(
    null, 
    [
    GetType = () => 
      TableTypeToReturn, 
    GetRows = () => 
      Source, 
    OnTake = (count as number) => 
      if count = 0 then 
      #table(
        TableTypeToReturn, 
        {}
      ) 
    else 
     Table.FirstN(Source, count)]
  )
in
  OverrideZeroRowFilter

The Table.View function (line 27) in Power Query allows you to intercept the "Top 0" request and return an empty table with only metadata, significantly speeding up validation. Line 18 is making a fake delay to simulate a long query.

🧐Understanding GetType, GetRows, and OnTake in Table.View

The Table.View function in Power Query M lets you define a custom virtual table with specific behaviors when Power BI (or another client) interacts with the data. It allows you to control how the table responds to different operations.

1. GetType – Defines the Table Schema

GetType tells Power BI what the table structure (schema) should be. It returns a table type (type table [...]), specifying column names and data types.

1
GetType = () => type table [ ID = number, Name = text, CreatedDate = date ]

Why it's useful:

  • When Power BI applies a Top 0 filter (to check the schema without fetching data), GetType ensures that Power BI understands the correct column names and types.

2. GetRows – Retrieves the Actual Data

GetRows is the function that returns the full dataset when Power BI queries the table without any filters (e.g., when loading all data).

1
2
3
4
5
6
7
8
GetRows = () => 
    #table(
        type table [ID = number, Name = text, CreatedDate = date],
        {
            {1, "Alice", #date(2023,1,1)},
            {2, "Bob", #date(2023,2,15)}
        }
    )

Why it's useful:

  • This is the default behavior if Power BI doesn’t request a limited number of rows (e.g., Table.FirstN).

  • You can use GetRows to control what data is returned, such as dynamically fetching from an API.

3. OnTake – Controls What Happens When Rows are Requested

OnTake(count) is triggered when Power BI requests only a limited number of rows (e.g., Table.FirstN(Source, 10)). It allows you to optimize performance by handling row requests efficiently.

1
2
3
4
5
OnTake = (count as number) =>
    if count = 0 then
        table(type table [ID = number, Name = text, CreatedDate = date], {}) // Empty table for schema detection
    else
        Table.FirstN(GetRows(), count) // Return actual data if rows are requested

What This Does:

  1. When Power BI validates the table (Top 0), it quickly returns an empty table with column headers instead of fetching real data. ✅

  2. If Power BI requests N rows (e.g., Table.FirstN(Source, 10)), only N rows are returned, optimizing performance. ✅

  3. When Power BI loads all data, GetRows returns the full dataset. ✅

References:

  1. Webb, C. (2025, March 23). Speed up Dataflow publishing validation times in Power BI and Fabric. Chris Webb's BI Blog. https://blog.crossjoin.co.uk/2025/03/23/speed-up-dataflow-publishing-validation-times-in-power-bi-and-fabric/

Next
Next

DeepSeek-R1 - a chain of thought LLM for the people