BigQuery - Create a table from results of a query that uses complex CTEs?

2 min read 06-10-2024
BigQuery - Create a table from results of a query that uses complex CTEs?


Creating Tables from Complex CTE Queries in BigQuery

Problem: You've crafted a complex query in BigQuery using Common Table Expressions (CTEs) to achieve a desired result set. Now, you want to store this data into a table for future analysis or use.

Rephrased: Imagine you've built a detailed report using multiple steps in BigQuery. How do you save this report as a table for easy access and future analysis?

Scenario:

Let's assume you have a table named "Sales" with columns for Order_Date, Product, and Quantity. You want to create a new table summarizing the total sales for each product on a monthly basis.

WITH MonthlySales AS (
  SELECT
    DATE_TRUNC(Order_Date, MONTH) AS SalesMonth,
    Product,
    SUM(Quantity) AS TotalQuantity
  FROM
    `your_project.your_dataset.Sales`
  GROUP BY 1, 2
)
SELECT
  SalesMonth,
  Product,
  TotalQuantity
FROM
  MonthlySales
ORDER BY
  SalesMonth, Product;

This query uses a CTE named "MonthlySales" to first group sales by month and product, then presents the result in a clean format. Now, how do you store this output into a new table?

Solution:

BigQuery offers a straightforward solution using the CREATE OR REPLACE TABLE statement. Here's how to do it:

CREATE OR REPLACE TABLE `your_project.your_dataset.MonthlySalesSummary` AS
WITH MonthlySales AS (
  SELECT
    DATE_TRUNC(Order_Date, MONTH) AS SalesMonth,
    Product,
    SUM(Quantity) AS TotalQuantity
  FROM
    `your_project.your_dataset.Sales`
  GROUP BY 1, 2
)
SELECT
  SalesMonth,
  Product,
  TotalQuantity
FROM
  MonthlySales
ORDER BY
  SalesMonth, Product;

Explanation:

  1. CREATE OR REPLACE TABLE: This statement creates a new table or replaces an existing table with the same name.
  2. Table Name: Replace your_project.your_dataset.MonthlySalesSummary with your desired project, dataset, and table name.
  3. CTE and SELECT: You keep the entire query structure, including the CTE and the final SELECT statement, as it is.

Key Points:

  • CREATE OR REPLACE: This ensures that if a table with the same name exists, it will be overwritten.
  • Data Freshness: The new table will be populated with the result of the query at the time of execution. It will not be automatically updated with changes to the source data.
  • Partitioning: You can leverage BigQuery partitioning features to optimize storage and query performance for your new table.

Example:

Imagine you want to create a table summarizing the average order value per month for each product category:

CREATE OR REPLACE TABLE `your_project.your_dataset.AvgOrderValue` AS
WITH MonthlyOrders AS (
    SELECT
        DATE_TRUNC(Order_Date, MONTH) AS SalesMonth,
        Product,
        SUM(Quantity * UnitPrice) AS TotalOrderValue
    FROM
        `your_project.your_dataset.Sales`
    GROUP BY 1, 2
),
AvgOrderValueByCategory AS (
  SELECT
    SalesMonth,
    Product,
    TotalOrderValue,
    AVG(TotalOrderValue) OVER (PARTITION BY SalesMonth ORDER BY Product) AS AvgOrderValue
  FROM
    MonthlyOrders
)
SELECT
  SalesMonth,
  Product,
  TotalOrderValue,
  AvgOrderValue
FROM
  AvgOrderValueByCategory
ORDER BY
  SalesMonth, Product;

Conclusion:

This approach simplifies creating tables from complex queries in BigQuery. By combining CTEs with CREATE OR REPLACE TABLE, you can quickly store your analytical results into tables for further analysis or use in other queries. Remember to consider partitioning and data freshness for optimal performance and data management.