Introduction
This project showcases my ability to create an end-to-end BI report using a full cloud architecture with Google Cloud Storage (GCS), Snowflake and Power-BI Service.
The data is originally sourced from Tableau public datasets under the Business category.
Business Requirement(s)
A hypothetical business, named Super Store Sales wants insight into the number of orders being done in their store.
They would like to monitor the number of orders placed by country and the orders trend per year.
Architecture
Diagram

Explanation
The data is located in a GCS bucket. It is then ingested into Snowflake via AirByte OSS/Cloud (I am using the GUI, but one can use pyairbyte in case they want a serverless implementation). Once landed in a Snowflake database, it is then transformed, using DBT, from raw data into staging then finally into a data product (data mart) following a star-schema. The tables are pulled into the Power BI service via the ODBC and the on-premises gateway. (However, you may use the power bi snowflake connector. I was having challenges with it at the time of writing.) Dagster then orchestrates the whole process from source to Power BI semantic model update.
Data Pipeline Lineage view from Dagster U.I

ELT
Ingestion – Extract, Load
AirByte UI

Transformation – DBT
- created views, via DBT to reference the tables in the raw schema.
- merging returned with orders to have one table instead of having to join the data via a many-to-many relationship
- created staging tables, marts, star schema facts and dimensions
Power Query
- Replaced nulls with the word “No” in IS_RETURNED column
- Renamed column COUNTRY_REGION to COUNTRY
Snippet of a staging table DBT SQL model
with source_data as ( select * from {{ref('base_people')}}),denormalised as ( SELECT --* CASE WHEN REGION = 'West' THEN 1 WHEN REGION = 'East' THEN 2 WHEN REGION = 'Central' THEN 3 WHEN REGION = 'South' THEN 4 END REGION_KEY, REGIONAL_MANAGER EMPLOYEE, CASE WHEN REGIONAL_MANAGER IS NOT NULL THEN TRUE ELSE FALSE END IS_MANAGERFROM source_data)select * from denormalised
Data Modelling
ERD Diagram

Power BI Data view

Semantic Model
Measures
No. of Orders:
No. of Orders = COUNT('FCT_ORDERS'[ORDER_ID_KEY])
Calculated Tables
//Date TableDates = CALENDARAUTO()//Measures TableBox = DATATABLE( "PH", INTEGER, { {1} })
Calculated Columns
Day = DAY(Dates[Date])Mon-Year = FORMAT(Dates[Date], "mmm-yy")Month = MONTH(Dates[Date])Month Name = FORMAT(Dates[Date], "mmm")Months = STARTOFMONTH(Dates[Date])Year = YEAR(Dates[Date])Year-Month = FORMAT(Dates[Date], "yyyymm")
Visuals
Page: Orders
Bar Chart
- Metric: No. of orders – X-axis
- Dimension: By Employee – Legend
- Dimension: By Country – Y-axis
- Dimension: By State/Province – Y-axis – Drill Down
- Dimension: By City – Y-axis – Drill Down
Donut Chart
- Metric: No. of orders – X-axis
- Dimension: By Returned Status – Donut
Tree map
- Metric: No. of orders – X-axis
- Dimension: By product category
- Dimension: By product subcategory
Line Chart
- Metric: No. of orders – Y-axis
- Dimension: By Year – X-axis
- Dimension: By Year – Month – X-axis – Drill Down
Filters
- Dimension: By Segment – Page Filters
- Dimension: By Ship Mode – Page Filters
Report Insights
- Orders have been increasing each year from the base year of 2023.
- The U.S have most of the orders for the whole period.
- The top 3 products orders are Binders, Furnishings and Papers.
Achievements
- Managed to orchestrate the Power BI Service in Dagster.
- Managed to deploy the Dagster project to Dagster plus.
Challenges
- No access to the OLTP ! Hard to maintain data integrity as the dataset seems to be partially already transformed.
- Dagster despite it’s powerful capabilities can be extremely hard to configure ! Recommended to use A.I to help with the hard parts.
Opportunities for improvement
The project can be improved by adding more pages to the report and analyse other metrics such as Profit, Discounts and Sales.
Power BI report

GitHub Link to Project
























