Looker Semantic Layer + FinOps in BigQuery | Vol. 01

Why this trend is redefining modern BI

The Business Intelligence market is experiencing a fundamental shift. According to a recent Gartner report, 73% of organizations implementing FinOps practices in their data platforms report a 40-60% reduction in operational costs during the first year.

This shift is driven by three key factors:

Exponential data growth: Companies process increasingly larger volumes of data, causing BigQuery costs to scale rapidly without proper governance.
Need for semantic governance: Distributed data teams require consistent definitions of metrics and dimensions to avoid duplication and maintain trust in data.
Democratization of BI with AI: The integration of AI agents to monitor performance and costs enables automatic optimizations that previously required dedicated teams.

Technology stack in real architectures

In recent projects, we have implemented architectures that integrate:

Google Cloud Platform:
- BigQuery (data warehouse)
- Cloud Composer (Airflow orchestration)
- Cloud Storage (data lake)
- Cloud Functions (event-driven processing)
Looker:
- LookML for semantic layer
- PDTs (Persistent Derived Tables) for pre-aggregation
- Datagroups for intelligent caching
- Optimized explores with selective joins
DBT Cloud: ELT transformations with automated testing
Agente IA: cost monitoring and automatic alerts

Reference architecture

Data Sources

(APIs, Databases, Files, Streaming)

REST APIs & Webhooks
Cloud Databases
File Storage Systems
Real-time Streaming

↓

Ingestion Layer

Cloud Functions (real-time events)
Cloud Composer/Airflow (batch ETL)
Fivetran/Airbyte (connectors)

↓

Raw Data Layer

Cloud Storage (Data Lake)
BigQuery Landing Zone (partitioned by ingestion_date)

↓

Transformation Layer (DBT)

Staging models (data cleaning)
Intermediate models (business logic)
Mart models (analytics-ready)
FinOps: Incremental models + partitioning

↓

Semantic Layer (Looker LookML)

Views: unified metric definitions
Explores: optimized joins
PDTs: pre-aggregated tables
Datagroups: intelligent caching (4h refresh)
Access filters: row-level security

↓

Presentation Layer

Looker Dashboards (exec + operational)
Looker API (embedded analytics)
Scheduled reports (email + Slack)

FinOps Implementation in BigQuery

1. Intelligent Partitioning and Clustering

SQL - Table optimization in BigQuery

-- Example: Partitioned events table
CREATE TABLE analytics.events_partitioned
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type
AS SELECT * FROM analytics.events_raw;

-- Optimized query (scans only 1 day)
SELECT
  event_type,
  COUNT(*) as total_events
FROM analytics.events_partitioned
WHERE DATE(event_timestamp) = CURRENT_DATE()
GROUP BY 1;

-- Savings: From ~$50 per query to $0.02 ✓

2. Looker PDTs with datagroups

LookML - Optimized Persistent Derived Tables

# Define datagroup for intelligent refresh
datagroup: daily_revenue_datagroup {
  sql_trigger: SELECT MAX(order_date) FROM orders ;;
  max_cache_age: "4 hours"
}

# PDT for aggregated metrics
view: daily_revenue_summary {
  derived_table: {
    datagroup_trigger: daily_revenue_datagroup
    partition_keys: ["order_date"]
    cluster_keys: ["customer_segment"]
    sql:
      SELECT
        DATE(order_timestamp) as order_date,
        customer_segment,
        SUM(order_total) as total_revenue,
        COUNT(DISTINCT order_id) as order_count,
        COUNT(DISTINCT customer_id) as customer_count
      FROM orders
      WHERE DATE(order_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
      GROUP BY 1, 2
    ;;
  }

  dimension: order_date {
    type: date
    sql: ${TABLE}.order_date ;;
  }

  measure: revenue {
    type: sum
    sql: ${TABLE}.total_revenue ;;
    value_format_name: usd
  }
}

3. AI Agent for cost monitoring

Python - Cloud Function for automatic monitoring

# Cloud Function that monitors BigQuery costs
import functions_framework
from google.cloud import bigquery

@functions_framework.cloud_event
def monitor_bq_costs(cloud_event):
    """
    Monitors expensive BigQuery queries and sends alerts
    """
    client = bigquery.Client()

    # Query to identify expensive queries (> $10)
    query = """
    SELECT
      user_email,
      query,
      total_bytes_processed,
      total_bytes_billed,
      (total_bytes_billed / POW(10, 12)) * 5 as estimated_cost_usd,
      creation_time
    FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
      AND (total_bytes_billed / POW(10, 12)) * 5 > 10
    ORDER BY estimated_cost_usd DESC
    LIMIT 10
    """

    results = client.query(query).result()

    for row in results:
        # Send alert to Slack/Email
        send_alert({
            'user': row.user_email,
            'cost': round(row.estimated_cost_usd, 2),
            'query_preview': row.query[:200],
            'recommendation': suggest_optimization(row.query)
        })

def suggest_optimization(query):
    """
    AI agent that suggests optimizations
    """
    if 'SELECT *' in query:
        return "⚠️ Avoid SELECT *. Specify only necessary columns."
    elif 'PARTITION' not in query and 'WHERE' in query:
        return "💡 Consider adding partition filter to reduce scanning."
    else:
        return "✓ Query appears optimized."