SGA Dental Partners · Growth Platform

Marketing Data Warehouse and Full Funnel Attribution Dashboard

Phase 1 project plan. Daily refreshed Google Ads and GoHighLevel data unified into Railway Postgres, surfaced as a dashboard inside the existing SGA3P content engine, with full funnel attribution as the Phase 2 target.

Date

2026-05-06

Prepared For

Dakota Milner, SGA Growth

Phase

1 (Build, 6 weeks)

Status

Approved, in build (2026-05-06)

01 · Executive Summary

What we are building, why, and the call we have to make first

SGA operates 260 practices. Roughly 50 to 60 actively run paid media. Today there is no automated way to see spend, leads, cost per lead, or ROAS by practice without logging into 50 different ad accounts and CRMs. This plan stands up a single warehouse, a daily refresh, and a dashboard inside SGA3P that the growth team can open on Monday morning and act from.

Phase 1 Scope

Google Ads + GHL

Campaign daily grain for Google Ads. Webhook plus reconciliation pull for GHL. Dashboard with 5 views inside app.sga3p.com.

Target Ship

6 weeks

Walking skeleton end to end in week 1. Per practice tiles, funnel, and Insights feed by week 6. Phase 1.5 follows immediately.

Decision Confirmed

Postgres on Railway

Stack divergence approved 2026-05-06. ClickHouse stays as future option if cheap once volume thresholds breach. See section 02.

The headline

A standalone Fastify worker pulls Google Ads nightly and listens to GHL webhooks in real time. Data lands in a new marketing schema on the existing Railway Postgres. A new /api/v1/marketing/* route group on asset-registry serves materialized views to a new dashboard route inside the React content engine. BullMQ replaces node-cron when volume justifies it. Temporal replaces BullMQ when AWS arrives.

02 · Stack Divergence

Where this plan deviates from the 2026-04-20 settled stack

The settled stack memo locked in ClickHouse for analytics, Power BI for dashboards, and AWS for hosting. This plan diverges on two of those choices, deliberately, because Dakota directed the change in the project-kickoff conversation. Calling it out so the deviation is conscious, not drift.

Concern	Settled Stack (2026-04-20)	This Plan	Rationale for the change
Analytics store	ClickHouse Cloud or self hosted on AWS	Postgres on Railway, single instance	ClickHouse hosting was blocked. Phase 1 volume (low millions of rows per year) fits Postgres comfortably with partitioning and materialized views. Documented thresholds for when to migrate.
Dashboard surface	Power BI semantic models	New section inside app.sga3p.com (React + Vite + Tailwind)	Power BI provisioning is not started. The content engine is already in production at sga3p.com with auth, layout, and design system. Adds zero new vendor.
Workflow orchestration	Temporal	node-cron in Phase 1, BullMQ in Phase 1.5, Temporal in Phase 2	Temporal is not provisioned. node-cron is sufficient for nightly batch. Job bodies wrap as plain async functions so Temporal swap is mechanical.
Cloud	AWS (RDS, MSK, ECS)	Railway through Phase 2	Railway is where SGA Content Engine, asset-registry, and Postgres already live. No reason to split infrastructure mid build.
Event bus	Kafka MSK	Postgres staging table acting as durable queue (Kafka stub)	Same pattern, simpler runtime. When Kafka MSK is real, the webhook handler drops a topic message instead of a row.

Approved 2026-05-06

Dakota confirmed the divergence: build on Railway Postgres now. ClickHouse remains an option for the future if cheap and once measured volume thresholds breach (single MV refresh > 5 min, dashboard p95 > 2s, total mart rows > 500M). The mart layer prefix convention keeps a future ClickHouse copy of mart_* mechanical.

03 · Goal and Scope

Phase boundaries and what ships when

Phase 1 (this plan, 6 weeks)

Google Ads campaign daily metrics. GHL contacts, opportunities, appointments via webhook plus hourly reconciliation. Five dashboard views: Portfolio Overview, Practice Detail, Channel View, Funnel View, Insights and Anomalies. Last touch and linear attribution. Anomaly rules. Stale data banners.

Phase 1.5 (immediate follow up, 3 weeks)

Google Ads keyword and ad group day metrics. Recommendation rule engine. First touch and time decay attribution. Pacing vs budget. Region rollups. BullMQ replaces node-cron once webhook reconciliation queue grows.

Phase 2 (Q3, scope TBD)

Meta Ads, GA4, Google Business Profile, call tracking. Multi touch attribution with revenue join. Practice manager login (RBAC). Migrate scheduler to Temporal. Evaluate ClickHouse against measured volume thresholds.

Phase B (separate ADR required)

PHI data: case started events from PMS, treatment level revenue, insurance claims. Triggers HIPAA review, BAAs, and a dedicated marketing_phi schema with stricter grants.

Out of scope for Phase 1

PHI or clinical data. No procedure detail, diagnosis, or chart data.
Write back to Google Ads. Read only. No pause, no budget change, no creative edit from inside the dashboard.
Meta Ads, TikTok, YouTube, Microsoft Ads. Smile Like Si Meta data ingests in Phase 2.
GA4 web analytics, Google Business Profile, review platforms.
Agency comparison scorecards. Politically sensitive, defer until data trust is established.
Predictive forecasting, LTV models, propensity scoring, budget optimization.
Practice manager login and per practice scoped RBAC.
Data driven attribution. Insufficient per practice volume. Revisit at portfolio level in Phase 2.
Mobile native app, scheduled email or SMS report exports.

04 · System Architecture

End to end pipeline, ingestion service shape, dashboard data layer

EXTERNAL SOURCES Google Ads API GoHighLevel API + Webhooks | | v v +-----------------------------------------------------+ | sga-marketing-ingest (new Fastify worker svc) | | . node-cron schedules (nightly Google Ads pulls) | | . GHL webhook receiver (HMAC verified) | | . OAuth refresh handling, rate limit guards | | . Idempotent writers, sync_runs telemetry | +-----------------------------------------------------+ | | v v +-----------------------------------------------------+ | Postgres (Railway, internal-only, shared) | | . marketing.raw_* (append-only, JSONB) | | . marketing.stg_* (typed, deduplicated) | | . marketing.mart_* (dim/fact, time partitioned) | | . marketing.mv_* (materialized views) | +-----------------------------------------------------+ | v +-----------------------------------------------------+ | asset-registry (existing Fastify API) | | . NEW route group: /api/v1/marketing/* | | . Drizzle reads against mart_* and mv_* | | . Zod response schemas in packages/shared-types | | . 5-minute response cache on summary endpoints | +-----------------------------------------------------+ | v +-----------------------------------------------------+ | app.sga3p.com (React + Vite + Tailwind) | | . /marketing route section (5 views) | | . React Query against /api/v1/marketing/* | | . Recharts for visualization | +-----------------------------------------------------+

Why a new dedicated worker, not endpoints on asset-registry

Asset registry is a low latency request response API powering the content engine UI. Ingestion is long running, bursty, OAuth bearing, and rate limited by upstream APIs. Mixing them violates single responsibility, lets a runaway ingest loop choke the API, and bleeds OAuth secrets into the wrong service. The duplication cost is small because both services share packages/shared-types for Zod schemas and Drizzle tables.

Option	Pros	Cons	Verdict
(a) Endpoints on asset-registry	Zero new infra, reuse Drizzle	Mixes concerns, OAuth scope bleed, scaling decisions coupled	Reject
(b) Worker added inside asset-registry repo	Code colocated	Same deploy unit, same memory pressure, restart of API kills mid ingest	Reject
(c) Standalone `sga-marketing-ingest` service	Independent deploy and scale, blast radius isolated, OAuth contained, clean Temporal seam	One more Railway service	Accept
(d) Railway native Cron Jobs	Zero infra	Ephemeral containers, weak observability, no in flight state	Reject

Dashboard data layer

Add a /api/v1/marketing/* route group to the existing asset-registry service. Same auth (Bearer demo password now, Cognito later), same Drizzle, separate route module. The React app does not hit Postgres directly. Phase 1 routes:

Method	Path	Reads From	Purpose
GET	/api/v1/marketing/portfolio/summary?range=30d	`mv_portfolio_kpi_daily`	Portfolio Overview top tiles
GET	/api/v1/marketing/practices/:id/summary?range=30d	`mv_practice_spend_30d`	Practice Detail tiles
GET	/api/v1/marketing/practices/:id/funnel?range=30d	`mv_practice_funnel_30d`	Funnel waterfall
GET	/api/v1/marketing/practices/:id/campaigns?range=30d	`mv_campaign_performance_30d`	Campaign table
GET	/api/v1/marketing/insights?severity=high&range=14d	`marketing.insights_log`	Insights and Anomalies feed
GET	/api/v1/marketing/etl/status	`marketing.sync_runs`	Stale data banner, last refreshed timestamps

All response shapes live in packages/shared-types/src/marketing.ts as Zod schemas. The React app imports the same types. Fastify response cache holds summary endpoints for 5 minutes since materialized views refresh on the order of hours.

05 · Database Schema

Postgres design: raw, staging, mart, materialized

All marketing objects live in a single marketing schema on the existing Railway Postgres, alongside shared (practices). The design separates four concerns: operational sync state, raw API ingest, conformed dimensions and facts, and materialized views for the dashboard.

Schema layout

Group	Prefix	Purpose	Retention
Operational	`google_ads_accounts`, `ghl_locations`, `sync_runs`, `insights_log`	Account links, sync state, error log, generated insights	Indefinite
Raw	`raw_google_ads_`, `raw_ghl_`	Untransformed JSON dumps (replay source of truth)	13 months hard cutoff
Dimensions	`dim_date`, `dim_channel`, `dim_campaign`, `dim_practice` (view)	Stable lookups for joins	Indefinite
Facts	`google_ads_campaign_daily_metrics`, `fact_ghl_event`, `fact_attribution_touchpoint`	Append mostly, time partitioned	Indefinite
Attribution	`attribution_touchpoints`, `attribution_conversions`, `attribution_assignments`	Many to many credit allocation across multiple models	Indefinite
Views	`mv_*`	Pre aggregated dashboard payloads	Refresh hourly or nightly

Google Ads tables (Phase 1)

Table	Grain	Phase	Notes
`google_ads_accounts`	customer_id	Exists	Practice to MCC mapping. Do not redesign.
`google_ads_campaigns`	campaign_id	1	SCD type 1 (overwrite on change), keep `updated_at`
`google_ads_ad_groups`	ad_group_id	1	SCD type 1
`google_ads_campaign_daily_metrics`	(date, customer_id, campaign_id)	1	Partitioned by month. Cost stored in micros to match API source.
`google_ads_ad_group_daily_metrics`	(date, ad_group_id)	1.5	Defer until per ad group ROI is requested
`google_ads_keywords`	criterion_id	1.5	10x row count, low Phase 1 dashboard value

DDL: `marketing.google_ads_campaign_daily_metrics` (partitioned)

CREATE TABLE marketing.google_ads_campaign_daily_metrics (
  metric_date           DATE   NOT NULL,
  customer_id           BIGINT NOT NULL,
  campaign_id           BIGINT NOT NULL,
  practice_id           UUID   NOT NULL,
  impressions           BIGINT NOT NULL DEFAULT 0,
  clicks                BIGINT NOT NULL DEFAULT 0,
  cost_micros           BIGINT NOT NULL DEFAULT 0,
  conversions           NUMERIC(12,4) NOT NULL DEFAULT 0,
  conversion_value      NUMERIC(14,4) NOT NULL DEFAULT 0,
  view_through_conv     NUMERIC(12,4) NOT NULL DEFAULT 0,
  search_impression_share NUMERIC(6,4),
  avg_cpc_micros        BIGINT,
  ctr                   NUMERIC(8,6),
  ingested_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (metric_date, campaign_id),
  FOREIGN KEY (campaign_id) REFERENCES marketing.google_ads_campaigns(campaign_id),
  FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
) PARTITION BY RANGE (metric_date);

CREATE INDEX idx_gads_metrics_practice_date
  ON marketing.google_ads_campaign_daily_metrics (practice_id, metric_date DESC);
CREATE INDEX idx_gads_metrics_campaign_date
  ON marketing.google_ads_campaign_daily_metrics (campaign_id, metric_date DESC);

GoHighLevel tables (Phase 1)

Table	Grain	Notes
`ghl_locations`	one per GHL sub-account	One sub-account per practice. Joined to `shared.practices`.
`ghl_pipelines`	pipeline_id	Per location, SGA standardizes names but allows variance.
`ghl_pipeline_stages`	stage_id	Ordered by `position`.
`ghl_contacts`	contact_id	Carries first touch attribution: gclid, fbclid, utm_*, source.
`ghl_opportunities`	opportunity_id	Linked to contact, pipeline, stage. Holds monetary_value.
`ghl_appointments`	appointment_id	Status: scheduled, confirmed, showed, no_show, cancelled, rescheduled.
`ghl_calls`	call_id	Twilio routed inbound calls if location is on call tracking.
`ghl_workflow_events`	event row	Entry, exit, goal-met events from automation workflows.
`ghl_messages`	message_id	Phase 2 SMS / email body. Defer for PHI review.
`ghl_webhook_events`	idempotency_key	Durable queue of received webhooks (Kafka stub).

DDL: `marketing.ghl_contacts` (attribution carrier)

CREATE TABLE marketing.ghl_contacts (
  contact_id          TEXT        NOT NULL,
  location_id         TEXT        NOT NULL,
  practice_id         UUID        NOT NULL,
  first_name          TEXT,
  last_name           TEXT,
  email_hash          BYTEA,                  -- sha256(lower(trim(email)))
  phone_hash          BYTEA,                  -- sha256(e164(phone))
  email_encrypted     BYTEA,                  -- pgcrypto, key in app layer
  phone_encrypted     BYTEA,
  source              TEXT,                   -- 'Google Ads', 'Facebook', 'Direct'
  utm_source          TEXT,
  utm_medium          TEXT,
  utm_campaign        TEXT,
  utm_content         TEXT,
  utm_term            TEXT,
  gclid               TEXT,
  fbclid              TEXT,
  wbraid              TEXT,
  gbraid              TEXT,
  tags                TEXT[]      NOT NULL DEFAULT '{}',
  date_added          TIMESTAMPTZ NOT NULL,
  first_seen_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  last_activity_at    TIMESTAMPTZ,
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
  raw_payload         JSONB,
  PRIMARY KEY (contact_id),
  FOREIGN KEY (location_id) REFERENCES marketing.ghl_locations(location_id),
  FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
);

CREATE INDEX idx_ghl_contacts_practice_added ON marketing.ghl_contacts (practice_id, date_added DESC);
CREATE INDEX idx_ghl_contacts_gclid          ON marketing.ghl_contacts (gclid) WHERE gclid IS NOT NULL;
CREATE INDEX idx_ghl_contacts_email_hash     ON marketing.ghl_contacts (email_hash) WHERE email_hash IS NOT NULL;
CREATE INDEX idx_ghl_contacts_phone_hash     ON marketing.ghl_contacts (phone_hash) WHERE phone_hash IS NOT NULL;

Identity resolution: how the funnel chains together

The chain is Google Click (GCLID) to GHL Contact to Opportunity to Appointment to Started Treatment to Revenue.

Layer	Primary Key	Fallback	Reliability
Click to Contact	GCLID captured at form fill	Phone (E.164 normalized) for call tracked leads	High when landing pages capture GCLID, medium otherwise
Contact to Ad Spend	GCLID joined to campaign via Google Ads click view	UTM campaign text match	High
Cross system (Phase 2: Meta, GA4)	phone_hash, email_hash	First party cookie or device id	Medium

Phase 1 prerequisite

Every active Innovative Dental and Gen4 landing page form must capture gclid from URL params and forward to GHL as a contact custom field. Without this, full funnel attribution is structurally impossible regardless of warehouse design. Audit and patch in week 1, block Phase 2 attribution work until verified.

Attribution model (three table design)

Three tables let multiple attribution algorithms run over the same touchpoint history without rewriting facts.

attribution_touchpoints: every marketing touch (click, organic visit, referral) tied to a contact, with channel, source, medium, campaign, gclid, occurred_at, position.
attribution_conversions: events being attributed TO. Types: contact_created, opportunity_created, opportunity_won, appointment_booked, appointment_showed, case_started.
attribution_assignments: many to many between touchpoints and conversions, parameterized by model_name and credit_fraction (sums to 1.0 per conversion per model).

This shape supports first touch, last touch, linear, time decay, position based, and (Phase 2) data driven without reshaping data.

Materialized views (dashboard tier)

MV	Grain	Refresh	Powers
`mv_portfolio_kpi_daily`	(metric_date)	Hourly, CONCURRENTLY	Portfolio Overview tiles
`mv_practice_spend_30d`	(practice_id)	Hourly, CONCURRENTLY	Practice Detail spend tile, MoM delta
`mv_practice_funnel_30d`	(practice_id)	Hourly, CONCURRENTLY	Click to contact to appt to showed funnel
`mv_campaign_performance_30d`	(campaign_id)	Hourly, CONCURRENTLY	Campaign leaderboard
`mv_channel_roi_90d`	(practice_id, channel)	Nightly, CONCURRENTLY	Channel mix chart
`mv_attribution_assist_path`	(practice_id, model_name, path_signature)	Nightly	Multi touch path explorer (Phase 1.5)

Every MV requires a UNIQUE index on its grain key so REFRESH MATERIALIZED VIEW CONCURRENTLY works. Without this, refresh blocks dashboard reads.

Partitioning, retention, PII

Concern	Approach
Partitioning	Daily metrics and event facts partitioned by `RANGE` on date column, monthly cadence. Use `pg_partman` or a scheduled job to pre create next 3 months and detach beyond retention.
Retention (raw)	13 months hard cutoff via partition drop. Raw JSON may contain PII. Restrict `SELECT` to ETL role only.
Retention (mart, MV)	Indefinite. Practice level only, no raw PII.
PII handling	Email and phone stored as `BYTEA` sha256 hashes for joins, plus `pgcrypto` encrypted columns for display. Encryption key held in app layer, not Postgres.
PHI	Hard rule: no diagnoses, treatment plans, clinical notes, or insurance data in this schema. Phase B introduces `marketing_phi` with its own grants.

Migration plan (Drizzle, drop in)

Order	Migration	Contents
0NN	`marketing_schema_init`	`CREATE SCHEMA marketing`, grants, `dim_date`, `dim_channel`, `sync_runs`, `insights_log`
0NN+1	`google_ads_dimensions`	`google_ads_campaigns`, `google_ads_ad_groups`
0NN+2	`google_ads_facts`	Partitioned `google_ads_campaign_daily_metrics` plus first 14 monthly partitions
0NN+3	`ghl_locations_pipelines`	`ghl_locations`, `ghl_pipelines`, `ghl_pipeline_stages`
0NN+4	`ghl_contacts_opps`	`ghl_contacts`, `ghl_opportunities`
0NN+5	`ghl_appointments_calls`	`ghl_appointments`, `ghl_calls`, `ghl_workflow_events`, `ghl_webhook_events`
0NN+6	`attribution_core`	`attribution_touchpoints`, `attribution_conversions`, `attribution_assignments`, `contact_merges`
0NN+7	`attribution_functions`	`merge_contacts()`, `resolve_click_to_contact()`
0NN+8	`marketing_views`	`dim_practice` view, `dim_campaign` view
0NN+9	`marketing_mvs`	All 6 materialized views plus unique indexes
0NN+10	`raw_staging_tables`	Partitioned `raw_google_ads_`, `raw_ghl_`
0NN+11	`partition_maintenance`	pg_partman config or pl/pgsql monthly rotation job

06 · KPI Catalog

Every metric the dashboard must show, with formula and freshness

Stale threshold is the lag past which the dashboard surfaces a yellow data warning banner. If a source crosses its threshold, the page header shows the affected source with the last successful refresh timestamp.

Spend metrics

KPI	Formula	Source	Grain	Stale
Total Spend	Sum of cost across all in scope Google Ads accounts	Google Ads	Daily	36 hr
Spend by Channel	Cost grouped by channel (Search, PMax, Display)	Google Ads	Daily	36 hr
Spend by Practice	Cost grouped by customer_id, mapped to practice_id	Google Ads + practice map	Daily	36 hr
Daily Spend Trend	Spend per calendar day, rolling 90 days	Google Ads	Daily	36 hr
Pacing vs Budget	(MTD spend / days elapsed) x days in month, vs monthly budget	Google Ads + budget table	Daily	36 hr

Acquisition metrics

KPI	Formula	Source	Grain	Stale
Impressions	Count of ad impressions served	Google Ads	Daily	36 hr
Clicks	Count of ad clicks	Google Ads	Daily	36 hr
CTR	Clicks / Impressions	Derived	Daily	36 hr
CPC	Spend / Clicks	Derived	Daily	36 hr
Platform Conversions	Google Ads reported conversions (form fills, calls)	Google Ads	Daily	36 hr

Funnel metrics

KPI	Formula	Source	Grain	Stale
Leads	New contacts created in GHL with marketing source tag	GHL Webhook + REST	Hourly	2 hr
Contacts	Total contacts created (lead and non lead)	GHL	Hourly	2 hr
Opportunities	GHL opportunities at "New Patient" stage	GHL	Hourly	2 hr
Appointments Booked	GHL opportunity at "Booked" stage or appointment created	GHL	Hourly	4 hr
Appointments Showed	GHL appointment status = "Showed"	GHL	Hourly	4 hr
Started Treatments	GHL opportunity at "Treatment Started" stage	GHL custom field	Daily	24 hr
Revenue (Booked)	Sum of opportunity monetary value at "Treatment Started"	GHL custom field	Daily	24 hr

Cost efficiency metrics

KPI	Formula	Source	Grain
CPL	Spend / Leads	Derived	Daily
Cost per Booked Appt	Spend / Appointments Booked	Derived	Daily
Cost per Showed Appt	Spend / Appointments Showed	Derived	Daily
Cost per Started Case	Spend / Started Treatments	Derived	Daily
ROAS	Revenue (Booked) / Spend	Derived	Daily
Payback Period (days)	Median days from first ad click to first revenue event, per practice	Derived	Weekly

Attribution metrics

KPI	Formula	Source	Grain
First Touch Conversions	Conversions credited to first touch in 90 day window	Derived	Daily
Last Touch Conversions	Conversions credited to most recent touch before conversion	Derived	Daily
Linear Attribution Share	Equal credit split across all touches in path	Derived	Daily
Assisted Conversions	Count of conversions where channel appeared in path but not as last touch	Derived	Daily
Channel Overlap Rate	Percent of converters touched by 2 or more channels	Derived	Weekly

07 · Dashboard Information Architecture

Five views inside app.sga3p.com

View	Primary User	Primary Question	Key Tiles / Charts	Default Range
Portfolio Overview (default landing)	Dakota (growth lead)	Which practices need attention this week?	Total spend, total leads, blended CPL, total started cases, blended ROAS, all with delta vs prior period. 90 day spend and lead trend (dual axis), channel mix donut, top 5 and bottom 5 practices by CPL.	Last 30 days
Practice Detail	Dakota, regional ops	What is happening at this practice and where is the funnel breaking?	Practice tiles with portfolio median benchmarks. Funnel waterfall (Impression to Started Case). Channel breakdown table. Practice scoped anomaly feed.	Last 30 days
Channel View	Marketing strategist, agency QA	How is Google Ads performing and which campaigns drive results?	Channel spend, leads, CPL, ROAS. Sortable campaign table with drilldown to ad group. Cost vs conversion scatter (one dot per campaign).	Last 30 days
Funnel View	Marketing strategist, regional ops	Where in the patient journey are we losing the most value?	Click to lead %, lead to booked %, booked to showed %, showed to started %. Full funnel waterfall with stage drop off table by practice. Attribution model toggle.	Last 30 days
Insights and Anomalies	Dakota, regional ops	What changed and what should I do about it?	Reverse chronological feed of flagged events with practice, metric, severity, suggested action, and link to Practice Detail.	Last 14 days

Global filter bar (sticky, persists across views)

Filter	Options	Default
Time range	Today, Yesterday, Last 7 / 14 / 30 / 90 days, MTD, QTD, YTD, Custom	Last 30 days
Compare to	Previous period, Same period last year, None	Previous period
Practice	Single, multi, by region, by ownership tier (SGA / LookSee / Gen4), all active	All active
Channel	Google Search, Google PMax, Google Display, all	All
Attribution model	Last touch, Linear (Phase 1). First touch, Time decay, Position based (Phase 1.5). Data driven (Phase 2)	Last touch

08 · Insights and Anomaly Rules

What "insights across each marketing channel" actually means

An insight is a system generated, plain English statement about a measurable change or comparison that warrants action. Insights are not raw charts. They appear as cards in the Insights feed and as inline badges on Practice Detail tiles, with acknowledge and snooze (7 day) actions to keep the feed actionable.

Anomaly detection rules (Phase 1)

Trigger	Threshold	Severity	Suggested action
Daily spend spike	Day spend exceeds trailing 14 day mean by 50%+	High	Check campaign budget changes, possible runaway PMax
Daily spend drop	Day spend below trailing 14 day mean by 40%+	High	Check billing status, account suspension, accidental pause
CPL spike	7 day rolling CPL exceeds trailing 28 day rolling CPL by 60%+	Medium	Audit creative, geo targeting, landing page
Lead drop	7 day lead count below trailing 28 day median by 50%+	High	Front desk follow up issue or platform issue
Conversion rate drop	Click to lead % below practice 28 day baseline by 30%+	Medium	Landing page audit, form audit, page speed
Stale data	Source has not refreshed past stale threshold	High	Banner on every page, last refreshed timestamp shown

Comparative insight rules

Vs portfolio: "Practice X CPL of $84 is 2.4x portfolio median of $35 over the last 30 days. Investigate creative, geo targeting, or landing page."
Vs self: "Practice X CPL is up 38% week over week vs the prior 4 week average."
Vs region: "Practice X is the worst performer in its region (Valdosta cluster) on cost per started case."

Recommendation templates (Phase 1.5)

Trigger combination	Recommendation
High CPL + low CTR	"Creative refresh recommended, ad CTR is below 1.5%."
Healthy CTR + high CPL + low click to lead	"Landing page conversion issue. Audit form and page speed."
Strong leads + low booked %	"Front desk follow up issue, leads are not converting to appointments. Review GHL nurture sequence."
Healthy booked + low showed %	"Reminder cadence issue. Audit appointment confirmation SMS frequency and timing."

09 · Attribution Model Strategy

Which models, when, and why

Model	Phase	Rationale
Last touch	1 (default)	Matches what Google Ads reports natively. Lowest friction for cross checking against agency reports.
Linear	1	Equal credit across all touches in the path. Simple to explain, immediately reveals the multi touch reality of dental patient journeys.
First touch	1.5	Useful for top of funnel brand spend evaluation.
Time decay (7 day half life)	1.5	Best single number proxy for short sales cycles, which dental tends to be.
Position based (40, 20, 40)	1.5	Compromise between first and last touch.
Data driven	2 (portfolio level only)	Requires roughly 300 conversions and 3,000 ad interactions per 30 days per account. Most practices will not qualify individually. Run at portfolio level only.

The attribution model selector is a single dropdown in the global filter bar. Changing it recomputes attribution dependent KPIs (assisted conversions, channel share, ROAS by channel) in place. The selected model is shown as a persistent badge on every attribution dependent tile.

10 · Refresh Cadence

What "live update on a consistent schedule" means concretely

Source	Mechanism	Cadence	Rationale
Google Ads campaign + ad group day metrics	node-cron scheduled pull	Daily 04:00 ET (09:00 UTC), pulling T-2 through T-1	Google Ads stats finalize ~T+1, conversion adjustments arrive up to T+3, so re pull a 3 day window to absorb late conversions
Google Ads keyword and search term reports	node-cron pull	Daily 05:00 ET, T-1 only	Higher volume, less revision after T+1. Phase 1.5.
Google Ads account structure (campaigns, ad groups)	node-cron pull	Hourly	Cheap, surfaces new campaigns quickly in dashboard
GHL contacts, opportunities, appointments	Webhook (primary) + reconciliation pull (safety net)	Webhook real time, reconciliation every hour	Webhooks are eventually consistent and occasionally lost, reconciliation pull catches drift
GHL pipeline and stage definitions	node-cron pull	Daily 03:00 ET	Rarely changes
Materialized views (slow)	SQL job	Daily 04:30 ET, post Google Ads pull, CONCURRENTLY	Channel ROI 90d, attribution paths
Materialized views (fast)	SQL job	Hourly, CONCURRENTLY	Practice spend 30d, funnel 30d, campaign performance 30d

Principle: webhooks update the warehouse within seconds, scheduled pulls are the safety net. Both write to the same raw_ghl_* tables. The dedupe step in stg_* resolves conflicts using updated_at from the source.

Every dashboard page shows three timestamps in the footer: "Google Ads last refreshed," "GHL last refreshed," "Derived metrics last computed." If any source is past its stale threshold, a yellow banner appears at the top of the page with the affected source named.

11 · DevOps and Infrastructure

Secrets, scheduling, observability, backfill

Service topology on Railway

Deploy sga-marketing-ingest as a new Railway service in the same project as the existing Postgres service. Same private network (no egress cost on the Postgres connection), separate environment, isolated blast radius.

Scheduling: node-cron now, BullMQ when needed, Temporal eventually

// scheduler.ts -- the only file that changes when Temporal arrives
import cron from 'node-cron';
import { runGoogleAdsDailyPull } from './jobs/google-ads-daily';
import { runGhlReconciliation } from './jobs/ghl-reconciliation';
import { refreshMvSlow, refreshMvFast } from './jobs/mv-refresh';

export function startScheduler() {
  cron.schedule('0 9 * * *',  () => runGoogleAdsDailyPull({ window: 'T-2..T-1' }));
  cron.schedule('0 * * * *',  () => runGhlReconciliation());
  cron.schedule('30 9 * * *', () => refreshMvSlow());
  cron.schedule('0 * * * *',  () => refreshMvFast());
}

Every job body is a plain async function with no cron dependency. node-cron calls it today, BullMQ calls it tomorrow, Temporal Activity calls it the day after. The migration cost is one file.

Secrets: Railway env vars, layered with Doppler

# --- Google Ads ----------------------------------
GOOGLE_ADS_DEVELOPER_TOKEN=
GOOGLE_ADS_CLIENT_ID=
GOOGLE_ADS_CLIENT_SECRET=
GOOGLE_ADS_REFRESH_TOKEN_GEN4=
GOOGLE_ADS_REFRESH_TOKEN_SGA=          # add when SGA MCC exists
GOOGLE_ADS_MCC_CUSTOMER_ID_GEN4=
GOOGLE_ADS_MCC_CUSTOMER_ID_SGA=

# --- GoHighLevel ---------------------------------
GHL_AGENCY_API_TOKEN=                  # single agency token, not per location
GHL_WEBHOOK_SIGNING_SECRET=

# --- Database -----------------------------------
DATABASE_URL=                          # Railway internal

# --- Observability ------------------------------
SENTRY_DSN=
BETTERSTACK_SOURCE_TOKEN=

# --- Runtime ------------------------------------
NODE_ENV=production
TZ=UTC
LOG_LEVEL=info

Doppler is recommended on top of Railway env vars: single source of truth, audit log, per environment promotion. Do not introduce AWS Secrets Manager in Phase 1, that is a Phase 2 migration.

OAuth refresh and rate limits

Google Ads: store one refresh token per MCC (Gen4, future SGA). On invalid_grant, write status = 'token_error' to sync_runs, fire Sentry fatal alert, skip remaining practices for that MCC. Do not retry in same run.
Google Ads quotas: apply for Standard Access immediately (no daily op cap). Basic Access (15,000 ops/day) covers 60 practices comfortably as a fallback. Batch reports across the MCC hierarchy.
GHL: single Agency API token with locationId query param per request. Do not store 260 location keys. v2 rate limit is 100 req per 10 sec per location, plenty of headroom.

GHL webhook receiver

POST /webhooks/ghl on the worker, exposed via a Railway public URL (separate hostname from api.sga3p.com). HMAC verification using x-ghl-signature. Synchronous insert into marketing.ghl_webhook_events (the durable queue). Downstream processor reads from that table on the hourly reconciliation job and marks rows processed = true. This gives replay capability without any new infrastructure.

Observability stack (Phase 1, lightweight)

Structured JSON logs with pino, drained to Better Stack via Railway log drain
sync_runs table: row per job per practice per source per day with status, rows ingested, rows expected, duration, error message, api ops consumed
Sentry for error capture and alerting (free tier is enough)
Better Stack for uptime monitoring of /webhooks/ghl/health and /healthz
Phase 2: OpenTelemetry traces, Grafana via Railway addon

Three SLOs to commit to in Phase 1

SLO	Target
Dashboard p95 latency	< 1.5 seconds
Data freshness for Google Ads	< 26 hours from event to dashboard
Data freshness for GHL via webhook	< 5 minutes from event to dashboard

Backfill workflow

POST /internal/backfill on the worker (static internal API key, not exposed). Body: { practiceId, source, startDate, endDate }. Writes one row per date-source combination to backfill_queue. A backfill runner processes oldest first, respects rate limits, marks rows done with row count. Idempotent on (practice_id, source, data_date). 90 days x 60 practices x 200ms delay ≈ 18 minutes wall time.

Cost projection (Phase 1)

Item	Monthly
`sga-marketing-ingest` on Railway (512MB, Starter)	$5 to $10
Postgres storage growth (~500MB year 1)	$0 to $5
Doppler Team plan (free tier)	$0
Sentry Developer plan (free tier)	$0
Better Stack (free tier)	$0
Total Phase 1	$5 to $15

12 · Phase 1 Roadmap

6 weeks, week by week, with a walking skeleton in week 1

Walking skeleton first. Prove the full path end to end with one practice, one day, one number on screen, then replicate breadth and depth from there.

Week 1: Walking Skeleton

Wk 1

Provision sga-marketing-ingest Railway service. Wire DATABASE_URL to internal Postgres.
Apply migrations 0NN through 0NN+2 (marketing schema, Google Ads dimensions, partitioned daily metrics).
Pull one Google Ads account, one day, write to raw_googleads_campaign_daily, transform to marketing.google_ads_campaign_daily_metrics.
Add /api/v1/marketing/practices/:id/summary route on asset-registry.
Render one number ("Cost yesterday") on a new /marketing route in app.sga3p.com.
Audit: every active landing page captures gclid from URL params and forwards to GHL contact custom field. Block Phase 2 attribution work until verified.

Week 2: Google Ads Breadth

Wk 2

Apply remaining Google Ads migrations (campaigns, ad groups). Backfill 90 days x 60 practices.
Schedule nightly pull at 09:00 UTC. Implement OAuth refresh handling and token_error alerting.
Build mv_portfolio_kpi_daily and mv_practice_spend_30d. Wire to summary endpoints.
Ship Portfolio Overview view with spend, leads (placeholder), CPL (placeholder), ROAS (placeholder), 90 day trend chart.

Week 3: GHL Ingest

Wk 3

Apply GHL migrations (locations, pipelines, contacts, opportunities, appointments).
Stand up POST /webhooks/ghl with HMAC verification and durable queue table.
Build hourly reconciliation pull using GHL Agency API token.
Backfill 90 days of contacts, opportunities, appointments per practice.
Wire leads, opportunities, appointments tiles to real data. Add stale data banner.

Week 4: Funnel and Practice Detail

Wk 4

Build mv_practice_funnel_30d and mv_campaign_performance_30d.
Ship Practice Detail view (tiles with portfolio benchmarks, funnel waterfall, channel breakdown).
Ship Channel View with sortable campaign table and cost vs conversion scatter.
Ship Funnel View with full waterfall and stage drop off table.

Week 5: Attribution and Insights

Wk 5

Apply attribution migrations. Implement nightly transform building attribution_touchpoints from GHL contacts and Google Ads click view.
Implement last touch and linear attribution computations. Wire UI toggle to recompute attribution dependent tiles.
Implement anomaly rule engine (5 rules from section 8). Write to insights_log.
Ship Insights and Anomalies view with severity, suggested action, snooze.

Week 6: Hardening and Launch

Wk 6

Cross check dashboard CPL and spend against LookSee monthly report for 5 practices. Variance must be under 5%, ideally under 2%.
Performance pass: Fastify response cache on summary endpoints, dashboard p95 under 1.5s.
Observability shake out: trigger every Sentry alert path manually, confirm Better Stack uptime monitor.
Documentation: data dictionary, runbook, on call playbook.
Internal launch to growth team. Schedule Phase 1.5 kickoff.

13 · MoSCoW Prioritization

Phase 1 inclusion decisions

Item	Priority	Rationale
Portfolio Overview view	Must	Core reason the project exists.
Practice Detail view	Must	Required to act on portfolio level signals.
Channel View (Google Ads)	Must	Phase 1 channel scope, needed for QA.
Funnel View	Must	Full funnel attribution is the user's stated ultimate goal.
Google Ads ingest (campaign daily)	Must	Phase 1 source.
GHL ingest (webhook + REST reconciliation)	Must	Phase 1 source.
Spend, CPL, ROAS tiles	Must	Tier 1 metrics for every persona.
Last touch + Linear attribution	Must	Default model + multi touch reality.
Anomaly detection (5 rules)	Must	"Insights across each channel" requirement.
Filters: time, practice, channel	Must	Cannot use the dashboard without these.
Stale data banner + last refreshed timestamps	Must	Trust requirement.
Insights feed page	Should	High value, but tile badges cover urgent path.
Comparative insights vs portfolio median	Should	Strong differentiator but not blocking.
Pacing vs budget	Should	Currently tracked manually.
Recommendation rule engine	Should	Templates can ship in 1.1.
First touch + time decay attribution	Could	Defer to Phase 1.5.
Region rollups	Could	Useful but small audience initially.
Ownership tier filter (SGA / LookSee / Gen4)	Could	Practice multi select covers the use case in Phase 1.
Practice manager login + per practice RBAC	Won't	Phase 2.
Meta Ads, GA4, GBP, Yelp	Won't	Phase 2.
Predictive forecasting + budget optimization	Won't	Phase 3.
Write back to Google Ads	Won't	Read only, regulatory and permissions risk.
PHI / clinical data	Won't	Phase B with BAAs.
Data driven attribution	Won't	Insufficient per practice volume, defer to Phase 2 portfolio.
Agency comparison scorecards	Won't	Defer until data trust is established.

14 · Acceptance Criteria

Given / When / Then scenarios for the highest value flows

Feature: Portfolio Overview load performance and content
  Scenario: Growth lead opens Portfolio Overview on Monday morning
    Given I am authenticated as the growth lead
    And there are 50 or more practices with active spend in the last 30 days
    When I navigate to Portfolio Overview
    Then I see total spend, total leads, blended CPL, total started cases, and blended ROAS within 2 seconds
    And each tile shows delta versus the previous 30 day period

Feature: Funnel waterfall on Practice Detail
  Scenario: Diagnosing where a practice is losing patients
    Given I have selected a single practice on Practice Detail
    When the funnel waterfall renders
    Then I see counts and conversion percentages at each stage from Impression through Started Case
    And the largest drop off stage is visually highlighted

Feature: Attribution model toggle
  Scenario: Switching from last touch to linear attribution
    Given I am on Funnel View with attribution set to Last Touch
    When I change the attribution selector to Linear
    Then ROAS, channel share, and assisted conversion tiles recompute within 3 seconds
    And every attribution dependent tile shows a "Linear" badge

Feature: Anomaly detection surfaces a spend spike
  Scenario: A practice spends 60 percent above trailing mean
    Given Practice X has a trailing 14 day mean daily spend of $100
    And Practice X spent $180 yesterday
    When the nightly insights job runs at 05:00 ET
    Then a High severity insight card appears in the Insights feed
    And the card states the practice name, the metric, the percentage change, and a suggested action

Feature: GHL webhook freshness
  Scenario: A new lead is created in GHL
    Given a contact is created in GHL with a marketing source tag at 14:00 ET
    When I refresh Portfolio Overview at 14:05 ET
    Then the lead count tile reflects the new lead
    And the GHL last refreshed timestamp is within 5 minutes of now

Feature: Google Ads stale data warning
  Scenario: Google Ads ingest fails overnight
    Given the Google Ads nightly refresh failed at 04:00 ET
    And it is now 14:00 ET
    When I open any dashboard page
    Then a yellow banner appears at the top of the page
    And the banner states "Google Ads data is stale, last refreshed [timestamp]"

Feature: Independent verification of agency numbers
  Scenario: QA reviewer cross checks LookSee monthly report
    Given LookSee reported a CPL of $42 for Practice Y in April
    When I open Practice Detail for Practice Y with time range April 1 to April 30
    Then the dashboard CPL is displayed and is within 2 percent of the agency reported figure
    And if the variance exceeds 5 percent, the tile shows a variance flag

Feature: Comparative insight against portfolio median
  Scenario: A practice is materially above portfolio CPL
    Given Practice X has a 30 day CPL of $84
    And the portfolio median 30 day CPL is $35
    When I open Practice Detail for Practice X
    Then a comparative insight card states "CPL is 2.4x portfolio median"
    And a recommendation suggests creative or targeting investigation

15 · Architecture Decision Records

The five most consequential calls, captured

Use existing Postgres on Railway as the marketing warehouse

ADR-001

Status

Accepted (pending Dakota confirmation per section 02)

Decision

Use the existing Railway Postgres for raw, staging, mart, and materialized view layers. Defer ClickHouse to when measured thresholds are breached (single MV refresh > 5 min, dashboard p95 > 2s, total mart rows > 500M).

Alternatives

ClickHouse (premature for current volume). BigQuery, Snowflake (rejected by user direction).

Consequences

Zero new infra, single backup story, joins between marketing and content engine data trivial. Materialized views required for performance. If volume grows 100x faster than projected, migration cost is real but the layered prefix convention makes a future ClickHouse copy of mart_* straightforward.

Reversibility

Reversible. The mart layer is portable.

Standalone sga-marketing-ingest service rather than endpoints on asset-registry

ADR-002

Status

Accepted

Decision

New Fastify service, separate Railway deployment, shares packages/shared-types for Zod and Drizzle.

Alternatives

Endpoints on asset-registry (mixes concerns), workers in same repo (couples deploys), Railway native Cron Jobs (weak observability).

Consequences

Independent scaling, blast radius isolation, OAuth secrets contained. One more service to operate. Bootstrap code duplicated, mitigated by shared package.

Reversibility

Reversible but costly after data volume builds.

node-cron in Phase 1, BullMQ in Phase 1.5, Temporal in Phase 2

ADR-003

Status

Accepted

Decision

Phase 1 uses node-cron in process. Job bodies wrap as plain async functions to make the BullMQ and Temporal swaps mechanical.

Alternatives

Railway native cron (no retries, opaque), BullMQ on day 1 (premature complexity, requires Redis addon), Temporal now (provisioning is its own project).

Consequences

Ships fast. Not the eventual end state. Migration work in Phase 1.5 and Phase 2.

Reversibility

Reversible. Migration path documented.

Attribution lives in the warehouse, not at ingest

ADR-004

Status

Accepted

Decision

Ingest writes raw and staging. A nightly transform job builds attribution_touchpoints and assignments from completed staging data. Three table model supports multiple attribution algorithms over the same touchpoint history.

Alternatives

Match at ingest (partial data problem), match at query time (slow, repeated work, inconsistent results across queries).

Consequences

Deterministic, replayable, single place to evolve match logic. Attribution is up to 24 hours behind. Late arriving GCLIDs miss the nightly window, mitigated by re running attribution over a rolling 7 day window.

Reversibility

Reversible. Match logic is in one job.

Marketing dashboard reads through asset-registry, not direct Postgres

ADR-005

Status

Accepted

Decision

Add /api/v1/marketing/* route group to asset-registry. Same auth, same Drizzle, separate route module. Zod schemas in packages/shared-types.

Alternatives

Separate API service (premature), direct Postgres from browser (unsafe), GraphQL gateway (out of scope).

Consequences

Zero new deploy unit, reuse Bearer auth, easy Cognito migration later. Asset registry now has two domains, watch for drift. Revisit at Phase 2 whether to split.

Reversibility

Reversible. Route group can be lifted into its own service.

16 · Risks and Mitigations

Top 6 ranked, what breaks and what we do about it

Rank	Risk	Likelihood	Impact	Mitigation
1	GCLID not captured on landing pages, so attribution is structurally impossible	High	Critical	Audit and patch every active landing page in week 1. Block Phase 2 attribution work until verified. Innovative Dental and Gen4 pages are highest priority.
2	OAuth refresh token revokes silently, sync fails for days unnoticed	Medium	High	`token_error` status triggers Sentry fatal alert. Dashboard shows last successful sync date per practice. Healthcheck pings Ads API daily.
3	MCC consolidation incomplete, two separate credential sets to manage	High (current state)	Medium	Design for multiple MCCs from day one (`GOOGLE_ADS_REFRESH_TOKEN_GEN4` / `_SGA` pattern). Do not assume single MCC.
4	Postgres volume grows faster than projected, materialized view refresh becomes the bottleneck	Low	High	Threshold based monitoring. ClickHouse migration plan documented. `REFRESH MATERIALIZED VIEW CONCURRENTLY` required, unique index on every MV.
5	GHL webhooks unreliable (drops, missed events, signature drift)	Medium	Medium	Hourly reconciliation pull is the safety net. Drift alerts. Durable queue table preserves replay capability.
6	Ad account access revoked by agency mid sync	Low	Medium	Log `access_revoked` in `sync_runs`, not as hard failure. Sentry alert, do not halt rest of run. Dual link strategy reduces blast radius.

17 · Action Items

What to confirm and what to start, this week

#	Action	Owner	Effort	Blocks
1	Confirm or course correct stack divergence (section 02)	Dakota	S (15 min)	Everything
2	Apply for Google Ads Standard Access developer token	Dakota	S (form, 24 to 72 hr review)	Production sync
3	Audit landing page GCLID capture (Innovative Dental, Gen4)	Dakota or web team	M (1 to 3 days, per page)	Phase 2 attribution
4	Confirm GHL Agency API token availability and webhook signing secret	Dakota	S (1 day)	GHL ingest
5	Provision `sga-marketing-ingest` Railway service in Content Engine project	Implementation	S (1 hr)	Walking skeleton
6	Provision Doppler project, populate Phase 1 secrets	Implementation	S (2 hr)	OAuth flows
7	Identify the 50 to 60 paid media active practices for Phase 1 scope	Dakota	S (existing list refresh)	Backfill prioritization
8	Decide MCC consolidation strategy (Gen4 only Phase 1 vs SGA MCC by Week 4)	Dakota	S (1 day)	Token strategy
9	Kick off week 1 walking skeleton sprint	Implementation	L (5 days)	Subsequent weeks

Recommended next move

Confirm the stack divergence (item 1) and authorize the walking skeleton sprint (item 9). Items 2 through 8 run in parallel during week 1, none of them block the walking skeleton itself. Within 5 working days you can have one number from one practice on screen at app.sga3p.com/marketing, proving the full path end to end.

Marketing Data Warehouse and Full Funnel Attribution Dashboard

What we are building, why, and the call we have to make first

Where this plan deviates from the 2026-04-20 settled stack

Phase boundaries and what ships when

Phase 1 (this plan, 6 weeks)

Phase 1.5 (immediate follow up, 3 weeks)

Phase 2 (Q3, scope TBD)

Phase B (separate ADR required)

Out of scope for Phase 1

End to end pipeline, ingestion service shape, dashboard data layer

Why a new dedicated worker, not endpoints on asset-registry

Dashboard data layer

Postgres design: raw, staging, mart, materialized

Schema layout

Google Ads tables (Phase 1)

DDL: marketing.google_ads_campaign_daily_metrics (partitioned)

GoHighLevel tables (Phase 1)

DDL: marketing.ghl_contacts (attribution carrier)

Identity resolution: how the funnel chains together

Attribution model (three table design)

Materialized views (dashboard tier)

Partitioning, retention, PII

Migration plan (Drizzle, drop in)

Every metric the dashboard must show, with formula and freshness

Spend metrics

Acquisition metrics

Funnel metrics

Cost efficiency metrics

Attribution metrics

Five views inside app.sga3p.com

Global filter bar (sticky, persists across views)

What "insights across each marketing channel" actually means

Anomaly detection rules (Phase 1)

Comparative insight rules

Recommendation templates (Phase 1.5)

Which models, when, and why

What "live update on a consistent schedule" means concretely

Secrets, scheduling, observability, backfill

Service topology on Railway

Scheduling: node-cron now, BullMQ when needed, Temporal eventually

Secrets: Railway env vars, layered with Doppler

OAuth refresh and rate limits

GHL webhook receiver

Observability stack (Phase 1, lightweight)

Three SLOs to commit to in Phase 1

Backfill workflow

Cost projection (Phase 1)

6 weeks, week by week, with a walking skeleton in week 1

Phase 1 inclusion decisions

Given / When / Then scenarios for the highest value flows

The five most consequential calls, captured

Top 6 ranked, what breaks and what we do about it

What to confirm and what to start, this week

DDL: `marketing.google_ads_campaign_daily_metrics` (partitioned)

DDL: `marketing.ghl_contacts` (attribution carrier)