SGA Dental Partners · Growth Platform

Marketing Data Warehouse and Full Funnel Attribution Dashboard

Phase 1 project plan. Daily refreshed Google Ads and GoHighLevel data unified into Railway Postgres, surfaced as a dashboard inside the existing SGA3P content engine, with full funnel attribution as the Phase 2 target.

Date
2026-05-06
Prepared For
Dakota Milner, SGA Growth
Phase
1 (Build, 6 weeks)
Status
Approved, in build (2026-05-06)
01 · Executive Summary

What we are building, why, and the call we have to make first

SGA operates 260 practices. Roughly 50 to 60 actively run paid media. Today there is no automated way to see spend, leads, cost per lead, or ROAS by practice without logging into 50 different ad accounts and CRMs. This plan stands up a single warehouse, a daily refresh, and a dashboard inside SGA3P that the growth team can open on Monday morning and act from.

Phase 1 Scope
Google Ads + GHL
Campaign daily grain for Google Ads. Webhook plus reconciliation pull for GHL. Dashboard with 5 views inside app.sga3p.com.
Target Ship
6 weeks
Walking skeleton end to end in week 1. Per practice tiles, funnel, and Insights feed by week 6. Phase 1.5 follows immediately.
Decision Confirmed
Postgres on Railway
Stack divergence approved 2026-05-06. ClickHouse stays as future option if cheap once volume thresholds breach. See section 02.
The headline
A standalone Fastify worker pulls Google Ads nightly and listens to GHL webhooks in real time. Data lands in a new marketing schema on the existing Railway Postgres. A new /api/v1/marketing/* route group on asset-registry serves materialized views to a new dashboard route inside the React content engine. BullMQ replaces node-cron when volume justifies it. Temporal replaces BullMQ when AWS arrives.
02 · Stack Divergence

Where this plan deviates from the 2026-04-20 settled stack

The settled stack memo locked in ClickHouse for analytics, Power BI for dashboards, and AWS for hosting. This plan diverges on two of those choices, deliberately, because Dakota directed the change in the project-kickoff conversation. Calling it out so the deviation is conscious, not drift.

ConcernSettled Stack (2026-04-20)This PlanRationale for the change
Analytics storeClickHouse Cloud or self hosted on AWSPostgres on Railway, single instanceClickHouse hosting was blocked. Phase 1 volume (low millions of rows per year) fits Postgres comfortably with partitioning and materialized views. Documented thresholds for when to migrate.
Dashboard surfacePower BI semantic modelsNew section inside app.sga3p.com (React + Vite + Tailwind)Power BI provisioning is not started. The content engine is already in production at sga3p.com with auth, layout, and design system. Adds zero new vendor.
Workflow orchestrationTemporalnode-cron in Phase 1, BullMQ in Phase 1.5, Temporal in Phase 2Temporal is not provisioned. node-cron is sufficient for nightly batch. Job bodies wrap as plain async functions so Temporal swap is mechanical.
CloudAWS (RDS, MSK, ECS)Railway through Phase 2Railway is where SGA Content Engine, asset-registry, and Postgres already live. No reason to split infrastructure mid build.
Event busKafka MSKPostgres staging table acting as durable queue (Kafka stub)Same pattern, simpler runtime. When Kafka MSK is real, the webhook handler drops a topic message instead of a row.
Approved 2026-05-06
Dakota confirmed the divergence: build on Railway Postgres now. ClickHouse remains an option for the future if cheap and once measured volume thresholds breach (single MV refresh > 5 min, dashboard p95 > 2s, total mart rows > 500M). The mart layer prefix convention keeps a future ClickHouse copy of mart_* mechanical.
03 · Goal and Scope

Phase boundaries and what ships when

Phase 1 (this plan, 6 weeks)

Google Ads campaign daily metrics. GHL contacts, opportunities, appointments via webhook plus hourly reconciliation. Five dashboard views: Portfolio Overview, Practice Detail, Channel View, Funnel View, Insights and Anomalies. Last touch and linear attribution. Anomaly rules. Stale data banners.

Phase 1.5 (immediate follow up, 3 weeks)

Google Ads keyword and ad group day metrics. Recommendation rule engine. First touch and time decay attribution. Pacing vs budget. Region rollups. BullMQ replaces node-cron once webhook reconciliation queue grows.

Phase 2 (Q3, scope TBD)

Meta Ads, GA4, Google Business Profile, call tracking. Multi touch attribution with revenue join. Practice manager login (RBAC). Migrate scheduler to Temporal. Evaluate ClickHouse against measured volume thresholds.

Phase B (separate ADR required)

PHI data: case started events from PMS, treatment level revenue, insurance claims. Triggers HIPAA review, BAAs, and a dedicated marketing_phi schema with stricter grants.

Out of scope for Phase 1

04 · System Architecture

End to end pipeline, ingestion service shape, dashboard data layer

EXTERNAL SOURCES Google Ads API GoHighLevel API + Webhooks | | v v +-----------------------------------------------------+ | sga-marketing-ingest (new Fastify worker svc) | | . node-cron schedules (nightly Google Ads pulls) | | . GHL webhook receiver (HMAC verified) | | . OAuth refresh handling, rate limit guards | | . Idempotent writers, sync_runs telemetry | +-----------------------------------------------------+ | | v v +-----------------------------------------------------+ | Postgres (Railway, internal-only, shared) | | . marketing.raw_* (append-only, JSONB) | | . marketing.stg_* (typed, deduplicated) | | . marketing.mart_* (dim/fact, time partitioned) | | . marketing.mv_* (materialized views) | +-----------------------------------------------------+ | v +-----------------------------------------------------+ | asset-registry (existing Fastify API) | | . NEW route group: /api/v1/marketing/* | | . Drizzle reads against mart_* and mv_* | | . Zod response schemas in packages/shared-types | | . 5-minute response cache on summary endpoints | +-----------------------------------------------------+ | v +-----------------------------------------------------+ | app.sga3p.com (React + Vite + Tailwind) | | . /marketing route section (5 views) | | . React Query against /api/v1/marketing/* | | . Recharts for visualization | +-----------------------------------------------------+

Why a new dedicated worker, not endpoints on asset-registry

Asset registry is a low latency request response API powering the content engine UI. Ingestion is long running, bursty, OAuth bearing, and rate limited by upstream APIs. Mixing them violates single responsibility, lets a runaway ingest loop choke the API, and bleeds OAuth secrets into the wrong service. The duplication cost is small because both services share packages/shared-types for Zod schemas and Drizzle tables.

OptionProsConsVerdict
(a) Endpoints on asset-registryZero new infra, reuse DrizzleMixes concerns, OAuth scope bleed, scaling decisions coupledReject
(b) Worker added inside asset-registry repoCode colocatedSame deploy unit, same memory pressure, restart of API kills mid ingestReject
(c) Standalone sga-marketing-ingest serviceIndependent deploy and scale, blast radius isolated, OAuth contained, clean Temporal seamOne more Railway serviceAccept
(d) Railway native Cron JobsZero infraEphemeral containers, weak observability, no in flight stateReject

Dashboard data layer

Add a /api/v1/marketing/* route group to the existing asset-registry service. Same auth (Bearer demo password now, Cognito later), same Drizzle, separate route module. The React app does not hit Postgres directly. Phase 1 routes:

MethodPathReads FromPurpose
GET/api/v1/marketing/portfolio/summary?range=30dmv_portfolio_kpi_dailyPortfolio Overview top tiles
GET/api/v1/marketing/practices/:id/summary?range=30dmv_practice_spend_30dPractice Detail tiles
GET/api/v1/marketing/practices/:id/funnel?range=30dmv_practice_funnel_30dFunnel waterfall
GET/api/v1/marketing/practices/:id/campaigns?range=30dmv_campaign_performance_30dCampaign table
GET/api/v1/marketing/insights?severity=high&range=14dmarketing.insights_logInsights and Anomalies feed
GET/api/v1/marketing/etl/statusmarketing.sync_runsStale data banner, last refreshed timestamps

All response shapes live in packages/shared-types/src/marketing.ts as Zod schemas. The React app imports the same types. Fastify response cache holds summary endpoints for 5 minutes since materialized views refresh on the order of hours.

05 · Database Schema

Postgres design: raw, staging, mart, materialized

All marketing objects live in a single marketing schema on the existing Railway Postgres, alongside shared (practices). The design separates four concerns: operational sync state, raw API ingest, conformed dimensions and facts, and materialized views for the dashboard.

Schema layout

GroupPrefixPurposeRetention
Operationalgoogle_ads_accounts, ghl_locations, sync_runs, insights_logAccount links, sync state, error log, generated insightsIndefinite
Rawraw_google_ads_*, raw_ghl_*Untransformed JSON dumps (replay source of truth)13 months hard cutoff
Dimensionsdim_date, dim_channel, dim_campaign, dim_practice (view)Stable lookups for joinsIndefinite
Factsgoogle_ads_campaign_daily_metrics, fact_ghl_event, fact_attribution_touchpointAppend mostly, time partitionedIndefinite
Attributionattribution_touchpoints, attribution_conversions, attribution_assignmentsMany to many credit allocation across multiple modelsIndefinite
Viewsmv_*Pre aggregated dashboard payloadsRefresh hourly or nightly

Google Ads tables (Phase 1)

TableGrainPhaseNotes
google_ads_accountscustomer_idExistsPractice to MCC mapping. Do not redesign.
google_ads_campaignscampaign_id1SCD type 1 (overwrite on change), keep updated_at
google_ads_ad_groupsad_group_id1SCD type 1
google_ads_campaign_daily_metrics(date, customer_id, campaign_id)1Partitioned by month. Cost stored in micros to match API source.
google_ads_ad_group_daily_metrics(date, ad_group_id)1.5Defer until per ad group ROI is requested
google_ads_keywordscriterion_id1.510x row count, low Phase 1 dashboard value

DDL: marketing.google_ads_campaign_daily_metrics (partitioned)

CREATE TABLE marketing.google_ads_campaign_daily_metrics (
  metric_date           DATE   NOT NULL,
  customer_id           BIGINT NOT NULL,
  campaign_id           BIGINT NOT NULL,
  practice_id           UUID   NOT NULL,
  impressions           BIGINT NOT NULL DEFAULT 0,
  clicks                BIGINT NOT NULL DEFAULT 0,
  cost_micros           BIGINT NOT NULL DEFAULT 0,
  conversions           NUMERIC(12,4) NOT NULL DEFAULT 0,
  conversion_value      NUMERIC(14,4) NOT NULL DEFAULT 0,
  view_through_conv     NUMERIC(12,4) NOT NULL DEFAULT 0,
  search_impression_share NUMERIC(6,4),
  avg_cpc_micros        BIGINT,
  ctr                   NUMERIC(8,6),
  ingested_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (metric_date, campaign_id),
  FOREIGN KEY (campaign_id) REFERENCES marketing.google_ads_campaigns(campaign_id),
  FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
) PARTITION BY RANGE (metric_date);

CREATE INDEX idx_gads_metrics_practice_date
  ON marketing.google_ads_campaign_daily_metrics (practice_id, metric_date DESC);
CREATE INDEX idx_gads_metrics_campaign_date
  ON marketing.google_ads_campaign_daily_metrics (campaign_id, metric_date DESC);

GoHighLevel tables (Phase 1)

TableGrainNotes
ghl_locationsone per GHL sub-accountOne sub-account per practice. Joined to shared.practices.
ghl_pipelinespipeline_idPer location, SGA standardizes names but allows variance.
ghl_pipeline_stagesstage_idOrdered by position.
ghl_contactscontact_idCarries first touch attribution: gclid, fbclid, utm_*, source.
ghl_opportunitiesopportunity_idLinked to contact, pipeline, stage. Holds monetary_value.
ghl_appointmentsappointment_idStatus: scheduled, confirmed, showed, no_show, cancelled, rescheduled.
ghl_callscall_idTwilio routed inbound calls if location is on call tracking.
ghl_workflow_eventsevent rowEntry, exit, goal-met events from automation workflows.
ghl_messagesmessage_idPhase 2 SMS / email body. Defer for PHI review.
ghl_webhook_eventsidempotency_keyDurable queue of received webhooks (Kafka stub).

DDL: marketing.ghl_contacts (attribution carrier)

CREATE TABLE marketing.ghl_contacts (
  contact_id          TEXT        NOT NULL,
  location_id         TEXT        NOT NULL,
  practice_id         UUID        NOT NULL,
  first_name          TEXT,
  last_name           TEXT,
  email_hash          BYTEA,                  -- sha256(lower(trim(email)))
  phone_hash          BYTEA,                  -- sha256(e164(phone))
  email_encrypted     BYTEA,                  -- pgcrypto, key in app layer
  phone_encrypted     BYTEA,
  source              TEXT,                   -- 'Google Ads', 'Facebook', 'Direct'
  utm_source          TEXT,
  utm_medium          TEXT,
  utm_campaign        TEXT,
  utm_content         TEXT,
  utm_term            TEXT,
  gclid               TEXT,
  fbclid              TEXT,
  wbraid              TEXT,
  gbraid              TEXT,
  tags                TEXT[]      NOT NULL DEFAULT '{}',
  date_added          TIMESTAMPTZ NOT NULL,
  first_seen_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  last_activity_at    TIMESTAMPTZ,
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
  raw_payload         JSONB,
  PRIMARY KEY (contact_id),
  FOREIGN KEY (location_id) REFERENCES marketing.ghl_locations(location_id),
  FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
);

CREATE INDEX idx_ghl_contacts_practice_added ON marketing.ghl_contacts (practice_id, date_added DESC);
CREATE INDEX idx_ghl_contacts_gclid          ON marketing.ghl_contacts (gclid) WHERE gclid IS NOT NULL;
CREATE INDEX idx_ghl_contacts_email_hash     ON marketing.ghl_contacts (email_hash) WHERE email_hash IS NOT NULL;
CREATE INDEX idx_ghl_contacts_phone_hash     ON marketing.ghl_contacts (phone_hash) WHERE phone_hash IS NOT NULL;

Identity resolution: how the funnel chains together

The chain is Google Click (GCLID) to GHL Contact to Opportunity to Appointment to Started Treatment to Revenue.

LayerPrimary KeyFallbackReliability
Click to ContactGCLID captured at form fillPhone (E.164 normalized) for call tracked leadsHigh when landing pages capture GCLID, medium otherwise
Contact to Ad SpendGCLID joined to campaign via Google Ads click viewUTM campaign text matchHigh
Cross system (Phase 2: Meta, GA4)phone_hash, email_hashFirst party cookie or device idMedium
Phase 1 prerequisite
Every active Innovative Dental and Gen4 landing page form must capture gclid from URL params and forward to GHL as a contact custom field. Without this, full funnel attribution is structurally impossible regardless of warehouse design. Audit and patch in week 1, block Phase 2 attribution work until verified.

Attribution model (three table design)

Three tables let multiple attribution algorithms run over the same touchpoint history without rewriting facts.

This shape supports first touch, last touch, linear, time decay, position based, and (Phase 2) data driven without reshaping data.

Materialized views (dashboard tier)

MVGrainRefreshPowers
mv_portfolio_kpi_daily(metric_date)Hourly, CONCURRENTLYPortfolio Overview tiles
mv_practice_spend_30d(practice_id)Hourly, CONCURRENTLYPractice Detail spend tile, MoM delta
mv_practice_funnel_30d(practice_id)Hourly, CONCURRENTLYClick to contact to appt to showed funnel
mv_campaign_performance_30d(campaign_id)Hourly, CONCURRENTLYCampaign leaderboard
mv_channel_roi_90d(practice_id, channel)Nightly, CONCURRENTLYChannel mix chart
mv_attribution_assist_path(practice_id, model_name, path_signature)NightlyMulti touch path explorer (Phase 1.5)

Every MV requires a UNIQUE index on its grain key so REFRESH MATERIALIZED VIEW CONCURRENTLY works. Without this, refresh blocks dashboard reads.

Partitioning, retention, PII

ConcernApproach
PartitioningDaily metrics and event facts partitioned by RANGE on date column, monthly cadence. Use pg_partman or a scheduled job to pre create next 3 months and detach beyond retention.
Retention (raw)13 months hard cutoff via partition drop. Raw JSON may contain PII. Restrict SELECT to ETL role only.
Retention (mart, MV)Indefinite. Practice level only, no raw PII.
PII handlingEmail and phone stored as BYTEA sha256 hashes for joins, plus pgcrypto encrypted columns for display. Encryption key held in app layer, not Postgres.
PHIHard rule: no diagnoses, treatment plans, clinical notes, or insurance data in this schema. Phase B introduces marketing_phi with its own grants.

Migration plan (Drizzle, drop in)

OrderMigrationContents
0NNmarketing_schema_initCREATE SCHEMA marketing, grants, dim_date, dim_channel, sync_runs, insights_log
0NN+1google_ads_dimensionsgoogle_ads_campaigns, google_ads_ad_groups
0NN+2google_ads_factsPartitioned google_ads_campaign_daily_metrics plus first 14 monthly partitions
0NN+3ghl_locations_pipelinesghl_locations, ghl_pipelines, ghl_pipeline_stages
0NN+4ghl_contacts_oppsghl_contacts, ghl_opportunities
0NN+5ghl_appointments_callsghl_appointments, ghl_calls, ghl_workflow_events, ghl_webhook_events
0NN+6attribution_coreattribution_touchpoints, attribution_conversions, attribution_assignments, contact_merges
0NN+7attribution_functionsmerge_contacts(), resolve_click_to_contact()
0NN+8marketing_viewsdim_practice view, dim_campaign view
0NN+9marketing_mvsAll 6 materialized views plus unique indexes
0NN+10raw_staging_tablesPartitioned raw_google_ads_*, raw_ghl_*
0NN+11partition_maintenancepg_partman config or pl/pgsql monthly rotation job
06 · KPI Catalog

Every metric the dashboard must show, with formula and freshness

Stale threshold is the lag past which the dashboard surfaces a yellow data warning banner. If a source crosses its threshold, the page header shows the affected source with the last successful refresh timestamp.

Spend metrics

KPIFormulaSourceGrainStale
Total SpendSum of cost across all in scope Google Ads accountsGoogle AdsDaily36 hr
Spend by ChannelCost grouped by channel (Search, PMax, Display)Google AdsDaily36 hr
Spend by PracticeCost grouped by customer_id, mapped to practice_idGoogle Ads + practice mapDaily36 hr
Daily Spend TrendSpend per calendar day, rolling 90 daysGoogle AdsDaily36 hr
Pacing vs Budget(MTD spend / days elapsed) x days in month, vs monthly budgetGoogle Ads + budget tableDaily36 hr

Acquisition metrics

KPIFormulaSourceGrainStale
ImpressionsCount of ad impressions servedGoogle AdsDaily36 hr
ClicksCount of ad clicksGoogle AdsDaily36 hr
CTRClicks / ImpressionsDerivedDaily36 hr
CPCSpend / ClicksDerivedDaily36 hr
Platform ConversionsGoogle Ads reported conversions (form fills, calls)Google AdsDaily36 hr

Funnel metrics

KPIFormulaSourceGrainStale
LeadsNew contacts created in GHL with marketing source tagGHL Webhook + RESTHourly2 hr
ContactsTotal contacts created (lead and non lead)GHLHourly2 hr
OpportunitiesGHL opportunities at "New Patient" stageGHLHourly2 hr
Appointments BookedGHL opportunity at "Booked" stage or appointment createdGHLHourly4 hr
Appointments ShowedGHL appointment status = "Showed"GHLHourly4 hr
Started TreatmentsGHL opportunity at "Treatment Started" stageGHL custom fieldDaily24 hr
Revenue (Booked)Sum of opportunity monetary value at "Treatment Started"GHL custom fieldDaily24 hr

Cost efficiency metrics

KPIFormulaSourceGrain
CPLSpend / LeadsDerivedDaily
Cost per Booked ApptSpend / Appointments BookedDerivedDaily
Cost per Showed ApptSpend / Appointments ShowedDerivedDaily
Cost per Started CaseSpend / Started TreatmentsDerivedDaily
ROASRevenue (Booked) / SpendDerivedDaily
Payback Period (days)Median days from first ad click to first revenue event, per practiceDerivedWeekly

Attribution metrics

KPIFormulaSourceGrain
First Touch ConversionsConversions credited to first touch in 90 day windowDerivedDaily
Last Touch ConversionsConversions credited to most recent touch before conversionDerivedDaily
Linear Attribution ShareEqual credit split across all touches in pathDerivedDaily
Assisted ConversionsCount of conversions where channel appeared in path but not as last touchDerivedDaily
Channel Overlap RatePercent of converters touched by 2 or more channelsDerivedWeekly
07 · Dashboard Information Architecture

Five views inside app.sga3p.com

ViewPrimary UserPrimary QuestionKey Tiles / ChartsDefault Range
Portfolio Overview (default landing) Dakota (growth lead) Which practices need attention this week? Total spend, total leads, blended CPL, total started cases, blended ROAS, all with delta vs prior period. 90 day spend and lead trend (dual axis), channel mix donut, top 5 and bottom 5 practices by CPL. Last 30 days
Practice Detail Dakota, regional ops What is happening at this practice and where is the funnel breaking? Practice tiles with portfolio median benchmarks. Funnel waterfall (Impression to Started Case). Channel breakdown table. Practice scoped anomaly feed. Last 30 days
Channel View Marketing strategist, agency QA How is Google Ads performing and which campaigns drive results? Channel spend, leads, CPL, ROAS. Sortable campaign table with drilldown to ad group. Cost vs conversion scatter (one dot per campaign). Last 30 days
Funnel View Marketing strategist, regional ops Where in the patient journey are we losing the most value? Click to lead %, lead to booked %, booked to showed %, showed to started %. Full funnel waterfall with stage drop off table by practice. Attribution model toggle. Last 30 days
Insights and Anomalies Dakota, regional ops What changed and what should I do about it? Reverse chronological feed of flagged events with practice, metric, severity, suggested action, and link to Practice Detail. Last 14 days

Global filter bar (sticky, persists across views)

FilterOptionsDefault
Time rangeToday, Yesterday, Last 7 / 14 / 30 / 90 days, MTD, QTD, YTD, CustomLast 30 days
Compare toPrevious period, Same period last year, NonePrevious period
PracticeSingle, multi, by region, by ownership tier (SGA / LookSee / Gen4), all activeAll active
ChannelGoogle Search, Google PMax, Google Display, allAll
Attribution modelLast touch, Linear (Phase 1). First touch, Time decay, Position based (Phase 1.5). Data driven (Phase 2)Last touch
08 · Insights and Anomaly Rules

What "insights across each marketing channel" actually means

An insight is a system generated, plain English statement about a measurable change or comparison that warrants action. Insights are not raw charts. They appear as cards in the Insights feed and as inline badges on Practice Detail tiles, with acknowledge and snooze (7 day) actions to keep the feed actionable.

Anomaly detection rules (Phase 1)

TriggerThresholdSeveritySuggested action
Daily spend spikeDay spend exceeds trailing 14 day mean by 50%+HighCheck campaign budget changes, possible runaway PMax
Daily spend dropDay spend below trailing 14 day mean by 40%+HighCheck billing status, account suspension, accidental pause
CPL spike7 day rolling CPL exceeds trailing 28 day rolling CPL by 60%+MediumAudit creative, geo targeting, landing page
Lead drop7 day lead count below trailing 28 day median by 50%+HighFront desk follow up issue or platform issue
Conversion rate dropClick to lead % below practice 28 day baseline by 30%+MediumLanding page audit, form audit, page speed
Stale dataSource has not refreshed past stale thresholdHighBanner on every page, last refreshed timestamp shown

Comparative insight rules

Recommendation templates (Phase 1.5)

Trigger combinationRecommendation
High CPL + low CTR"Creative refresh recommended, ad CTR is below 1.5%."
Healthy CTR + high CPL + low click to lead"Landing page conversion issue. Audit form and page speed."
Strong leads + low booked %"Front desk follow up issue, leads are not converting to appointments. Review GHL nurture sequence."
Healthy booked + low showed %"Reminder cadence issue. Audit appointment confirmation SMS frequency and timing."
09 · Attribution Model Strategy

Which models, when, and why

ModelPhaseRationale
Last touch1 (default)Matches what Google Ads reports natively. Lowest friction for cross checking against agency reports.
Linear1Equal credit across all touches in the path. Simple to explain, immediately reveals the multi touch reality of dental patient journeys.
First touch1.5Useful for top of funnel brand spend evaluation.
Time decay (7 day half life)1.5Best single number proxy for short sales cycles, which dental tends to be.
Position based (40, 20, 40)1.5Compromise between first and last touch.
Data driven2 (portfolio level only)Requires roughly 300 conversions and 3,000 ad interactions per 30 days per account. Most practices will not qualify individually. Run at portfolio level only.

The attribution model selector is a single dropdown in the global filter bar. Changing it recomputes attribution dependent KPIs (assisted conversions, channel share, ROAS by channel) in place. The selected model is shown as a persistent badge on every attribution dependent tile.

10 · Refresh Cadence

What "live update on a consistent schedule" means concretely

SourceMechanismCadenceRationale
Google Ads campaign + ad group day metricsnode-cron scheduled pullDaily 04:00 ET (09:00 UTC), pulling T-2 through T-1Google Ads stats finalize ~T+1, conversion adjustments arrive up to T+3, so re pull a 3 day window to absorb late conversions
Google Ads keyword and search term reportsnode-cron pullDaily 05:00 ET, T-1 onlyHigher volume, less revision after T+1. Phase 1.5.
Google Ads account structure (campaigns, ad groups)node-cron pullHourlyCheap, surfaces new campaigns quickly in dashboard
GHL contacts, opportunities, appointmentsWebhook (primary) + reconciliation pull (safety net)Webhook real time, reconciliation every hourWebhooks are eventually consistent and occasionally lost, reconciliation pull catches drift
GHL pipeline and stage definitionsnode-cron pullDaily 03:00 ETRarely changes
Materialized views (slow)SQL jobDaily 04:30 ET, post Google Ads pull, CONCURRENTLYChannel ROI 90d, attribution paths
Materialized views (fast)SQL jobHourly, CONCURRENTLYPractice spend 30d, funnel 30d, campaign performance 30d

Principle: webhooks update the warehouse within seconds, scheduled pulls are the safety net. Both write to the same raw_ghl_* tables. The dedupe step in stg_* resolves conflicts using updated_at from the source.

Every dashboard page shows three timestamps in the footer: "Google Ads last refreshed," "GHL last refreshed," "Derived metrics last computed." If any source is past its stale threshold, a yellow banner appears at the top of the page with the affected source named.

11 · DevOps and Infrastructure

Secrets, scheduling, observability, backfill

Service topology on Railway

Deploy sga-marketing-ingest as a new Railway service in the same project as the existing Postgres service. Same private network (no egress cost on the Postgres connection), separate environment, isolated blast radius.

Scheduling: node-cron now, BullMQ when needed, Temporal eventually

// scheduler.ts -- the only file that changes when Temporal arrives
import cron from 'node-cron';
import { runGoogleAdsDailyPull } from './jobs/google-ads-daily';
import { runGhlReconciliation } from './jobs/ghl-reconciliation';
import { refreshMvSlow, refreshMvFast } from './jobs/mv-refresh';

export function startScheduler() {
  cron.schedule('0 9 * * *',  () => runGoogleAdsDailyPull({ window: 'T-2..T-1' }));
  cron.schedule('0 * * * *',  () => runGhlReconciliation());
  cron.schedule('30 9 * * *', () => refreshMvSlow());
  cron.schedule('0 * * * *',  () => refreshMvFast());
}

Every job body is a plain async function with no cron dependency. node-cron calls it today, BullMQ calls it tomorrow, Temporal Activity calls it the day after. The migration cost is one file.

Secrets: Railway env vars, layered with Doppler

# --- Google Ads ----------------------------------
GOOGLE_ADS_DEVELOPER_TOKEN=
GOOGLE_ADS_CLIENT_ID=
GOOGLE_ADS_CLIENT_SECRET=
GOOGLE_ADS_REFRESH_TOKEN_GEN4=
GOOGLE_ADS_REFRESH_TOKEN_SGA=          # add when SGA MCC exists
GOOGLE_ADS_MCC_CUSTOMER_ID_GEN4=
GOOGLE_ADS_MCC_CUSTOMER_ID_SGA=

# --- GoHighLevel ---------------------------------
GHL_AGENCY_API_TOKEN=                  # single agency token, not per location
GHL_WEBHOOK_SIGNING_SECRET=

# --- Database -----------------------------------
DATABASE_URL=                          # Railway internal

# --- Observability ------------------------------
SENTRY_DSN=
BETTERSTACK_SOURCE_TOKEN=

# --- Runtime ------------------------------------
NODE_ENV=production
TZ=UTC
LOG_LEVEL=info

Doppler is recommended on top of Railway env vars: single source of truth, audit log, per environment promotion. Do not introduce AWS Secrets Manager in Phase 1, that is a Phase 2 migration.

OAuth refresh and rate limits

GHL webhook receiver

POST /webhooks/ghl on the worker, exposed via a Railway public URL (separate hostname from api.sga3p.com). HMAC verification using x-ghl-signature. Synchronous insert into marketing.ghl_webhook_events (the durable queue). Downstream processor reads from that table on the hourly reconciliation job and marks rows processed = true. This gives replay capability without any new infrastructure.

Observability stack (Phase 1, lightweight)

Three SLOs to commit to in Phase 1

SLOTarget
Dashboard p95 latency< 1.5 seconds
Data freshness for Google Ads< 26 hours from event to dashboard
Data freshness for GHL via webhook< 5 minutes from event to dashboard

Backfill workflow

POST /internal/backfill on the worker (static internal API key, not exposed). Body: { practiceId, source, startDate, endDate }. Writes one row per date-source combination to backfill_queue. A backfill runner processes oldest first, respects rate limits, marks rows done with row count. Idempotent on (practice_id, source, data_date). 90 days x 60 practices x 200ms delay ≈ 18 minutes wall time.

Cost projection (Phase 1)

ItemMonthly
sga-marketing-ingest on Railway (512MB, Starter)$5 to $10
Postgres storage growth (~500MB year 1)$0 to $5
Doppler Team plan (free tier)$0
Sentry Developer plan (free tier)$0
Better Stack (free tier)$0
Total Phase 1$5 to $15
12 · Phase 1 Roadmap

6 weeks, week by week, with a walking skeleton in week 1

Walking skeleton first. Prove the full path end to end with one practice, one day, one number on screen, then replicate breadth and depth from there.

Week 1: Walking Skeleton
Wk 1
  • Provision sga-marketing-ingest Railway service. Wire DATABASE_URL to internal Postgres.
  • Apply migrations 0NN through 0NN+2 (marketing schema, Google Ads dimensions, partitioned daily metrics).
  • Pull one Google Ads account, one day, write to raw_googleads_campaign_daily, transform to marketing.google_ads_campaign_daily_metrics.
  • Add /api/v1/marketing/practices/:id/summary route on asset-registry.
  • Render one number ("Cost yesterday") on a new /marketing route in app.sga3p.com.
  • Audit: every active landing page captures gclid from URL params and forwards to GHL contact custom field. Block Phase 2 attribution work until verified.
Week 2: Google Ads Breadth
Wk 2
  • Apply remaining Google Ads migrations (campaigns, ad groups). Backfill 90 days x 60 practices.
  • Schedule nightly pull at 09:00 UTC. Implement OAuth refresh handling and token_error alerting.
  • Build mv_portfolio_kpi_daily and mv_practice_spend_30d. Wire to summary endpoints.
  • Ship Portfolio Overview view with spend, leads (placeholder), CPL (placeholder), ROAS (placeholder), 90 day trend chart.
Week 3: GHL Ingest
Wk 3
  • Apply GHL migrations (locations, pipelines, contacts, opportunities, appointments).
  • Stand up POST /webhooks/ghl with HMAC verification and durable queue table.
  • Build hourly reconciliation pull using GHL Agency API token.
  • Backfill 90 days of contacts, opportunities, appointments per practice.
  • Wire leads, opportunities, appointments tiles to real data. Add stale data banner.
Week 4: Funnel and Practice Detail
Wk 4
  • Build mv_practice_funnel_30d and mv_campaign_performance_30d.
  • Ship Practice Detail view (tiles with portfolio benchmarks, funnel waterfall, channel breakdown).
  • Ship Channel View with sortable campaign table and cost vs conversion scatter.
  • Ship Funnel View with full waterfall and stage drop off table.
Week 5: Attribution and Insights
Wk 5
  • Apply attribution migrations. Implement nightly transform building attribution_touchpoints from GHL contacts and Google Ads click view.
  • Implement last touch and linear attribution computations. Wire UI toggle to recompute attribution dependent tiles.
  • Implement anomaly rule engine (5 rules from section 8). Write to insights_log.
  • Ship Insights and Anomalies view with severity, suggested action, snooze.
Week 6: Hardening and Launch
Wk 6
  • Cross check dashboard CPL and spend against LookSee monthly report for 5 practices. Variance must be under 5%, ideally under 2%.
  • Performance pass: Fastify response cache on summary endpoints, dashboard p95 under 1.5s.
  • Observability shake out: trigger every Sentry alert path manually, confirm Better Stack uptime monitor.
  • Documentation: data dictionary, runbook, on call playbook.
  • Internal launch to growth team. Schedule Phase 1.5 kickoff.
13 · MoSCoW Prioritization

Phase 1 inclusion decisions

ItemPriorityRationale
Portfolio Overview viewMustCore reason the project exists.
Practice Detail viewMustRequired to act on portfolio level signals.
Channel View (Google Ads)MustPhase 1 channel scope, needed for QA.
Funnel ViewMustFull funnel attribution is the user's stated ultimate goal.
Google Ads ingest (campaign daily)MustPhase 1 source.
GHL ingest (webhook + REST reconciliation)MustPhase 1 source.
Spend, CPL, ROAS tilesMustTier 1 metrics for every persona.
Last touch + Linear attributionMustDefault model + multi touch reality.
Anomaly detection (5 rules)Must"Insights across each channel" requirement.
Filters: time, practice, channelMustCannot use the dashboard without these.
Stale data banner + last refreshed timestampsMustTrust requirement.
Insights feed pageShouldHigh value, but tile badges cover urgent path.
Comparative insights vs portfolio medianShouldStrong differentiator but not blocking.
Pacing vs budgetShouldCurrently tracked manually.
Recommendation rule engineShouldTemplates can ship in 1.1.
First touch + time decay attributionCouldDefer to Phase 1.5.
Region rollupsCouldUseful but small audience initially.
Ownership tier filter (SGA / LookSee / Gen4)CouldPractice multi select covers the use case in Phase 1.
Practice manager login + per practice RBACWon'tPhase 2.
Meta Ads, GA4, GBP, YelpWon'tPhase 2.
Predictive forecasting + budget optimizationWon'tPhase 3.
Write back to Google AdsWon'tRead only, regulatory and permissions risk.
PHI / clinical dataWon'tPhase B with BAAs.
Data driven attributionWon'tInsufficient per practice volume, defer to Phase 2 portfolio.
Agency comparison scorecardsWon'tDefer until data trust is established.
14 · Acceptance Criteria

Given / When / Then scenarios for the highest value flows

Feature: Portfolio Overview load performance and content
  Scenario: Growth lead opens Portfolio Overview on Monday morning
    Given I am authenticated as the growth lead
    And there are 50 or more practices with active spend in the last 30 days
    When I navigate to Portfolio Overview
    Then I see total spend, total leads, blended CPL, total started cases, and blended ROAS within 2 seconds
    And each tile shows delta versus the previous 30 day period

Feature: Funnel waterfall on Practice Detail
  Scenario: Diagnosing where a practice is losing patients
    Given I have selected a single practice on Practice Detail
    When the funnel waterfall renders
    Then I see counts and conversion percentages at each stage from Impression through Started Case
    And the largest drop off stage is visually highlighted

Feature: Attribution model toggle
  Scenario: Switching from last touch to linear attribution
    Given I am on Funnel View with attribution set to Last Touch
    When I change the attribution selector to Linear
    Then ROAS, channel share, and assisted conversion tiles recompute within 3 seconds
    And every attribution dependent tile shows a "Linear" badge

Feature: Anomaly detection surfaces a spend spike
  Scenario: A practice spends 60 percent above trailing mean
    Given Practice X has a trailing 14 day mean daily spend of $100
    And Practice X spent $180 yesterday
    When the nightly insights job runs at 05:00 ET
    Then a High severity insight card appears in the Insights feed
    And the card states the practice name, the metric, the percentage change, and a suggested action

Feature: GHL webhook freshness
  Scenario: A new lead is created in GHL
    Given a contact is created in GHL with a marketing source tag at 14:00 ET
    When I refresh Portfolio Overview at 14:05 ET
    Then the lead count tile reflects the new lead
    And the GHL last refreshed timestamp is within 5 minutes of now

Feature: Google Ads stale data warning
  Scenario: Google Ads ingest fails overnight
    Given the Google Ads nightly refresh failed at 04:00 ET
    And it is now 14:00 ET
    When I open any dashboard page
    Then a yellow banner appears at the top of the page
    And the banner states "Google Ads data is stale, last refreshed [timestamp]"

Feature: Independent verification of agency numbers
  Scenario: QA reviewer cross checks LookSee monthly report
    Given LookSee reported a CPL of $42 for Practice Y in April
    When I open Practice Detail for Practice Y with time range April 1 to April 30
    Then the dashboard CPL is displayed and is within 2 percent of the agency reported figure
    And if the variance exceeds 5 percent, the tile shows a variance flag

Feature: Comparative insight against portfolio median
  Scenario: A practice is materially above portfolio CPL
    Given Practice X has a 30 day CPL of $84
    And the portfolio median 30 day CPL is $35
    When I open Practice Detail for Practice X
    Then a comparative insight card states "CPL is 2.4x portfolio median"
    And a recommendation suggests creative or targeting investigation
15 · Architecture Decision Records

The five most consequential calls, captured

Use existing Postgres on Railway as the marketing warehouse
ADR-001
Status
Accepted (pending Dakota confirmation per section 02)
Decision
Use the existing Railway Postgres for raw, staging, mart, and materialized view layers. Defer ClickHouse to when measured thresholds are breached (single MV refresh > 5 min, dashboard p95 > 2s, total mart rows > 500M).
Alternatives
ClickHouse (premature for current volume). BigQuery, Snowflake (rejected by user direction).
Consequences
Zero new infra, single backup story, joins between marketing and content engine data trivial. Materialized views required for performance. If volume grows 100x faster than projected, migration cost is real but the layered prefix convention makes a future ClickHouse copy of mart_* straightforward.
Reversibility
Reversible. The mart layer is portable.
Standalone sga-marketing-ingest service rather than endpoints on asset-registry
ADR-002
Status
Accepted
Decision
New Fastify service, separate Railway deployment, shares packages/shared-types for Zod and Drizzle.
Alternatives
Endpoints on asset-registry (mixes concerns), workers in same repo (couples deploys), Railway native Cron Jobs (weak observability).
Consequences
Independent scaling, blast radius isolation, OAuth secrets contained. One more service to operate. Bootstrap code duplicated, mitigated by shared package.
Reversibility
Reversible but costly after data volume builds.
node-cron in Phase 1, BullMQ in Phase 1.5, Temporal in Phase 2
ADR-003
Status
Accepted
Decision
Phase 1 uses node-cron in process. Job bodies wrap as plain async functions to make the BullMQ and Temporal swaps mechanical.
Alternatives
Railway native cron (no retries, opaque), BullMQ on day 1 (premature complexity, requires Redis addon), Temporal now (provisioning is its own project).
Consequences
Ships fast. Not the eventual end state. Migration work in Phase 1.5 and Phase 2.
Reversibility
Reversible. Migration path documented.
Attribution lives in the warehouse, not at ingest
ADR-004
Status
Accepted
Decision
Ingest writes raw and staging. A nightly transform job builds attribution_touchpoints and assignments from completed staging data. Three table model supports multiple attribution algorithms over the same touchpoint history.
Alternatives
Match at ingest (partial data problem), match at query time (slow, repeated work, inconsistent results across queries).
Consequences
Deterministic, replayable, single place to evolve match logic. Attribution is up to 24 hours behind. Late arriving GCLIDs miss the nightly window, mitigated by re running attribution over a rolling 7 day window.
Reversibility
Reversible. Match logic is in one job.
Marketing dashboard reads through asset-registry, not direct Postgres
ADR-005
Status
Accepted
Decision
Add /api/v1/marketing/* route group to asset-registry. Same auth, same Drizzle, separate route module. Zod schemas in packages/shared-types.
Alternatives
Separate API service (premature), direct Postgres from browser (unsafe), GraphQL gateway (out of scope).
Consequences
Zero new deploy unit, reuse Bearer auth, easy Cognito migration later. Asset registry now has two domains, watch for drift. Revisit at Phase 2 whether to split.
Reversibility
Reversible. Route group can be lifted into its own service.
16 · Risks and Mitigations

Top 6 ranked, what breaks and what we do about it

RankRiskLikelihoodImpactMitigation
1 GCLID not captured on landing pages, so attribution is structurally impossible High Critical Audit and patch every active landing page in week 1. Block Phase 2 attribution work until verified. Innovative Dental and Gen4 pages are highest priority.
2 OAuth refresh token revokes silently, sync fails for days unnoticed Medium High token_error status triggers Sentry fatal alert. Dashboard shows last successful sync date per practice. Healthcheck pings Ads API daily.
3 MCC consolidation incomplete, two separate credential sets to manage High (current state) Medium Design for multiple MCCs from day one (GOOGLE_ADS_REFRESH_TOKEN_GEN4 / _SGA pattern). Do not assume single MCC.
4 Postgres volume grows faster than projected, materialized view refresh becomes the bottleneck Low High Threshold based monitoring. ClickHouse migration plan documented. REFRESH MATERIALIZED VIEW CONCURRENTLY required, unique index on every MV.
5 GHL webhooks unreliable (drops, missed events, signature drift) Medium Medium Hourly reconciliation pull is the safety net. Drift alerts. Durable queue table preserves replay capability.
6 Ad account access revoked by agency mid sync Low Medium Log access_revoked in sync_runs, not as hard failure. Sentry alert, do not halt rest of run. Dual link strategy reduces blast radius.
17 · Action Items

What to confirm and what to start, this week

#ActionOwnerEffortBlocks
1Confirm or course correct stack divergence (section 02)DakotaS (15 min)Everything
2Apply for Google Ads Standard Access developer tokenDakotaS (form, 24 to 72 hr review)Production sync
3Audit landing page GCLID capture (Innovative Dental, Gen4)Dakota or web teamM (1 to 3 days, per page)Phase 2 attribution
4Confirm GHL Agency API token availability and webhook signing secretDakotaS (1 day)GHL ingest
5Provision sga-marketing-ingest Railway service in Content Engine projectImplementationS (1 hr)Walking skeleton
6Provision Doppler project, populate Phase 1 secretsImplementationS (2 hr)OAuth flows
7Identify the 50 to 60 paid media active practices for Phase 1 scopeDakotaS (existing list refresh)Backfill prioritization
8Decide MCC consolidation strategy (Gen4 only Phase 1 vs SGA MCC by Week 4)DakotaS (1 day)Token strategy
9Kick off week 1 walking skeleton sprintImplementationL (5 days)Subsequent weeks
Recommended next move
Confirm the stack divergence (item 1) and authorize the walking skeleton sprint (item 9). Items 2 through 8 run in parallel during week 1, none of them block the walking skeleton itself. Within 5 working days you can have one number from one practice on screen at app.sga3p.com/marketing, proving the full path end to end.