Phase 1 project plan. Daily refreshed Google Ads and GoHighLevel data unified into Railway Postgres, surfaced as a dashboard inside the existing SGA3P content engine, with full funnel attribution as the Phase 2 target.
SGA operates 260 practices. Roughly 50 to 60 actively run paid media. Today there is no automated way to see spend, leads, cost per lead, or ROAS by practice without logging into 50 different ad accounts and CRMs. This plan stands up a single warehouse, a daily refresh, and a dashboard inside SGA3P that the growth team can open on Monday morning and act from.
marketing schema on the existing Railway Postgres. A new /api/v1/marketing/* route group on asset-registry serves materialized views to a new dashboard route inside the React content engine. BullMQ replaces node-cron when volume justifies it. Temporal replaces BullMQ when AWS arrives.The settled stack memo locked in ClickHouse for analytics, Power BI for dashboards, and AWS for hosting. This plan diverges on two of those choices, deliberately, because Dakota directed the change in the project-kickoff conversation. Calling it out so the deviation is conscious, not drift.
| Concern | Settled Stack (2026-04-20) | This Plan | Rationale for the change |
|---|---|---|---|
| Analytics store | ClickHouse Cloud or self hosted on AWS | Postgres on Railway, single instance | ClickHouse hosting was blocked. Phase 1 volume (low millions of rows per year) fits Postgres comfortably with partitioning and materialized views. Documented thresholds for when to migrate. |
| Dashboard surface | Power BI semantic models | New section inside app.sga3p.com (React + Vite + Tailwind) | Power BI provisioning is not started. The content engine is already in production at sga3p.com with auth, layout, and design system. Adds zero new vendor. |
| Workflow orchestration | Temporal | node-cron in Phase 1, BullMQ in Phase 1.5, Temporal in Phase 2 | Temporal is not provisioned. node-cron is sufficient for nightly batch. Job bodies wrap as plain async functions so Temporal swap is mechanical. |
| Cloud | AWS (RDS, MSK, ECS) | Railway through Phase 2 | Railway is where SGA Content Engine, asset-registry, and Postgres already live. No reason to split infrastructure mid build. |
| Event bus | Kafka MSK | Postgres staging table acting as durable queue (Kafka stub) | Same pattern, simpler runtime. When Kafka MSK is real, the webhook handler drops a topic message instead of a row. |
mart_* mechanical.Google Ads campaign daily metrics. GHL contacts, opportunities, appointments via webhook plus hourly reconciliation. Five dashboard views: Portfolio Overview, Practice Detail, Channel View, Funnel View, Insights and Anomalies. Last touch and linear attribution. Anomaly rules. Stale data banners.
Google Ads keyword and ad group day metrics. Recommendation rule engine. First touch and time decay attribution. Pacing vs budget. Region rollups. BullMQ replaces node-cron once webhook reconciliation queue grows.
Meta Ads, GA4, Google Business Profile, call tracking. Multi touch attribution with revenue join. Practice manager login (RBAC). Migrate scheduler to Temporal. Evaluate ClickHouse against measured volume thresholds.
PHI data: case started events from PMS, treatment level revenue, insurance claims. Triggers HIPAA review, BAAs, and a dedicated marketing_phi schema with stricter grants.
Asset registry is a low latency request response API powering the content engine UI. Ingestion is long running, bursty, OAuth bearing, and rate limited by upstream APIs. Mixing them violates single responsibility, lets a runaway ingest loop choke the API, and bleeds OAuth secrets into the wrong service. The duplication cost is small because both services share packages/shared-types for Zod schemas and Drizzle tables.
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| (a) Endpoints on asset-registry | Zero new infra, reuse Drizzle | Mixes concerns, OAuth scope bleed, scaling decisions coupled | Reject |
| (b) Worker added inside asset-registry repo | Code colocated | Same deploy unit, same memory pressure, restart of API kills mid ingest | Reject |
(c) Standalone sga-marketing-ingest service | Independent deploy and scale, blast radius isolated, OAuth contained, clean Temporal seam | One more Railway service | Accept |
| (d) Railway native Cron Jobs | Zero infra | Ephemeral containers, weak observability, no in flight state | Reject |
Add a /api/v1/marketing/* route group to the existing asset-registry service. Same auth (Bearer demo password now, Cognito later), same Drizzle, separate route module. The React app does not hit Postgres directly. Phase 1 routes:
| Method | Path | Reads From | Purpose |
|---|---|---|---|
| GET | /api/v1/marketing/portfolio/summary?range=30d | mv_portfolio_kpi_daily | Portfolio Overview top tiles |
| GET | /api/v1/marketing/practices/:id/summary?range=30d | mv_practice_spend_30d | Practice Detail tiles |
| GET | /api/v1/marketing/practices/:id/funnel?range=30d | mv_practice_funnel_30d | Funnel waterfall |
| GET | /api/v1/marketing/practices/:id/campaigns?range=30d | mv_campaign_performance_30d | Campaign table |
| GET | /api/v1/marketing/insights?severity=high&range=14d | marketing.insights_log | Insights and Anomalies feed |
| GET | /api/v1/marketing/etl/status | marketing.sync_runs | Stale data banner, last refreshed timestamps |
All response shapes live in packages/shared-types/src/marketing.ts as Zod schemas. The React app imports the same types. Fastify response cache holds summary endpoints for 5 minutes since materialized views refresh on the order of hours.
All marketing objects live in a single marketing schema on the existing Railway Postgres, alongside shared (practices). The design separates four concerns: operational sync state, raw API ingest, conformed dimensions and facts, and materialized views for the dashboard.
| Group | Prefix | Purpose | Retention |
|---|---|---|---|
| Operational | google_ads_accounts, ghl_locations, sync_runs, insights_log | Account links, sync state, error log, generated insights | Indefinite |
| Raw | raw_google_ads_*, raw_ghl_* | Untransformed JSON dumps (replay source of truth) | 13 months hard cutoff |
| Dimensions | dim_date, dim_channel, dim_campaign, dim_practice (view) | Stable lookups for joins | Indefinite |
| Facts | google_ads_campaign_daily_metrics, fact_ghl_event, fact_attribution_touchpoint | Append mostly, time partitioned | Indefinite |
| Attribution | attribution_touchpoints, attribution_conversions, attribution_assignments | Many to many credit allocation across multiple models | Indefinite |
| Views | mv_* | Pre aggregated dashboard payloads | Refresh hourly or nightly |
| Table | Grain | Phase | Notes |
|---|---|---|---|
google_ads_accounts | customer_id | Exists | Practice to MCC mapping. Do not redesign. |
google_ads_campaigns | campaign_id | 1 | SCD type 1 (overwrite on change), keep updated_at |
google_ads_ad_groups | ad_group_id | 1 | SCD type 1 |
google_ads_campaign_daily_metrics | (date, customer_id, campaign_id) | 1 | Partitioned by month. Cost stored in micros to match API source. |
google_ads_ad_group_daily_metrics | (date, ad_group_id) | 1.5 | Defer until per ad group ROI is requested |
google_ads_keywords | criterion_id | 1.5 | 10x row count, low Phase 1 dashboard value |
marketing.google_ads_campaign_daily_metrics (partitioned)CREATE TABLE marketing.google_ads_campaign_daily_metrics (
metric_date DATE NOT NULL,
customer_id BIGINT NOT NULL,
campaign_id BIGINT NOT NULL,
practice_id UUID NOT NULL,
impressions BIGINT NOT NULL DEFAULT 0,
clicks BIGINT NOT NULL DEFAULT 0,
cost_micros BIGINT NOT NULL DEFAULT 0,
conversions NUMERIC(12,4) NOT NULL DEFAULT 0,
conversion_value NUMERIC(14,4) NOT NULL DEFAULT 0,
view_through_conv NUMERIC(12,4) NOT NULL DEFAULT 0,
search_impression_share NUMERIC(6,4),
avg_cpc_micros BIGINT,
ctr NUMERIC(8,6),
ingested_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (metric_date, campaign_id),
FOREIGN KEY (campaign_id) REFERENCES marketing.google_ads_campaigns(campaign_id),
FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
) PARTITION BY RANGE (metric_date);
CREATE INDEX idx_gads_metrics_practice_date
ON marketing.google_ads_campaign_daily_metrics (practice_id, metric_date DESC);
CREATE INDEX idx_gads_metrics_campaign_date
ON marketing.google_ads_campaign_daily_metrics (campaign_id, metric_date DESC);
| Table | Grain | Notes |
|---|---|---|
ghl_locations | one per GHL sub-account | One sub-account per practice. Joined to shared.practices. |
ghl_pipelines | pipeline_id | Per location, SGA standardizes names but allows variance. |
ghl_pipeline_stages | stage_id | Ordered by position. |
ghl_contacts | contact_id | Carries first touch attribution: gclid, fbclid, utm_*, source. |
ghl_opportunities | opportunity_id | Linked to contact, pipeline, stage. Holds monetary_value. |
ghl_appointments | appointment_id | Status: scheduled, confirmed, showed, no_show, cancelled, rescheduled. |
ghl_calls | call_id | Twilio routed inbound calls if location is on call tracking. |
ghl_workflow_events | event row | Entry, exit, goal-met events from automation workflows. |
ghl_messages | message_id | Phase 2 SMS / email body. Defer for PHI review. |
ghl_webhook_events | idempotency_key | Durable queue of received webhooks (Kafka stub). |
marketing.ghl_contacts (attribution carrier)CREATE TABLE marketing.ghl_contacts (
contact_id TEXT NOT NULL,
location_id TEXT NOT NULL,
practice_id UUID NOT NULL,
first_name TEXT,
last_name TEXT,
email_hash BYTEA, -- sha256(lower(trim(email)))
phone_hash BYTEA, -- sha256(e164(phone))
email_encrypted BYTEA, -- pgcrypto, key in app layer
phone_encrypted BYTEA,
source TEXT, -- 'Google Ads', 'Facebook', 'Direct'
utm_source TEXT,
utm_medium TEXT,
utm_campaign TEXT,
utm_content TEXT,
utm_term TEXT,
gclid TEXT,
fbclid TEXT,
wbraid TEXT,
gbraid TEXT,
tags TEXT[] NOT NULL DEFAULT '{}',
date_added TIMESTAMPTZ NOT NULL,
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_activity_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
raw_payload JSONB,
PRIMARY KEY (contact_id),
FOREIGN KEY (location_id) REFERENCES marketing.ghl_locations(location_id),
FOREIGN KEY (practice_id) REFERENCES shared.practices(id)
);
CREATE INDEX idx_ghl_contacts_practice_added ON marketing.ghl_contacts (practice_id, date_added DESC);
CREATE INDEX idx_ghl_contacts_gclid ON marketing.ghl_contacts (gclid) WHERE gclid IS NOT NULL;
CREATE INDEX idx_ghl_contacts_email_hash ON marketing.ghl_contacts (email_hash) WHERE email_hash IS NOT NULL;
CREATE INDEX idx_ghl_contacts_phone_hash ON marketing.ghl_contacts (phone_hash) WHERE phone_hash IS NOT NULL;
The chain is Google Click (GCLID) to GHL Contact to Opportunity to Appointment to Started Treatment to Revenue.
| Layer | Primary Key | Fallback | Reliability |
|---|---|---|---|
| Click to Contact | GCLID captured at form fill | Phone (E.164 normalized) for call tracked leads | High when landing pages capture GCLID, medium otherwise |
| Contact to Ad Spend | GCLID joined to campaign via Google Ads click view | UTM campaign text match | High |
| Cross system (Phase 2: Meta, GA4) | phone_hash, email_hash | First party cookie or device id | Medium |
gclid from URL params and forward to GHL as a contact custom field. Without this, full funnel attribution is structurally impossible regardless of warehouse design. Audit and patch in week 1, block Phase 2 attribution work until verified.Three tables let multiple attribution algorithms run over the same touchpoint history without rewriting facts.
contact_created, opportunity_created, opportunity_won, appointment_booked, appointment_showed, case_started.model_name and credit_fraction (sums to 1.0 per conversion per model).This shape supports first touch, last touch, linear, time decay, position based, and (Phase 2) data driven without reshaping data.
| MV | Grain | Refresh | Powers |
|---|---|---|---|
mv_portfolio_kpi_daily | (metric_date) | Hourly, CONCURRENTLY | Portfolio Overview tiles |
mv_practice_spend_30d | (practice_id) | Hourly, CONCURRENTLY | Practice Detail spend tile, MoM delta |
mv_practice_funnel_30d | (practice_id) | Hourly, CONCURRENTLY | Click to contact to appt to showed funnel |
mv_campaign_performance_30d | (campaign_id) | Hourly, CONCURRENTLY | Campaign leaderboard |
mv_channel_roi_90d | (practice_id, channel) | Nightly, CONCURRENTLY | Channel mix chart |
mv_attribution_assist_path | (practice_id, model_name, path_signature) | Nightly | Multi touch path explorer (Phase 1.5) |
Every MV requires a UNIQUE index on its grain key so REFRESH MATERIALIZED VIEW CONCURRENTLY works. Without this, refresh blocks dashboard reads.
| Concern | Approach |
|---|---|
| Partitioning | Daily metrics and event facts partitioned by RANGE on date column, monthly cadence. Use pg_partman or a scheduled job to pre create next 3 months and detach beyond retention. |
| Retention (raw) | 13 months hard cutoff via partition drop. Raw JSON may contain PII. Restrict SELECT to ETL role only. |
| Retention (mart, MV) | Indefinite. Practice level only, no raw PII. |
| PII handling | Email and phone stored as BYTEA sha256 hashes for joins, plus pgcrypto encrypted columns for display. Encryption key held in app layer, not Postgres. |
| PHI | Hard rule: no diagnoses, treatment plans, clinical notes, or insurance data in this schema. Phase B introduces marketing_phi with its own grants. |
| Order | Migration | Contents |
|---|---|---|
| 0NN | marketing_schema_init | CREATE SCHEMA marketing, grants, dim_date, dim_channel, sync_runs, insights_log |
| 0NN+1 | google_ads_dimensions | google_ads_campaigns, google_ads_ad_groups |
| 0NN+2 | google_ads_facts | Partitioned google_ads_campaign_daily_metrics plus first 14 monthly partitions |
| 0NN+3 | ghl_locations_pipelines | ghl_locations, ghl_pipelines, ghl_pipeline_stages |
| 0NN+4 | ghl_contacts_opps | ghl_contacts, ghl_opportunities |
| 0NN+5 | ghl_appointments_calls | ghl_appointments, ghl_calls, ghl_workflow_events, ghl_webhook_events |
| 0NN+6 | attribution_core | attribution_touchpoints, attribution_conversions, attribution_assignments, contact_merges |
| 0NN+7 | attribution_functions | merge_contacts(), resolve_click_to_contact() |
| 0NN+8 | marketing_views | dim_practice view, dim_campaign view |
| 0NN+9 | marketing_mvs | All 6 materialized views plus unique indexes |
| 0NN+10 | raw_staging_tables | Partitioned raw_google_ads_*, raw_ghl_* |
| 0NN+11 | partition_maintenance | pg_partman config or pl/pgsql monthly rotation job |
Stale threshold is the lag past which the dashboard surfaces a yellow data warning banner. If a source crosses its threshold, the page header shows the affected source with the last successful refresh timestamp.
| KPI | Formula | Source | Grain | Stale |
|---|---|---|---|---|
| Total Spend | Sum of cost across all in scope Google Ads accounts | Google Ads | Daily | 36 hr |
| Spend by Channel | Cost grouped by channel (Search, PMax, Display) | Google Ads | Daily | 36 hr |
| Spend by Practice | Cost grouped by customer_id, mapped to practice_id | Google Ads + practice map | Daily | 36 hr |
| Daily Spend Trend | Spend per calendar day, rolling 90 days | Google Ads | Daily | 36 hr |
| Pacing vs Budget | (MTD spend / days elapsed) x days in month, vs monthly budget | Google Ads + budget table | Daily | 36 hr |
| KPI | Formula | Source | Grain | Stale |
|---|---|---|---|---|
| Impressions | Count of ad impressions served | Google Ads | Daily | 36 hr |
| Clicks | Count of ad clicks | Google Ads | Daily | 36 hr |
| CTR | Clicks / Impressions | Derived | Daily | 36 hr |
| CPC | Spend / Clicks | Derived | Daily | 36 hr |
| Platform Conversions | Google Ads reported conversions (form fills, calls) | Google Ads | Daily | 36 hr |
| KPI | Formula | Source | Grain | Stale |
|---|---|---|---|---|
| Leads | New contacts created in GHL with marketing source tag | GHL Webhook + REST | Hourly | 2 hr |
| Contacts | Total contacts created (lead and non lead) | GHL | Hourly | 2 hr |
| Opportunities | GHL opportunities at "New Patient" stage | GHL | Hourly | 2 hr |
| Appointments Booked | GHL opportunity at "Booked" stage or appointment created | GHL | Hourly | 4 hr |
| Appointments Showed | GHL appointment status = "Showed" | GHL | Hourly | 4 hr |
| Started Treatments | GHL opportunity at "Treatment Started" stage | GHL custom field | Daily | 24 hr |
| Revenue (Booked) | Sum of opportunity monetary value at "Treatment Started" | GHL custom field | Daily | 24 hr |
| KPI | Formula | Source | Grain |
|---|---|---|---|
| CPL | Spend / Leads | Derived | Daily |
| Cost per Booked Appt | Spend / Appointments Booked | Derived | Daily |
| Cost per Showed Appt | Spend / Appointments Showed | Derived | Daily |
| Cost per Started Case | Spend / Started Treatments | Derived | Daily |
| ROAS | Revenue (Booked) / Spend | Derived | Daily |
| Payback Period (days) | Median days from first ad click to first revenue event, per practice | Derived | Weekly |
| KPI | Formula | Source | Grain |
|---|---|---|---|
| First Touch Conversions | Conversions credited to first touch in 90 day window | Derived | Daily |
| Last Touch Conversions | Conversions credited to most recent touch before conversion | Derived | Daily |
| Linear Attribution Share | Equal credit split across all touches in path | Derived | Daily |
| Assisted Conversions | Count of conversions where channel appeared in path but not as last touch | Derived | Daily |
| Channel Overlap Rate | Percent of converters touched by 2 or more channels | Derived | Weekly |
| View | Primary User | Primary Question | Key Tiles / Charts | Default Range |
|---|---|---|---|---|
| Portfolio Overview (default landing) | Dakota (growth lead) | Which practices need attention this week? | Total spend, total leads, blended CPL, total started cases, blended ROAS, all with delta vs prior period. 90 day spend and lead trend (dual axis), channel mix donut, top 5 and bottom 5 practices by CPL. | Last 30 days |
| Practice Detail | Dakota, regional ops | What is happening at this practice and where is the funnel breaking? | Practice tiles with portfolio median benchmarks. Funnel waterfall (Impression to Started Case). Channel breakdown table. Practice scoped anomaly feed. | Last 30 days |
| Channel View | Marketing strategist, agency QA | How is Google Ads performing and which campaigns drive results? | Channel spend, leads, CPL, ROAS. Sortable campaign table with drilldown to ad group. Cost vs conversion scatter (one dot per campaign). | Last 30 days |
| Funnel View | Marketing strategist, regional ops | Where in the patient journey are we losing the most value? | Click to lead %, lead to booked %, booked to showed %, showed to started %. Full funnel waterfall with stage drop off table by practice. Attribution model toggle. | Last 30 days |
| Insights and Anomalies | Dakota, regional ops | What changed and what should I do about it? | Reverse chronological feed of flagged events with practice, metric, severity, suggested action, and link to Practice Detail. | Last 14 days |
| Filter | Options | Default |
|---|---|---|
| Time range | Today, Yesterday, Last 7 / 14 / 30 / 90 days, MTD, QTD, YTD, Custom | Last 30 days |
| Compare to | Previous period, Same period last year, None | Previous period |
| Practice | Single, multi, by region, by ownership tier (SGA / LookSee / Gen4), all active | All active |
| Channel | Google Search, Google PMax, Google Display, all | All |
| Attribution model | Last touch, Linear (Phase 1). First touch, Time decay, Position based (Phase 1.5). Data driven (Phase 2) | Last touch |
An insight is a system generated, plain English statement about a measurable change or comparison that warrants action. Insights are not raw charts. They appear as cards in the Insights feed and as inline badges on Practice Detail tiles, with acknowledge and snooze (7 day) actions to keep the feed actionable.
| Trigger | Threshold | Severity | Suggested action |
|---|---|---|---|
| Daily spend spike | Day spend exceeds trailing 14 day mean by 50%+ | High | Check campaign budget changes, possible runaway PMax |
| Daily spend drop | Day spend below trailing 14 day mean by 40%+ | High | Check billing status, account suspension, accidental pause |
| CPL spike | 7 day rolling CPL exceeds trailing 28 day rolling CPL by 60%+ | Medium | Audit creative, geo targeting, landing page |
| Lead drop | 7 day lead count below trailing 28 day median by 50%+ | High | Front desk follow up issue or platform issue |
| Conversion rate drop | Click to lead % below practice 28 day baseline by 30%+ | Medium | Landing page audit, form audit, page speed |
| Stale data | Source has not refreshed past stale threshold | High | Banner on every page, last refreshed timestamp shown |
| Trigger combination | Recommendation |
|---|---|
| High CPL + low CTR | "Creative refresh recommended, ad CTR is below 1.5%." |
| Healthy CTR + high CPL + low click to lead | "Landing page conversion issue. Audit form and page speed." |
| Strong leads + low booked % | "Front desk follow up issue, leads are not converting to appointments. Review GHL nurture sequence." |
| Healthy booked + low showed % | "Reminder cadence issue. Audit appointment confirmation SMS frequency and timing." |
| Model | Phase | Rationale |
|---|---|---|
| Last touch | 1 (default) | Matches what Google Ads reports natively. Lowest friction for cross checking against agency reports. |
| Linear | 1 | Equal credit across all touches in the path. Simple to explain, immediately reveals the multi touch reality of dental patient journeys. |
| First touch | 1.5 | Useful for top of funnel brand spend evaluation. |
| Time decay (7 day half life) | 1.5 | Best single number proxy for short sales cycles, which dental tends to be. |
| Position based (40, 20, 40) | 1.5 | Compromise between first and last touch. |
| Data driven | 2 (portfolio level only) | Requires roughly 300 conversions and 3,000 ad interactions per 30 days per account. Most practices will not qualify individually. Run at portfolio level only. |
The attribution model selector is a single dropdown in the global filter bar. Changing it recomputes attribution dependent KPIs (assisted conversions, channel share, ROAS by channel) in place. The selected model is shown as a persistent badge on every attribution dependent tile.
| Source | Mechanism | Cadence | Rationale |
|---|---|---|---|
| Google Ads campaign + ad group day metrics | node-cron scheduled pull | Daily 04:00 ET (09:00 UTC), pulling T-2 through T-1 | Google Ads stats finalize ~T+1, conversion adjustments arrive up to T+3, so re pull a 3 day window to absorb late conversions |
| Google Ads keyword and search term reports | node-cron pull | Daily 05:00 ET, T-1 only | Higher volume, less revision after T+1. Phase 1.5. |
| Google Ads account structure (campaigns, ad groups) | node-cron pull | Hourly | Cheap, surfaces new campaigns quickly in dashboard |
| GHL contacts, opportunities, appointments | Webhook (primary) + reconciliation pull (safety net) | Webhook real time, reconciliation every hour | Webhooks are eventually consistent and occasionally lost, reconciliation pull catches drift |
| GHL pipeline and stage definitions | node-cron pull | Daily 03:00 ET | Rarely changes |
| Materialized views (slow) | SQL job | Daily 04:30 ET, post Google Ads pull, CONCURRENTLY | Channel ROI 90d, attribution paths |
| Materialized views (fast) | SQL job | Hourly, CONCURRENTLY | Practice spend 30d, funnel 30d, campaign performance 30d |
Principle: webhooks update the warehouse within seconds, scheduled pulls are the safety net. Both write to the same raw_ghl_* tables. The dedupe step in stg_* resolves conflicts using updated_at from the source.
Every dashboard page shows three timestamps in the footer: "Google Ads last refreshed," "GHL last refreshed," "Derived metrics last computed." If any source is past its stale threshold, a yellow banner appears at the top of the page with the affected source named.
Deploy sga-marketing-ingest as a new Railway service in the same project as the existing Postgres service. Same private network (no egress cost on the Postgres connection), separate environment, isolated blast radius.
// scheduler.ts -- the only file that changes when Temporal arrives
import cron from 'node-cron';
import { runGoogleAdsDailyPull } from './jobs/google-ads-daily';
import { runGhlReconciliation } from './jobs/ghl-reconciliation';
import { refreshMvSlow, refreshMvFast } from './jobs/mv-refresh';
export function startScheduler() {
cron.schedule('0 9 * * *', () => runGoogleAdsDailyPull({ window: 'T-2..T-1' }));
cron.schedule('0 * * * *', () => runGhlReconciliation());
cron.schedule('30 9 * * *', () => refreshMvSlow());
cron.schedule('0 * * * *', () => refreshMvFast());
}
Every job body is a plain async function with no cron dependency. node-cron calls it today, BullMQ calls it tomorrow, Temporal Activity calls it the day after. The migration cost is one file.
# --- Google Ads ----------------------------------
GOOGLE_ADS_DEVELOPER_TOKEN=
GOOGLE_ADS_CLIENT_ID=
GOOGLE_ADS_CLIENT_SECRET=
GOOGLE_ADS_REFRESH_TOKEN_GEN4=
GOOGLE_ADS_REFRESH_TOKEN_SGA= # add when SGA MCC exists
GOOGLE_ADS_MCC_CUSTOMER_ID_GEN4=
GOOGLE_ADS_MCC_CUSTOMER_ID_SGA=
# --- GoHighLevel ---------------------------------
GHL_AGENCY_API_TOKEN= # single agency token, not per location
GHL_WEBHOOK_SIGNING_SECRET=
# --- Database -----------------------------------
DATABASE_URL= # Railway internal
# --- Observability ------------------------------
SENTRY_DSN=
BETTERSTACK_SOURCE_TOKEN=
# --- Runtime ------------------------------------
NODE_ENV=production
TZ=UTC
LOG_LEVEL=info
Doppler is recommended on top of Railway env vars: single source of truth, audit log, per environment promotion. Do not introduce AWS Secrets Manager in Phase 1, that is a Phase 2 migration.
invalid_grant, write status = 'token_error' to sync_runs, fire Sentry fatal alert, skip remaining practices for that MCC. Do not retry in same run.locationId query param per request. Do not store 260 location keys. v2 rate limit is 100 req per 10 sec per location, plenty of headroom.POST /webhooks/ghl on the worker, exposed via a Railway public URL (separate hostname from api.sga3p.com). HMAC verification using x-ghl-signature. Synchronous insert into marketing.ghl_webhook_events (the durable queue). Downstream processor reads from that table on the hourly reconciliation job and marks rows processed = true. This gives replay capability without any new infrastructure.
/webhooks/ghl/health and /healthz| SLO | Target |
|---|---|
| Dashboard p95 latency | < 1.5 seconds |
| Data freshness for Google Ads | < 26 hours from event to dashboard |
| Data freshness for GHL via webhook | < 5 minutes from event to dashboard |
POST /internal/backfill on the worker (static internal API key, not exposed). Body: { practiceId, source, startDate, endDate }. Writes one row per date-source combination to backfill_queue. A backfill runner processes oldest first, respects rate limits, marks rows done with row count. Idempotent on (practice_id, source, data_date). 90 days x 60 practices x 200ms delay ≈ 18 minutes wall time.
| Item | Monthly |
|---|---|
sga-marketing-ingest on Railway (512MB, Starter) | $5 to $10 |
| Postgres storage growth (~500MB year 1) | $0 to $5 |
| Doppler Team plan (free tier) | $0 |
| Sentry Developer plan (free tier) | $0 |
| Better Stack (free tier) | $0 |
| Total Phase 1 | $5 to $15 |
Walking skeleton first. Prove the full path end to end with one practice, one day, one number on screen, then replicate breadth and depth from there.
sga-marketing-ingest Railway service. Wire DATABASE_URL to internal Postgres.raw_googleads_campaign_daily, transform to marketing.google_ads_campaign_daily_metrics./api/v1/marketing/practices/:id/summary route on asset-registry./marketing route in app.sga3p.com.gclid from URL params and forwards to GHL contact custom field. Block Phase 2 attribution work until verified.token_error alerting.mv_portfolio_kpi_daily and mv_practice_spend_30d. Wire to summary endpoints.POST /webhooks/ghl with HMAC verification and durable queue table.mv_practice_funnel_30d and mv_campaign_performance_30d.attribution_touchpoints from GHL contacts and Google Ads click view.insights_log.| Item | Priority | Rationale |
|---|---|---|
| Portfolio Overview view | Must | Core reason the project exists. |
| Practice Detail view | Must | Required to act on portfolio level signals. |
| Channel View (Google Ads) | Must | Phase 1 channel scope, needed for QA. |
| Funnel View | Must | Full funnel attribution is the user's stated ultimate goal. |
| Google Ads ingest (campaign daily) | Must | Phase 1 source. |
| GHL ingest (webhook + REST reconciliation) | Must | Phase 1 source. |
| Spend, CPL, ROAS tiles | Must | Tier 1 metrics for every persona. |
| Last touch + Linear attribution | Must | Default model + multi touch reality. |
| Anomaly detection (5 rules) | Must | "Insights across each channel" requirement. |
| Filters: time, practice, channel | Must | Cannot use the dashboard without these. |
| Stale data banner + last refreshed timestamps | Must | Trust requirement. |
| Insights feed page | Should | High value, but tile badges cover urgent path. |
| Comparative insights vs portfolio median | Should | Strong differentiator but not blocking. |
| Pacing vs budget | Should | Currently tracked manually. |
| Recommendation rule engine | Should | Templates can ship in 1.1. |
| First touch + time decay attribution | Could | Defer to Phase 1.5. |
| Region rollups | Could | Useful but small audience initially. |
| Ownership tier filter (SGA / LookSee / Gen4) | Could | Practice multi select covers the use case in Phase 1. |
| Practice manager login + per practice RBAC | Won't | Phase 2. |
| Meta Ads, GA4, GBP, Yelp | Won't | Phase 2. |
| Predictive forecasting + budget optimization | Won't | Phase 3. |
| Write back to Google Ads | Won't | Read only, regulatory and permissions risk. |
| PHI / clinical data | Won't | Phase B with BAAs. |
| Data driven attribution | Won't | Insufficient per practice volume, defer to Phase 2 portfolio. |
| Agency comparison scorecards | Won't | Defer until data trust is established. |
Feature: Portfolio Overview load performance and content
Scenario: Growth lead opens Portfolio Overview on Monday morning
Given I am authenticated as the growth lead
And there are 50 or more practices with active spend in the last 30 days
When I navigate to Portfolio Overview
Then I see total spend, total leads, blended CPL, total started cases, and blended ROAS within 2 seconds
And each tile shows delta versus the previous 30 day period
Feature: Funnel waterfall on Practice Detail
Scenario: Diagnosing where a practice is losing patients
Given I have selected a single practice on Practice Detail
When the funnel waterfall renders
Then I see counts and conversion percentages at each stage from Impression through Started Case
And the largest drop off stage is visually highlighted
Feature: Attribution model toggle
Scenario: Switching from last touch to linear attribution
Given I am on Funnel View with attribution set to Last Touch
When I change the attribution selector to Linear
Then ROAS, channel share, and assisted conversion tiles recompute within 3 seconds
And every attribution dependent tile shows a "Linear" badge
Feature: Anomaly detection surfaces a spend spike
Scenario: A practice spends 60 percent above trailing mean
Given Practice X has a trailing 14 day mean daily spend of $100
And Practice X spent $180 yesterday
When the nightly insights job runs at 05:00 ET
Then a High severity insight card appears in the Insights feed
And the card states the practice name, the metric, the percentage change, and a suggested action
Feature: GHL webhook freshness
Scenario: A new lead is created in GHL
Given a contact is created in GHL with a marketing source tag at 14:00 ET
When I refresh Portfolio Overview at 14:05 ET
Then the lead count tile reflects the new lead
And the GHL last refreshed timestamp is within 5 minutes of now
Feature: Google Ads stale data warning
Scenario: Google Ads ingest fails overnight
Given the Google Ads nightly refresh failed at 04:00 ET
And it is now 14:00 ET
When I open any dashboard page
Then a yellow banner appears at the top of the page
And the banner states "Google Ads data is stale, last refreshed [timestamp]"
Feature: Independent verification of agency numbers
Scenario: QA reviewer cross checks LookSee monthly report
Given LookSee reported a CPL of $42 for Practice Y in April
When I open Practice Detail for Practice Y with time range April 1 to April 30
Then the dashboard CPL is displayed and is within 2 percent of the agency reported figure
And if the variance exceeds 5 percent, the tile shows a variance flag
Feature: Comparative insight against portfolio median
Scenario: A practice is materially above portfolio CPL
Given Practice X has a 30 day CPL of $84
And the portfolio median 30 day CPL is $35
When I open Practice Detail for Practice X
Then a comparative insight card states "CPL is 2.4x portfolio median"
And a recommendation suggests creative or targeting investigation
mart_* straightforward.sga-marketing-ingest service rather than endpoints on asset-registrypackages/shared-types for Zod and Drizzle.attribution_touchpoints and assignments from completed staging data. Three table model supports multiple attribution algorithms over the same touchpoint history./api/v1/marketing/* route group to asset-registry. Same auth, same Drizzle, separate route module. Zod schemas in packages/shared-types.| Rank | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| 1 | GCLID not captured on landing pages, so attribution is structurally impossible | High | Critical | Audit and patch every active landing page in week 1. Block Phase 2 attribution work until verified. Innovative Dental and Gen4 pages are highest priority. |
| 2 | OAuth refresh token revokes silently, sync fails for days unnoticed | Medium | High | token_error status triggers Sentry fatal alert. Dashboard shows last successful sync date per practice. Healthcheck pings Ads API daily. |
| 3 | MCC consolidation incomplete, two separate credential sets to manage | High (current state) | Medium | Design for multiple MCCs from day one (GOOGLE_ADS_REFRESH_TOKEN_GEN4 / _SGA pattern). Do not assume single MCC. |
| 4 | Postgres volume grows faster than projected, materialized view refresh becomes the bottleneck | Low | High | Threshold based monitoring. ClickHouse migration plan documented. REFRESH MATERIALIZED VIEW CONCURRENTLY required, unique index on every MV. |
| 5 | GHL webhooks unreliable (drops, missed events, signature drift) | Medium | Medium | Hourly reconciliation pull is the safety net. Drift alerts. Durable queue table preserves replay capability. |
| 6 | Ad account access revoked by agency mid sync | Low | Medium | Log access_revoked in sync_runs, not as hard failure. Sentry alert, do not halt rest of run. Dual link strategy reduces blast radius. |
| # | Action | Owner | Effort | Blocks |
|---|---|---|---|---|
| 1 | Confirm or course correct stack divergence (section 02) | Dakota | S (15 min) | Everything |
| 2 | Apply for Google Ads Standard Access developer token | Dakota | S (form, 24 to 72 hr review) | Production sync |
| 3 | Audit landing page GCLID capture (Innovative Dental, Gen4) | Dakota or web team | M (1 to 3 days, per page) | Phase 2 attribution |
| 4 | Confirm GHL Agency API token availability and webhook signing secret | Dakota | S (1 day) | GHL ingest |
| 5 | Provision sga-marketing-ingest Railway service in Content Engine project | Implementation | S (1 hr) | Walking skeleton |
| 6 | Provision Doppler project, populate Phase 1 secrets | Implementation | S (2 hr) | OAuth flows |
| 7 | Identify the 50 to 60 paid media active practices for Phase 1 scope | Dakota | S (existing list refresh) | Backfill prioritization |
| 8 | Decide MCC consolidation strategy (Gen4 only Phase 1 vs SGA MCC by Week 4) | Dakota | S (1 day) | Token strategy |
| 9 | Kick off week 1 walking skeleton sprint | Implementation | L (5 days) | Subsequent weeks |