Skip to main content

/create-new-spine

Scaffold a new provider spine DAG under dags/data_science/spines/, following the labs DAG structure.

Usage

/create-new-spine <snake-case-name>

Provide the snake_case name for the new spine (e.g. infusion_centers). The skill will ask you to confirm the display name, provider type string, taxonomy filters, and national providers before generating any files.

What You Need to Provide

  1. Display name — title case used in docstrings (e.g. Infusion Centers)
  2. Provider type string — the provider_type value in the spine table (e.g. Infusion Center)
  3. Taxonomy filters — one or more rows of taxonomy_grouping, taxonomy_classification (optional), taxonomy_specialization (optional)
  4. National providers — known large national chains to canonicalize, as (NPPES name prefix, canonical display name) pairs. Empty dict if none.

What It Does

  1. Looks up taxonomy codes via Trino — queries the NUCC taxonomy reference table using your filters, shows the results, and asks you to confirm which codes to include.

  2. Creates the full DAG directory — generates all files under dags/data_science/spines/<name>/ by copying the labs DAG and substituting all labs-specific strings:

    • params.pyTAXONOMY_SPECS list with one entry per confirmed taxonomy code
    • national_providers.pyNATIONAL_PROVIDERS dict
    • config/table_config.yaml — shared reference table paths (identical across all spines)
    • config/__init__.py — table config loader
    • 6 SQL files under sql/ — all table name references updated
    • 3 task files under tasks/ — all import paths, function names, and table names updated
    • tasks/__init__.py — module imports
    • __init__.py — DAG definition with updated DAG ID, description, and task wiring

Files Created

dags/data_science/spines/<name>/
├── __init__.py
├── national_providers.py
├── params.py
├── config/
│ ├── __init__.py
│ └── table_config.yaml
├── tasks/
│ ├── __init__.py
│ ├── cleanup.py
│ ├── data_preparation.py
│ └── spines.py
└── sql/
├── cleanup.sql
├── nppes.sql
├── nppes_additional_npis.sql
├── tmp_nppes_base.sql
├── tmp_nppes_primary.sql
└── tmp_nppes_taxonomy.sql

Post-Scaffold Steps

  • Update dev_schema in __init__.py to your personal dev schema before running
  • After a successful dev run, set target_schema = prod to copy to tq_dev.spines
  • Production tables will be tq_dev.spines.spines_provider_<name> and tq_dev.spines.spines_provider_<name>_additional_npis

Reference

Create New Spine: Reference

Naming Conventions

Given <name> (snake_case, e.g. infusion_centers):

ElementPatternExample
DAG directorydags/data_science/spines/<name>/spines/infusion_centers/
DAG IDspines_<name>spines_infusion_centers
Python import pathdata_science.spines.<name>data_science.spines.infusion_centers
Tmp taxonomy tabletmp_nppes_taxonomy_<name>_<sub_version>tmp_nppes_taxonomy_infusion_centers_2025_08
Tmp base tabletmp_nppes_base_<name>_<sub_version>tmp_nppes_base_infusion_centers_2025_08
Tmp primary tabletmp_nppes_primary_<name>_<sub_version>tmp_nppes_primary_infusion_centers_2025_08
Primary spine table<name>_nppes_<sub_version>infusion_centers_nppes_2025_08
Additional NPIs table<name>_nppes_additional_npis_<sub_version>infusion_centers_nppes_additional_npis_2025_08
Prod spines tabletq_dev.spines.spines_provider_<name>tq_dev.spines.spines_provider_infusion_centers

SQL Substitutions

Summary of every labs-specific string that needs replacing in each SQL file:

SQL FileReplaceWith
tmp_nppes_taxonomy.sqltmp_nppes_taxonomy_labs_tmp_nppes_taxonomy_<name>_
tmp_nppes_base.sqltmp_nppes_taxonomy_labs_tmp_nppes_taxonomy_<name>_
tmp_nppes_base.sqltmp_nppes_base_labs_tmp_nppes_base_<name>_
tmp_nppes_base.sql'Laboratory' AS provider_type'<provider_type>' AS provider_type
tmp_nppes_primary.sqltmp_nppes_base_labs_tmp_nppes_base_<name>_
tmp_nppes_primary.sqltmp_nppes_primary_labs_tmp_nppes_primary_<name>_
nppes.sqllabs_nppes_<name>_nppes_
nppes.sqltmp_nppes_primary_labs_tmp_nppes_primary_<name>_
nppes_additional_npis.sqllabs_nppes_additional_npis_<name>_nppes_additional_npis_
nppes_additional_npis.sqltmp_nppes_primary_labs_tmp_nppes_primary_<name>_
cleanup.sql(no changes)

Pipeline Overview

start_dag
→ build_tmp_nppes_taxonomy # Select taxonomy codes from NPPES
→ build_tmp_nppes_base # Join to geo/taxonomy details, filter by taxonomy_specs + entity_type=2
→ build_tmp_nppes_primary # Rank NPIs, apply national provider canonicalization
→ [build_<name>_spine, # Primary NPIs (primary_rank = 1)
build_additional_npis] # Secondary NPIs (primary_rank != 1)
→ cleanup_intermediate_tables
→ [build_spines_provider_<name>, # (prod mode only)
build_spines_provider_<name>_additional_npis]
→ end_dag

Key Template Variables

VariableSourceUsed in
{{ schema }}dev_schema DAG paramAll files
{{ sub_version }}sub_version DAG paramAll files
{{ taxonomy_specs }}params.TAXONOMY_SPECStmp_nppes_base.sql
{{ national_providers }}national_providers.NATIONAL_PROVIDERStmp_nppes_primary.sql, nppes.sql, nppes_additional_npis.sql
{{ ref_nucc_taxonomy_code }}table_config.yamltmp_nppes_base.sql
{{ ref_cms_nppes_npi }}table_config.yamltmp_nppes_taxonomy.sql

params.py Format

TAXONOMY_SPECS = [
{'code': '291U00000X'},
{'code': '293D00000X'},
]

national_providers.py Format

NATIONAL_PROVIDERS = {
"NPPES PREFIX": "Canonical Name", # prefix matched with LIKE 'PREFIX%'
}

An empty dict is valid for provider types without known national chains.


Taxonomy Lookup Query

SELECT
taxonomy_code, taxonomy_grouping, taxonomy_classification, taxonomy_specialization
FROM tq_production.spines.spines_provider_nucc_taxonomy_codes
WHERE taxonomy_grouping IN ('...')
ORDER BY taxonomy_grouping, taxonomy_classification, taxonomy_specialization