/create-new-spine

Scaffold a new provider spine DAG under dags/data_science/spines/, following the labs DAG structure.

Usage

/create-new-spine <snake-case-name>

Provide the snake_case name for the new spine (e.g. infusion_centers). The skill will ask you to confirm the display name, provider type string, taxonomy filters, and national providers before generating any files.

What You Need to Provide

Display name — title case used in docstrings (e.g. Infusion Centers)
Provider type string — the provider_type value in the spine table (e.g. Infusion Center)
Taxonomy filters — one or more rows of taxonomy_grouping, taxonomy_classification (optional), taxonomy_specialization (optional)
National providers — known large national chains to canonicalize, as (NPPES name prefix, canonical display name) pairs. Empty dict if none.

What It Does

Looks up taxonomy codes via Trino — queries the NUCC taxonomy reference table using your filters, shows the results, and asks you to confirm which codes to include.
Creates the full DAG directory — generates all files under dags/data_science/spines/<name>/ by copying the labs DAG and substituting all labs-specific strings:
- params.py — TAXONOMY_SPECS list with one entry per confirmed taxonomy code
- national_providers.py — NATIONAL_PROVIDERS dict
- config/table_config.yaml — shared reference table paths (identical across all spines)
- config/__init__.py — table config loader
- 6 SQL files under sql/ — all table name references updated
- 3 task files under tasks/ — all import paths, function names, and table names updated
- tasks/__init__.py — module imports
- __init__.py — DAG definition with updated DAG ID, description, and task wiring

Files Created

dags/data_science/spines/<name>/
├── __init__.py
├── national_providers.py
├── params.py
├── config/
│   ├── __init__.py
│   └── table_config.yaml
├── tasks/
│   ├── __init__.py
│   ├── cleanup.py
│   ├── data_preparation.py
│   └── spines.py
└── sql/
    ├── cleanup.sql
    ├── nppes.sql
    ├── nppes_additional_npis.sql
    ├── tmp_nppes_base.sql
    ├── tmp_nppes_primary.sql
    └── tmp_nppes_taxonomy.sql

Post-Scaffold Steps

Update dev_schema in __init__.py to your personal dev schema before running
After a successful dev run, set target_schema = prod to copy to tq_dev.spines
Production tables will be tq_dev.spines.spines_provider_<name> and tq_dev.spines.spines_provider_<name>_additional_npis

Reference

Create New Spine: Reference

Naming Conventions

Given <name> (snake_case, e.g. infusion_centers):

Element	Pattern	Example
DAG directory	`dags/data_science/spines/<name>/`	`spines/infusion_centers/`
DAG ID	`spines_<name>`	`spines_infusion_centers`
Python import path	`data_science.spines.<name>`	`data_science.spines.infusion_centers`
Tmp taxonomy table	`tmp_nppes_taxonomy_<name>_<sub_version>`	`tmp_nppes_taxonomy_infusion_centers_2025_08`
Tmp base table	`tmp_nppes_base_<name>_<sub_version>`	`tmp_nppes_base_infusion_centers_2025_08`
Tmp primary table	`tmp_nppes_primary_<name>_<sub_version>`	`tmp_nppes_primary_infusion_centers_2025_08`
Primary spine table	`<name>_nppes_<sub_version>`	`infusion_centers_nppes_2025_08`
Additional NPIs table	`<name>_nppes_additional_npis_<sub_version>`	`infusion_centers_nppes_additional_npis_2025_08`
Prod spines table	`tq_dev.spines.spines_provider_<name>`	`tq_dev.spines.spines_provider_infusion_centers`

SQL Substitutions

Summary of every labs-specific string that needs replacing in each SQL file:

SQL File	Replace	With
`tmp_nppes_taxonomy.sql`	`tmp_nppes_taxonomy_labs_`	`tmp_nppes_taxonomy_<name>_`
`tmp_nppes_base.sql`	`tmp_nppes_taxonomy_labs_`	`tmp_nppes_taxonomy_<name>_`
`tmp_nppes_base.sql`	`tmp_nppes_base_labs_`	`tmp_nppes_base_<name>_`
`tmp_nppes_base.sql`	`'Laboratory' AS provider_type`	`'<provider_type>' AS provider_type`
`tmp_nppes_primary.sql`	`tmp_nppes_base_labs_`	`tmp_nppes_base_<name>_`
`tmp_nppes_primary.sql`	`tmp_nppes_primary_labs_`	`tmp_nppes_primary_<name>_`
`nppes.sql`	`labs_nppes_`	`<name>_nppes_`
`nppes.sql`	`tmp_nppes_primary_labs_`	`tmp_nppes_primary_<name>_`
`nppes_additional_npis.sql`	`labs_nppes_additional_npis_`	`<name>_nppes_additional_npis_`
`nppes_additional_npis.sql`	`tmp_nppes_primary_labs_`	`tmp_nppes_primary_<name>_`
`cleanup.sql`	(no changes)	—

Pipeline Overview

start_dag
    → build_tmp_nppes_taxonomy       # Select taxonomy codes from NPPES
    → build_tmp_nppes_base           # Join to geo/taxonomy details, filter by taxonomy_specs + entity_type=2
    → build_tmp_nppes_primary        # Rank NPIs, apply national provider canonicalization
    → [build_<name>_spine,           # Primary NPIs (primary_rank = 1)
       build_additional_npis]        # Secondary NPIs (primary_rank != 1)
    → cleanup_intermediate_tables
    → [build_spines_provider_<name>,              # (prod mode only)
       build_spines_provider_<name>_additional_npis]
    → end_dag

Key Template Variables

Variable	Source	Used in
`{{ schema }}`	`dev_schema` DAG param	All files
`{{ sub_version }}`	`sub_version` DAG param	All files
`{{ taxonomy_specs }}`	`params.TAXONOMY_SPECS`	`tmp_nppes_base.sql`
`{{ national_providers }}`	`national_providers.NATIONAL_PROVIDERS`	`tmp_nppes_primary.sql`, `nppes.sql`, `nppes_additional_npis.sql`
`{{ ref_nucc_taxonomy_code }}`	`table_config.yaml`	`tmp_nppes_base.sql`
`{{ ref_cms_nppes_npi }}`	`table_config.yaml`	`tmp_nppes_taxonomy.sql`

`params.py` Format

TAXONOMY_SPECS = [
    {'code': '291U00000X'},
    {'code': '293D00000X'},
]

`national_providers.py` Format

NATIONAL_PROVIDERS = {
    "NPPES PREFIX": "Canonical Name",  # prefix matched with LIKE 'PREFIX%'
}

An empty dict is valid for provider types without known national chains.

Taxonomy Lookup Query

SELECT
    taxonomy_code, taxonomy_grouping, taxonomy_classification, taxonomy_specialization
FROM tq_production.spines.spines_provider_nucc_taxonomy_codes
WHERE taxonomy_grouping IN ('...')
ORDER BY taxonomy_grouping, taxonomy_classification, taxonomy_specialization

Usage​

What You Need to Provide​

What It Does​

Files Created​

Post-Scaffold Steps​

Reference​

Create New Spine: Reference

Naming Conventions​

SQL Substitutions​

Pipeline Overview​

Key Template Variables​

params.py Format​

national_providers.py Format​

Taxonomy Lookup Query​

On this page: