/create-new-spine
Scaffold a new provider spine DAG under dags/data_science/spines/, following the labs DAG structure.
Usage
/create-new-spine <snake-case-name>
Provide the snake_case name for the new spine (e.g. infusion_centers). The skill will ask you to confirm the display name, provider type string, taxonomy filters, and national providers before generating any files.
What You Need to Provide
- Display name — title case used in docstrings (e.g.
Infusion Centers) - Provider type string — the
provider_typevalue in the spine table (e.g.Infusion Center) - Taxonomy filters — one or more rows of
taxonomy_grouping,taxonomy_classification(optional),taxonomy_specialization(optional) - National providers — known large national chains to canonicalize, as (NPPES name prefix, canonical display name) pairs. Empty dict if none.
What It Does
-
Looks up taxonomy codes via Trino — queries the NUCC taxonomy reference table using your filters, shows the results, and asks you to confirm which codes to include.
-
Creates the full DAG directory — generates all files under
dags/data_science/spines/<name>/by copying the labs DAG and substituting alllabs-specific strings:params.py—TAXONOMY_SPECSlist with one entry per confirmed taxonomy codenational_providers.py—NATIONAL_PROVIDERSdictconfig/table_config.yaml— shared reference table paths (identical across all spines)config/__init__.py— table config loader- 6 SQL files under
sql/— all table name references updated - 3 task files under
tasks/— all import paths, function names, and table names updated tasks/__init__.py— module imports__init__.py— DAG definition with updated DAG ID, description, and task wiring
Files Created
dags/data_science/spines/<name>/
├── __init__.py
├── national_providers.py
├── params.py
├── config/
│ ├── __init__.py
│ └── table_config.yaml
├── tasks/
│ ├── __init__.py
│ ├── cleanup.py
│ ├── data_preparation.py
│ └── spines.py
└── sql/
├── cleanup.sql
├── nppes.sql
├── nppes_additional_npis.sql
├── tmp_nppes_base.sql
├── tmp_nppes_primary.sql
└── tmp_nppes_taxonomy.sql
Post-Scaffold Steps
- Update
dev_schemain__init__.pyto your personal dev schema before running - After a successful dev run, set
target_schema = prodto copy totq_dev.spines - Production tables will be
tq_dev.spines.spines_provider_<name>andtq_dev.spines.spines_provider_<name>_additional_npis
Reference
Create New Spine: Reference
Naming Conventions
Given <name> (snake_case, e.g. infusion_centers):
| Element | Pattern | Example |
|---|---|---|
| DAG directory | dags/data_science/spines/<name>/ | spines/infusion_centers/ |
| DAG ID | spines_<name> | spines_infusion_centers |
| Python import path | data_science.spines.<name> | data_science.spines.infusion_centers |
| Tmp taxonomy table | tmp_nppes_taxonomy_<name>_<sub_version> | tmp_nppes_taxonomy_infusion_centers_2025_08 |
| Tmp base table | tmp_nppes_base_<name>_<sub_version> | tmp_nppes_base_infusion_centers_2025_08 |
| Tmp primary table | tmp_nppes_primary_<name>_<sub_version> | tmp_nppes_primary_infusion_centers_2025_08 |
| Primary spine table | <name>_nppes_<sub_version> | infusion_centers_nppes_2025_08 |
| Additional NPIs table | <name>_nppes_additional_npis_<sub_version> | infusion_centers_nppes_additional_npis_2025_08 |
| Prod spines table | tq_dev.spines.spines_provider_<name> | tq_dev.spines.spines_provider_infusion_centers |
SQL Substitutions
Summary of every labs-specific string that needs replacing in each SQL file:
| SQL File | Replace | With |
|---|---|---|
tmp_nppes_taxonomy.sql | tmp_nppes_taxonomy_labs_ | tmp_nppes_taxonomy_<name>_ |
tmp_nppes_base.sql | tmp_nppes_taxonomy_labs_ | tmp_nppes_taxonomy_<name>_ |
tmp_nppes_base.sql | tmp_nppes_base_labs_ | tmp_nppes_base_<name>_ |
tmp_nppes_base.sql | 'Laboratory' AS provider_type | '<provider_type>' AS provider_type |
tmp_nppes_primary.sql | tmp_nppes_base_labs_ | tmp_nppes_base_<name>_ |
tmp_nppes_primary.sql | tmp_nppes_primary_labs_ | tmp_nppes_primary_<name>_ |
nppes.sql | labs_nppes_ | <name>_nppes_ |
nppes.sql | tmp_nppes_primary_labs_ | tmp_nppes_primary_<name>_ |
nppes_additional_npis.sql | labs_nppes_additional_npis_ | <name>_nppes_additional_npis_ |
nppes_additional_npis.sql | tmp_nppes_primary_labs_ | tmp_nppes_primary_<name>_ |
cleanup.sql | (no changes) | — |
Pipeline Overview
start_dag
→ build_tmp_nppes_taxonomy # Select taxonomy codes from NPPES
→ build_tmp_nppes_base # Join to geo/taxonomy details, filter by taxonomy_specs + entity_type=2
→ build_tmp_nppes_primary # Rank NPIs, apply national provider canonicalization
→ [build_<name>_spine, # Primary NPIs (primary_rank = 1)
build_additional_npis] # Secondary NPIs (primary_rank != 1)
→ cleanup_intermediate_tables
→ [build_spines_provider_<name>, # (prod mode only)
build_spines_provider_<name>_additional_npis]
→ end_dag
Key Template Variables
| Variable | Source | Used in |
|---|---|---|
{{ schema }} | dev_schema DAG param | All files |
{{ sub_version }} | sub_version DAG param | All files |
{{ taxonomy_specs }} | params.TAXONOMY_SPECS | tmp_nppes_base.sql |
{{ national_providers }} | national_providers.NATIONAL_PROVIDERS | tmp_nppes_primary.sql, nppes.sql, nppes_additional_npis.sql |
{{ ref_nucc_taxonomy_code }} | table_config.yaml | tmp_nppes_base.sql |
{{ ref_cms_nppes_npi }} | table_config.yaml | tmp_nppes_taxonomy.sql |
params.py Format
TAXONOMY_SPECS = [
{'code': '291U00000X'},
{'code': '293D00000X'},
]
national_providers.py Format
NATIONAL_PROVIDERS = {
"NPPES PREFIX": "Canonical Name", # prefix matched with LIKE 'PREFIX%'
}
An empty dict is valid for provider types without known national chains.
Taxonomy Lookup Query
SELECT
taxonomy_code, taxonomy_grouping, taxonomy_classification, taxonomy_specialization
FROM tq_production.spines.spines_provider_nucc_taxonomy_codes
WHERE taxonomy_grouping IN ('...')
ORDER BY taxonomy_grouping, taxonomy_classification, taxonomy_specialization