Skip to main content

CSVSource

fhir4ds.sources.CSVSource — SourceAdapter for FHIR resources stored in CSV files, where the user defines a SQL projection mapping their column layout to the fhir4ds schema.

Class Signature

CSVSource(path: str, projection_sql: str)
ParameterTypeDescription
pathstrPath to the CSV file or glob pattern.
projection_sqlstrA SQL SELECT statement projecting CSV columns to the fhir4ds schema. Must use the {source} placeholder in the FROM clause.

Raises: SchemaValidationError — if the projection does not produce the required columns (id, resourceType, resource, patient_ref) with the required types.


The {source} Placeholder

The projection_sql must contain {source} in its FROM clause. At registration time, this placeholder is replaced with read_csv_auto('<path>'), which is DuckDB's CSV scanner.

-- Your projection_sql:
SELECT ... FROM {source}

-- Becomes:
SELECT ... FROM read_csv_auto('/data/patients.csv')

This lets you write portable projection SQL that doesn't depend on the file path.


Methods

register(con)

  1. Substitutes {source} with read_csv_auto('<path>').
  2. Creates CREATE OR REPLACE VIEW resources AS <projection>.
  3. Calls validate_schema().

Raises: SchemaValidationError if the projection doesn't expose the required columns, or if the view cannot be created (e.g., projection references non-existent CSV columns).

unregister(con)

Drops the resources view. Safe to call even if register() was never called.

supports_incremental()

Returns False. CSVSource does not support incremental delta tracking.


Example

import fhir4ds
from fhir4ds.sources import CSVSource

source = CSVSource(
path='/data/patients.csv',
projection_sql="""
SELECT
patient_id AS id,
'Patient' AS resourceType,
json_object(
'resourceType', 'Patient',
'id', patient_id,
'birthDate', birth_date,
'gender', gender
) AS resource,
patient_id AS patient_ref
FROM {source}
"""
)

con = fhir4ds.create_connection()
fhir4ds.attach(con, source)

Constructing FHIR JSON from CSV Columns

Use DuckDB's json_object() function to build the FHIR resource JSON from flat CSV columns:

json_object(
'resourceType', 'Patient',
'id', patient_id,
'birthDate', birth_date,
'gender', gender,
'name', json_array(
json_object('family', last_name, 'given', json_array(first_name))
)
) AS resource

The projection must produce a column named resource with a type castable to JSON.


Common Mistakes

MistakeSymptomFix
Forgetting {source} in FROM clauseSchemaValidationError or DuckDB parse errorUse FROM {source} in your projection SQL
Missing required columnsSchemaValidationError: required column 'patient_ref' is missingEnsure your SELECT produces all four columns: id, resourceType, resource, patient_ref
Wrong column typesSchemaValidationError: column 'resource' has type 'VARCHAR' but 'JSON' is requiredWrap your resource construction in json_object() or cast with ::JSON
Referencing non-existent CSV columnsSchemaValidationError: failed to create the 'resources' viewCheck your CSV headers match the column names in your projection