CSVSource

fhir4ds.sources.CSVSource — SourceAdapter for FHIR resources stored in CSV files, where the user defines a SQL projection mapping their column layout to the fhir4ds schema.

Class Signature

CSVSource(path: str, projection_sql: str)

Parameter	Type	Description
`path`	`str`	Path to the CSV file or glob pattern.
`projection_sql`	`str`	A SQL `SELECT` statement projecting CSV columns to the fhir4ds schema. Must use the `{source}` placeholder in the `FROM` clause.

Raises: SchemaValidationError — if the projection does not produce the required columns (id, resourceType, resource, patient_ref) with the required types.

The `{source}` Placeholder

The projection_sql must contain {source} in its FROM clause. At registration time, this placeholder is replaced with read_csv_auto('<path>'), which is DuckDB's CSV scanner.

-- Your projection_sql:
SELECT ... FROM {source}

-- Becomes:
SELECT ... FROM read_csv_auto('/data/patients.csv')

This lets you write portable projection SQL that doesn't depend on the file path.

Methods

`register(con)`

Substitutes {source} with read_csv_auto('<path>').
Creates CREATE OR REPLACE VIEW resources AS <projection>.
Calls validate_schema().

Raises: SchemaValidationError if the projection doesn't expose the required columns, or if the view cannot be created (e.g., projection references non-existent CSV columns).

`unregister(con)`

Drops the resources view. Safe to call even if register() was never called.

`supports_incremental()`

Returns False. CSVSource does not support incremental delta tracking.

Example

import fhir4ds
from fhir4ds.sources import CSVSource

source = CSVSource(
    path='/data/patients.csv',
    projection_sql="""
        SELECT
            patient_id AS id,
            'Patient'  AS resourceType,
            json_object(
                'resourceType', 'Patient',
                'id', patient_id,
                'birthDate', birth_date,
                'gender', gender
            ) AS resource,
            patient_id AS patient_ref
        FROM {source}
    """
)

con = fhir4ds.create_connection()
fhir4ds.attach(con, source)

Constructing FHIR JSON from CSV Columns

Use DuckDB's json_object() function to build the FHIR resource JSON from flat CSV columns:

json_object(
    'resourceType', 'Patient',
    'id', patient_id,
    'birthDate', birth_date,
    'gender', gender,
    'name', json_array(
        json_object('family', last_name, 'given', json_array(first_name))
    )
) AS resource

The projection must produce a column named resource with a type castable to JSON.

Common Mistakes

Mistake	Symptom	Fix
Forgetting `{source}` in FROM clause	`SchemaValidationError` or DuckDB parse error	Use `FROM {source}` in your projection SQL
Missing required columns	`SchemaValidationError: required column 'patient_ref' is missing`	Ensure your SELECT produces all four columns: `id`, `resourceType`, `resource`, `patient_ref`
Wrong column types	`SchemaValidationError: column 'resource' has type 'VARCHAR' but 'JSON' is required`	Wrap your resource construction in `json_object()` or cast with `::JSON`
Referencing non-existent CSV columns	`SchemaValidationError: failed to create the 'resources' view`	Check your CSV headers match the column names in your projection

Class Signature​

The {source} Placeholder​

Methods​

register(con)​

unregister(con)​

supports_incremental()​

Example​

Constructing FHIR JSON from CSV Columns​

Common Mistakes​