Contributing
Development Setup
Development is done on a dedicated Digital Ocean droplet. Local Docker setup on Mac is not currently viable due to known permission issues with the container entrypoint scripts.
Git Workflow
- Create a branch off
main - Make changes and test on the dev droplet
- Commit and push
- Open a pull request or merge directly to
mainwhen stable
Rebuilding After Code Changes
For nginx/docs changes only:
Adding a New Data Source
Each data source (Zenodo, GBIF, Dryad, etc.) has its own mapper package under src/ckanext-doi-import/ckanext/doi_import/mappers/. The mapper is responsible for fetching metadata from the source API and mapping it to the CKAN schema. It is plain Python and testable without running CKAN.
Mapper structure
Each source gets its own subdirectory:
mappers/
base.py ← DOI detection and shared utilities
zenodo/
__init__.py ← fetch_metadata(), get_last_modified(), internal helpers
gbif/ ← example future source
__init__.py
Required interface
Every mapper package must implement two functions in its __init__.py:
fetch_metadata(doi) — fetches metadata from the source API and returns a dict matching the CKAN schema. This is called by both the web import form and the harvest CLI.
def fetch_metadata(doi):
"""Fetch and map metadata from the source for a given DOI.
Args:
doi: A DOI string (e.g. '10.XXXX/source.12345').
Returns:
A dict matching the CKAN schema, ready for package_create/update.
Raises:
toolkit.ValidationError on API or format errors.
"""
...
The returned dict should include at minimum: title, notes, source_url, canonical_id, license_id, product_type, authors, resources.
get_last_modified(doi) — returns the last modified timestamp from the source API as an ISO 8601 string, or None if unavailable. Used by the harvest CLI to skip records that haven't changed.
def get_last_modified(doi):
"""Get the last modified timestamp from the source for a given DOI.
Returns:
ISO 8601 timestamp string, or None if unavailable.
"""
...
Step-by-step
-
Create the mapper directory and file:
-
Implement
fetch_metadata(doi)andget_last_modified(doi)in__init__.py. Usemappers/zenodo/__init__.pyas a reference. -
Register the source in
mappers/base.pyby adding detection logic todetect_source(): -
Wire the mapper into
doi_import/plugin.pyin thedoi_fetch_metadataaction function:
Testing the mapper
Because the mapper is plain Python, you can test it directly without running CKAN: