Skip to content

Datasets

Note: This page will get a major update soon. Other datasets not yet listed:

  • when sec filings are detected: https://github.com/john-friedman/datamule-data/blob/master/data/datasets/detected_time_2025_12_03.csv.gz
  • Filings erroneously marked as XBRL by the SEC https://github.com/john-friedman/datamule-data/blob/master/data/datasets/recorded_as_xbrl_but_no_xbrl.csv
  • Every 10-K MDA up to 12/21/2025: https://github.com/john-friedman/Every-10-K-MDA-01-01-1993-12-21-2025.
  • SEC Filing Wordcounts: https://github.com/john-friedman/sec-filing-wordcounts-1993-2000

Usage

from datamule.datasets import cik_cusip_crosswalk
import pandas as pd

print(pd.DataFrame(cik_cusip_crosswalk).head())

Cloud

Datasets are stored in this repository and are updated daily using GitHub Actions.

Local

Datasets are locally stored in the User's home. e.g. for Windows: C:\Users\{username}\.datamule\datasets.

List of Datasets

  • cik_cusip_crosswalk
  • financial_security_identifiers_crosswalk
  • proposal_results

Updating Local Datasets

from datamule.datasets import update_dataset
update_dataset('cik_cusip_crosswalk')