Downloader & Premium Downloader
Downloader is free, but constrained by SEC rate limits. Premium Downloader uses the datamule api which is constrained by your internet speed and hardware.
Downloader
from datamule import Downloader
downloader = Downloader()
# Download SEC filings
download_submissions(
self,
output_dir='filings', # Where to save files
cik=None, # CIK number, list of CIKs, or None for all
ticker=None, # Stock symbol, list of symbols, or None
submission_type=None, # Submission types: '10-K', ['10-K', '10-Q', '8-K']
filing_date=None # Filing dates:
# * Single date: "2023-01-01"
# * Date range tuple: ("2023-01-01", "2023-12-31")
# * List of dates: ["2023-01-01", "2023-02-01"]
# * Defaults to all dates from 2001-01-01 to present
)
# Download XBRL company concepts (financial data)
download_company_concepts(
self,
output_dir='company_concepts', # Where to save XBRL JSON files
cik=None, # CIK number, list of CIKs, or None for all
ticker=None # Stock symbol, list of symbols, or None
)
Premium Downloader
from datamuler import PremiumDownloader as Downloader
downloader = Downloader(api_key='your-api-key') # will automatically use the environment variable DATAMULE_API_KEY
# Download SEC filings
download_submissions(
self,
output_dir='filings', # Where to save files
cik=None, # CIK number, list of CIKs, or None for all
ticker=None, # Stock symbol, list of symbols, or None
submission_type=None, # Submission types: '10-K', ['10-K', '10-Q', '8-K']
filing_date=None # Filing dates:
# * Single date: "2023-01-01"
# * Date range tuple: ("2023-01-01", "2023-12-31")
# * List of dates: ["2023-01-01", "2023-02-01"]
# * Defaults to all dates from 2001-01-01 to present
)
Benchmarks
File Size |
Examples |
Downloader |
Premium Downloader |
---|---|---|---|
Small Files |
3, 4, 5 |
5/s |
300/s |
Medium Files |
8-K |
5/s |
60/s |
Large Files |
10-K |
3/s |
5/s |
Note
Premium Downloader will be updated soon to be 10-100x faster.
Pricing
Free: Basic Downloader
Premium Tier: $1/100,000 filings + $1/billion rows read (the database is 16 million rows total, so max read cost is about 1.6 cents).
Query Optimization
The database uses a composite index (submission_type, filing_date, cik). For optimal performance and lower costs:
Use columns in left-to-right order: *
submission_type
*submission_type + filing_date
*submission_type + filing_date + cik
Note
Queries using only filing_date
or cik
will not leverage the index efficiently.
Note
In the future, I’ll add more indexes that will remove the need for this optimization. I would do it now, but it would cost me $50. Waiting for next month, when my write quota resets.
Note
API returns up to 25,000 results per page.
Performance Tuning for Premium Downloader
You can adjust these parameters to optimize for your hardware:
# Defaults
downloader.CHUNK_SIZE = 2 * 1024 * 1024 # 2MB chunks
downloader.MAX_CONCURRENT_DOWNLOADS = 100 # Parallel downloads
downloader.MAX_DECOMPRESSION_WORKERS = 16 # Decompression threads
downloader.MAX_PROCESSING_WORKERS = 16 # Processing threads
downloader.QUEUE_SIZE = 10 # Internal queue size
API key
PowerShell
[System.Environment]::SetEnvironmentVariable('DATAMULE_API_KEY', 'your-api-key', 'User')
Bash
echo 'export DATAMULE_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc
Zsh (macOS default)
echo 'export DATAMULE_API_KEY="your-api-key"' >> ~/.zshrc
source ~/.zshrc
Note
After setting the environment variable, you may need to restart your terminal/shell for the changes to take effect.
Note
Premium Downloader may be much faster depending on your laptop’s specs and internet connection.
Note
Premium Downloader is in beta and may have bugs. To check for errors go to output_dir/errors.json