Core Classes

Portfolio

Container for managing SEC submissions. Takes the output_dir from the Downloader or PremiumDownloader as input.

Methods:

portfolio = Portfolio('output_dir')

# Iterate through submissions
for submission in portfolio:
    print(submission.metadata)

Submission

A SEC submission containing documents and metadata.

Methods:

# Keep only specific document types, delete others
submission.keep(['4', 'GRAPHIC'])

# Delete specific document types, keep others
submission.drop(['GRAPHIC'])

# Iterate through documents of specific type
for doc in submission.document_type('4'):
    data = doc.parse()

Document

A document within a submission. For example, an INFORMATION TABLE inside a 13F-HR, or a GRAPHIC inside a 10-K. You can access it as an iterable, if it is supported by the parser.

Methods:

# Parse document content
data = document.parse()

# Write to JSON
document.write_json('output.json')

# Write to CSV
document.write_csv('output.csv')

Usage Example

Example converting Form 4 filings to CSV:

from datamule import Submission, Portfolio, PremiumDownloader
import pandas as pd

# I set my api_key using the environment variable 'DATAMULE_API_KEY'
downloader = PremiumDownloader()

# Downloads for me in about 1 minute (300/sec)
downloader.download_submissions(
    filing_date=('2023-01-01','2023-01-31'),
    submission_type='4',
    output_dir='jan_23'
)
portfolio = Portfolio('jan_23')

# This is not optimized for speed, but it's a good example of how to iterate over the portfolio
# Takes about 1 minute to run
df_list = []
for submission in portfolio:
    for form_4 in submission.document_type('4'):
        df_list.append(pd.DataFrame(form_4))

df = pd.concat(df_list)
df.to_csv('jan_23.csv',index=False)