Parsing ======= Currently parses documents with: * .xml extension * .txt extension if 10-K, 10-Q, 8-K, SC 13D, SC 13G * .htm/.html extension if 10-K, 10-Q, 8-K, SC 13D, SC 13G Note: The parser will soon be updated to parse almost every document type. Future ------ * parses all .htm/.html files * parses most .pdf files (some are image-based and cannot be parsed) Standardization --------------- Parsing utilizes `doc2dict `_ to convert documents to a dictionary format. Documents can be further standardized using the `mapping_dicts `_. Contributions to mapping dicts are highly appreciated!