Parsers inform how content should be parsed into a valid hast tree. Parsers are assigned to their corresponding mime type (inferred from the filename). You can add a new parser, or override an existing parser, by assigning a custom parser to a mime type.
unified-doc infers the mime type from the filename to determine which parser should be used. unified-doc currently supports, and will eventually support, the following document formats (and relating mime types):
text/html: HTML parsertext/markdown: Markdown parsertext/csv: CSV parserapplication/vnd.openxmlformats-officedocument.wordprocessing: DOCX parserapplication/pdf: PDF parserapplication/x-latex: LaTeX parserapplication/rtf: RTF parserThe filename should be related to the content intending to be parsed in a natural way. This is usually not an issue if both pieces of information are read from a common source/file/document.
doc.mdsome markdown content
doc.htmlsome HTML content
doc.csv| Column 0 | Column 1 | Column 2 |
|---|---|---|
| row1col0 | row1col1 | row1col2 |
If a parser is not supported from its inferred mime type, a fallback parser that renders content into a code block is applied. This is useful for syntax highlighters, and will be further explored in the Syntax highlighting section. Source code documents usually fall under this category.
doc.json{
"one": 2,
"three": [
true,
false,
null,
"four",
5
]
}doc.jsfunction greet() {
return "hello world";
}doc.pydef greet():
return "hello world"Parsers are applied using the PluggableList interface and can include multiple steps e.g. [parser1] or [parser2, parser3]. Custom parsers are specified through a mapping of mimeTypes to associated parsers.
You can override an existing parser with a new parser by assigning to its supported mime type or assign specific parsers to unsupported mime types.
> **SOME** MARKDOWN CONTENT
{ "ONE": 2, "THREE": [TRUE, FALSE, NULL, "FOUR", 5]}If you would prefer to use the fallback code block parser to render the source code of the file, you can disable the parser by setting its value to null for the associated mime type in the parsers option. The following shows how the default HTML parser can be disabled to rely on the fallback code block parser to render the source code of the HTML content.
<blockquote><strong>some</strong> HTML content</blockquote>