Parsers inform how content should be parsed into a valid hast tree. Parsers are assigned to their corresponding mime type (inferred from the filename
). You can add a new parser, or override an existing parser, by assigning a custom parser to a mime type.
unified-doc
infers the mime type from the filename
to determine which parser should be used. unified-doc
currently supports, and will eventually support, the following document formats (and relating mime types):
text/html
: HTML parsertext/markdown
: Markdown parsertext/csv
: CSV parserapplication/vnd.openxmlformats-officedocument.wordprocessing
: DOCX parserapplication/pdf
: PDF parserapplication/x-latex
: LaTeX parserapplication/rtf
: RTF parserThe filename
should be related to the content
intending to be parsed in a natural way. This is usually not an issue if both pieces of information are read from a common source/file/document.
doc.md
some markdown content
doc.html
some HTML content
doc.csv
Column 0 | Column 1 | Column 2 |
---|---|---|
row1col0 | row1col1 | row1col2 |
If a parser is not supported from its inferred mime type, a fallback parser that renders content into a code block is applied. This is useful for syntax highlighters, and will be further explored in the Syntax highlighting section. Source code documents usually fall under this category.
doc.json
{
"one": 2,
"three": [
true,
false,
null,
"four",
5
]
}
doc.js
function greet() {
return "hello world";
}
doc.py
def greet():
return "hello world"
Parsers are applied using the PluggableList
interface and can include multiple steps e.g. [parser1]
or [parser2, parser3]
. Custom parsers are specified through a mapping of mimeTypes to associated parsers.
You can override an existing parser with a new parser by assigning to its supported mime type or assign specific parsers to unsupported mime types.
> **SOME** MARKDOWN CONTENT
{ "ONE": 2, "THREE": [TRUE, FALSE, NULL, "FOUR", 5]}
If you would prefer to use the fallback code block parser to render the source code of the file, you can disable the parser by setting its value to null
for the associated mime type in the parsers
option. The following shows how the default HTML parser can be disabled to rely on the fallback code block parser to render the source code of the HTML content.
<blockquote><strong>some</strong> HTML content</blockquote>