unified-doc

Overview

The textContent of a document is the concatenated value of text values in all text nodes. This data is used by many API methods (e.g. doc.search(), doc.file('.txt'), marks). It can also be useful in a couple of scenarios outside of unified-doc to compute derived data that can be passed back to unified-doc as marks.

Use outside unified-doc

doc.textContent will easily return you the text content of a document (irregardless of its content type). You can use this data for anything (e.g. in various NLP pipelines).

Live Code Editor
Preview
Text content: some markdown content, content, content
Term: content
Count: 3

Compute marks

You can use the textContent to compute marks by calculating start and end offset of matched terms and repipe that data to unified-doc for visually marking nodes. Note that the doc.search() method (explored more in the Search section) should be the preferred way to do this (since search results are compatible interfaces with mark interfaces).

This example demonstrates how the marks data could be computed outside of a doc instance (e.g. by a server), and the results will be fully compatible with unified-doc since the offsets are based on the textContent of a doc.

Live Code Editor
Preview

Compiled

some markdown content, content, content

© 2020 unified-doc