The textContent
of a document is the concatenated value of text values in all text nodes. This data is used by many API methods (e.g. doc.search()
, doc.file('.txt')
, marks
). It can also be useful in a couple of scenarios outside of unified-doc
to compute derived data that can be passed back to unified-doc
as marks
.
unified-doc
doc.textContent
will easily return you the text content of a document (irregardless of its content type). You can use this data for anything (e.g. in various NLP pipelines).
some markdown content, content, content
content
3
marks
You can use the textContent
to compute marks
by calculating start
and end
offset of matched terms and repipe that data to unified-doc
for visually marking nodes. Note that the doc.search()
method (explored more in the Search section) should be the preferred way to do this (since search results are compatible interfaces with mark
interfaces).
This example demonstrates how the marks
data could be computed outside of a doc
instance (e.g. by a server), and the results will be fully compatible with unified-doc
since the offsets are based on the textContent
of a doc
.
some markdown content, content, content