    My javascript reader reads among other things an HTML block, e.g. "<div><li><span>blah</span></div>".
    Then I need to extract only the text content from this HTML block, similar to document.getElementById("myelement").textContent.
    Any idea how can I achieve this in source connector?

    I think you'd need to use an html parser. You can't use e4x since it's not xhtml (the <li> tag is never closed.)

    It looks like the mirth Document Writer uses com.lowagie.text.html.HtmlParser from the itext library if you want to look into that.

    This might be helpful, too.


      When converting html to text do you want to preserve newline and spacing in the text?

      I have used regex earlier however I have found it can lead to issue specifically if there are < > in your text

      I have successfully used jsoup HTML parser which has worked well for me.