No announcement yet.

convert html string to text

  • Filter
  • Time
  • Show
Clear All
new posts

  • convert html string to text

    My javascript reader reads among other things an HTML block, e.g. "<div><li><span>blah</span></div>".
    Then I need to extract only the text content from this HTML block, similar to document.getElementById("myelement").textContent.
    Any idea how can I achieve this in source connector?

  • #2
    I think you'd need to use an html parser. You can't use e4x since it's not xhtml (the <li> tag is never closed.)

    It looks like the mirth Document Writer uses com.lowagie.text.html.HtmlParser from the itext library if you want to look into that.

    This might be helpful, too.


    • #3
      When converting html to text do you want to preserve newline and spacing in the text?

      I have used regex earlier however I have found it can lead to issue specifically if there are < > in your text

      I have successfully used jsoup HTML parser which has worked well for me.