No announcement yet.

Converting RTF to HTML/XML ?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting RTF to HTML/XML ?

    Hey all,

    I'm trying to find a good way of converting an RTF document (embedded in OBX;5 of an ORU message) into HTML or XML?

    I notice in the source code that there are libraries used from iText, which can handle XML, PDF, HTML and RTF... I just don't quite know how I would implement them. There also the built in Java RTF editor, but i don't think this supports tables.

    Anyone have any ideas?

  • #2
    I have actually had to use this before:

    it's not free but there is a free evaluation lib


    • #3
      The iText libs vary in their ability, and the rtf classes are somewhat old and unmaintained now (there is a sourceforge project for it, but no content there). It does not, for example, do a good job of preserving formatting in Cerner Millennium discharges. The fact that you're in London suggests to me that you may be using the same system.

      The product mentioned by bradd does a decent job, but personally I have a number of reservations about manipulating the contents of what is essentially a schema-less clinical document. Every tool I've used to try and get such data into a usable format has experienced issues at one point or another which is something of a deal breaker for me.

      After playing around with it for a number of weeks I settled for a straight convert to PDF using jodconverter (which can be plugged into Mirth as a custom java lib) and the OpenOffice document converter running as a daemon. PDFs can then be sent out via DTS (or whatever) to the GP. Works really well.


      • #4
        Incidentally, if you want to play around with the iText stuff, here's some code I dug out from my testing which will take the ORU, parse the saved RTF (not very well!) and spit out a PDF; you could simply change the pdfwriter class to xml/html (com.lowagie.text.xml.XmlWriter and com.lowagie.text.html.HtmlWriter) as you wish and examine the results for yourself.

        // Pull the RTF from the OBX and unescape it...
        var contents = msg['OBX']['OBX.5']['OBX.5.5'].toString();
        contents = contents.replace(/\\E\\/g, "\\");
        contents = contents.replace(/\\.br\\/g, "\r\n");
        FileUtil.write('/tmp/input.rtf', false, contents);
        // Generate a unix time stamp for use as the output filename (we'll use something a little more robust for prod, but this is useful for test)
        var foo = new Date;
        var unixtime_ms = foo.getTime();
        var unixtime = parseInt(unixtime_ms / 1000);
        // Set the variables for the input file and output file
        var inputfile = "/tmp/input.rtf";
        var outputfile = "/output/"+unixtime+".pdf";
        // Create the respective streams for the files
        var inputstream = new;
        var outputstream = new;
        // Create an iText document
        var myDocument = new;
        // Create a PDF writer object which we'll use to save the PDF in a moment
        var pdfwriter = new, outputstream);
        // Open the iText document we created a moment ago so we can modify it;
        // Create a parser which will load the RTF file in a moment
        var parser = new;
        // Parse the RTF input and pass it to the PDF writer object
        parser.convertRtfDocument(inputstream, myDocument);
        // Close the document and hopefully it will contain what we want!
        // Remove the temporary RTF file
        var tidyUp = new;
        Last edited by AlexToft; 06-04-2011, 12:10 AM.


        • #5
          Thanks Alex, you're right on the source of my RTFs.. I had looked into the OpenOffice/jodconverter route but not to the point that i'd tried implementing it. As long as it handles the RTF tables I think I should be fine.

          I'm still a bit of a Java newbie so it could take me a while to figure it out

          Cheers for the help. I do agree - only do the conversions if you are very, very sure that they are perfect