Announcement

Collapse
No announcement yet.

how to handle invalid XML Characters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to handle invalid XML Characters

    Hi
    Mirth 3.1.2
    I am trying to extract a PDF document from xml message and write it to PDF file.
    I built the following Channel

    (1) Source connector types:
    Inbound XML
    Outbound Raw
    Channel reader
    I added a transformer step to the source connector to map the xml element to a variable as follows:
    pdfdoc = msg['Body']['Part'][1]['Content'].toString();
    globalChannelMap.put('pdfdoc',pdfdoc)

    (2) Destination connector types:
    Inbound Raw
    Outbound Raw
    File writer
    File Type: Binary

    I used the above variable in the destination template as follows:
    ${pdfdoc}

    The problem:
    the pdf has special characters and I keep getting the following error at the source transformer step:
    Transformer error
    ERROR MESSAGE: Error evaluating transformer
    com.mirth.connect.server.MirthJavascriptTransforme rException:
    CHANNEL: XDSb Retrieve Document Filter DOC
    CONNECTOR: sourceConnector
    SCRIPT SOURCE:
    SOURCE CODE:
    257: }
    258: if ('xml' === typeof msg && msg.hasSimpleContent()) { msg = msg.toXMLString(); }if ('xml' === typeof tmp && tmp.hasSimpleContent()) { tmp = tmp.toXMLString(); }
    259: }
    260: if (doFilter() == true) { doTransform(); return true; } else { return false; }
    261: }
    LINE NUMBER: 262
    DETAILS: TypeError: Character reference "" is an invalid XML character.
    at 264932c5-eb67-4a90-b549-071d8ac1ec55:241 (doScript)
    at 264932c5-eb67-4a90-b549-071d8ac1ec55:262
    at com.mirth.connect.server.transformers.JavaScriptFi lterTransformer$FilterTransformerTask.call(JavaScr iptFilterTransformer.java:153)
    at com.mirth.connect.server.transformers.JavaScriptFi lterTransformer$FilterTransformerTask.call(JavaScr iptFilterTransformer.java:118)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker( Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (Unknown Source)
    at java.lang.Thread.run(Unknown Source)



    My trials:
    (1) if I remove the special characters in the preprocessor scirpt, I end up with corrupted PDF.
    (2) if i change file type in my destination to "Text", I will have a PDF file the opens as blank

    P.S: I attached samples from the xml message and the PDF document that should be extracted from this message.
    Any help is mostly appreciated
    Attached Files

  • #2
    That "XML" file you attached isn't XML at all. It's an HTTP multipart payload. How are you getting that? Are you receiving it with a TCP Listener or something? Consider using an HTTP Listener instead, and enabling XML Body and Parse Multipart. Then the actual message received by the channel will be a well-formatted XML document containing each part, and any binary content (like the PDF) will be encoded in Base64.
    Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

    Nicholas Rupley
    Work: 949-237-6069
    Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


    - How do I foo?
    - You just bar.

    Comment


    • #3
      Well ... I tried to be brief ... But what actually I did is the following ..I have a separate channel where I send MTOM request and receive response with embedded PDF.
      I built up the MTOM body in the template, and made the below settings for my Http sender:
      Multipart: No
      Response Content: XML Body
      Parse Multipart: Yes

      I got the PDF as xml element inside the response, following an example:
      HttpResponse>
      <Body boundary="MIMEBoundaryurn_uuid_EB763C07A95A62816A1 426668158042" multipart="yes">
      <Part>
      <Headers>
      <Content-Type>application/xop+xml; charset=UTF-8; type="application/soap+xml"</Content-Type>
      <Content-Transfer-Encoding>binary</Content-Transfer-Encoding>
      <Content-ID>&lt;0.urn:uuid:[email protected] apache.org&gt;</Content-ID>
      </Headers>
      <Content multipart="no">&lt;?xml version='1.0' encoding='UTF-8'?&gt;&lt;soapenv:Envelope &gt;&lt;soapenv:Header&gt;&lt;wsa:Action&gt;urn :ih e:iti:2007:RetrieveDocumentSetResponse&lt;/wsa:Action&gt;&lt;wsa:RelatesTo&gt;bb1906f1-5fff-4ea5-979e-55c3a2f20489&lt;/wsa:RelatesTo&gt;&lt;/soapenv:Header&gt;&lt;soapenv:Body&gt;&lt;xdsb:Ret rieveDocumentSetResponse &gt;&lt;rs:RegistryResponse status="urnasis:names:tc:ebxml-regrep:ResponseStatusType:Success" /&gt;&lt;xdsbocumentResponse&gt;&lt;xdsb:Reposit o ryUniqueId&gt;1.19.6.24.109.42.1.5&lt;/xdsb:RepositoryUniqueId&gt;&lt;xdsbocumentUnique Id&gt;1.42.20141105202336.39&lt;/xdsbocumentUniqueId&gt;&lt;xdsb:mimeType&gt;text/plain&lt;/xdsb:mimeType&gt;&lt;xdsbocument&gt;&lt;xop:Incl ude href="cid:1.urn:uuid:EB763C07A95A62816A14266681580 [email protected]" /&gt;&lt;/xdsbocument&gt;&lt;/xdsbocumentResponse&gt;&lt;/xdsb:RetrieveDocumentSetResponse&gt;&lt;/soapenv:Body&gt;&lt;/soapenv:Envelope&gt;</Content>
      </Part>
      <Part>
      <Headers>
      <Content-Type>text/plain</Content-Type>
      <Content-Transfer-Encoding>binary</Content-Transfer-Encoding>
      <Content-ID>&lt;1.urn:uuid:[email protected] apache.org&gt;</Content-ID>
      </Headers>
      <Content multipart="no">........ PDF document is here ....

      </Content>
      </Part>
      </Body>
      </HttpResponse>

      Now in my response channel I added the following transformer to extract the PDF into a string:
      $gc('doc', msg['Body']['Part'][1]['Content'].toString());

      I added a file writer destination set it into binary and put ${doc} in it's template.

      I have the following error:
      ERROR MESSAGE: Error evaluating transformer
      com.mirth.connect.server.MirthJavascriptTransforme rException:
      CHANNEL: XDSb Retrieve Document Filter
      CONNECTOR: To Response 2
      SCRIPT SOURCE:
      SOURCE CODE:
      LINE NUMBER: 262
      DETAILS: TypeError: Character reference "" is an invalid XML character.

      If I try to change the type of inbound/outbound connector to Raw I get:
      246: $gc('doc', msg['Body']['Part'][1]['Content'].toString());
      247: if ('xml' === typeof msg && msg.hasSimpleContent()) { msg = msg.toXMLString(); }if ('xml' === typeof tmp && tmp.hasSimpleContent()) { tmp = tmp.toXMLString(); }
      248: }
      249: if (doFilter() == true) { doTransform(); return true; } else { return false; }
      250: }
      LINE NUMBER: 246
      DETAILS: TypeError: Cannot read property "Part" from undefined

      if I try to change the connector to type to document writer I get:
      Document Writer error
      ERROR MESSAGE: Error writing document
      org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 2; The content of elements must consist of well-formed character data or markup.

      If I try to pre-process the message removing control characters such as "", I end up with "The content of elements must consist of well-formed character data or markup" error

      Comment


      • #4
        The problem is that you're receiving the PDF document with a content type of text/plain. That is incorrect. You need to go to whoever manages that server and get them to send correct responses with correct content types. Also, what is your Response Binary MIME Types set to on the HTTP Sender? By default it should be Base64 encoding anything with a content type that begins with "application/".
        Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

        Nicholas Rupley
        Work: 949-237-6069
        Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


        - How do I foo?
        - You just bar.

        Comment


        • #5
          Thank you Nick ... The problem as you indicated was in the response binary MIME types ... It was blank.
          Now I am able to extract both PDFs and text documents..
          However, since the content type might be incorrect from the document source , I wonder what is the best way to detect document type (pdf, Cda, txt ..etc)

          Comment


          • #6
            By default the Binary MIME Types field is set to "application/, image/, video/, audio/". That must have been changed at some point.

            Since the server is sending you the incorrect content type, yeah there's no good way to know what type of document it is. That's the entire point of the content type header, so whoever manages that server must have really messed up.

            You could force all content types to be Base64 encoded, by checking Regular Expression and using ".*". That way at least you shouldn't get any of those "invalid XML character" issues.
            Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

            Nicholas Rupley
            Work: 949-237-6069
            Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


            - How do I foo?
            - You just bar.

            Comment

            Working...
            X