Announcement

Collapse
No announcement yet.

Convert PDF into TXT

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert PDF into TXT

    Hello,

    after intensive search of the different forums without finding anything I'll ask for help regarding PDF to TXT convert in Mirth.
    That means I read in a channel a PDF document and will convert the PDF contents as TXT into a variable called x.
    Using Mirth 2.2.1.5861 I tried

    var x = FileUtil.readBytes("D:/Mirth-Test/Test.pdf");

    but the contents of x is encoded data, not the real PDF contents converted into TXT.

    Please can anybody help me to convert the PDF contents as TXT into a variable ?
    Which commands do I have to use ?
    How to split later for inserting the TXT into different OBX segments ?

    Thanks in Advance

  • #2
    We include iText already, so that may be your best bet. Though you can also choose to use any other library you want.
    Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

    Nicholas Rupley
    Work: 949-237-6069
    Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


    - How do I foo?
    - You just bar.

    Comment


    • #3
      Hello Narupley,

      thanks for the fast answer. I'll check with iText if I can use it for converting PDF into TXT.

      Best Regards

      Comment


      • #4
        Hello Narupley,

        checked with iText, but the software is not able to convert PDF into TXT.
        Is there no command or function available to do this in Mirth itself ?

        Best Regards

        Comment


        • #5
          Do a google search of pdf.js.

          Comment


          • #6
            Thanks, but pdf.js seems to be a PDF Reader in JavaScript only and also not able to convert PDF into TXT. Or I'm wrong ?

            Best Regards

            Comment


            • #7
              I saw a couple of articles that dealt with converting PDF to text.

              Comment


              • #8
                Tried to find a solution and found Apache Tika (http://tika.apache.org/) as a possible good tool, but do not know how to invoke Apache Tika into Mirth.
                Please has anybody an idea how to do this ?

                Comment


                • #9
                  Originally posted by Mirther172 View Post
                  Tried to find a solution and found Apache Tika (http://tika.apache.org/) as a possible good tool, but do not know how to invoke Apache Tika into Mirth.
                  Please has anybody an idea how to do this ?
                  This will probably help: http://www.mirthcorp.com/community/w...ode+from+Mirth
                  Step 1: JAVA CACHE...DID YOU CLEAR ...wait, ding dong the witch is dead?

                  Nicholas Rupley
                  Work: 949-237-6069
                  Always include what Mirth Connect version you're working with. Also include (if applicable) the code you're using and full stacktraces for errors (use CODE tags). Posting your entire channel is helpful as well; make sure to scrub any PHI/passwords first.


                  - How do I foo?
                  - You just bar.

                  Comment


                  • #10
                    When I want to convert PDF to Text, I used to use A-PDF Text Extractor.
                    This software works well for us here. I may work for you as well.

                    The other software is A-PDF Form Data Extractor that extract data filled in the PDF Form into .csv file.
                    A-PDF support command line call.
                    I am new to Mirth... and am exploring how I can preprocess the file with the call with these function.

                    JV

                    Comment


                    • #11
                      Hi jvohariwatt,

                      good idea, PDF Form Data Extractor sounds good although it's not for free like e.g. Apache Tika. The question is how to invoke a function like this into a Mirth channel ?
                      That's what I'm not sure about ...

                      Best Regards

                      Comment


                      • #12
                        i googled for a couple of converters, might give mirth a go if you say it does the job...
                        so far ive only found http://pdftoword.pro/ and that doesnt convert images, which i guess is fair enough
                        if anyone does know a free converter that includes image conversion pm me please

                        Comment


                        • #13
                          Originally posted by Mirther172 View Post
                          Hello Narupley,

                          checked with iText, but the software is not able to convert PDF into TXT.
                          Is there no command or function available to do this in Mirth itself ?

                          Best Regards
                          yes, it is able to do that: http://support.itextpdf.com/node/27

                          the thing is, can we get an example of that?

                          Comment


                          • #14
                            Originally posted by Mirther172 View Post
                            Thanks, but pdf.js seems to be a PDF Reader in JavaScript only and also not able to convert PDF into TXT. Or I'm wrong ?

                            Best Regards
                            you're wrong: http://git.macropus.org/2011/11/pdftotext/example/

                            the thing is, can we get an example of this in Mirth, please?

                            Comment


                            • #15
                              Originally posted by narupley View Post
                              We include iText already, so that may be your best bet.
                              can we get an example, please?

                              Comment

                              Working...
                              X