Announcement

Collapse
No announcement yet.

Java issues in transformer trying to convert base64 pdf to text

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Java issues in transformer trying to convert base64 pdf to text

    I'm having strange issues in my transformer, however, it works perfectly fine at a Rhino shell. I'm using Apache PDFBox 2.04 (standalone version) to try to convert a PDF to text. Mirth version is 3.4.1.8057.

    This code works in a Rhino shell:
    Code:
    importPackage(com.mirth.connect.server.userutil);
    
    // in mirth I construct base64pdf by combining multiple OBX.5 segments. In Rhino I read a sample from a file.
    var base64pdf = "base64 encoded pdf";
    var bytearraypdf = FileUtil.decode(base64pdf);
    var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(bytearraypdf);
    var stripper = new org.apache.pdfbox.text.PDFTextStripper();
    var pdftext = stripper.getText(pdf);
    The first issue when I run this in a transformer is on the var pdf line:
    Code:
    Wrapped java.io.FileNotFoundException: [[email protected] (The system cannot find the file specified)
    For some reason it is not calling the correct PDDocument.load method that takes a byte[] and is instead trying to open bytearraypdf.toString() as a file.

    Not sure why I needed to do this, but I was able to work around the issue by changing the line to
    Code:
    var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(new java.io.ByteArrayInputStream(bytearraypdf));
    This still ran in the Rhino shell without any issues. When trying to run it in Mirth, I now get:
    Code:
    Transformer error
    ERROR MESSAGE: Error evaluating transformer
    java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.getPages()Lorg/apache/pdfbox/pdmodel/PDPageTree;
    	at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
    	at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:497)
    	at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:126)
    	at org.mozilla.javascript.NativeJavaMethod.call(NativeJavaMethod.java:225)
    	at org.mozilla.javascript.Interpreter.interpretLoop(Interpreter.java:1479)
    	at org.mozilla.javascript.Interpreter.interpret(Interpreter.java:815)
    	at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
    	at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:393)
    	at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3280)
    	at org.mozilla.javascript.InterpretedFunction.exec(InterpretedFunction.java:120)
    	at com.mirth.connect.server.util.javascript.JavaScriptTask.executeScript(JavaScriptTask.java:142)
    	at com.mirth.connect.server.transformers.JavaScriptFilterTransformer$FilterTransformerTask.doCall(JavaScriptFilterTransformer.java:143)
    	at com.mirth.connect.server.transformers.JavaScriptFilterTransformer$FilterTransformerTask.doCall(JavaScriptFilterTransformer.java:119)
    	at com.mirth.connect.server.util.javascript.JavaScriptTask.call(JavaScriptTask.java:113)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    I put the PDFBox jar file in it's own subdirectory (pdfbox-lib) in the Mirth Program Files folder (Windows 2008 R2.) I'm including that directory in the classpath when I start the Rhino shell. In Mirth I added the directory as a new Resource. I then assigned that Resource to the connector from which I'm trying to use it.

  • #2
    So after much digging, I discovered an older version of PDFBox sitting in C:\Program Files\Mirth Connect\extensions\doc\lib\pdfbox-1.8.4.jar. I was not including this in the classpath when running in Rhino.

    That would explain the issues finding the correct classes/methods as there were apparently significant changes between the 1.8.x and 2.0.x branches.

    I removed the resource for the 2.0 branch that I had added. I updated my transformer code to be:
    Code:
    var pdf = org.apache.pdfbox.pdmodel.PDDocument.load(new java.io.ByteArrayInputStream(FileUtil.decode(base64pdf)));
    var stripper = new org.apache.pdfbox.util.PDFTextStripper();
    var pdftext = stripper.getText(pdf);
    Then I got this error:
    Code:
    Transformer error
    ERROR MESSAGE: Error evaluating transformer
    java.lang.NoClassDefFoundError: org/apache/fontbox/afm/AFMParser
    	at org.apache.pdfbox.pdmodel.font.PDFont.addAdobeFontMetric(PDFont.java:144)
    	at ...
    So I downloaded fontbox-1.8.4.jar and stuck it in custom-lib. Then I got this error:

    Code:
    Transformer error
    ERROR MESSAGE: Error evaluating transformer
    java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.font.PDTrueTypeFont
    	at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:100)
    	at ...
    I'm not sure what I'm missing at the moment, and this is all very frustrating considering I was able to easily make it work in Rhino. I'd prefer to use the newest release (2.0.4 at this time.) Is there any way to do this without breaking Mirth?

    Comment


    • #3
      Ok, I just found this thread. https://forums.mirthproject.io/node/12934

      I put pdfbox-app-1.8.13.jar in the extensions/doc/lib folder and edited extensions/doc/destination.xml to refer to it instead of pdfbox-1.8.4. I removed everything from custom-lib and the extra resources I had added.

      My sample code from my second post works now.

      The thread I linked asked the same question about using the 2.0 branch with Mirth, and it was not answered there either. Is this possible?

      Comment

      Working...
      X