Announcement

Collapse
No announcement yet.

parse .gz

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • parse .gz

    I am trying to read a .gz file from a SFTP and parse the data contained in the .gz file.
    I’ve tried a code posted by Nick Rupley previously (see below). I’ve changed ZipInputStream to GZIPInputStream, hoping it may help, but it’s not able to read the .gz file.

    Any help is greatly appreciated.

    importPackage(java.io);
    importPackage(java.util.zip);

    var zis = new GZIPInputStream(new ByteArrayInputStream(org.apache.commons.codec.bina ry.Base64.decodeBase64(connectorMessage.getRawData ().replaceAll('[^0-9a-zA-Z\\+\\/\\=]',''))));
    var bufSize = 1024;

    var file;
    while (file = zis.getNextEntry()) {
    var bos = new ByteArrayOutputStream();
    var data = getBlankByteArray(bufSize);
    var len = -1;
    while ((len = zis.read(data,0,bufSize)) != -1)
    bos.write(data,0,len);
    var fileNode = <file/>;
    fileNode.name = file.getName();
    fileNode.content = new java.lang.String(bos.toByteArray());
    tmp.replaceChild(fileNode);

    channelMap.put('bos', bos);
    channelMap.put('fileNode', fileNode);
    }
    zis.close();

    function getBlankByteArray(length) {
    var bos = new java.io.ByteArrayOutputStream();
    for (var i = 1; i <= length; i++)
    bos.write(0);
    return bos.toByteArray();
    }

  • #2
    This should work, with minor adjustments for extracting files from .gz file.

    Code:
    importPackage(java.io);
    importPackage(java.util.zip);
    const BUFFSIZE = 65536;
    
    //java.lang.Thread.sleep(30000); //optional delay
    inputDirectory=$('readGZFromThisLocation'); //This is location from where you pickup the .gz file
    outputDirectory=$('writeXMLToThisLocation'); //location where you would write the extracted file. In my case it was .xml
    moveToDirectory=$('moveGZToThisLocation'); // moving location of .gz after it is read.
    
    var fileOps= org.apache.commons.io.FileUtils();
    var fileList=[];
    var sourceDir=new File(inputDirectory);
    var moveIt=new File(moveToDirectory);
    var fileName=sourceDir.getName();
    
    
    var fileList=fileOps.listFiles(sourceDir,['gz'],false)
    for each(file in fileList.toArray()) {
    logger.info(fileName);
    
    thisFileName=file.getName();
    
    thatFileName=org.apache.commons.io.FilenameUtils.removeExtension(thisFileName);
    thatNewFileName=thatFileName+".xml";
    
    
    sourceGZ=new File(inputDirectory+'/'+thisFileName);
    targetFile=new File(outputDirectory+'/'+thatNewFileName);
    
    var gzis=new GZIPInputStream(new FileInputStream(sourceGZ));
    var fout=new fileOps.openOutputStream(targetFile);
    
    var len=0;
    var buffer = java.lang.reflect.Array.newInstance(java.lang.Byte.TYPE, BUFFSIZE);
    
    while ((len = gzis.read(buffer)) > 0) {
            	fout.write(buffer, 0, len);
            }
    
    		gzis.close();
    		fout.close();
    	//	logger.info("done");
    		
    fileOps.moveFileToDirectory(sourceGZ,moveIt,true);
    }
    Last edited by siddharth; 01-13-2017, 04:37 AM. Reason: comments
    HL7v2.7 Certified Control Specialist!

    Comment


    • #3
      Thanks Siddharth.
      It looks like the the codes will unpack the .gz into a file. I am trying to unpack the .gz into mirths memory, and parse the values without having to write it to a temp file.

      Comment


      • #4
        Well, I don't think you would be able to do it that easily. But good luck with your quest!
        HL7v2.7 Certified Control Specialist!

        Comment


        • #5
          Originally posted by gojoshi View Post
          Thanks Siddharth.
          It looks like the the codes will unpack the .gz into a file. I am trying to unpack the .gz into mirths memory, and parse the values without having to write it to a temp file.
          I think it's not a great idea to unpack everything into Mirth's memory as it might eat up the heap space..Just my 2 cents

          Comment


          • #6
            I'm trying that code to extract data from a .tar.gz file, but I'm getting this error:

            Wrapped java.util.zip.ZipException: invalid distance too far back


            Here's the code I'm using:

            importPackage(java.io);
            importPackage(java.util.zip);
            const BUFFSIZE = 65536;

            inputDirectory='/folders/inbox/TarTest'; //This is location from where you pickup the .gz file
            outputDirectory='/folders/inbox/TarExtract'; //location where you would write the extracted file.
            moveToDirectory='/folders/inbox/TarMove'; // moving location of .gz after it is read.

            var fileOps= org.apache.commons.io.FileUtils();
            var fileList=[];
            var sourceDir=new File(inputDirectory);
            var moveIt=new File(moveToDirectory);
            var fileName=sourceDir.getName();


            var fileList=fileOps.listFiles(sourceDir,['gz'],false)
            for each(file in fileList.toArray()) {
            logger.info(fileName);

            thisFileName=file.getName();

            thatFileName=org.apache.commons.io.FilenameUtils.r emoveExtension(thisFileName);
            thatNewFileName=thatFileName//+".xml";


            sourceGZ=new File(inputDirectory+'/'+thisFileName);
            targetFile=new File(outputDirectory+'/'+thatNewFileName);

            var gzis=new GZIPInputStream(new FileInputStream(sourceGZ));
            var fout=new fileOps.openOutputStream(targetFile);

            var len=0;
            var buffer = java.lang.reflect.Array.newInstance(java.lang.Byte .TYPE, BUFFSIZE);

            while ((len = gzis.read(buffer)) > 0) {
            fout.write(buffer, 0, len);
            }

            gzis.close();
            fout.close();
            // logger.info("done");

            fileOps.moveFileToDirectory(sourceGZ,moveIt,true);
            }

            Comment


            • #7
              So the error message:
              Wrapped java.util.zip.ZipException: invalid distance too far back
              means that the file is corrupt.

              That happened with two files that are actually not corrupt.

              Also, it looks like it cannot read anything larger than 2.147 GB:

              Unable to read files greater than 2147483647 bytes

              So it seems that this won't work for me.

              Comment

              Working...
              X