Announcement

Collapse
No announcement yet.

CSV parsing accelerated ?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CSV parsing accelerated ?

    Hi, hope all ok in this nasty world .

    I try to work with a output generated by instruments. there are multiple table in the row data and i need to combine data to generate output from .
    In eg

    Table A is having Ct values in 3 color vs well position
    A1 green 26.5
    A1 yellow 15.2
    A1 red 20.4
    A2 green 12.4
    A2 yellow 0
    A2 red 14.5

    Table B is having Intensity data in several color vs well position it behave like multiple table
    green
    A1 0.00005
    A2 0.01232
    A3 0.1213213
    yellow
    A1 5230
    A2 -0.0.5
    A3 565466
    red
    .....

    the table are separated by title and new lines , the headers exist as well

    My reference is a 96 well plate meaning a table containing
    /

    c
    Code:
    onst _96well = ['A1','A2','A3','A4','A5','A6','A7','A8','A9','A10' ,'A11','A12','B1','B2','B3','B4','B5','B6','B7','B 8','B9','B10','B11','B12','C1','C2','C3','C4','C6' ,'C7','C8','C9','C10','C11','C12','D1','D2','D3',' D4','D5','D6','D7','D8','D9','D10','D11','D12','E1 ','E2','E3','E4','E5','E6','E7','E8','E9','E10','E 11','E12','F1','F2','F3','F5','F6','F7','F8','F10' ,'F11','F12','G1','G2','G3','G4','G5','G6','G7','G 8','G9','G10','G11','G12','H1','H2','H3','H4','H5' ,'H6','H7','H8','H9','H10','H11','H12']
    I try to build a code leading to for each position in my reference plate , extract the 3 CT values + intensisty files in 1 line

    i start with something like

    Code:
    for each ( pos in _96well) {
    get the CT values = is fast
    get the intensity value = is slow
    }

    The slowness is due to my intensity finder code that leads to parse multiple time the CSV until it find relevant starting row .

    Parse all file until finding "title"
    parse all the file from title position until finding color
    parse all the file from color position until finding the Well position
    get value from corresponding row/ column

    Code:
    function getFamDRN(pos){
    var value = 0;
    
    var drnstartline = finstartrow ('Table CT intensity',0)
    var famstartline = finstartrow (DYEG,drnstartline)
    var positionstartline = finstartrow (pos,famstartline)
    value = msg['row'][positionstartline]['column41'].toString()
    
    logger.info('value is = '+ pos +" " + value)
    
    return value
    
    function finstartrow (title,start){
    var sRow = 0;
    for (var i = start; i < getArrayOrXmlLength(msg['row']); i++) {
    
    if (msg['row'][i]['column1'].toString() == title) {
    sRow = i
    break;
    }
    }
    return parseInt(sRow, 10)//sRow
    }
    This works but take 2-3 min to run with 16% resources , this looks bad.

    is it a more efficient way to perform similar action ?
    load in variable ? using mirth tricks ?

    Thank youuuuuuu in advance

  • #2
    Sure.. if your values are static, load that CSV file into an in memory map probably $gc or $g depending on if that table you are searching is used in one or many channels. I am hopeful you are not consuming a CSV for every message to the channel, that would be a poor design. Alternatively put that look up stuff in a SQL table and write a function to find what you need.

    Honestly, I don't understand most of your post. Maybe post an export of your channel.

    Diridium Technologies, Inc.
    https://diridium.com

    Comment


    • #3
      Can you post a full sample message? It's difficult to tell what it really looks like from your description and code.

      Comment


      • #4
        Hi
        thanks for feedback here is the CSV (that is in fact the message coming out of instrument in CSV to be treated )
        1 part is header with generic data
        2nd part is of interest table contain Well
        3rd part is of interest "Table CT intensity" witch is composed in fact on 3 tables (FAM HEX CY5 and associated values)


        The goal is for each position in 1st table is to:
        1. regroup following info into a single line
        2. add the 40th value from 3rd table of FAM color
        ex for A1

        A1;;;Unknown;HEX_2;HS IC;No Ct;FAM;N1+N2;No Ct;Cy5;RNA IC;No Ct; 3,03619317440872E-6


        The step1 is pretty ok
        The step 2 involve multiple loop to parse the CSV to manage redundancies
        Find the row having Title "Table CT intensity " then 1st row after having "FAM" then 1st row after having Well ID "A1" This loop takes 3 minutes to run , I was asking if alternative option are possible

        I used this code to parse the famous value

        Code:
        function getFamDRN(pos){
        var value = 0;
        
        var drnstartline = finstartrow ('Table CT intensity',0)
        var famstartline = finstartrow (DYEG,drnstartline)
        var positionstartline = finstartrow (pos,famstartline)
        value = msg['row'][positionstartline]['column41'].toString()
        
        logger.info('value is = '+ pos +" " + value)
        
        return value
        }
        
        
        
        //find starting row , starting row is header of results tab start by title eg "Well" + a starting position
        function finstartrow (title,start){
        var sRow = 0;
        for (var i = start; i < getArrayOrXmlLength(msg['row']); i++) {
        
        if (msg['row'][i]['column1'].toString() == title) {
        sRow = i
        break;
        }
        }
        return parseInt(sRow, 10)//sRow
        }


        thanks

        Attached Files

        Comment


        • #5
          Any feedback? 😥

          Comment


          • #6
            So if I understand you correctly, you are parsing the whole file three times over to find your title and color and whatnot?
            I'd try reading the file into some combination of java hashmaps or js dicts. For example if you had set number of colors, maybe create a dictionary of dictionaries and use those to eventually find your needed values.
            Code:
            {
              "green": {
                "A1": "0.something",
                "C5": "1.whatever"
              },
              "yellow": {
                "A1": "0.something",
                "Q9": "1.whatever"
              }
            }
            EDIT: Didn't notice your attached .csv before. After seeing that, the structure would change to dict of dicts of arrays.
            Code:
            {
              "green": {
                "A1": [1, 0.1, 2],
                "C5": [1, 0.1, 2]
              },
              "yellow": {
                "A1": [1, 0.1, 2],
                "Q9": [1, 0.1, 2]
              }
            }
            Last edited by CreepySheep; 02-04-2021, 12:42 AM.

            Comment


            • #7
              FYI I guess I found the alternative
              Use preprocessor and load message in a variable to avoid Mirth XML based treatment
              Split CSV by row "\n"
              navigate into row by row
              split rows in ";" to get col[0] value and use my algorithm to do the same job.

              The minutes jobs changed to seconds . I can still see it but much faster . Probably not optimised but it may be enough

              Thanks for all .

              Comment


              • #8
                Originally posted by guiguidusud View Post
                Use preprocessor and load message in a variable to avoid Mirth XML based treatment
                No need for that. All you need to do is switch your inbound type from Delimited to Raw if you want msg to be the string content instead of having mirth try to parse it to xml.

                Comment

                Working...
                X