Announcement

Collapse
No announcement yet.

Mirth is unresponsive with CPU spikes at 90% or more every 7 days (Saturday)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mirth is unresponsive with CPU spikes at 90% or more every 7 days (Saturday)

    For the last couple of months we are seeing that Mirth becomes unresponsive every third Saturday and does not process messages. During this time the CPU usage is way above 90%. Interestingly, the CPU spikes(>90%) happens every Saturday but mirth stops functioning every third Saturday.

    Setup: Two Mirth Connect(v3.8.1) servers connecting to single Azure server for MySQL(v5.7) database. Note both these servers are running on separate VMs. Also both VMs are part of Azure VMSS, and are load balanced by Azure Application Gateway.

    Configuration: Each mirth instance consist of two channels. The HTTP Listener channel reads the message and passes it to other channel. The channel reader channel reads the message and send the message on ActiveMQ. Note that both channels operate with 20 processing threads. Message meta data pruning is set to 30 days and messages are deleted when metadata is pruned. We have daily input of around ~35k messages.

    Analysis so far: On restart everything works fine. Also we didn't find anything in logs when the CPU spikes over 90%.

    Questions:
    - Any possible reasons for this behavior ?
    - Any missing setting or configuration ?
    - Can we change logging level to capture these failures ?
    - Also how long does the Mirth server waits before timing out while connecting to database or in transaction ? any default re-trys or time ?

  • #2
    Anyone ? Any ideas ?

    Comment


    • #3
      Can you be more specific about which process is causing the spike and on which server(s)?

      Is the mirth process itself spiking the cpu? On both instances? Is the CPU spiking on the database server? What operating system are you running on these machines?

      Comment


      • #4
        The CPU spikes on mirth VMs and no other process is running on that VM. We are doing this on a Linux VM

        Comment


        • #5
          I'm not sure what could be causing it unless your message load spikes on Saturdays or you have things scheduled that only run on Saturday.

          You probably need to identify which mirth threads have high cpu usage. I can't find any other forum threads on this topic, but this is how you do it: https://backstage.forgerock.com/know...icle/a39551500

          Mirth does a pretty good job naming its threads, so hopefully you'll be able to determine where the problem areas are.

          Comment


          • #6
            No there are quite a less load on Saturday compared to week days. Also there is nothing scheduled or running during this time.

            With respect to which process utilizing the CPU, I did notice that "Java" is the one utilizing most of CPU during these Spikes using the TOP command. But thanks for the link.. It would be helpful debugging further. Note - we have configured the java memory to 4GB.

            Also I found the below log once during the spikes. Can you help with this ? meaning when does exactly mirth tries to lock or which scenario this could be ?

            Caused by: com.mirth.connect.donkey.server.data.DonkeyDaoExce ption: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollb ackException: Lock wait timeout exceeded; try restarting transaction
            at com.mirth.connect.donkey.server.data.jdbc.JdbcDao. getNextMessageId(JdbcDao.java:1447)
            at com.mirth.connect.donkey.server.data.buffered.Buff eredDao.getNextMessageId(BufferedDao.java:414)
            at com.mirth.connect.donkey.server.channel.Channel.cr eateAndStoreSourceMessage(Channel.java:1356)
            at com.mirth.connect.donkey.server.channel.Channel.di spatchRawMessage(Channel.java:1211)
            ... 33 more
            Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollb ackException: Lock wait timeout exceeded; try restarting transaction
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLExc eption(SQLError.java:123)
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLExc eption(SQLError.java:97)
            at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping. translateException(SQLExceptionsMapping.java:122)
            at com.mysql.cj.jdbc.StatementImpl.executeQuery(State mentImpl.java:1218)
            at com.zaxxer.hikari.pool.ProxyStatement.executeQuery (ProxyStatement.java:111)
            at com.zaxxer.hikari.pool.HikariProxyStatement.execut eQuery(HikariProxyStatement.java)
            at com.mirth.connect.donkey.server.data.jdbc.JdbcDao. getNextMessageId(JdbcDao.java:1436)
            ... 36 more
            During this logs, I found that MySQL instance was unavailable but Microsoft confirmed that it was available...


            Any more thoughts ? ideas ?

            Comment


            • #7
              Lots of factors to consider I think.

              What are you doing outside of Mirth to determine current MySQL load? Are backups happening at that time? Is there some job outside of Mirth blocking Mirth? I'd be looking closely at MySQL at the moment the spikes start.
              Mirth 3.8.0 / PostgreSQL 11 / Ubuntu 18.04
              Diridium Technologies, Inc.
              https://diridium.com

              Comment


              • #8
                Originally posted by darshan View Post
                During this logs, I found that MySQL instance was unavailable but Microsoft confirmed that it was available...


                Any more thoughts ? ideas ?
                A lockwait timeout doesn't mean the server was unavailable. It means that a lock couldn't be acquired (likely because the server was busy doing something else that was already holding a lock.)

                I think a thread dump could be very beneficial in figuring out what mirth is actually doing at that time. Not sure what else to do until you have that information.

                Comment


                • #9
                  1. I will try and get the thread dumps..

                  2. I checked and as per MS, it seems that the Azure MySQL full backups are taken every 7 days. So this might be the operation which are holding locks on data and Mirth is trying to acquire them..and in this time the CPU spikes as threads are waiting for MySQL... If this is the scenario then Mirth should be unresponsive every week and not every third week..

                  So even if we consider that above scenario is true then we still have some unknown and also I believe that Mirth should have handled it more appropriately.

                  Is there any configuration in Mirth to handle such scenarios ? If not I think reducing the number parallel threads would at least reduce the total wait on MySQL...

                  Comment


                  • #10
                    After detailed investigation we found that it was Nexpose cause the issue which made the CPU to spike. See details below -

                    Nexpose does a security scan for us and when the scan run we saw around ~2500 malicious login attempts to mirth due to which the CPU spikes.

                    Now the question I have is, why can't mirth handle this security scans?

                    I believe this is due to executable jar of mirth-cli which is causing the issue or the client APIs. Is there any settings to disable APIs or CLI ? or a switch to disable admin console ?

                    Comment


                    • #11
                      I would say a combination of firewall and switching to a non-default port is probably your best bet.

                      Is your server's admin port publicly exposed or is the scan coming from an internal network?

                      Comment


                      • #12
                        So the scan happens on individual VM level so firewall won't work here.

                        Not sure what you mean changing by ports.. Currently ports are at default values. Also admin console is not exposed publicly.

                        I am not sure but my guess is Nexpose is trying to login using the executable jar of mirth-cli

                        Comment


                        • #13
                          Are you saying the scan/pentest software actually runs on the local VM? Or the scan is targeting the local VM?
                          Mirth 3.8.0 / PostgreSQL 11 / Ubuntu 18.04
                          Diridium Technologies, Inc.
                          https://diridium.com

                          Comment


                          • #14
                            Yes it's running on local VM.

                            The login events have the same IP on which the mirth is installed so yess the scan are running on the same VM

                            Comment


                            • #15
                              That's a tough one. Every system has some threshold where a denial of service attack will in fact stop a service from responding. Short of an IDS/IPS on a system that actively prevents that kind of compromise and/or your company weighing the costs to prevent such an attack, I'd say tell them stop messing with your system with this particular test.
                              Mirth 3.8.0 / PostgreSQL 11 / Ubuntu 18.04
                              Diridium Technologies, Inc.
                              https://diridium.com

                              Comment

                              Working...
                              X