<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>how can you make sure that no context switch is happening if the
      kernel uses neon instructions by itself? by stopping the kernel?</p>
    <p>this is faily impossible. check if this option is on, and disable
      it to make sure that the kernel does not make use of neon
      instructions</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">Am 25.03.2020 um 05:25 schrieb Horshack
      ‪‬:<br>
    </div>
    <blockquote type="cite"
cite="mid:BY5PR13MB333045683FD93C675DD50D5DA4CE0@BY5PR13MB3330.namprd13.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;}</style>
      <div style="font-family: Calibri, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0);">
        I excluded context switches as a possible culprit by looping
        until a corruption happened for which no context switches
        occurred while the test was running (ie, at the start of the
        test I would save the # of involuntary/voluntary context
        switches from /proc/&lt;pid&gt;/status, then check those counts
        again after the failure - if they were different I restarted the
        test and kept looping until a failure happened in which the ctx
        switch counts were the same.<br>
      </div>
      <div>
        <div style="font-family:Calibri,Helvetica,sans-serif;
          font-size:12pt; color:rgb(0,0,0)">
          <br>
        </div>
        <hr tabindex="-1" style="display:inline-block; width:98%">
        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
            face="Calibri, sans-serif" color="#000000"><b>From:</b>
            <a class="moz-txt-link-abbreviated" href="mailto:dropbear-bounces+horshack=live.com@ucc.asn.au">dropbear-bounces+horshack=live.com@ucc.asn.au</a>
            <a class="moz-txt-link-rfc2396E" href="mailto:dropbear-bounces+horshack=live.com@ucc.asn.au">&lt;dropbear-bounces+horshack=live.com@ucc.asn.au&gt;</a> on
            behalf of Sebastian Gottschall
            <a class="moz-txt-link-rfc2396E" href="mailto:s.gottschall@dd-wrt.com">&lt;s.gottschall@dd-wrt.com&gt;</a><br>
            <b>Sent:</b> Tuesday, March 24, 2020 9:13 PM<br>
            <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:dropbear@ucc.asn.au">dropbear@ucc.asn.au</a> <a class="moz-txt-link-rfc2396E" href="mailto:dropbear@ucc.asn.au">&lt;dropbear@ucc.asn.au&gt;</a><br>
            <b>Subject:</b> Re: SSH key exchange fails 30-70% of the
            time on Netgear X4S R7800</font>
          <div> </div>
        </div>
        <div>
          <div class="x_moz-text-html" lang="x-unicode">
            <p>if the corruption is caused by a context switch the
              problem can be caused by the kernel.<br>
              try the following and disable "CONFIG_KERNEL_MODE_NEON" <br>
              in the kernel config. this will disable some kernel crypto
              assembly code<br>
            </p>
            <div class="x_moz-cite-prefix">Am 24.03.2020 um 16:11
              schrieb Matt Johnston:<br>
            </div>
            <blockquote type="cite">
              <div class="">Good work narrowing down a test case there.</div>
              <div class="">That's an interesting finding - I guess it
                might be worth posting on OpenWRT lists/forum to try
                find other testers.</div>
              <div class="">Could it be power related if the tight
                multiplication loop is stressing it somehow? It doesn't
                seem to be using the Neon instruction for anything apart
                from loads/stores though - is there something that the
                compiler should be doing mixing Neon and non-Neon
                operations?</div>
              <div class=""><br class="">
              </div>
              <div class="">
                <div>Cheers,</div>
                <div>Matt</div>
              </div>
              <div><br class="">
              </div>
              <div>(Your emails got held up being over 100kB, I've
                trimmed the reply below and let them through. Apologies
                to everyone for the stale old one that got let through
                with them just now, I wasn't looking closely)</div>
              <div><br class="">
                <blockquote type="cite" class="">
                  <div class="">On Tue 24/3/2020, at 11:23 am, Horshack
                    ‪‬ &lt;<a href="mailto:horshack@live.com" class=""
                      moz-do-not-send="true">horshack@live.com</a>&gt;
                    wrote:</div>
                  <br class="x_Apple-interchange-newline">
                  <div class="">
                    <div class="" style="font-style:normal;
                      font-variant-caps:normal; font-weight:normal;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
                      font-family:Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      I was able to isolate the issue to just a handful
                      of assembly instructions within fast_s_mp_sqr(),
                      related to the squaring loop. I broke that code
                      out into a separate utility that reproduces the
                      issue within a few seconds. The failure is
                      somewhat sensitive to the data pattern and very
                      sensitive to timing, indicating a likely
                      memory/data path issue within my particular
                      router. I'm guessing it's the IPQ8065 and not the
                      SDRAM because I can get it to fail with a tiny
                      data set easily fits within DCACHE. I can alter
                      the frequency of the failure with a single ARM
                      memory barrier instruction, which at first implied
                      a superscalar data ordering condition but the
                      memory barrier also alters the timing through the
                      DCACHE so that is likely the effect it's having. I
                      was able to exclude the VFP/Neon register
                      corruption as the cause with some test code. I
                      also excluded any context switch-speciifc issue by
                      measuring the # of context switches in
                      /proc/&lt;pid&gt;/status and catching a failure
                      where no switches had occurred. I also modified
                      the affinity so the utility runs on just one
                      processor to rule out a specific core having the
                      issue.<br class="">
                    </div>
                    <div class="" style="font-style:normal;
                      font-variant-caps:normal; font-weight:normal;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
                      font-family:Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br class="">
                    </div>
                    <div class="" style="font-style:normal;
                      font-variant-caps:normal; font-weight:normal;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
                      font-family:Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      I put the source and binary of my utility on
                      github - if anyone on this mailing list has this
                      model router can you give it a try if possible?
                      You only need the ipq8065-sqrbug (binary) and
                      run-ipq8065-sqrbug.sh (script). Here's the link to
                      the repository:<span
                        class="x_Apple-converted-space"> </span><a
                        href="https://github.com/horshack-dpreview/ipq8065-sqrbug"
                        class="" moz-do-not-send="true">https://github.com/horshack-dpreview/ipq8065-sqrbug</a><br
                        class="">
                    </div>
                    <div class="" style="font-style:normal;
                      font-variant-caps:normal; font-weight:normal;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
                      font-family:Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br class="">
                    </div>
                    <div class="" style="font-family:Helvetica;
                      font-size:13px; font-style:normal;
                      font-variant-caps:normal; font-weight:normal;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none">
                      <div class=""
                        style="font-family:Calibri,Helvetica,sans-serif;
                        font-size:12pt"><br class="">
                      </div>
                      <hr tabindex="-1" class=""
                        style="display:inline-block; width:620.328125px">
                      <div id="x_divRplyFwdMsg" dir="ltr" class=""><font
                          class="" style="font-size:11pt" face="Calibri,
                          sans-serif"><b class="">From:</b><span
                            class="x_Apple-converted-space"> </span>Horshack
                          ‪‬ &lt;<a href="mailto:horshack@live.com"
                            class="" moz-do-not-send="true">horshack@live.com</a>&gt;<br
                            class="">
                          <b class="">Sent:</b><span
                            class="x_Apple-converted-space"> </span>Saturday,
                          March 21, 2020 7:54 AM<br class="">
                          <b class="">To:</b><span
                            class="x_Apple-converted-space"> </span><a
                            href="mailto:dropbear@ucc.asn.au" class=""
                            moz-do-not-send="true">dropbear@ucc.asn.au</a><span
                            class="x_Apple-converted-space"> </span>&lt;<a
                            href="mailto:dropbear@ucc.asn.au" class=""
                            moz-do-not-send="true">dropbear@ucc.asn.au</a>&gt;<br
                            class="">
                          <b class="">Subject:</b><span
                            class="x_Apple-converted-space"> </span>SSH
                          key exchange fails 30-70% of the time on
                          Netgear X4S R7800</font>
                        <div class=""> </div>
                      </div>
                      <div dir="auto" class="">
                        <div dir="ltr" class="">Including mailing list
                          for my last two messages below...<br class="">
                          <div dir="ltr" class=""><br class="">
                            Begin forwarded message:<br class="">
                            <br class="">
                          </div>
                          <blockquote type="cite" class="">
                            <div dir="ltr" class=""><b class="">From:</b><span
                                class="x_Apple-converted-space"> </span>Horshack
                              ‪‬ &lt;<a href="mailto:horshack@live.com"
                                class="" moz-do-not-send="true">horshack@live.com</a>&gt;<br
                                class="">
                              <b class="">Date:</b><span
                                class="x_Apple-converted-space"> </span>March
                              21, 2020 at 7:35:18 AM PDT<br class="">
                              <b class="">To:</b><span
                                class="x_Apple-converted-space"> </span>Matt
                              Johnston &lt;<a
                                href="mailto:matt@ucc.asn.au" class=""
                                moz-do-not-send="true">matt@ucc.asn.au</a>&gt;<br
                                class="">
                              <b class="">Cc:</b><span
                                class="x_Apple-converted-space"> </span>"<a
                                href="mailto:dropbear@ucc.asn.au"
                                class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>"
                              &lt;<a href="mailto:dropbear@ucc.asn.au"
                                class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>&gt;<br
                                class="">
                              <b class="">Subject:</b><span
                                class="x_Apple-converted-space"> </span><b
                                class="">Re:  SSH key exchange fails
                                30-70% of the time on Netgear X4S R7800</b><br
                                class="">
                              <br class="">
                            </div>
                          </blockquote>
                          <blockquote type="cite" class="">
                            <div dir="ltr" class="">
                              <div class=""
                                style="font-family:Calibri,Helvetica,sans-serif;
                                font-size:12pt">Disassembly of
                                fast_s_mp_sqr() and other libtommath
                                functions reveals gcc is utilizing the
                                arm NEON SIMD instructions and registers
                                for calculations involved with
                                libtommath's mp_word scalar. Based on
                                the 64-bit word corruption I see I'm
                                guessing the SIMD registers aren't being
                                preserved/restored properly somewhere,
                                probably during a context switch,
                                specifically s16–s31 (d8–d15, q4–q7),
                                which AAPCS says must be preserved and
                                which I see being used in the
                                disassembly of fast_s_mp_sqr(). I'lll
                                write some test code later today to see
                                if this is the case, and if so, try to
                                track down where and why the registers
                                aren't being preserved.<br class="">
                              </div>
                              <div class="">
                                <div class=""
                                  style="font-family:Calibri,Helvetica,sans-serif;
                                  font-size:12pt"><br class="">
                                </div>
                                <hr tabindex="-1" class=""
                                  style="display:inline-block;
                                  width:610.53125px">
                                <div id="x_x_divRplyFwdMsg" dir="ltr"
                                  class=""><font class=""
                                    style="font-size:11pt"
                                    face="Calibri, sans-serif"><b
                                      class="">From:</b><span
                                      class="x_Apple-converted-space"> </span>Horshack
                                    ‪‬ &lt;<a
                                      href="mailto:horshack@live.com"
                                      class="" moz-do-not-send="true">horshack@live.com</a>&gt;<br
                                      class="">
                                    <b class="">Sent:</b><span
                                      class="x_Apple-converted-space"> </span>Saturday,
                                    March 21, 2020 1:11 AM<br class="">
                                    <b class="">To:</b><span
                                      class="x_Apple-converted-space"> </span>Matt
                                    Johnston &lt;<a
                                      href="mailto:matt@ucc.asn.au"
                                      class="" moz-do-not-send="true">matt@ucc.asn.au</a>&gt;<br
                                      class="">
                                    <b class="">Cc:</b><span
                                      class="x_Apple-converted-space"> </span><a
                                      href="mailto:dropbear@ucc.asn.au"
                                      class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>
                                    &lt;<a
                                      href="mailto:dropbear@ucc.asn.au"
                                      class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>&gt;<br
                                      class="">
                                    <b class="">Subject:</b><span
                                      class="x_Apple-converted-space"> </span>Re:
                                    SSH key exchange fails 30-70% of the
                                    time on Netgear X4S R7800</font>
                                  <div class=""> </div>
                                </div>
                                <div dir="ltr" class="">
                                  <div class=""
                                    style="font-family:Calibri,Helvetica,sans-serif;
                                    font-size:12pt">
                                    <div class=""
                                      style="font-family:Calibri,Helvetica,sans-serif;
                                      font-size:12pt">I have one of the
                                      failure paths isolated down to a
                                      single corrupt 64-bit word in
                                      memory, which required a
                                      significant amount of code
                                      instrumentation to achieve. I
                                      implemented a code execution
                                      history buffer that gets filled at
                                      various checkpoints within
                                      s_mp_exptmod() and some of the
                                      modules called by it. To
                                      facilitate this history mechanism
                                      I packaged all of s_mp_exptmod()'s
                                      local variables inside a structure
                                      , which consists of saving the
                                      local scalar vars in addition to
                                      crc32's of all the mp_int data
                                      structures with a separate crc32
                                      of the mp_int.dp payload (data).
                                      When a failure occurs, ie one or
                                      more of the three back-to-back
                                      debug invocations of s_mp_exptmod
                                      yields a mismatching signed key
                                      result, I  dump out the history
                                      elements for each of the
                                      invocations to determine the first
                                      code checkpoint where failing
                                      invocation departed from the known
                                      correct invocation.<br class="">
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
                <br class="">
              </div>
              <div>*snipped*</div>
              <div><br class="">
              </div>
              <br class="">
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>