<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-text-html" lang="x-unicode">
      <p>if the corruption is caused by a context switch the problem can
        be caused by the kernel.<br>
        try the following and disable "CONFIG_KERNEL_MODE_NEON" <br>
        in the kernel config. this will disable some kernel crypto
        assembly code<br>
      </p>
      <div class="moz-cite-prefix">Am 24.03.2020 um 16:11 schrieb Matt
        Johnston:<br>
      </div>
      <blockquote type="cite"
        cite="mid:1E0D44CA-5341-4C3A-924E-8C6BF850B64A@ucc.asn.au">
        <div class="">Good work narrowing down a test case there.</div>
        <div class="">That's an interesting finding - I guess it might
          be worth posting on OpenWRT lists/forum to try find other
          testers.</div>
        <div class="">Could it be power related if the tight
          multiplication loop is stressing it somehow? It doesn't seem
          to be using the Neon instruction for anything apart from
          loads/stores though - is there something that the compiler
          should be doing mixing Neon and non-Neon operations?</div>
        <div class=""><br class="">
        </div>
        <div class="">
          <div>Cheers,</div>
          <div>Matt</div>
        </div>
        <div><br class="">
        </div>
        <div>(Your emails got held up being over 100kB, I've trimmed the
          reply below and let them through. Apologies to everyone for
          the stale old one that got let through with them just now, I
          wasn't looking closely)</div>
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On Tue 24/3/2020, at 11:23 am, Horshack ‪‬
              &lt;<a href="mailto:horshack@live.com" class="">horshack@live.com</a>&gt;
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div style="font-style: normal; font-variant-caps: normal;
                font-weight: normal; letter-spacing: normal; text-align:
                start; text-indent: 0px; text-transform: none;
                white-space: normal; word-spacing: 0px;
                -webkit-text-stroke-width: 0px; text-decoration: none;
                font-family: Calibri, Helvetica, sans-serif; font-size:
                12pt;" class="">I was able to isolate the issue to just
                a handful of assembly instructions within
                fast_s_mp_sqr(), related to the squaring loop. I broke
                that code out into a separate utility that reproduces
                the issue within a few seconds. The failure is somewhat
                sensitive to the data pattern and very sensitive to
                timing, indicating a likely memory/data path issue
                within my particular router. I'm guessing it's the
                IPQ8065 and not the SDRAM because I can get it to fail
                with a tiny data set easily fits within DCACHE. I can
                alter the frequency of the failure with a single ARM
                memory barrier instruction, which at first implied a
                superscalar data ordering condition but the memory
                barrier also alters the timing through the DCACHE so
                that is likely the effect it's having. I was able to
                exclude the VFP/Neon register corruption as the cause
                with some test code. I also excluded any context
                switch-speciifc issue by measuring the # of context
                switches in /proc/&lt;pid&gt;/status and catching a
                failure where no switches had occurred. I also modified
                the affinity so the utility runs on just one processor
                to rule out a specific core having the issue.<br
                  class="">
              </div>
              <div style="font-style: normal; font-variant-caps: normal;
                font-weight: normal; letter-spacing: normal; text-align:
                start; text-indent: 0px; text-transform: none;
                white-space: normal; word-spacing: 0px;
                -webkit-text-stroke-width: 0px; text-decoration: none;
                font-family: Calibri, Helvetica, sans-serif; font-size:
                12pt;" class=""><br class="">
              </div>
              <div style="font-style: normal; font-variant-caps: normal;
                font-weight: normal; letter-spacing: normal; text-align:
                start; text-indent: 0px; text-transform: none;
                white-space: normal; word-spacing: 0px;
                -webkit-text-stroke-width: 0px; text-decoration: none;
                font-family: Calibri, Helvetica, sans-serif; font-size:
                12pt;" class="">I put the source and binary of my
                utility on github - if anyone on this mailing list has
                this model router can you give it a try if possible? You
                only need the ipq8065-sqrbug (binary) and
                run-ipq8065-sqrbug.sh (script). Here's the link to the
                repository:<span class="Apple-converted-space"> </span><a
href="https://github.com/horshack-dpreview/ipq8065-sqrbug" class="">https://github.com/horshack-dpreview/ipq8065-sqrbug</a><br
                  class="">
              </div>
              <div style="font-style: normal; font-variant-caps: normal;
                font-weight: normal; letter-spacing: normal; text-align:
                start; text-indent: 0px; text-transform: none;
                white-space: normal; word-spacing: 0px;
                -webkit-text-stroke-width: 0px; text-decoration: none;
                font-family: Calibri, Helvetica, sans-serif; font-size:
                12pt;" class=""><br class="">
              </div>
              <div style="caret-color: rgb(0, 0, 0); font-family:
                Helvetica; font-size: 13px; font-style: normal;
                font-variant-caps: normal; font-weight: normal;
                letter-spacing: normal; text-align: start; text-indent:
                0px; text-transform: none; white-space: normal;
                word-spacing: 0px; -webkit-text-stroke-width: 0px;
                text-decoration: none;" class="">
                <div style="font-family: Calibri, Helvetica, sans-serif;
                  font-size: 12pt;" class=""><br class="">
                </div>
                <hr tabindex="-1" style="display: inline-block; width:
                  620.328125px;" class="">
                <div id="divRplyFwdMsg" dir="ltr" class=""><font
                    style="font-size: 11pt;" class="" face="Calibri,
                    sans-serif"><b class="">From:</b><span
                      class="Apple-converted-space"> </span>Horshack ‪‬
                    &lt;<a href="mailto:horshack@live.com" class="">horshack@live.com</a>&gt;<br
                      class="">
                    <b class="">Sent:</b><span
                      class="Apple-converted-space"> </span>Saturday,
                    March 21, 2020 7:54 AM<br class="">
                    <b class="">To:</b><span
                      class="Apple-converted-space"> </span><a
                      href="mailto:dropbear@ucc.asn.au" class="">dropbear@ucc.asn.au</a><span
                      class="Apple-converted-space"> </span>&lt;<a
                      href="mailto:dropbear@ucc.asn.au" class="">dropbear@ucc.asn.au</a>&gt;<br
                      class="">
                    <b class="">Subject:</b><span
                      class="Apple-converted-space"> </span>SSH key
                    exchange fails 30-70% of the time on Netgear X4S
                    R7800</font>
                  <div class=""> </div>
                </div>
                <div dir="auto" class="">
                  <div dir="ltr" class="">Including mailing list for my
                    last two messages below...<br class="">
                    <div dir="ltr" class=""><br class="">
                      Begin forwarded message:<br class="">
                      <br class="">
                    </div>
                    <blockquote type="cite" class="">
                      <div dir="ltr" class=""><b class="">From:</b><span
                          class="Apple-converted-space"> </span>Horshack
                        ‪‬ &lt;<a href="mailto:horshack@live.com"
                          class="">horshack@live.com</a>&gt;<br class="">
                        <b class="">Date:</b><span
                          class="Apple-converted-space"> </span>March
                        21, 2020 at 7:35:18 AM PDT<br class="">
                        <b class="">To:</b><span
                          class="Apple-converted-space"> </span>Matt
                        Johnston &lt;<a href="mailto:matt@ucc.asn.au"
                          class="">matt@ucc.asn.au</a>&gt;<br class="">
                        <b class="">Cc:</b><span
                          class="Apple-converted-space"> </span>"<a
                          href="mailto:dropbear@ucc.asn.au" class="">dropbear@ucc.asn.au</a>"
                        &lt;<a href="mailto:dropbear@ucc.asn.au"
                          class="">dropbear@ucc.asn.au</a>&gt;<br
                          class="">
                        <b class="">Subject:</b><span
                          class="Apple-converted-space"> </span><b
                          class="">Re:  SSH key exchange fails 30-70% of
                          the time on Netgear X4S R7800</b><br class="">
                        <br class="">
                      </div>
                    </blockquote>
                    <blockquote type="cite" class="">
                      <div dir="ltr" class="">
                        <div style="font-family: Calibri, Helvetica,
                          sans-serif; font-size: 12pt;" class="">Disassembly
                          of fast_s_mp_sqr() and other libtommath
                          functions reveals gcc is utilizing the arm
                          NEON SIMD instructions and registers for
                          calculations involved with libtommath's
                          mp_word scalar. Based on the 64-bit word
                          corruption I see I'm guessing the SIMD
                          registers aren't being preserved/restored
                          properly somewhere, probably during a context
                          switch, specifically s16–s31 (d8–d15, q4–q7),
                          which AAPCS says must be preserved and which I
                          see being used in the disassembly of
                          fast_s_mp_sqr(). I'lll write some test code
                          later today to see if this is the case, and if
                          so, try to track down where and why the
                          registers aren't being preserved.<br class="">
                        </div>
                        <div class="">
                          <div style="font-family: Calibri, Helvetica,
                            sans-serif; font-size: 12pt;" class=""><br
                              class="">
                          </div>
                          <hr tabindex="-1" style="display:
                            inline-block; width: 610.53125px;" class="">
                          <div id="x_divRplyFwdMsg" dir="ltr" class=""><font
                              style="font-size: 11pt;" class=""
                              face="Calibri, sans-serif"><b class="">From:</b><span
                                class="Apple-converted-space"> </span>Horshack
                              ‪‬ &lt;<a href="mailto:horshack@live.com"
                                class="">horshack@live.com</a>&gt;<br
                                class="">
                              <b class="">Sent:</b><span
                                class="Apple-converted-space"> </span>Saturday,
                              March 21, 2020 1:11 AM<br class="">
                              <b class="">To:</b><span
                                class="Apple-converted-space"> </span>Matt
                              Johnston &lt;<a
                                href="mailto:matt@ucc.asn.au" class="">matt@ucc.asn.au</a>&gt;<br
                                class="">
                              <b class="">Cc:</b><span
                                class="Apple-converted-space"> </span><a
                                href="mailto:dropbear@ucc.asn.au"
                                class="">dropbear@ucc.asn.au</a> &lt;<a
                                href="mailto:dropbear@ucc.asn.au"
                                class="">dropbear@ucc.asn.au</a>&gt;<br
                                class="">
                              <b class="">Subject:</b><span
                                class="Apple-converted-space"> </span>Re:
                              SSH key exchange fails 30-70% of the time
                              on Netgear X4S R7800</font>
                            <div class=""> </div>
                          </div>
                          <div dir="ltr" class="">
                            <div style="font-family: Calibri, Helvetica,
                              sans-serif; font-size: 12pt;" class="">
                              <div style="font-family: Calibri,
                                Helvetica, sans-serif; font-size: 12pt;"
                                class="">I have one of the failure paths
                                isolated down to a single corrupt 64-bit
                                word in memory, which required a
                                significant amount of code
                                instrumentation to achieve. I
                                implemented a code execution history
                                buffer that gets filled at various
                                checkpoints within s_mp_exptmod() and
                                some of the modules called by it. To
                                facilitate this history mechanism I
                                packaged all of s_mp_exptmod()'s local
                                variables inside a structure , which
                                consists of saving the local scalar vars
                                in addition to crc32's of all the mp_int
                                data structures with a separate crc32 of
                                the mp_int.dp payload (data). When a
                                failure occurs, ie one or more of the
                                three back-to-back debug invocations of
                                s_mp_exptmod yields a mismatching signed
                                key result, I  dump out the history
                                elements for each of the invocations to
                                determine the first code checkpoint
                                where failing invocation departed from
                                the known correct invocation.<br
                                  class="">
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
          <br class="">
        </div>
        <div>*snipped*</div>
        <div><br class="">
        </div>
        <br class="">
      </blockquote>
    </div>
  </body>
</html>