<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>i can exclude neon code for dd-wrt in dropbear if it helps. but
would be greater to nail down the problem. otherwise other
programms would be likelly affected too<br>
</p>
<div class="moz-cite-prefix">Am 28.03.2020 um 21:06 schrieb Horshack
:<br>
</div>
<blockquote type="cite"
cite="mid:BY5PR13MB33304958232D0035516D7CDBA4CD0@BY5PR13MB3330.namprd13.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Calibri, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
As a postscript, I was able to refine the logic to produce the
corrupted result almost instantaneously. I'm also able to get it
to fail with an all-zero input dataset and a bitwise OR
operation instead of the original squaring multiplication
operations, which allows me to see what actual corrupted loads
are. The result is very interesting - sometimes the corrupted
data is valid ARM instructions, other times valid kernel-space
addresses, so it seems clear this is an addressing problem. Also
interesting is how I'll see just one or a few corrupted words,
which implies the corruption is in the interface between DCACHE
and the processor rather than errant fetch of a line into DCACHE
from memory (otherwise the entire DCACHE line would hold corrupt
data). You can see a sample of the failure output here: <a
href="https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt"
moz-do-not-send="true">
https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt</a><br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Finally, to exclude any possibility the issue is related to
possible kernel code running and corrupting register sets/memory
(such as an interrupt routine), I ported the test to a kernel
module and ran the logic within a local_irq_disable() block,
which disables both preemption and interrupts on the core. Still
fails. I created a separate repository for the kernel module
version here:
<a
href="https://github.com/horshack-dpreview/ipq8065-sqrbug-driver"
moz-do-not-send="true">https://github.com/horshack-dpreview/ipq8065-sqrbug-driver</a><br>
</div>
<div style="font-family: Calibri, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
Horshack <a class="moz-txt-link-rfc2396E" href="mailto:horshack@live.com"><horshack@live.com></a><br>
<b>Sent:</b> Tuesday, March 24, 2020 9:25 PM<br>
<b>To:</b> Sebastian Gottschall
<a class="moz-txt-link-rfc2396E" href="mailto:s.gottschall@dd-wrt.com"><s.gottschall@dd-wrt.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:dropbear@ucc.asn.au">dropbear@ucc.asn.au</a>
<a class="moz-txt-link-rfc2396E" href="mailto:dropbear@ucc.asn.au"><dropbear@ucc.asn.au></a><br>
<b>Subject:</b> Re: SSH key exchange fails 30-70% of the
time on Netgear X4S R7800</font>
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
I excluded context switches as a possible culprit by looping
until a corruption happened for which no context switches
occurred while the test was running (ie, at the start of the
test I would save the # of involuntary/voluntary context
switches from /proc/<pid>/status, then check those
counts again after the failure - if they were different I
restarted the test and kept looping until a failure happened
in which the ctx switch counts were the same.<br>
</div>
<div>
<div style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font
style="font-size:11pt" face="Calibri, sans-serif"
color="#000000"><b>From:</b>
<a class="moz-txt-link-abbreviated" href="mailto:dropbear-bounces+horshack=live.com@ucc.asn.au">dropbear-bounces+horshack=live.com@ucc.asn.au</a>
<a class="moz-txt-link-rfc2396E" href="mailto:dropbear-bounces+horshack=live.com@ucc.asn.au"><dropbear-bounces+horshack=live.com@ucc.asn.au></a> on
behalf of Sebastian Gottschall
<a class="moz-txt-link-rfc2396E" href="mailto:s.gottschall@dd-wrt.com"><s.gottschall@dd-wrt.com></a><br>
<b>Sent:</b> Tuesday, March 24, 2020 9:13 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:dropbear@ucc.asn.au">dropbear@ucc.asn.au</a>
<a class="moz-txt-link-rfc2396E" href="mailto:dropbear@ucc.asn.au"><dropbear@ucc.asn.au></a><br>
<b>Subject:</b> Re: SSH key exchange fails 30-70% of the
time on Netgear X4S R7800</font>
<div> </div>
</div>
<div>
<div class="x_x_moz-text-html" lang="x-unicode">
<p style="margin-top: 0px; margin-bottom: 0px;">if the
corruption is caused by a context switch the problem
can be caused by the kernel.<br>
try the following and disable
"CONFIG_KERNEL_MODE_NEON" <br>
in the kernel config. this will disable some kernel
crypto assembly code<br>
</p>
<div class="x_x_moz-cite-prefix">Am 24.03.2020 um 16:11
schrieb Matt Johnston:<br>
</div>
<blockquote type="cite">
<div class="">Good work narrowing down a test case
there.</div>
<div class="">That's an interesting finding - I guess
it might be worth posting on OpenWRT lists/forum to
try find other testers.</div>
<div class="">Could it be power related if the tight
multiplication loop is stressing it somehow? It
doesn't seem to be using the Neon instruction for
anything apart from loads/stores though - is there
something that the compiler should be doing mixing
Neon and non-Neon operations?</div>
<div class=""><br class="">
</div>
<div class="">
<div>Cheers,</div>
<div>Matt</div>
</div>
<div><br class="">
</div>
<div>(Your emails got held up being over 100kB, I've
trimmed the reply below and let them through.
Apologies to everyone for the stale old one that got
let through with them just now, I wasn't looking
closely)</div>
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Tue 24/3/2020, at 11:23 am,
Horshack <<a
href="mailto:horshack@live.com" class=""
moz-do-not-send="true">horshack@live.com</a>>
wrote:</div>
<br class="x_x_Apple-interchange-newline">
<div class="">
<div class="" style="font-style:normal;
font-variant-caps:normal; font-weight:normal;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">
I was able to isolate the issue to just a
handful of assembly instructions within
fast_s_mp_sqr(), related to the squaring loop.
I broke that code out into a separate utility
that reproduces the issue within a few
seconds. The failure is somewhat sensitive to
the data pattern and very sensitive to timing,
indicating a likely memory/data path issue
within my particular router. I'm guessing it's
the IPQ8065 and not the SDRAM because I can
get it to fail with a tiny data set easily
fits within DCACHE. I can alter the frequency
of the failure with a single ARM memory
barrier instruction, which at first implied a
superscalar data ordering condition but the
memory barrier also alters the timing through
the DCACHE so that is likely the effect it's
having. I was able to exclude the VFP/Neon
register corruption as the cause with some
test code. I also excluded any context
switch-speciifc issue by measuring the # of
context switches in /proc/<pid>/status
and catching a failure where no switches had
occurred. I also modified the affinity so the
utility runs on just one processor to rule out
a specific core having the issue.<br class="">
</div>
<div class="" style="font-style:normal;
font-variant-caps:normal; font-weight:normal;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">
<br class="">
</div>
<div class="" style="font-style:normal;
font-variant-caps:normal; font-weight:normal;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">
I put the source and binary of my utility on
github - if anyone on this mailing list has
this model router can you give it a try if
possible? You only need the ipq8065-sqrbug
(binary) and run-ipq8065-sqrbug.sh (script).
Here's the link to the repository:<span
class="x_x_Apple-converted-space"> </span><a
href="https://github.com/horshack-dpreview/ipq8065-sqrbug" class=""
moz-do-not-send="true">https://github.com/horshack-dpreview/ipq8065-sqrbug</a><br
class="">
</div>
<div class="" style="font-style:normal;
font-variant-caps:normal; font-weight:normal;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">
<br class="">
</div>
<div class="" style="font-family:Helvetica;
font-size:13px; font-style:normal;
font-variant-caps:normal; font-weight:normal;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none">
<div class=""
style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt"><br class="">
</div>
<hr tabindex="-1" class=""
style="display:inline-block;
width:620.328125px">
<div id="x_x_divRplyFwdMsg" dir="ltr" class=""><font
class="" style="font-size:11pt"
face="Calibri, sans-serif"><b class="">From:</b><span
class="x_x_Apple-converted-space"> </span>Horshack
<<a href="mailto:horshack@live.com"
class="" moz-do-not-send="true">horshack@live.com</a>><br
class="">
<b class="">Sent:</b><span
class="x_x_Apple-converted-space"> </span>Saturday,
March 21, 2020 7:54 AM<br class="">
<b class="">To:</b><span
class="x_x_Apple-converted-space"> </span><a
href="mailto:dropbear@ucc.asn.au"
class="" moz-do-not-send="true">dropbear@ucc.asn.au</a><span
class="x_x_Apple-converted-space"> </span><<a
href="mailto:dropbear@ucc.asn.au"
class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>><br
class="">
<b class="">Subject:</b><span
class="x_x_Apple-converted-space"> </span>SSH
key exchange fails 30-70% of the time on
Netgear X4S R7800</font>
<div class=""> </div>
</div>
<div dir="auto" class="">
<div dir="ltr" class="">Including mailing
list for my last two messages below...<br
class="">
<div dir="ltr" class=""><br class="">
Begin forwarded message:<br class="">
<br class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class=""><b class="">From:</b><span
class="x_x_Apple-converted-space"> </span>Horshack
<<a
href="mailto:horshack@live.com"
class="" moz-do-not-send="true">horshack@live.com</a>><br
class="">
<b class="">Date:</b><span
class="x_x_Apple-converted-space"> </span>March
21, 2020 at 7:35:18 AM PDT<br class="">
<b class="">To:</b><span
class="x_x_Apple-converted-space"> </span>Matt
Johnston <<a
href="mailto:matt@ucc.asn.au"
class="" moz-do-not-send="true">matt@ucc.asn.au</a>><br
class="">
<b class="">Cc:</b><span
class="x_x_Apple-converted-space"> </span>"<a
href="mailto:dropbear@ucc.asn.au"
class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>"
<<a
href="mailto:dropbear@ucc.asn.au"
class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>><br
class="">
<b class="">Subject:</b><span
class="x_x_Apple-converted-space"> </span><b
class="">Re: SSH key exchange fails
30-70% of the time on Netgear X4S
R7800</b><br class="">
<br class="">
</div>
</blockquote>
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class=""
style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">Disassembly of
fast_s_mp_sqr() and other libtommath
functions reveals gcc is utilizing
the arm NEON SIMD instructions and
registers for calculations involved
with libtommath's mp_word scalar.
Based on the 64-bit word corruption
I see I'm guessing the SIMD
registers aren't being
preserved/restored properly
somewhere, probably during a context
switch, specifically s16–s31
(d8–d15, q4–q7), which AAPCS says
must be preserved and which I see
being used in the disassembly of
fast_s_mp_sqr(). I'lll write some
test code later today to see if this
is the case, and if so, try to track
down where and why the registers
aren't being preserved.<br class="">
</div>
<div class="">
<div class=""
style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt"><br class="">
</div>
<hr tabindex="-1" class=""
style="display:inline-block;
width:610.53125px">
<div id="x_x_x_divRplyFwdMsg"
dir="ltr" class=""><font class=""
style="font-size:11pt"
face="Calibri, sans-serif"><b
class="">From:</b><span
class="x_x_Apple-converted-space"> </span>Horshack
<<a
href="mailto:horshack@live.com"
class=""
moz-do-not-send="true">horshack@live.com</a>><br
class="">
<b class="">Sent:</b><span
class="x_x_Apple-converted-space"> </span>Saturday,
March 21, 2020 1:11 AM<br
class="">
<b class="">To:</b><span
class="x_x_Apple-converted-space"> </span>Matt
Johnston <<a
href="mailto:matt@ucc.asn.au"
class=""
moz-do-not-send="true">matt@ucc.asn.au</a>><br
class="">
<b class="">Cc:</b><span
class="x_x_Apple-converted-space"> </span><a
href="mailto:dropbear@ucc.asn.au" class="" moz-do-not-send="true">dropbear@ucc.asn.au</a>
<<a
href="mailto:dropbear@ucc.asn.au"
class=""
moz-do-not-send="true">dropbear@ucc.asn.au</a>><br
class="">
<b class="">Subject:</b><span
class="x_x_Apple-converted-space"> </span>Re:
SSH key exchange fails 30-70% of
the time on Netgear X4S R7800</font>
<div class=""> </div>
</div>
<div dir="ltr" class="">
<div class=""
style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">
<div class=""
style="font-family:Calibri,Helvetica,sans-serif;
font-size:12pt">I have one of
the failure paths isolated
down to a single corrupt
64-bit word in memory, which
required a significant amount
of code instrumentation to
achieve. I implemented a code
execution history buffer that
gets filled at various
checkpoints within
s_mp_exptmod() and some of the
modules called by it. To
facilitate this history
mechanism I packaged all of
s_mp_exptmod()'s local
variables inside a structure ,
which consists of saving the
local scalar vars in addition
to crc32's of all the mp_int
data structures with a
separate crc32 of the
mp_int.dp payload (data). When
a failure occurs, ie one or
more of the three back-to-back
debug invocations of
s_mp_exptmod yields a
mismatching signed key result,
I dump out the history
elements for each of the
invocations to determine the
first code checkpoint where
failing invocation departed
from the known correct
invocation.<br class="">
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
<br class="">
</div>
<div>*snipped*</div>
<div><br class="">
</div>
<br class="">
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>