SSH key exchange fails 30-70% of the time on Netgear X4S R7800
Sebastian Gottschall
s.gottschall at dd-wrt.com
Sun Mar 29 05:32:23 AWST 2020
i can exclude neon code for dd-wrt in dropbear if it helps. but would be
greater to nail down the problem. otherwise other programms would be
likelly affected too
Am 28.03.2020 um 21:06 schrieb Horshack :
> As a postscript, I was able to refine the logic to produce the
> corrupted result almost instantaneously. I'm also able to get it to
> fail with an all-zero input dataset and a bitwise OR operation instead
> of the original squaring multiplication operations, which allows me to
> see what actual corrupted loads are. The result is very interesting -
> sometimes the corrupted data is valid ARM instructions, other times
> valid kernel-space addresses, so it seems clear this is an addressing
> problem. Also interesting is how I'll see just one or a few corrupted
> words, which implies the corruption is in the interface between DCACHE
> and the processor rather than errant fetch of a line into DCACHE from
> memory (otherwise the entire DCACHE line would hold corrupt data). You
> can see a sample of the failure output here:
> https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt
> <https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt>
>
> Finally, to exclude any possibility the issue is related to possible
> kernel code running and corrupting register sets/memory (such as an
> interrupt routine), I ported the test to a kernel module and ran the
> logic within a local_irq_disable() block, which disables both
> preemption and interrupts on the core. Still fails. I created a
> separate repository for the kernel module version here:
> https://github.com/horshack-dpreview/ipq8065-sqrbug-driver
> <https://github.com/horshack-dpreview/ipq8065-sqrbug-driver>
>
> ------------------------------------------------------------------------
> *From:* Horshack <horshack at live.com>
> *Sent:* Tuesday, March 24, 2020 9:25 PM
> *To:* Sebastian Gottschall <s.gottschall at dd-wrt.com>;
> dropbear at ucc.asn.au <dropbear at ucc.asn.au>
> *Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear
> X4S R7800
> I excluded context switches as a possible culprit by looping until a
> corruption happened for which no context switches occurred while the
> test was running (ie, at the start of the test I would save the # of
> involuntary/voluntary context switches from /proc/<pid>/status, then
> check those counts again after the failure - if they were different I
> restarted the test and kept looping until a failure happened in which
> the ctx switch counts were the same.
>
> ------------------------------------------------------------------------
> *From:* dropbear-bounces+horshack=live.com at ucc.asn.au
> <dropbear-bounces+horshack=live.com at ucc.asn.au> on behalf of Sebastian
> Gottschall <s.gottschall at dd-wrt.com>
> *Sent:* Tuesday, March 24, 2020 9:13 PM
> *To:* dropbear at ucc.asn.au <dropbear at ucc.asn.au>
> *Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear
> X4S R7800
>
> if the corruption is caused by a context switch the problem can be
> caused by the kernel.
> try the following and disable "CONFIG_KERNEL_MODE_NEON"
> in the kernel config. this will disable some kernel crypto assembly code
>
> Am 24.03.2020 um 16:11 schrieb Matt Johnston:
>> Good work narrowing down a test case there.
>> That's an interesting finding - I guess it might be worth posting on
>> OpenWRT lists/forum to try find other testers.
>> Could it be power related if the tight multiplication loop is
>> stressing it somehow? It doesn't seem to be using the Neon
>> instruction for anything apart from loads/stores though - is there
>> something that the compiler should be doing mixing Neon and non-Neon
>> operations?
>>
>> Cheers,
>> Matt
>>
>> (Your emails got held up being over 100kB, I've trimmed the reply
>> below and let them through. Apologies to everyone for the stale old
>> one that got let through with them just now, I wasn't looking closely)
>>
>>> On Tue 24/3/2020, at 11:23 am, Horshack <horshack at live.com
>>> <mailto:horshack at live.com>> wrote:
>>>
>>> I was able to isolate the issue to just a handful of assembly
>>> instructions within fast_s_mp_sqr(), related to the squaring loop. I
>>> broke that code out into a separate utility that reproduces the
>>> issue within a few seconds. The failure is somewhat sensitive to the
>>> data pattern and very sensitive to timing, indicating a likely
>>> memory/data path issue within my particular router. I'm guessing
>>> it's the IPQ8065 and not the SDRAM because I can get it to fail with
>>> a tiny data set easily fits within DCACHE. I can alter the frequency
>>> of the failure with a single ARM memory barrier instruction, which
>>> at first implied a superscalar data ordering condition but the
>>> memory barrier also alters the timing through the DCACHE so that is
>>> likely the effect it's having. I was able to exclude the VFP/Neon
>>> register corruption as the cause with some test code. I also
>>> excluded any context switch-speciifc issue by measuring the # of
>>> context switches in /proc/<pid>/status and catching a failure where
>>> no switches had occurred. I also modified the affinity so the
>>> utility runs on just one processor to rule out a specific core
>>> having the issue.
>>>
>>> I put the source and binary of my utility on github - if anyone on
>>> this mailing list has this model router can you give it a try if
>>> possible? You only need the ipq8065-sqrbug (binary) and
>>> run-ipq8065-sqrbug.sh (script). Here's the link to the
>>> repository:https://github.com/horshack-dpreview/ipq8065-sqrbug
>>> <https://github.com/horshack-dpreview/ipq8065-sqrbug>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:*Horshack <horshack at live.com <mailto:horshack at live.com>>
>>> *Sent:*Saturday, March 21, 2020 7:54 AM
>>> *To:*dropbear at ucc.asn.au
>>> <mailto:dropbear at ucc.asn.au><dropbear at ucc.asn.au
>>> <mailto:dropbear at ucc.asn.au>>
>>> *Subject:*SSH key exchange fails 30-70% of the time on Netgear X4S
>>> R7800
>>> Including mailing list for my last two messages below...
>>>
>>> Begin forwarded message:
>>>
>>>> *From:*Horshack <horshack at live.com <mailto:horshack at live.com>>
>>>> *Date:*March 21, 2020 at 7:35:18 AM PDT
>>>> *To:*Matt Johnston <matt at ucc.asn.au <mailto:matt at ucc.asn.au>>
>>>> *Cc:*"dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>"
>>>> <dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>>
>>>> *Subject:**Re: SSH key exchange fails 30-70% of the time on
>>>> Netgear X4S R7800*
>>>>
>>>>
>>>> Disassembly of fast_s_mp_sqr() and other libtommath functions
>>>> reveals gcc is utilizing the arm NEON SIMD instructions and
>>>> registers for calculations involved with libtommath's mp_word
>>>> scalar. Based on the 64-bit word corruption I see I'm guessing the
>>>> SIMD registers aren't being preserved/restored properly somewhere,
>>>> probably during a context switch, specifically s16–s31 (d8–d15,
>>>> q4–q7), which AAPCS says must be preserved and which I see being
>>>> used in the disassembly of fast_s_mp_sqr(). I'lll write some test
>>>> code later today to see if this is the case, and if so, try to
>>>> track down where and why the registers aren't being preserved.
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:*Horshack <horshack at live.com <mailto:horshack at live.com>>
>>>> *Sent:*Saturday, March 21, 2020 1:11 AM
>>>> *To:*Matt Johnston <matt at ucc.asn.au <mailto:matt at ucc.asn.au>>
>>>> *Cc:*dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>
>>>> <dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>>
>>>> *Subject:*Re: SSH key exchange fails 30-70% of the time on Netgear
>>>> X4S R7800
>>>> I have one of the failure paths isolated down to a single corrupt
>>>> 64-bit word in memory, which required a significant amount of code
>>>> instrumentation to achieve. I implemented a code execution history
>>>> buffer that gets filled at various checkpoints within
>>>> s_mp_exptmod() and some of the modules called by it. To facilitate
>>>> this history mechanism I packaged all of s_mp_exptmod()'s local
>>>> variables inside a structure , which consists of saving the local
>>>> scalar vars in addition to crc32's of all the mp_int data
>>>> structures with a separate crc32 of the mp_int.dp payload (data).
>>>> When a failure occurs, ie one or more of the three back-to-back
>>>> debug invocations of s_mp_exptmod yields a mismatching signed key
>>>> result, I dump out the history elements for each of the
>>>> invocations to determine the first code checkpoint where failing
>>>> invocation departed from the known correct invocation.
>>
>> *snipped*
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/attachments/20200328/d2548b88/attachment-0001.htm
More information about the Dropbear
mailing list