SSH key exchange fails 30-70% of the time on Netgear X4S R7800

Sebastian Gottschall s.gottschall at dd-wrt.com
Sun Mar 29 05:32:23 AWST 2020


i can exclude neon code for dd-wrt in dropbear if it helps. but would be 
greater to nail down the problem. otherwise other programms would be 
likelly affected too

Am 28.03.2020 um 21:06 schrieb Horshack ‪‬:
> As a postscript, I was able to refine the logic to produce the 
> corrupted result almost instantaneously. I'm also able to get it to 
> fail with an all-zero input dataset and a bitwise OR operation instead 
> of the original squaring multiplication operations, which allows me to 
> see what actual corrupted loads are. The result is very interesting - 
> sometimes the corrupted data is valid ARM instructions, other times 
> valid kernel-space addresses, so it seems clear this is an addressing 
> problem. Also interesting is how I'll see just one or a few corrupted 
> words, which implies the corruption is in the interface between DCACHE 
> and the processor rather than errant fetch of a line into DCACHE from 
> memory (otherwise the entire DCACHE line would hold corrupt data). You 
> can see a sample of the failure output here: 
> https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt 
> <https://github.com/horshack-dpreview/ipq8065-sqrbug/blob/master/SampleFailures.txt>
>
> Finally, to exclude any possibility the issue is related to possible 
> kernel code running and corrupting register sets/memory (such as an 
> interrupt routine), I ported the test to a kernel module and ran the 
> logic within a local_irq_disable() block, which disables both 
> preemption and interrupts on the core. Still fails. I created a 
> separate repository for the kernel module version here: 
> https://github.com/horshack-dpreview/ipq8065-sqrbug-driver 
> <https://github.com/horshack-dpreview/ipq8065-sqrbug-driver>
>
> ------------------------------------------------------------------------
> *From:* Horshack ‪‬ <horshack at live.com>
> *Sent:* Tuesday, March 24, 2020 9:25 PM
> *To:* Sebastian Gottschall <s.gottschall at dd-wrt.com>; 
> dropbear at ucc.asn.au <dropbear at ucc.asn.au>
> *Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
> X4S R7800
> I excluded context switches as a possible culprit by looping until a 
> corruption happened for which no context switches occurred while the 
> test was running (ie, at the start of the test I would save the # of 
> involuntary/voluntary context switches from /proc/<pid>/status, then 
> check those counts again after the failure - if they were different I 
> restarted the test and kept looping until a failure happened in which 
> the ctx switch counts were the same.
>
> ------------------------------------------------------------------------
> *From:* dropbear-bounces+horshack=live.com at ucc.asn.au 
> <dropbear-bounces+horshack=live.com at ucc.asn.au> on behalf of Sebastian 
> Gottschall <s.gottschall at dd-wrt.com>
> *Sent:* Tuesday, March 24, 2020 9:13 PM
> *To:* dropbear at ucc.asn.au <dropbear at ucc.asn.au>
> *Subject:* Re: SSH key exchange fails 30-70% of the time on Netgear 
> X4S R7800
>
> if the corruption is caused by a context switch the problem can be 
> caused by the kernel.
> try the following and disable "CONFIG_KERNEL_MODE_NEON"
> in the kernel config. this will disable some kernel crypto assembly code
>
> Am 24.03.2020 um 16:11 schrieb Matt Johnston:
>> Good work narrowing down a test case there.
>> That's an interesting finding - I guess it might be worth posting on 
>> OpenWRT lists/forum to try find other testers.
>> Could it be power related if the tight multiplication loop is 
>> stressing it somehow? It doesn't seem to be using the Neon 
>> instruction for anything apart from loads/stores though - is there 
>> something that the compiler should be doing mixing Neon and non-Neon 
>> operations?
>>
>> Cheers,
>> Matt
>>
>> (Your emails got held up being over 100kB, I've trimmed the reply 
>> below and let them through. Apologies to everyone for the stale old 
>> one that got let through with them just now, I wasn't looking closely)
>>
>>> On Tue 24/3/2020, at 11:23 am, Horshack ‪‬ <horshack at live.com 
>>> <mailto:horshack at live.com>> wrote:
>>>
>>> I was able to isolate the issue to just a handful of assembly 
>>> instructions within fast_s_mp_sqr(), related to the squaring loop. I 
>>> broke that code out into a separate utility that reproduces the 
>>> issue within a few seconds. The failure is somewhat sensitive to the 
>>> data pattern and very sensitive to timing, indicating a likely 
>>> memory/data path issue within my particular router. I'm guessing 
>>> it's the IPQ8065 and not the SDRAM because I can get it to fail with 
>>> a tiny data set easily fits within DCACHE. I can alter the frequency 
>>> of the failure with a single ARM memory barrier instruction, which 
>>> at first implied a superscalar data ordering condition but the 
>>> memory barrier also alters the timing through the DCACHE so that is 
>>> likely the effect it's having. I was able to exclude the VFP/Neon 
>>> register corruption as the cause with some test code. I also 
>>> excluded any context switch-speciifc issue by measuring the # of 
>>> context switches in /proc/<pid>/status and catching a failure where 
>>> no switches had occurred. I also modified the affinity so the 
>>> utility runs on just one processor to rule out a specific core 
>>> having the issue.
>>>
>>> I put the source and binary of my utility on github - if anyone on 
>>> this mailing list has this model router can you give it a try if 
>>> possible? You only need the ipq8065-sqrbug (binary) and 
>>> run-ipq8065-sqrbug.sh (script). Here's the link to the 
>>> repository:https://github.com/horshack-dpreview/ipq8065-sqrbug 
>>> <https://github.com/horshack-dpreview/ipq8065-sqrbug>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:*Horshack ‪‬ <horshack at live.com <mailto:horshack at live.com>>
>>> *Sent:*Saturday, March 21, 2020 7:54 AM
>>> *To:*dropbear at ucc.asn.au 
>>> <mailto:dropbear at ucc.asn.au><dropbear at ucc.asn.au 
>>> <mailto:dropbear at ucc.asn.au>>
>>> *Subject:*SSH key exchange fails 30-70% of the time on Netgear X4S 
>>> R7800
>>> Including mailing list for my last two messages below...
>>>
>>> Begin forwarded message:
>>>
>>>> *From:*Horshack ‪‬ <horshack at live.com <mailto:horshack at live.com>>
>>>> *Date:*March 21, 2020 at 7:35:18 AM PDT
>>>> *To:*Matt Johnston <matt at ucc.asn.au <mailto:matt at ucc.asn.au>>
>>>> *Cc:*"dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>" 
>>>> <dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>>
>>>> *Subject:**Re:  SSH key exchange fails 30-70% of the time on 
>>>> Netgear X4S R7800*
>>>>
>>>> 
>>>> Disassembly of fast_s_mp_sqr() and other libtommath functions 
>>>> reveals gcc is utilizing the arm NEON SIMD instructions and 
>>>> registers for calculations involved with libtommath's mp_word 
>>>> scalar. Based on the 64-bit word corruption I see I'm guessing the 
>>>> SIMD registers aren't being preserved/restored properly somewhere, 
>>>> probably during a context switch, specifically s16–s31 (d8–d15, 
>>>> q4–q7), which AAPCS says must be preserved and which I see being 
>>>> used in the disassembly of fast_s_mp_sqr(). I'lll write some test 
>>>> code later today to see if this is the case, and if so, try to 
>>>> track down where and why the registers aren't being preserved.
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:*Horshack ‪‬ <horshack at live.com <mailto:horshack at live.com>>
>>>> *Sent:*Saturday, March 21, 2020 1:11 AM
>>>> *To:*Matt Johnston <matt at ucc.asn.au <mailto:matt at ucc.asn.au>>
>>>> *Cc:*dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au> 
>>>> <dropbear at ucc.asn.au <mailto:dropbear at ucc.asn.au>>
>>>> *Subject:*Re: SSH key exchange fails 30-70% of the time on Netgear 
>>>> X4S R7800
>>>> I have one of the failure paths isolated down to a single corrupt 
>>>> 64-bit word in memory, which required a significant amount of code 
>>>> instrumentation to achieve. I implemented a code execution history 
>>>> buffer that gets filled at various checkpoints within 
>>>> s_mp_exptmod() and some of the modules called by it. To facilitate 
>>>> this history mechanism I packaged all of s_mp_exptmod()'s local 
>>>> variables inside a structure , which consists of saving the local 
>>>> scalar vars in addition to crc32's of all the mp_int data 
>>>> structures with a separate crc32 of the mp_int.dp payload (data). 
>>>> When a failure occurs, ie one or more of the three back-to-back 
>>>> debug invocations of s_mp_exptmod yields a mismatching signed key 
>>>> result, I  dump out the history elements for each of the 
>>>> invocations to determine the first code checkpoint where failing 
>>>> invocation departed from the known correct invocation.
>>
>> *snipped*
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ucc.gu.uwa.edu.au/pipermail/dropbear/attachments/20200328/d2548b88/attachment-0001.htm 


More information about the Dropbear mailing list