45s login delay

Magnus Nilsson man at lundinova.se
Sat Mar 19 01:04:05 WST 2011


Hi all,

@ Matt: Your patch improved it by just a second or two on my rig, so 
that isn't the bottleneck in my case.

@ Peter: I have trouble compiling your code:
In file included from fp_mul_comba.c:365:
fp_mul_comba_small_set.i: In function `fp_mul_comba_small':
fp_mul_comba_small_set.i:1225: internal error--unrecognizable insn:
(insn 91872 37787 37788 (set (mem:SI (plus:SI (plus:SI (reg:SI 11 fp)
                     (const_int -4096 [0xfffff000]))
                 (const_int -740 [0xfffffd1c])) 0)
         (reg:SI 7 r7)) -1 (nil)
     (nil))
make[1]: *** [fp_mul_comba.o] Error 1
It's likely due to my configuration, but could also be a compatibility 
issue in your code. Just thought you might want to know.

Unfortunately I'm pressed for time and must go with openssl for the time 
being. It's a monster size-wise, but at 10s it's usable (just).
If space becomes an even more of a factor, be assured however that I 
will revisit dropbear.

Thank you all for your input and efforts to help.

Kind regards/Magnus

On 2011-03-17 16:23, Matt Johnston wrote:
> On Wed, Mar 16, 2011 at 07:16:34PM -0500, Rob Landley wrote:
>> On 03/16/2011 02:25 AM, Peter Turczak wrote:
>>> Hi Magnus, hi Rob,
>>>
>>> a while ago I made the same observations you did. On an m68k-nommu
>>> with 166 MHz the RSA exchange took quite forever. After some
>>> profiling I found out the comba multiply routine in libtommath was
>>> eating most of the time. It seems gcc produces quite inefficient code
>>> there. Libtommath resizes its large integers while calculating
>>> leading to more work for user memory management.
>> User mememory management?  It's got a malloc/free in an inner loop?  BARF!
>>
>> (Yeah, that'll blow your L1 cache wide open and slow stuff down by at
>> least an order of magnitude.  Allocation functions are some of the most
>> cache unfriendly things you can do, pretty much by definition.  Unused
>> memory is not cache hot, pretty much by definition.  That's sort of the
>> point.  Copying the data sucks too, but it's doing the copying on all
>> platforms I'd guess...)
> I guess it's possible. Logging in to a server with 1024 bit
> RSA and RSA authorized_key I get ~229 reallocs in mp_grow(),
> not a massive number if spread over 45 seconds. The patch
> below drops it to ~30 reallcs.
>
> Magnus: It might be worth seeing if it changes your
> timing. I haven't looked whether it increases memory usage.
>
> Matt
>
> --- libtommath/bn_mp_exptmod_fast.c	5a692f134deeab0992612206c16f8bf970b5088c
> +++ libtommath/bn_mp_exptmod_fast.c	5391873ccf8a11171774425c69f584195b4fdba4
> @@ -67,13 +67,13 @@ int mp_exptmod_fast (mp_int * G, mp_int
>
>     /* init M array */
>     /* init first cell */
> -  if ((err = mp_init(&M[1])) != MP_OKAY) {
> +  if ((err = mp_init_size(&M[1], P->used)) != MP_OKAY) {
>        return err;
>     }
>
>     /* now init the second half of the array */
>     for (x = 1<<(winsize-1); x<  (1<<  winsize); x++) {
> -    if ((err = mp_init(&M[x])) != MP_OKAY) {
> +    if ((err = mp_init_size(&M[x], P->alloc+1)) != MP_OKAY) {
>         for (y = 1<<(winsize-1); y<  x; y++) {
>           mp_clear (&M[y]);
>         }
> @@ -96,7 +96,7 @@ int mp_exptmod_fast (mp_int * G, mp_int
>
>        /* automatically pick the comba one if available (saves quite a few calls/ifs) */
>   #ifdef BN_FAST_MP_MONTGOMERY_REDUCE_C
> -     if (((P->used * 2 + 1)<  MP_WARRAY)&&
> +     if (((P->alloc * 2 + 1)<  MP_WARRAY)&&
>             P->used<  (1<<  ((CHAR_BIT * sizeof (mp_word)) - (2 * DIGIT_BIT)))) {
>           redux = fast_mp_montgomery_reduce;
>        } else
> @@ -133,7 +133,7 @@ int mp_exptmod_fast (mp_int * G, mp_int
>     }
>
>     /* setup result */
> -  if ((err = mp_init (&res)) != MP_OKAY) {
> +  if ((err = mp_init_size (&res, P->used)) != MP_OKAY) {
>       goto LBL_M;
>     }
>
> ============================================================
> --- libtommath/bn_mp_init_copy.c	fd7c20c0ee3473615de23c59074cf5c6757a20ca
> +++ libtommath/bn_mp_init_copy.c	841949a75e387e818f2f4d9adedff0ba9c9374c0
> @@ -20,7 +20,7 @@ int mp_init_copy (mp_int * a, mp_int * b
>   {
>     int     res;
>
> -  if ((res = mp_init (a)) != MP_OKAY) {
> +  if ((res = mp_init_size (a, b->used)) != MP_OKAY) {
>       return res;
>     }
>     return mp_copy (b, a);
> ============================================================
> --- libtommath/bn_mp_mod.c	3bed12926c4d019853f2b4dac814a7505580380e
> +++ libtommath/bn_mp_mod.c	9265cd0294d2c86f1c3c73eaa5bf19c30403e13b
> @@ -22,7 +22,7 @@ mp_mod (mp_int * a, mp_int * b, mp_int *
>     mp_int  t;
>     int     res;
>
> -  if ((res = mp_init (&t)) != MP_OKAY) {
> +  if ((res = mp_init_size (&t, b->used)) != MP_OKAY) {
>       return res;
>     }
>
> ============================================================
> --- libtommath/bn_mp_mulmod.c	935d0f5903589ddf62f42fc691cb2f83aa2832c4
> +++ libtommath/bn_mp_mulmod.c	ef9063432e3a0c62b7118dfc3d01d04cd4dc8bb9
> @@ -21,7 +21,7 @@ int mp_mulmod (mp_int * a, mp_int * b, m
>     int     res;
>     mp_int  t;
>
> -  if ((res = mp_init (&t)) != MP_OKAY) {
> +  if ((res = mp_init_size (&t, c->used)) != MP_OKAY) {
>       return res;
>     }
>
>


More information about the Dropbear mailing list