scp of sysfs file repeats data

Wed Jun 24 10:23:19 AWST 2015

On 24.06.2015, at 01:49, Craig McQueen <craig.mcqueen at innerrange.com> wrote:

>> From: Peter Korsgaard [mailto:jacmet at gmail.com] On Behalf Of Peter
>> 
>>>>>>> "Peter" == Peter Korsgaard <peter at korsgaard.com> writes:
>> 
>>>>>>> "Matt" == Matt Johnston <matt at ucc.asn.au> writes:
>>>> I see what you mean. I'll update scp to OpenSSH's latest -  >> there are a
>> few changes to be merged.
>> 
>>> FYI, openssh scp has a similar problem here:
>> 
>>> open("vendor", O_RDONLY|O_NONBLOCK)     = 3
>>> fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
>>> fcntl(3, F_GETFL)                       = 0x8800 (flags
>> O_RDONLY|O_NONBLOCK|O_LARGEFILE)
>>> fcntl(3, F_SETFL, O_RDONLY|O_LARGEFILE) = 0
>>> write(6, "C0444 4096 vendor\n", 18)     = 18
>>> read(3, "0x8086\n", 4096)               = 7
>>> read(3, "", 4089)                       = 0
>>> write(6,
>> "0x8086\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
>> 4096
>> 
>>> This is with openssh 6.7p1-6.
>> 
>> But thinking further about it, I guess that's just an artifact of the protocol and
>> there isn't any way around it.
> 
> I guess the question is, should the copied file's size be that reported in the directory listing, or what is actually successfully read from the file?
> 
> If I use 'cp' to copy a sysfs file, then the copy contains the bytes actually read. E.g.:
> 
> # ls -al /sys/class/net/eth0/address 
> -r--r--r--    1 root     root          4096 Jun 23 23:43 /sys/class/net/eth0/address
> # cp /sys/class/net/eth0/address .
> # ls -al
> drwxr-xr-x    1 root     root          4096 Jun 23 23:43 .
> drwxr-xr-x    1 root     root          4096 Jun 23 06:51 ..
> -r--r--r--    1 root     root            18 Jun 23 23:43 address
> 
> So with 'cp' I get a file length of 18 (the bytes successfully read), not 4096.
> 
> As a user, I expect 'scp' to do the same thing. Does the protocol make that difficult somehow?

Well, as I understand it the file size is sent first.
If you cannot trust stat, you'd have read the whole file first and even buffer it (in case it changes - though I guess that race condition also exists with current stat usage).
I see a few things that could (and probably should) be done:
1) if it works better, use lseek to end to get file size instead of stat
2) when read fails, zero the buffer (so at least there are only extra 0s, not 'random' data)
3) if the file size is less than some limit (64kB? - would be easier to make this the same as the normal buffer size, 4kB is a bit small anyway) do not rely on stat but just read the whole file to get the size without race conditions. Could be done by just always before anything else reading 64kB and if the read is short just skip anything else.
4) print a warning when read fails (if the protocol allows it)

Sorry for the long text, especially as I have no intention to actually work on it. Hope someone is motivated to fix it :)

Regards,
Reimar