[tech] robots.txt on secure.ucc

Matt Johnston matt at ucc.asn.au
Mon Jan 30 21:11:44 WST 2012


Huh, that's pretty strange googlebot behaviour, copied
below.

Also, the hg server there is intended to be used by anyone,
not just for Dropbear. Send an email to wheel at ucc with a
hg repo directory in your homedir and we'll add it.

Matt


66.249.67.105 66.249.67.105 secure.ucc.asn.au - - [29/Jan/2012:18:06:43 +0800] "GET /horde3/imp/redirect.php?Horde=o9jghg22ma665b8iqbdar5a0t4&imapuser=$(_imapuser)&pass=$(_pass)&server=$(_server)&new_lang=$(_new_lang)&url=/horde3/index.php& HTTP/1.1" 200 1442 "-" "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" "-" 1003 3429 "-" TLSv1 RC4-SHA
66.249.67.105 66.249.67.105 secure.ucc.asn.au - - [29/Jan/2012:18:09:25 +0800] "GET /horde3/imp/redirect.php?Horde=uahvjbhen3ghoe1aqoso5qo983&imapuser=$(_imapuser)&pass=$(_pass)&server=$(_server)&new_lang=$(_new_lang)&url=/horde3/index.php& HTTP/1.1" 200 1442 "-" "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" "-" 1003 3429 "-" TLSv1 RC4-SHA
66.249.67.105 66.249.67.105 secure.ucc.asn.au - - [29/Jan/2012:18:12:06 +0800] "GET /horde3/imp/redirect.php?Horde=jfd5emnovmsjs0u0ubl0d8jpu6&imapuser=$(_imapuser)&pass=$(_pass)&server=$(_server)&new_lang=$(_new_lang)&url=/horde3/index.php& HTTP/1.1" 200 1443 "-" "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" "-" 1003 3430 "-" TLSv1 RC4-SHA


On Mon, Jan 30, 2012 at 09:04:57PM +0800, Daniel Axtens wrote:
> Hi all,
> 
> I was watching apache's error.log today while debugging a php script, and realised that the google-bot was attempting to crawl our secure services. Unsurprisingly, it wasn't getting very far, but it was making for messy logs and quite severe load (one apache process was sitting at 100% trying to handle hits on all our different complicated secure services).
> 
> Interesting, there are several services on secure for which google has indexed the front page: see http://www.google.com.au/search?q=site:secure.ucc.asn.au . As it doesn't help us - or anyone else on the internet - to have these googleable, I have blocked all the webmails, the openid server and some management-y stuff. Dropbear remains untouched.
> 
> The full file, accessible at https://secure.ucc.asn.au/robots.txt , is below.
> 
> All the best,
> [DJA]
> 
> == mussel:/var/www/robots.txt ==
> User-agent: *
> # Don't allow any of our webmails
> Disallow: /horde3
> Disallow: /rcube
> Disallow: /SOGo
> 
> # No point in indexing an OpenID server, either
> Disallow: /openid
> 
> # Or any of our internal services
> Disallow: /phppgadmin
> Disallow: /glpi
> Disallow: /ocsreports
> 


More information about the tech mailing list