i would sugggest keeping robots.txt file just for search engines and allowing all directories for scannig..
in anyways search engins are not gonna scann dir with password protection (.htaccess)
so if you keep some directory names to disallow then it may open doors for hackers..
coz people always try to look into the closed doors
so they might try to play around with the directories u dont want SE's to access....
about ur 2nd question..
using sites like
www.whois.sc/(SiteName.com here)
*whois.webhosting.info/(SiteName.com here)
reaplace (SiteName.com here) with the site name you want to search for..
those sites will show no. of sites hosted on the same IP and even list of those sites...
cheers
Deep