information gathering using google - semantic scholarthis paper discusses ways to exploit google to...

24
Information Gathering Using Google Lih Wern Wong School of Computer and Information Science, Edith Cowan University [email protected] Abstract Google is a powerful search engine. However, by combining Google features and creativity in construction query, it will return sensitive information that usually would not be found by casual users. Attacker could use Google to look for vulnerable targets and passively gather information about their targets to assist further attacks. This paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack. The ideas discussed are applicable to other search engines as well. Keywords: Google, Google hacking, information gathering, penetration testing INTRODUCTION Google is the most widely used and powerful search engine. Lots of users are unaware that they are actually exposing far more information on the Internet than they wanted. Users who are able to construct the accurate query will be able to find the exact information they desire. Unfortunately, Google has been exploited by attackers for malicious purposes, to find vulnerable systems, passwords, other sensitive information and far more systems information than they need to know. Google can be used as an information gathering tools to profile targets. Though tools like Nessus and nmap are much more capable of scanning websites for vulnerabilities, the use of such tools can be detected and they create lots of “noise” which usually will alert the administrator (Mowse, 2003). By employing Google, an attacker can much more silently scan their targets for some of the vulnerabilities. Since Google has been constantly crawling the Internet for websites and indexed them in Google’s enormous database, it speeds up attacker vulnerability scanning process. Though Google is used solely in this paper, the ideas discussed are applicable to other search engines as well. PROFILING A PERSON This section focuses more on how to gather information about a particular person for general reconnaissance, social engineering (e.g. deceives or talks bank customers into revealing password) or other criminal acts. Personal Webpage and Blog In order to get a better understanding of a target, combining target name or email with words like homepage, blogs and family could point attacker to more information about the target. Driven by self- importance and vanity, a lot of individuals setup their own personal webpage or blog (a web version personal journal). Blogging is gaining huge popularity as users share their daily routines, thoughts and opinion on various matters. Through such sites, people have unwittingly released information including personal opinions, interests, dislike, job particulars and contact information (Granneman, 2004). If personal photos were posted on these sites, it allows attacker to identify the actual victim or victim’s friends and associates. Using all this accurate information with matching photo at hand, attacker could easily socially engineer the victim. The attacker could strike up a conversation using

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Information Gathering Using Google

Lih Wern WongSchool of Computer and Information Science, Edith Cowan University

[email protected]

Abstract

Google is a powerful search engine. However, by combining Google features and creativity in

construction query, it will return sensitive information that usually would not be found by casual users.

Attacker could use Google to look for vulnerable targets and passively gather information about their

targets to assist further attacks. This paper discusses ways to exploit Google to obtain valuable

information and how it can be used by attackers to perform attack. The ideas discussed are applicable

to other search engines as well.

Keywords:

Google, Google hacking, information gathering, penetration testing

INTRODUCTION

Google is the most widely used and powerful search engine. Lots of users are unaware that they areactually exposing far more information on the Internet than they wanted. Users who are able toconstruct the accurate query will be able to find the exact information they desire. Unfortunately,Google has been exploited by attackers for malicious purposes, to find vulnerable systems, passwords,other sensitive information and far more systems information than they need to know. Google can beused as an information gathering tools to profile targets.

Though tools like Nessus and nmap are much more capable of scanning websites for vulnerabilities,the use of such tools can be detected and they create lots of “noise” which usually will alert theadministrator (Mowse, 2003). By employing Google, an attacker can much more silently scan theirtargets for some of the vulnerabilities. Since Google has been constantly crawling the Internet forwebsites and indexed them in Google’s enormous database, it speeds up attacker vulnerabilityscanning process. Though Google is used solely in this paper, the ideas discussed are applicable toother search engines as well.

PROFILING A PERSON

This section focuses more on how to gather information about a particular person for generalreconnaissance, social engineering (e.g. deceives or talks bank customers into revealing password) orother criminal acts.

Personal Webpage and Blog

In order to get a better understanding of a target, combining target name or email with words like

homepage, blogs and family could point attacker to more information about the target. Driven by self-importance and vanity, a lot of individuals setup their own personal webpage or blog (a web versionpersonal journal). Blogging is gaining huge popularity as users share their daily routines, thoughts andopinion on various matters. Through such sites, people have unwittingly released informationincluding personal opinions, interests, dislike, job particulars and contact information (Granneman,2004). If personal photos were posted on these sites, it allows attacker to identify the actual victim orvictim’s friends and associates. Using all this accurate information with matching photo at hand,attacker could easily socially engineer the victim. The attacker could strike up a conversation using

Page 2: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

recent topics posted by the victim to initially gain trust and later persuade the victim into revealingdesired information. Personal webpage and blog are highly resourceful and reliable source to profile aperson.

Web-based Message Groups

People join groups in Yahoo! or Google Groups which they have interest in. Google Groups is Usenetarchives that enable users to access Usenet posts data since 1995 (Google, 2003). By searching anindividual screen name and checking their profile, attacker could potentially figure out their interestsor the kind of groups they most likely will join (Long, 2005, p. 141). Attacker could join a particulargroup and check their message archive for useful information about target themselves, which is usefulfor social engineering. Sometimes, even a group description is enough to determine the group contextwithout actually joining the group.

Groups like computer related groups could reveals some details on what projects an organization iscurrently engaged in or the type of hardware of software solution used. A software developmentorganization employees (using organization email to correspond) may post some questions related toprogramming problems the employee faced on some of these software development groups. If theemployee uses an organization email to correspond, attacker could get a rough idea of the ongoingprojects in that organization. Even if actual name is only used to correspond, attacker could possiblefind out their affiliated organization through sites like blog. Furthermore, if a system administrator isseeking help on solving networks issues, attacker would know which organization is having possibleexploitable holes. Attacker could also actively engage in the group to “help” the victim with theproblem, deceiving the victim into revealing more information.

Resume

Resume or curriculum vitae mostly contain accurate and current particulars of an individual. They areusually displayed in personal websites. It is a very reliable and favorable source that attacker cancount on when profiling a person. Its previous employment section gives attacker another approach toadvance the social engineering process. Attacker could impersonate a future employer/head hunteragent and call up the victim to find out more background information about the “candidate”. At suchtime, the victim will most probably give out accurate information to convince the attacker into hiring

the victim. Query "phone * * *" "address *" "e-mail" intitle:”curriculum vitae” would return positiveresumes (Davies, 2004).

Page 3: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 1.1: Curriculum Vitae (Curriculum Vitae, n.d.)

PROFILING A TARGET ORGANIZATION

If attacker has a fixed target, Google could assist in finding information that is useful for socialengineering or physical breaches. Most of this information is publicly accessible for their employees’convenience, while some of them can be very informative.

Intranet, Human Resources and Help Desk

Many organizations have an intranet which contains information that should only be accessible byemployees. For organization convenience, the intranet may contain human resources information (e.g.departmental contacts), policy and procedures, help desk information. Though intranet is supposedly

private, somehow such sites are still accessible to public by searching intitle:intranet inurl:intranet

“human resources”. Substitute human resources with words like help.desk, IT department foradditional information. This information includes name of individual in-charge, their position andcontacts are very helpful for social engineering, as shown in Figure 2.1, with helpful links to Contacts,Help Desk and Policies. By skimming through the policies, which may usually include operationprocedures, attacker could roughly know how the organization operates.

Page 4: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 2.1: Intranet (CSD Contacts, 2005)

Self-help Guides

Some organizations provide guided help for troubleshooting or installation that could be tooinformative. Attacker could learn their configuration details and technology involved which are useful

in later attacking phase. A search “how to” network setup dhcp server (“help desk” | helpdesk) showsa “how to” guide on network setup, as shown in Figure 2.2 (Long 2005, p. 124).

Information that is beneficial in Figure 2.2 includes proxy address and port number, workgroup name(i.e. DIS-STUDENT), email information and configurations (i.e. web-based and MS Outlook support),and additional server names (ie. dis.unimelb.edu.au, unimelb.edu.au.). Attacker can use thisinformation within the organization networks.

Page 5: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 2.2: Informative Self-help Guide (DIS, n.d.)

Jobs Postings

Recruitment section of organization website could easily being disregarded as a source of information.However, it reveals information regarding information technology in use and corporate structure. Itreveals operating system, software used, network type and server type. It also shows various corporatedepartments with their respective vacant positions with job description. Attacker could perform aphysical breach by impersonating a new employee taking up a new position, pretending it is his firstday at work and ask for access control. Attack will find information regarding jobs vacancies of an

organization, by combining operator site and employment | job | recruitment.

Figure 2.3: Job Postings Reveal Information (Employment, 2005)

Page 6: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 2.3 shows that the technology this organization uses most probably includes .NET Frameworksapplications, Oracle database, Veritas application, various OS (e.g. Windows 2003, Linux) and IBMAS/400 server. Attacker could then set his path right, focus his attacks on such technology weakness.

Google Local

Part of the social engineering process includes eavesdropping on conversations, people watching,engage in friendly target employee conversation. Google Local can be very helpful in locatingemployee favorite hang out places such as coffee shops, restaurants, grocery stores and pubs toeavesdrop, chat with employees or update with corporate gossip. Attacker could pose as an interestedjob applicant, and engage in a conversation with a target organization IT personnel, which couldpotentially reveal information regarding operating system, version, patch levels and application in use(Cole, 2003) in the target organization. Google Local (http://local.google.com/ ) (Currently GoogleLocal only works in US, Canada and UK) allows attacker to find any business type in a targetorganization surrounding, with detailed map to locate the place, shown in Figure 2.4.

Figure 2.4: Google Local (Google Local, 2005)

Link Mapping

Links in an organization website can reveal non-obvious relationship between the linked organizations.Attacker could attack a poorly secured partner site and subvert the trust relationship between the twodomains to compromise the much better secured target. BiLE of www.sensepost.com (BiLE, 2003) isan automated tool, capable of revealing such hidden relationships based on complex calculations andpredefined rules. For instance, a link from a target site weighs more than a link to a target site

(SensePost, 2003, p. 9). Though Google link operator is only capable of showing sites that links to aparticular given URL in the query, it is used intensively to assist such footprinting process in BiLE. Itis a subtle way to learn the possible relationships of an organization, over which an organization haveno control.

Page 7: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack
Page 8: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

PROFILING WEB SERVER AND WEB APPLICATION, AND LOCATINGLOGIN PORTALS

Attackers can use Google to profile web servers and the web applications the server is running on,before attacking potential vulnerable machines with vulnerable version application. Login portalsprovide “front-door” access to the target which is a helpful start for attacker.

Server Versioning

Server tag at the bottom of directory listing page provides useful information to determine the type ofweb server application and version that is running on the website, as shown in Figure 3.1. Attacker

who wants to exploit the vulnerabilities of say Apache 1.3.28, can run a search on “server at”

“Apache/1.3.28” to locate potential vulnerable machines. Query “Microsoft-IIS/6.0 server at” willlocate website running Microsoft IIS server 6.0. This is a fairly easy way to determine the serverversion. Though a vulnerable version does not guarantee a possible flaw as it may have been patched.However, if directory listing is allowed, it could suggest that the administrator is not concerned withthe server security and there is a possibility that the server is not fully patched.

Figure 3.1: Server Tag Reveals Server Version (Kernelnewbies, n.d.)

Web Application Error Messages

Error message generated by application installed in web server can reveal information about the server

and applications that reside on the server. Query “ASP.NET_SessionId”“data source=” "Application

key" reveal sites with ASP.NET application state dump, which contains a great deal of informationregarding the web application and applications that resides on the server, such as database connectionstring and application path in web server, as shown in Figure 3.2. The connection string itself providesvaluable information including database type, database name, username and password to connect to thedatabase. Thus, it would be easy for the attacker to connect to the database server to manipulate thecontents. After all, if such dump files are crawled by Google, the attacker is convinced that the webserver is probably not secure.

Page 9: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 3.2: ASP.NET Application State Dump (Broward County, 2005)

PHP error message can be revealed using query intext:”Warning: Failed opening” include_path,which help attacker to characterize the web server, as shown in Figure 3.3. The error message alsoexposes actual server path, web path and related PHP filenames. Attacker could try to traverse on theseactual file paths to look for potential valuable information. Poor programming practice and lack ofcomprehensive testing have caused such error messages to exist or not being caught by existing errorchecking mechanism.

Figure 3.3: PHP Error Message (RightVision, n.d.)

Page 10: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Default Pages and Documentations

Most web applications or web servers have default or test pages which enable administrators tovalidate that the application is successfully installed. Poor configuration has left such pages beingcrawled by Google. The mere existence of default pages even in Google cache will help attacker

profiling process. Figure 3.4 shows Apache installation using query intitle:Test Page.for.Apache

seeing.this.instead. Besides, different range of server version has disparate default pages (Long, 2004).Figure 3.4 shows Apache 1.3.11-1.3.31 installation as oppose to Figure 3.5 Apache 1.3.0-1.3.9

installation, found using query intitle:Test.Page.for.Apache “It worked!” “this Web site!”.

Attacker can use manuals or documentations that are usually shipped with web server applications to

profile the web servers, though not as accurate as default pages. Query intitle:“Apache 1.3

documentation” or intitle:“Apache 2.0 documentation” will find respective range of Apache servers

(Racerx, 2005). Figure 3.6 shows IIS 5.1 release notes, using query inurl:iishelp core. The mereexistence of default pages and documentations could signify careless administrator, which meanspotential vulnerable sites. These two techniques give attacker another approach to identity web serverversions.

Figure 3.4: Apache 1.3.11-1.3.31 Installation (Lanalana, n.d.)

Page 11: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 3.5: Apache 1.3.0-1.3.9 Installation (Mvacs, n.d.)

Figure 3.6: IIS Default Documentation (IIS 5.1, 2001)

Locating Login Pages

Web application login pages, such as the one shown in Figure 3.7, found using query

allinurl:”exchange/logon.asp”, allows attacker to profile the applications that reside on the web serverand they act as a break-in channel. In this case, the site even specifies the way the username isconstructed (i.e. Note: In most cases, your username…) and the software version and patch level (i.e.5.5 SP4). Attacker could find exploits related to the specific application version to compromise it.Besides, the login page is a default page, it could implicitly indicate that the website administrator isunskillful and the security of this site is probably weak (Long, 2005, p. 251). Administrator shouldcustomize the login page so that it does not indicate the actual application in used. Way of finding

generic login pages includes inurl:/admin/login.asp, “please log in” or inurl:login.php. Attacker coulduse the login pages to brute force or dictionary attack a range of passwords with the respectiveusernames.

Page 12: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 3.7: Microsoft Outlook Web Access Login Portal (Myers, 2005)

FINDING EXPLOITS AND VULNERABLE TARGETS USING VULNERABLEAPPLICATIONS COMMON WORDS

If an attacker intends to attack any vulnerable targets without a specific one, Google is highly effectivein finding such vulnerable targets. Attacker can first use Google to dig up exploit codes written byhackers to facilitate exploitation on vulnerable targets. Subsequently, attacker can search vulnerabletargets through flawed applications commonly displayed words.

Attacker can rely on Google to search for exploit codes that are posted on public sites or in hackingcommunity sites. To retrieve this large number of exploits, usually written in C language, use query

filetype:c exploit. However, some exploits are shown in other view format such as txt, html or php.Thus, to effectively locate such exploits, attacker will search for common code strings inside the

exploit codes, such as main or #include <stdio.h> which is commonly included in C programs to

reference standard input/output library (ComSec, 2003). Regardless of file extension, query “#include

<stdio.h>” main exploit will produce sites with exploit codes.

Attacker can use source code of a vulnerable application to construct an effective based query tosearch for vulnerable targets. Attacker could visit security advisory websites to learn 3rd partyapplication used in web applications that have security vulnerabilities. Most of the sites that use such3rd party components have the phrase “Powered by”, follow by component name and version. For

instance, query “Powered by CubeCard 2.0.1” will locate websites using CubeCard 2.0.1, which isvulnerable to SQL injection and cross-site scripting (Secunia, 2005). Another example to look for

vulnerable targets is allinurl:/CuteNews/show_archives.php, as CuteNews show_archives parameter issusceptible to cross-site scripting (Mohanty, 2005).

Page 13: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

To get an idea on how to produce an accurate query to locate the vulnerable websites, attacker canlearn the common display words of sites using the 3rd party web components by checking thecomponents source code (Long, 2005, p. 185). If the source code is not available, attacker coulddirectly install the vulnerable components to learn their common sign. Large amount of sites that uses3rd party applications/components left these trials on their sites. There is a fair chances that attackercould locate lots of sites using unpatched vulnerable 3rd party applications.

FINDING USERNAME, PASSWORD, SENSITIVE INFORMATION

Usernames and passwords are used by most authentication mechanism which attackers are very keenin hunting them. Google can also be used to unveil highly sensitive information such as credit cardsnumbers.

Finding Username

Knowing a user’s username means attacker has solved half of the puzzle in breaking in. Attacker coulduse username to socially engineer the help desk to reveal the matching password. Basically, a generic

query of finding username could be inurl:admin inurl:userlist. Alternatively, try

inurl:root.asp?acs=anon to locate Microsoft Outlook Web Access Address Book (Chambet, 2004). Itcontains a public accessible address book with staff contacts, as shown in Figure 5.1. The “Alias”column is most likely staff username used on the organization login. Attacker can randomly submitany common starting letters in names to harvest almost the entire entries in the address book. Somesites even show how a username is usually created (e.g. append first letter of your last name to yourfirst name).

Figure 5.1: Outlook Public Address Book (SPC, n.d.)

Finding Password

Host with Microsoft Frontpage Extension installed can be searched for username and password using

“# -FrontPage-” inurl:service.pwd. Although the password is encrypted using DES encryption,

Page 14: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

attacker could run tool like John the Ripper to decrypt the encrypted form password, as seen in Figure5.2 (ComSec, 2003). In addition, MySQL database credential information could potentially be stored

in connect.inc (Google Hacking, n.d.). Figure 5.3 shows the result of searching intitle:”index of”

intext:connect.inc. However, finding password in Google does not actually yield much positive result.Most passwords found are no longer valid. Most passwords found are usually stored in configurationor log files in an unencrypted or weak encrypted format.

Figure 5.2: FrontPage Extension Usernames and Passwords (Heyerlist, n.d.)

Figure 5.3: MySQL Database Credential (Central College, n.d.)

Other Valuable Information

There is much more valuable information that attackers can obtain through Google search, for instancecredit card numbers. Most of these highly valuable numbers are released by attacker who deceivesunwitting users into submitting personal information through phishing, not so much of a leak from e-commerce sites (Leyden, 2005).

National identification number like Social Security Number (SSN) could also be located using Google.The fact that some educational institutes use SSN for student identification has threatened students’privacy, exposing them to possible identity theft. There are usually posted alongside with associatednames and grades, and exposed in public networks, as shown in Figure 5.4 (edited), found using query

SSN “772-55”. Besides, some organization announces competition winners name with nationalidentification number (“IC” in this example) in their websites, as shown in Figure 5.5 (edited), found

using query IC "820508-*-*". Once the attacker knows the format of such numbers, it is trivial for theattacker to find them. Such numbers can be used to perform identify theft, for instance to apply creditcard or driving licence.

Page 15: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 5.4: Google Uncovers Social Security Number (Rutgers, 2005)

Figure 5.5: National Identification Number Exposed in Competition Winners List (Maybank, 2005)

FINDING FILES

Files and database contains information that attackers can use to accomplish their distinct objectives.Google can be used to locate files and contents inside these files. This section will focus on ways tolocate configuration files, log files, office documents and databases, since they usually containsensitive information.

Google Cache

Google cache can be very helpful to ordinary users as well as attacker. Each time Google crawls apage, it stores a copy of the page as cache in Google own servers. Thus, users can always access thedocument even though the live page has been removed. Unfortunately, attacker can take advantage of

Page 16: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

this feature to grab sensitive information that has been removed from the hosting server. Additionally,attacker could achieve anonymity by accessing a page cache version, as the data are retrieved fromGoogle server, which act like a proxy and not from the actual server. However, this is only true if thestripped or text-only cache is retrieved (Greene, 2001). Other non-text objects like images in cachepages are still retrieved from the actual server.

Configuration Files

Configuration files provide program settings information on how applications or networks areconfigured to operate, which are very helpful pieces of information to attacker. Figure 6.1 shows result

sought using query filetype:ini inurl:ws_ftp.ini . It locates WS_FTP application configuration fileswhich contain FTP server information on username, password, directory and host name. The poorlyencrypted password shown can easily be decrypted using free tools (Ipswitch, 1996).

Sometimes, Google returns vast amounts of results which require further refinement. Ways to filter theresults includes (Long, 2004)

•Create unique base words or phrases base on actual file.

•Filter out words like test, samples, how-to, and tutorial to exclude the example files.

•Locates and filter out commonly changed values like “yourservername“, “yourpassword” in

the sample configuration files.

Figure 6.1: WS_FTP Configuration File (Ipswitch, n.d.)

Office Documents

Office documents include word processing documents, spreadsheets, Microsoft PowerPoint, MicrosoftAccess and Adobe Acrobat files. Some of the files contents can be crawled and rendered by Google asHTML document, which enables attacker to hunt for highly relevant documents through Google

search. Attacker could for instance, query filetype:xls username password email to locate potentialMicrosoft Excel files that contains sensitive information, as shown in Figure 6.2 (edited) (Greene,2000).

Page 17: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 6.2: Microsoft Excel Reveals Username and Password (Digitalbrain, 2003)

Network Report

Nessus is a vulnerability scanner that produces an assessment report after scanning network

vulnerability and misconfiguration. Thus, with that in mind, attacker could query Google with “This

file was generated by Nessus” to find the report and locate vulnerabilities on potential targets that yetto be fixed (Trivedi, 2005, p. 8). Figure 6.3 shows such report contains assessed host IP, open portnumber, detailed potential vulnerabilities description and countermeasures. There is a high possibilitythat the sites mentioned in the report could be exploited, as such report may be uploaded by malicioususers who perform vulnerability scanning on other machines. If an administrator is conscious enoughto perform a vulnerability scanning, the assessment report should not have existed on the server.

Page 18: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 6.3: Nessus Assessment Report (Nessus Scan Report, n.d.)

Database SQL scripts

Database dumps usually refer to SQL scripts that contain text-based information about database andtable structure, including table name, field name, field type and even actual records in tables.Administrator uses this file to reconstruct the database. Figure 6.4 shows dump file containing table

structures and actual records (i.e username, password), using query filetype:sql “# Dumping data for

table” (username|user|users|password) (Long, 2005, p. 309). The query consists of generic dump fileextension, common header name and promising field names. Such SQL scripts are very helpful if thesite is vulnerable to SQL injection as well. Since attacker knows the table structure, attacker couldmanipulate the database file through SQL injection. At worst, if the sites login credential is stored inthat database, attacker could insert a username and password to the database to access private sites.

Page 19: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 6.4: Database Dumps Reveal Helpful Information (MySQL Dump, n.d.)

WEB-ENABLED NETWORK DEVICES

Lots of network devices such as routers, firewalls, printers and proxy servers have web interfaces thatshow the device status and allow administrators to remotely configure their settings. Network devicemisconfiguration has exposed these devices to Google. Attacker could subvert these devices to gainaccess to the trusted network protected by these devices, or directly exploit the device vulnerabilities.

For instance, query intitle:”ADSL Configuration page” will find SolWise ADSL modem crawled byGoogle, as shown in Figure 7.1 (Chat11, 2004).

Page 20: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 7.1: ADSL Modem Configuration Page (ADSL Configuration Page, n.d.)

Most network printers have web-based interface that allows users to conveniently view the printers’status or modify their configurations from any web browser. Misconfiguration has exposed such

printers on the Internet. Figure 7.2 shows network printer captured using query “Phaser 6250”

“Printer Neighborhood” (Chambet, 2004). The network printer provides a lot of detailed informationabout surrounding network, including its IP address, print jobs list, printed document filenames, andcomputer names issuing print job. Attacker can even further compromise the printer through itsadministrative page. Some of these printers allow attacker to issue test print page through the internet!Network printers like Phaser 740 have vulnerability that allows attacker to access a hidden file throughthe URL to modify the administrative password. If attacker is aiming to cause annoyance for any users,Google is very effective in finding these network devices for exploitation. Thus, administrator shouldnever allow such devices to be exposed on the Internet.

Page 21: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Figure 7.2: Google Exposes Network Printer (Phaser 6250, 2003)

COUNTERMEASURES

There a few simple ways users should practice to protect themselves from such innocuous attack usingGoogle. Firstly, users need to know the two ways a page could be found and indexed by Google. Firstmethod, the page could be linked from other sites that have been crawled by Google. Secondly, thepage is manually submitted to Google database. To avoid personal profiling, user should avoid usingactual name when corresponding in web-based message groups or blogs. Users could perform a search(e.g. actual name, username) on themselves to so that they are aware of the information that ispublished in the Internet to avoid such information being used against them.

Administrator should perform search on their own web servers to exposure potential threats usingGoogle. All of the above mentioned technique can be automated using tools like Gooscan, SiteDiggerand Athena. Such tools use a signature database consisting of various Google queries to search for sitesinformation leakage. Administrator can search their own websites for exposure effectively andefficiently using such tools. However, using automated tools like Gooscan and Athena that does notutilize Google API violates Google Terms and Condition, which can result in temporary servicebanning (Calashain, 2003, p. 137).

Employees in an organization should be advised of what information is published for public access toavoid possible social engineering by using such information against them. Administrator should firstmake sure all web servers are installed with the latest patches. Since directory listing provides a roadmap to private files, administrator should disable directory listing, unless users are allow to browsefiles in a FTP-style manner. All default username, password, test pages and documentation should beremoved. Administrator should ensure their web pages are fully tested for potential errors and allerrors are caught properly. Default pages should be customized to remove all possible common words.

Robots.txt should be used to specify web directories that should not be indexed by Google. However,attacker can still directly access the robots.txt of a targeted site to learn the directory structures. Usepassword protection mechanism to protect private pages that are intended for specific users, sinceGoogle is unable to indexed password-protected pages. Put META tag <META NAME="ROBOTS"

Page 22: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

CONTENT="NOARCHIVE"> in a page’s HEAD section to prevent Google from caching a page.Lastly, if a page that is not intended for public viewing is found in Google, after removing the pagefrom the web servers, administrator could resort to Google Remove URL and Google Groups Post(http://services.google.com/urlconsole/controller ) to remove the identified URL and its respectivecache page from Google repository.

CONCLUSION

Google is able to produce some astonishing results, which depend very much on the precision of theconstructed query. The possibility of constructing potential exploitable query is boundless; creativityof attacker in creating query is the only limitation. Google is very effective in profiling an individual aslots of users have unwittingly disclosed personal information. They are unaware that search engineslike Google could collect and index all this information and serve them to anyone with the correctquery. If attacker has no specific targets, Google is highly effective in finding vulnerable targets fromthe mass of indexed websites to perform random attacks, as opposed to finding potential vulnerabilitieson a specific target, which is not effective. There are others penetrations tools that are much better infinding vulnerabilities on a specific target than Google. The powerfulness of Google is a two-edge-sword. Attackers have armed themselves with Google to gather pieces of organization information thatseems innocuous to facilitate further compromise. Administrators should embrace Google as one oftheir penetration testing tools to protect their organization from information leakage.

REFERENCES

ADSL Configuration Page (n.d.) Retrieved May 3, 2005, from http://router.breukink.co.uk/

BiLE (2003). Bi-directional Link Extraction. Retrieved April 30, 2005, fromhttp://www.sensepost.com/restricted/BilePublic.tgz

Broward County (2005). OnCoRe Setting Options. Retrieved May 5, 2005, fromhttp://205.166.161.12/OncoreV2/Settings.aspx

Calashain, T. (2003). Google Hacks: 100 Industrial-Strength Tips & Tools. California: O’Reilly.

Central College (n.d.) Retrieved May 5, 2005, fromhttp://enrolme.centralcollege.ac.uk/enrolme/connect.inc

Chambet, P. (2004). Google Attacks. Retrieved May 3, 2005, fromhttp://www.blackhat.com/presentations/bh-usa-04/bh-us-04-chambet/bh-us-04-chambet-google-up.pdf

Chat11 (2004, July 5). Using Google to Find Passwords. Retrieved May 1, 2005, fromhttp://www.chat11.com/How_To_Use_Google_To_Find_Passwords

Cole, E. (2003). Hacker Beware. Singapore: Prentice Hall.

ComSec (2003, May 25). Google A Dream Come True. Retrieved May 3, 2005, fromhtp://www.governmentsecurity.org/comsec/googletut1.txt

CSD Contacts (n.d.). Retrieved May 5, 2005, from http://www.jmls.edu/intranet/csd/contacts.shtml

Curriculum Vitae (n.d.). John Terning – Curriculum Vitae. Retrieved May 3, 2005, fromhttp://t8web.lanl.gov/people/terning/john/cv/cvmain.html

Davies, G. (2004). Advanced Information Gathering. Retrieved May 3, 2005, fromhttp://packetstormsecurity.com/hitb04/hitb04-gareth-davies.pdf

Page 23: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Digitalbrain (2003). Retrieved May 3, 2005, fromhttp://frome.digitalbrain.com/frome/ICT/Digitalbrain%20users/ All%20DigitalBrain%20Users.xls

DIS (n.d.). DIS Student Plug-In Network Setup – How-To. Retrieved May 3, 2005, fromhttp://www.dis.unimelb.edu.au/helpdesk/connect.pdf

Employment (2005, March 31). Public Mutual - Employment Opportunity. Retrieved May 3, 2005,from http://www.publicmutual.com.my/page.aspx?name=co-Employment

Google (2003). Google Acquires Deja's Usenet Archive. Retrieved April 28, 2005, fromhttp://groups.google.com/googlegroups/deja_announcement.html

Google Local (2005). Retrieved May 5, 2005, from http://local.google.com/

Granneman, S. (2004, March 9). Googling Up Password. Retrieved May 2, 2005, fromhttp://www.securityfocus.com/columnists/224

Greene, T.C. (2000, June 25). Crackers Use Search Engines to Exploit Weak Sites. Retrieved April 30,2005, from http://www.theregister.co.uk/2000/06/25/crackers_use_search_engines/

Greene, T.C. (2001, November 28). The Google Attack Engine. Retrieved April 30, 2005, fromhttp://www.theregister.co.uk/2001/11/28/the_google_attack_engine/

Heyerlist (n.d.) Retrieved May 7, 2005, from http://www.heyerlist.org/garderobe/_vti_pvt/service.pwd

IIS 5.1 (2001). Internet Information Services 5.1 Release Notes. Retrieved May 6, 2005, fromhttp://www.aspit.net/iishelp/iis/misc/localhost/iishelp/iis/htm/core/readme.htm

Ipswitch (n.d.) Retrieved May 5, 2005, from http://www.ryerson.ca/~mblee/WS_FTP.ini

Ipswitch (1996). WS_FTP Professional – User Guide. Retrieved April 30, 2005, fromhttp://www.oxinet.co.uk/ipswitch/ws_ftp.pdf

Kernelnewbies (n.d.) Index of /documents/kdoc. Retrieved May 5, 2005, fromhttp://kernelnewbies.org/documents/kdoc/

Lanalana (n.d.) Test Page for Apache Installation. Retrieved May 7, 2005, from http://iolanipalace.org/

Leyden, J. (2005, April 4). Hacking Google for Fun and Profit. Retrieved May 1, 2005, fromhttp://www.securityfocus.com/news/10816

Long, J. (2004, March 19). The Google Hacker’s Guide. Retrieved May 1, 2005, fromhttp://johnny.ihackstuff.com/modules.php?op=modload&name=Downloads&file=index&req=getit&lid=34

Long, J. (2005). Google Hacking for Penetration Testers. United States of America: SyngressPublishing.

Maybank (2005). Maybank MaxiHome Year End Promotion Winners List. Retrieved May 5, 2005,from http://www.maybank2u.com.my/maybank_group/

products_services/consumer_loan/maxihome_winners2.shtml

Mohanty, D. (2005, March 11). Demystifying Google Hacks. Retrieved May 2, 2005, fromhttp://www.securitydocs.com/link.php?action=detail&id=3098&headerfooter=no

Mowse (2003, February 16). Google Knowledge: Exposing Sensitive Data with Google. RetrievedMay 1, 2005, from http://www.digivill.net/~mowse/code/mowse-googleknowledge.pdf

Mvacs (n.d.) It Worked! The Apache Web Server is Installed on this Web Site!. Retrieved May 7,2005, from http://mvacs.ess.ucla.edu/

Page 24: Information Gathering Using Google - Semantic ScholarThis paper discusses ways to exploit Google to obtain valuable information and how it can be used by attackers to perform attack

Myers (2005). Microsoft Outlook Web Access – Logon. Retrieved May 8, 2005, fromhttp://mail.dnmyers.edu/exchange/logon.asp

MySQL Dump (n.d.) MySQL Dump 8.22. Retrieved May 4, 2005, fromhttp://www.ozeki.hu/attachments/34/etalon.sql

Nessus Scan Report (n.d.). Retrieved May 8, 2005, fromhttp://www.geocities.com/mvea/debian30r2_install.htm

Phaser 6250 (2003). About Printer – Printer 6250. Retrieved May 2, 2005, fromhttp://140.113.153.116/aboutprinter.html

PhaserLink (1999, November 16). Fwd: Printer Vulnerability: Tektronix PhaserLink Webserver givesAdministrator Password. Retrieved May 1, 2005, from http://www.security-express.com/archives/bugtraq/1999-q4/0001.html

Racerx (2005, April). Google Hacking Techniques. Retrieved May 2, 2005, fromhttp://www.exploitersteam.org/forumnews-id30.html

RightVision (n.d.). Serveur Appliance – Software – Right Vision. Retrieved May 10, 2005, fromhttp://www.rightvision.com/lg-fr-rubrique-distributeurs.html

Rutgers (2005). Names. Retrieved May 10, 2005, from http://teachx.rutgers.edu/~mja/wakka/workfiles/int_excel/students.xls

Secunia (2005, March 3). CubeCart Cross-Site Scripting Vulnerabilities. Retrieved April 28, 2005,from http://secunia.com/advisories/14416/

SensePost (2003, February). The Role of Non-Obvious Relationships in the Foot Printing Process.Retrieved April 28, 2005, fromhttp://www.sensepost.com/restricted/BH_footprint2002_paper.pdf

SPC (n.d.). Find Names. Retrieved May 8, 2005, fromhttp://email.spc.edu/exchange/USA/finduser/root.asp?acs=anon

Trivedi, K. (2005, January). Foundstone SiteDigger 2.0 – Identifying Information Leakage UsingSearch Engines. Retrieved April 1, 2005, fromhttp://www.foundstone.com/resources/whitepapers/wp_sitedigger.pdf

COPYRIGHT

Lih Wern Wong ©2005. The author/s assign the School of Computer and Information Science (SCIS)& Edith Cowan University a non-exclusive license to use this document for personal use provided thatthe article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to SCIS & ECU to publish this document in full in the Conference Proceedings. Suchdocuments may be published on the World Wide Web, CD-ROM, in printed form, and on mirror siteson the World Wide Web. Any other usage is prohibited without the express permission of the authors.