October 8, 2024 | 14 min read

Google Dorking in Cybersecurity

Jump to comments ()
Share this post

Every information security specialist eventually faces the need to uncover well-hidden resources. While one approach might be to manually explore the target company’s website in hopes of finding something useful, a more efficient method is to use Google Dorks. In this article, we’ll explore how Dorking can be leveraged in information security, provide a few examples, and introduce some alternative techniques.

Basic Concepts

Before diving into dorks directly, let’s cover some brief theoretical background. Experienced users may choose to skip this section, but for those who have only a vague understanding of what Google Dorking entails, we recommend giving it a read.

What is Google Dorking?

Google Dorking, also known as Google Hacking, is a technique for crafting search queries on Google using specific keywords (dorks) and logical operators to access particular sites, file types, or pages.

With Google Dorking, you can search target sites for files with specific extensions, pages with particular titles, or specific keywords within URLs. This technique is highly useful for OSINT investigations, reconnaissance during penetration tests, or even for satisfying personal curiosity.

Don’t try to access resources you don’t have rights to!
Although Google Dorking is a completely legal technique, it can lead to the discovery of confidential information or pages that are not intended for public access. We strongly advise you to stay within legal boundaries and refrain from investigating any resources to which you do not have authorized access.

Most Valuable Dorks

The list of dorks that can be used in Google search is quite extensive. However, from an information security perspective, not all of them are relevant—after all, checking the weather (using the dork “weather:CityName”) isn’t typically part of a security assessment. In this section, I’ll provide a list of search queries that are most valuable for cybersecurity specialists.

cache

enables you to access Google-cached versions of sites that may currently be unavailable.

It functions similarly to using the Wayback Machine but allows you to explore a wider range of websites. Usage example: "cache:netlas.io".

Using cache dork

filetype

allows you to define the file extension to search for.

Some of the possible extensions include PDF, PS, DWF, KML, KMZ, XLS, PPT, DOC, RTF, and SWF. Files not covered in this list can be located using combinations of other dorks, which will be discussed in the use cases section.

Using filetype dork Using filetype dork

intext
pages in the text of which the keyword appears.
allintext
pages in the text of which the exact sequence of words occurs.

These two dorks are similar to a regular Google search and a search for a query in quotation marks. The only exception is that using intext/allintext, you will only find pages whose keyword is in the body. At the same time, a regular search and a query in quotation marks will also return pages where the keyword is in the title.

Using intext dork Using intext dork

Using quotation marks Using quotation marks

intitle
search for a specific word in the page title.
allititle
search for a specific combination of words in the page title.

This is a highly useful and commonly used dork, enabling you to find web interfaces, articles, login pages, and more.

Using intitle dork Using intitle dork

inurl
a specific word in the page address.
allinurl
a combination of certain words in the page address.

This dork is equally valuable, as it helps locate various technical pages (like login portals) and services within a specific domain.

Using inurl dork Using inurl dork

site
limits search results to a specific website and its subdomains, allowing you to concentrate solely on the resources of the target company without being distracted by unrelated content.

Using site dork

Logical Operators

When working with dorks, you can combine multiple queries into more complex constructs by using various logical operators. The AND operator combines two dorks, returning only results that meet both conditions. It can be particularly useful when you have multiple domains within your scope. The OR returns results that satisfy at least one of the specified queries. You can also use & and | symbols.

Using two dorks with AND Using two dorks with AND

Using two dorks with OR Using two dorks with OR

The - symbol acts as a NOT operator. By placing it before a query, you can exclude results that match that query from your search results. For example, you can use it to filter out specific domains or pages.

Using dorks with NOT Using dorks with NOT

Use Cases

So, let’s move on to use cases. In this section, we’ll explore the most common applications of Google Dorks in the context of information security, along with some practical examples.

OSINT

OSINT (Open Source Intelligence) involves gathering data about a company, individual, or other target by working with publicly accessible sources. The primary tool for an OSINT investigator is the Internet, including search engines. By leveraging Google Dorks, a researcher can customize search queries to efficiently locate the required information.

Mentions Search

The first and most straightforward application of Google Dorks in OSINT is searching for references to a person or company across the Internet. All you need is the name or nickname of your target. Here are some example dorks that can be utilized for this purpose:

  • "John Doe"
  • allintitle:"John Doe"
  • allintext:"John Doe"
  • inurl:"NickName"

However, the mentions might be too numerous or irrelevant. In such cases, it makes sense to narrow the scope of your search. For instance, if you want to find a person’s LinkedIn account, you can refine your query accordingly.

allintext:"John Doe" & site:linkedin.com

Similarly, you can use Google Dorks to find profiles on platforms like Twitter, Reddit, and others by refining your search terms.

By combining the target’s name with additional keywords, you can search for contact information, such as:

  • "John Doe" "contact information"
  • allintext:"John Doe" AND intext:"phone number"

By using the various dorks described in the first section, you can gather all relevant mentions of the company or person you’re investigating.

Related Documents Search

The next use case involves searching for documents related to your target. As you may have guessed, we’re going to focus on the filetype dork.

Let’s begin by searching for documents on a specific website. I’ll use arxiv.org as an example, as it often contains a large number of PDFs:

site:arxiv.org filetype:pdf

This will return us the following results:

Searching for documents on specific site Searching for documents on specific site

Now, if you want to find documents that mention a specific person or organization, you can use the following dork::

"John Doe" filetype:docx

This way, Google will return all DOCX files that mention the long-suffering John.

Searching for documents with someone’s mention Searching for documents with someone’s mention

Unfortunately, as previously mentioned, Google doesn’t support dorks for searching many file extensions. The method for handling this limitation will be described in the next use case.

Webcameras Search

Often, during your research, you may want to explore a location without physically visiting it. For this purpose, you can utilize online maps (such as Google Maps) or access live cameras. Google Dorks are particularly effective for locating these cameras.

To detect some camera models or software for them, you can use the following dorks:

  • Android IP Webcam - inurl:"videomgr.html"
  • AXIS - intext:"To use the Axis web application, enable JavaScript"
  • Blue Iris - allintitle:"Blue Iris Login"
  • etc.

You can find more dorks to find cameras in dedicated to this topic article.

Penetration Testing

The second use case we’ll cover in this article is penetration testing, specifically the reconnaissance phase before a pentest. With Google Dorks, you can easily search for potential entry points, such as login pages, services on a domain, or input forms. Let’s examine each of these use cases in more detail.

API Endpoints Search

When testing any web application, the API is often one of the most critical entry points. If developers haven’t sufficiently secured endpoints or have left access to restricted ones, this can significantly compromise the application’s security. To find such endpoints, you can use one of the following dorks:

  • site:example.com inurl:api
  • site:example.com inurl:schema

Netlas endpoints search Netlas endpoints search

Probably XSS

To execute an XSS injection, an attacker needs to find a vulnerable input form where code can be inserted. Such input is often reflected in the URL of the page, making it easier to detect with Google Dorks.

  • site:example.com inurl:q=
  • site:example.com inurl:s= | inurl:query=

Sensitive Files

The topic of file searching was previously covered in the OSINT section. However, as mentioned earlier, not all file extensions can be located using the filetype keyword. During a pentest, you may want to find “technical” files, such as logs, environment settings, and more. To discover these files, you can use a dork like the following:

  • intitle:"Index of" ".env" site:example.com
  • "parent directory" ".log" site:example.com

Env files search Env files search

Login and Services Pages

Equally important is finding authorization pages and domain services with distinct URLs. These dorks are straightforward to use—you just need to know what elements to look for in the page URL.

  • inurl:login site:example.com
  • inurl:signin site:example.com

Login page search Login page search

Automatization

Now that we’ve covered the most popular use cases, we can move on to the most exciting part of this article—automating work with Google Dorks. Manually entering these dorks for each target can be tedious, especially if you have a large scope. In the following section, I’ll create the skeleton of a script that you can easily modify and use in practical scenarios.

First Scripts

First, you need to choose a programming language and the necessary libraries. For this example, I used Python 3.11, and the main module will be googlesearch. To install it, enter the following command in your terminal:

python3 -m pip install googlesearch-python

Great! Now, you can import the module into your project and make your first Google search request directly from your code:

from googlesearch import search

j = search("netlas", advanced=True)

for i in j:
    print(i)

Here I requested the word “netlas”. The advanced parameter is needed in order to get results in all details: URL, title, description. The result of running the script is shown in the following image.

First script results

A good start! Now that we can retrieve results from Google search without a browser, let’s enhance the script’s functionality. We need to add more complexity to the query by incorporating different patterns and create a simple input system to allow the user to specify a scope.

Complex Patterns

While I was writing this article, I came across a list of Google Dorks specifically for bug bounty that was compiled by Mike Takashi. You can prepare your own patterns, but as an example I will take queries from this list, since they cover a large layer of the needs of a pentester.

Input/Output

Now that we have figured out what to do and roughly how to do it, we need to figure out what we will do it with. In my version of the script (simplified and educational), the following will be submitted to the input:

  • File with domain names for research.
  • Output format (YAML/JSON).
  • Number of results collected for one request.

The script will output the data to the console so that the user can redirect the output stream to any desired location.

What format will the data be output in? It will be a dictionary of dictionaries, looking something like this:

Domain_Name_1:       
  | Search_Pattern_1:        
      | URL_1: Description_1         
      | URL_2: Description_2        
      ...         
      | URL_k: Description_k         
  | Search_Pattern_2:        
      | ...           
Domain_Name_2:
  | ...

This format will allow us to clearly structure the data, after which it can be easily read in JSON or YAML formats.

Main Request Function

Next, we will write the main function that will be responsible for requests. In the first version of the script, I made many such functions that were as similar as two peas in a pod. However, this is bad form, so there will be nothing like that here. We will limit ourselves to just this:

def oneRequest(query, count):
    responseDict = {}

    j = search(query, advanced=True, num_results=count)

    for i in j:
        responseDict[i.url] = i.description

    return responseDict

In this function, we create a dictionary of the lowest level, where we enter key:value pairs. The URLs of pages corresponding to the request will act as keys, and their descriptions will act as values. The maximum number of values, as already mentioned, is fed to the script as input.

Queries Hub

Next, we need to submit requests to the function from the previous paragraph. I decided to make it as simple as possible by passing them this way:

def functionHub(site, resultsCount):
    domainDict = {}

    domainDict["API Endpoints"] = copy.deepcopy(oneRequest("site:" + site + " inurl:api | site:*/rest | site:*/v1 | site:*/v2 | site:*/v3", resultsCount))
    time.sleep(10)
    #...

    return domainDict

Here we are creating a dictionary that will contain all the information about the domain. Next, the URLs of the results that satisfy the request are written into the values ​​of this dictionary. Between each of the oneRequest() function calls I set a timer of 10 seconds. This is necessary to prevent Google from blocking your IP address due to frequent use of dorks.

Fighting Blockages

As I mentioned in the previous point, Google will block your IP if you try to make queries containing dorks too often. It will look something like this:

Error 429 in script

Unpleasant. And what’s even more unpleasant is that this problem is not solved by timeouts. More than 10 seconds passed between my requests, but this did not prevent Google from blocking me.

However, there is a solution. You can use a rotating proxy and change IP addresses before the search engine considers the script’s activity suspicious. In this article, I will not provide instructions for setting up this mechanism, I will only mention that the library we use supports it. In code, it looks something like this:

from googlesearch import search

proxy = 'http://API:@proxy.host.com:8080/'

j = search("proxy test", num_results=100, lang="en", proxy=proxy, ssl_verify=False)
for i in j:
    print(i)

Final

I won’t describe the rest in sufficient detail, since only technical nuances remain. We will need to set up an argument parser to run the script from the command line. In addition, the functionality for outputting results and processing the input file will be written. You can find the full script code in our GitHub repository.

Finally, let’s look at the results. As a test subject, I chose the site google-gruyere.appspot.com, which was intentionally made vulnerable to train information security specialists. The results of running the script are shown in the following image.

Script results

You may notice that the results are somewhat limited… and you would be correct. My own instance of Google Gruyere doesn’t last long, and testing the script on live, functioning sites can pose security risks and ethical concerns. As a result, the example above serves more as a demonstration of the code’s functionality than as an operational success.

If you decide to test this script in your work and share your feedback in the comments, I would be extremely grateful!

Additional Capabilities

This concludes the part of the article dedicated to Google Dorks. This is a very powerful tool that allows you to detect rather unobvious objects using only a Google search. However, you can supplement your search results by using other search engines. For example, Netlas.io.

Having a not so abundant data set, Netlas can provide great search opportunities. Thus, it allows you to both replace some of the dorks with your own requests, and create completely new ones. For example, the replacement might look like this:

  • intitle -> http.title
  • intext -> http.body
  • etc.

Here is an example of similar requests in Netlas:

Search by title in Netlas Search by title in Netlas

However, this search is not very interesting. It would be much better to consider the additional capabilities of Netlas. These include:

  • Search by favicons.
  • Search by software used on server.
  • By protocol.
  • By SSL certificates fields.
  • By WHOIS data.
  • etc.

Search examples:

Search by favicon in Netlas Search by favicon Search by favicon in Netlas Search by favicon

Search by software in Netlas Search by software Search by software in Netlas Search by software

Search by SSL fields in Netlas Search by SSL certificate fields Search by SSL fields in Netlas Search by SSL certificate fields

Using these and thousands of other queries, the user can greatly enhance the data obtained while working with Google Dorks, especially when it comes to pentesting or building an Attack Surface.

Another advantage of Netlas and similar search engines (for example, Shodan or Censys) is the fact that they ignore the restrictions imposed on search robots. Thus, Google does not touch the resources mentioned in the robots.txt file during crawling. IoT search engines don’t care about this, which is why they may store results that cannot be found through Google Search.

Conclusion

So, in this article we looked at using Google Dorks both manually and automatically, and also touched on one of the possible analogues. At the end, we can briefly summarize the main points:

  • Google Dorking is a powerful tool that allows you to accomplish many tasks using just a search bar.
  • Using dorks, you can carry out reconnaissance before pentests, as well as conduct OSINT investigations.
  • Working with dorks can be automated in the form of scripts.
  • An analogue of using Google Dorks can be working with various Internet search engines, such as Netlas, Shodan, Censys, etc.

Finally, I would like to remind you once again that while Google Dorks are a legal tool, they can easily allow you to access sensitive files or web pages that are not intended for public use. Be careful and remember to follow the laws.

Share this post