theHarvester: a Classic Open Source Intelligence Tool

June 25, 2025

9 min read

Learn practical OSINT techniques, installation tips, and real-world use cases for theHarvester to enhance your cybersecurity reconnaissance workflow.
Jump to comments ()
LinkedIn
Telegram
Reddit

Imagine having the ability to efficiently collect valuable intelligence from a wide range of platforms without the hassle of switching between tools—this is precisely what makes theHarvester stand out in the world of cyber reconnaissance.

While some premium tools may offer similar features, few match the flexibility and openness of theHarvester. As an open-source solution, it is incredibly user-friendly and can extract data from numerous sources, both free and premium. One of the most notable features is the seamless integration with a large number of tools and APIs, allowing user to gain even deeper insights during security assessments.

Exploring Harvester’s Reach and How Much Value Is Harvester in Cyber Security

theHarvester, a tool crafted by the Edge-Security team, is a key utility for open-source intelligence (OSINT) gathering, helping professionals identify and assess potential security risks. Built on Python and designed for command-line use, it assists in the initial phase of cyber investigations, offering insights into an organization’s exposure on the internet.

Originally intended for penetration tests and red team operations, theharvester has evolved into a versatile tool used across different cybersecurity teams. Its passive data collection capabilities provide essential insights for both offensive and defensive security efforts. Additionally, its ease of integration with various data sources helps streamline reconnaissance efforts.

Here are the passive and active reconnaissance sources leveraged by theharvester, including some unique facts about each:

Passive Reconnaissance Sources:

  • Baidu: Useful for gathering intelligence from Chinese search engines and websites.
  • Bing: Leverages Microsoft’s search index to discover publicly available domain information.
  • dnsdumpster: A popular DNS reconnaissance tool that provides data about subdomains and IP addresses.
  • DuckDuckGo: Offers privacy-focused search results that help track less obvious digital footprints.
  • Google: The most widely used search engine, giving access to a vast amount of indexed data and assets.
  • Hunter.io: A powerful tool for finding email addresses associated with specific domains.
  • Qwant: A European-based search engine that helps diversify the range of publicly available information.
  • Netlas: A search engine for internet-connected devices, similar to Shodan and Censys. You can read more about using theHarvester with the Netlas module in the corresponding article.
  • SecurityTrails: Provides comprehensive data about domains, IP addresses, and DNS records, making it an essential OSINT resource.
  • Shodan: A search engine for finding exposed devices connected to the internet, particularly useful for cybersecurity monitoring.
  • Trello: Offers insights into publicly accessible project management boards that could reveal sensitive information.
  • Twitter: A goldmine for discovering real-time information about companies, employees, and security events.

Active Reconnaissance Sources:

  • DNS Brute-force: Theharvester performs dictionary-based brute-force attempts to discover subdomains that might not be publicly indexed.
  • Screenshots: The tool can automatically capture images of discovered subdomains, providing a visual confirmation of any exposed resources.

These reconnaissance capabilities make theHarvester an indispensable tool for understanding the scope of a company’s exposure in the digital world, offering invaluable insights for security teams looking to secure their networks or assess vulnerabilities.

Request Your Free 14-Day Trial

Submit a request to try Netlas free for 14 days with full access to all features.

theHarvester Installing

There are several ways to set up theHarvester, making it adaptable to various environments. The four primary methods of installation include:

  • Kali Linux: Pre-installed and ready to use
  • Docker: Quick and containerized deployment
  • Installing from Source (without Pipenv): A manual method for a more customized setup
  • Installing from Source (with Pipenv): Uses a Python package manager for a streamlined environment

For this tutorial, we will focus on the third option, setting up theHarvester from source. It’s recommended to use a secure environment, such as a virtual machine (VM), container, or isolated test server, when installing new software. In this case, we are using Ubuntu 20.04 and MacOS Sequoia 15.5 for the installation, though these instructions should work for most Debian-based systems, with minor adjustments needed for others.

Start by preparing the system and installing necessary dependencies:

sudo apt update
sudo apt upgrade
sudo apt install git python3-venv

Next, we create a Python virtual environment to isolate the required dependencies:

python3 -m venv harvest
cd harvest/
source bin/activate
git clone https://github.com/laramies/theHarvester
cd theHarvester/
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install .

You can also use poetry to install dependencies:

pip install poetry
potry install

But it is important to note that the pyproject.toml file does not contain the project version, which can cause errors when installing dependencies. To avoid them, add the version manually.

After completing the installation, you can verify it by running the following command:

python theHarvester.py -h

This will display the help menu and confirm that the tool is ready for use.

Lastly, don’t forget to configure the API keys needed for certain functions. Harvester uses a mentioned YAML file called api-keys.yaml to store these keys. For testing, we’ll use free or freemium APIs, which can be added using a text editor like nano or vim:

cd theHarverster/data
nano -w api-keys.yaml

This step ensures theharvester is equipped to access the APIs and perform OSINT tasks efficiently.

How to Install theHarvester on MacOS

The method described above also works when installing the utility on MacOS. However, in addition to this, the tool can be installed much more easily using the Homebrew package manager:

brew install theharvester

The api-keys.yaml file in this case will be located in the /Users/user/.theHarvester directory.

How to Use theHarvester as a Free OSINT Tool for Open Source Intel

In this demonstration, we’ll leverage theHarvester to dig deeper into publicly available data by performing a basic search on a domain using various data sources.

For this test, let’s start with a simple query for the domain “target[.]com”, and always limiting our results to 50 entries.

Leveraging SecurityTrails API for an Open theHarvester Recon Tool

To enhance our reconnaissance efforts, we can use theHarvester in conjunction with some powerful tools. Let’s execute the following command to gather data about the domain “target[.]com” using the SecurityTrails API as our data source:

python theHarvester.py -d target.com -l 50 -b securityTrails

There was a problem here. It was possible to find one IP address associated with the target domain, but due to problems with response processing, it was not received. The utility returned the following results:

[*] Target: target.com 

Read api-keys.yaml from /Users/user/.theHarvester/api-keys.yaml
	Done Searching Results
[*] Searching SecurityTrails. 

[*] IPs found: 1
-------------------
An exception has occurred while adding: _rank to ip_list: failed to detect a valid IP address from '_rank'

[*] No emails found.

[*] No people found.

[*] No hosts found.

Not a pleasant result, but we will be able to find other data later. For now, let’s remember that using the SecurityTrails API is associated with a small bug.

How to Choose an OSINT Tool

The above result highlights an important aspect of OSINT tools: the availability of data can vary across different platforms. Sometimes, a particular source might not have information related to a specific domain, especially if the domain is not widely discussed or indexed on that platform. This is a natural limitation in the world of open-source intelligence, but it also underscores the flexibility of theHarvester, which allows users to continuously explore other data sources in pursuit of valuable information.

Here are a few points to consider when working with theHarvester and other OSINT tools:

  • Data Source Availability: Not all data sources will have information on every domain, and some platforms may have more extensive records than others.
  • Flexible Data Integration: theHarvester lets you experiment with a variety of data providers, increasing the chances of uncovering useful intelligence.
  • Open Source Contributions: As an open-source project on GitHub, theHarvester is constantly being improved, and additional data sources or bug fixes can be contributed by the community.

Even with some sources not yielding results, theHarvester’s open-source nature ensures that it remains a powerful and adaptable tool in any OSINT investigation.

Testing Best OSINT Tools with theHarvester

To start, we used theHarvester to perform an OSINT search for the domain “target[.]com” through the UrlScan API, limiting the results to 50 entries:

python theHarvester.py -d target.com -l 50 -b urlscan

This time the result was much more interesting. We managed to find the following resources associated with the domain: 3 ASNs, 3 interesting URLs, 24 IPs.

[*] Target: target.com 

[*] Searching Urlscan. 

[*] ASNS found: 3
--------------------
AS16509
AS27589
AS54113

[*] Interesting Urls found: 3
--------------------
https://678doo7bt5p9mx7n.u5hqmkfdt8rxif1w.www.fastly.gcp.ephemeral.target.com/
https://www.target.com/
https://www.target.com/bp/hyde%2Band%2Beek%21%2Bboutique%E2%84%A2

[*] IPs found: 24
-------------------
108.138.128.114
108.138.128.14
108.138.128.6
...
3.167.99.21
54.230.228.105
54.230.228.20
54.230.228.57

[*] No emails found.

[*] No people found.

[*] Hosts found: 0

Primary OSINT sources like UrlScan are excellent for identifying IPs and hosts linked to a domain. One of the major advantages of theHarvester is its access to a wide range of data sources, so we’ll now extend our tests by using additional third-party services.

Next, we tried using ThreatMiner and RapidDNS:

python theHarvester.py -d target.com -l 50 -b threatminer
python theHarvester.py -d target.com -l 50 -b rapiddns

ThreatMiner returns an error, but RapidDNS has allowed us to find over 2,500 hosts associated with the domain. Some of them are listed below:

[*] Target: target.com 

[*] Searching Rapiddns. 

[*] No IPs found.

[*] No emails found.

[*] No people found.

[*] Hosts found: 2532
---------------------
1view.target.com:161.225.130.61
2daymail.target.com:66.216.75.58
2daysweepstakes.target.com:rdr.target.com
2daysweepstakes.target.com:rdr.target.com.
2daysweepstakes.target.com:rdr3.ewips.target.com
Connectti.target.com:connectti.ewips.target.com
...
xyzshared.pf.target.com:xyzoffice365.pf.target.com
xyzshared.pf.target.com:xyzoffice365.pf.target.com.
xyzsso.pf.target.com:xyzoffice365.pf.target.com
xyzsso.pf.target.com:xyzoffice365.pf.target.com.

An interesting tool for searching open data is also GitHub. You can search it as follows:

python theHarvester.py -d target.com -l 50 -b github-code

This time we also managed to find several hosts, but besides them, for the first time we found email:

[*] Target: target.com 

Read api-keys.yaml from /Users/user/.theHarvester/api-keys.yaml
	Searching 0 results.
	Searching 38 results.
[*] Searching Github-code. 

[*] No IPs found.

[*] Emails found: 1
----------------------
[email protected]

[*] No people found.

[*] Hosts found: 8
---------------------
.target.com
admin.target.com
gsp.target.com
intl.target.com
login.target.com
tech.target.com
test.target.com
weeklyad.target.com

To broaden our investigation, we ran a comprehensive scan of the “target[.]com” domain across all available sources:

python theHarvester.py -d target.com -l 50 -b all

As a result of scanning we managed to get 1 email, 3210 hosts, 252 IP addresses, 3 ASNs and 3 URLs. There could have been more data, but for some tools requiring authorization, API codes were intentionally not entered.

It is also important to consider that some sources may block your IP address if you scan too often. To bypass this restriction, theHarvester offers a proxies:

theHarvester --proxies

By adding multiple proxy IPs to the proxies.yaml file, we can bypass some of the restrictions caused by frequent queries.

One feature we found a bit confusing was the --shodan option, which is not a data source but a flag that must be added for domain searches. After some experimentation, we found the correct usage:

python theHarvester.py -d target.com -l 50 -b urlscan --shodan

Shodan requires an associated IP address to work, and once it found one, it returned data on subdomains related to the domain. In our example, we first used Urslcan to find IP addresses, after which they will go to Shodan’s input.

The result was to obtain both the list of addresses shown above, and responses for some of them. It looks something like this:

[*] Target: target.com 

[*] Searching Urlscan. 

[*] ASNS found: 3
--------------------
AS16509
AS27589
AS54113

[*] Interesting Urls found: 3
--------------------
https://678doo7bt5p9mx7n.u5hqmkfdt8rxif1w.www.fastly.gcp.ephemeral.target.com/
https://www.target.com/
https://www.target.com/bp/hyde%2Band%2Beek%21%2Bboutique%E2%84%A2

[*] IPs found: 24
-------------------
108.138.128.114
...
54.230.228.57

[*] No emails found.

[*] No people found.

[*] Hosts found: 0
---------------------
[*] Searching Shodan. 
	Searching for 108.138.128.114
Read api-keys.yaml from /Users/user/.theHarvester/api-keys.yaml
{
    "asn": "AS16509",
    "domains": [
        "cloudfront.net"
    ],
    "hostnames": [
        "server-108-138-128-114.jfk50.r.cloudfront.net"
    ],
    "ip_str": "108.138.128.114",
    "isp": "Amazon.com, Inc.",
    "org": "Amazon.com, Inc.",
    "ports": [
        80,
        443
    ],
    "product": "",
    "server": "CloudFront",
    "technologies": [
        "Amazon Web Services",
        "Amazon CloudFront"
    ],
    "title": "ERROR: The request could not be satisfied"
}
...

Finally, we attempted to perform DNS brute forcing and take screenshots of the results with the following command:

python theHarvester.py -d target.com -l 50 -b securityTrails -c --screenshot /Users/user/Desktop/scs

The DNS brute-force approach wasn’t particularly useful, as we had already gathered this information from other sources. The screenshot feature, unfortunately, did not function as expected, and we encountered errors each time we tried.

This testing process highlights the versatility of theHarvester in collecting OSINT data across multiple platforms. While some sources may not yield results, the tool’s wide range of options ensures that users can gather valuable insights from the data available.

A Final Look at theHarvester and Its Impact in Cybersecurity Recon

In conclusion, theHarvester proves to be a highly valuable resource for performing basic OSINT investigations. Its ability to gather intelligence passively from a wide range of sources provides a solid foundation for any cybersecurity reconnaissance effort.

While the tool excels in functionality, one area that could be improved is its documentation. Many general guides overlook specific features or advanced configurations, but in-depth analyses like ours shed light on the full potential of theharvester. Additionally, as with most open-source software, we encountered a couple of minor bugs during testing. These issues are not uncommon but do not diminish the overall usefulness of the tool.

Here are a few key points about theharvester’s impact on cybersecurity reconnaissance:

  • Comprehensive Data Sources: The tool allows users to pull data from multiple platforms, enabling thorough investigation and mapping of digital footprints.
  • Flexible Integration: By incorporating open-source APIs, theharvester provides access to both free and premium data points, enhancing its effectiveness.
  • Open-Source Nature: Being open-source, theharvester invites the community to contribute, leading to continual improvements and the addition of new features.
  • Ease of Use: Despite its advanced capabilities, theharvester remains user-friendly, making it accessible to both newcomers and seasoned cybersecurity professionals.
  • Minor Bugs and Limitations: As expected with open-source projects, occasional bugs and limitations may arise, but these are typically manageable and do not affect the tool’s overall value.

All things considered, theHarvester is a robust asset for digital security experts aiming to discover openly accessible data throughout their investigative efforts. Its robust functionality, though occasionally hindered by minor issues, makes it an essential asset in the OSINT toolkit.

Calendar

Book Your Netlas Demo

Chat with our team to explore how the Netlas platform can support your security research and threat analysis.

LinkedIn
Telegram
Reddit