Imagine having the ability to efficiently collect valuable intelligence from a wide range of platforms without the hassle of switching between tools—this is precisely what makes theharvester stand out in the world of cyber reconnaissance.
While some premium tools may offer similar features, few match the flexibility and openness of theharvester. As an open-source solution, it is incredibly user-friendly and can extract data from numerous sources, both free and premium. One of the most notable features is its seamless integration with the SecurityTrails API™, allowing for even deeper insights during a security assessment.
Exploring Harvester’s Reach and How Much Value Is Harvester in Cyber Security
theharvester, a tool crafted by the Edge-Security team, is a key utility for open-source intelligence (OSINT) gathering, helping professionals identify and assess potential security risks. Built on Python and designed for command-line use, it assists in the initial phase of cyber investigations, offering insights into an organization’s exposure on the internet.
Originally intended for penetration tests and red team operations, theharvester has evolved into a versatile tool used across different cybersecurity teams. Its passive data collection capabilities provide essential insights for both offensive and defensive security efforts. Additionally, its ease of integration with various data sources helps streamline reconnaissance efforts.
Here are the passive and active reconnaissance sources leveraged by theharvester, including some unique facts about each:
Passive Reconnaissance Sources:
- Baidu: Useful for gathering intelligence from Chinese search engines and websites.
- Bing: Leverages Microsoft’s search index to discover publicly available domain information.
- dnsdumpster: A popular DNS reconnaissance tool that provides data about subdomains and IP addresses.
- DuckDuckGo: Offers privacy-focused search results that help track less obvious digital footprints.
- Google: The most widely used search engine, giving access to a vast amount of indexed data and assets.
- Hunter.io: A powerful tool for finding email addresses associated with specific domains.
- Qwant: A European-based search engine that helps diversify the range of publicly available information.
- Netlas: A search engine for internet-connected devices, similar to Shodan and Censys. You can read more about using theHarvester with the Netlas module in the corresponding article.
- SecurityTrails: Provides comprehensive data about domains, IP addresses, and DNS records, making it an essential OSINT resource.
- Shodan: A search engine for finding exposed devices connected to the internet, particularly useful for cybersecurity monitoring.
- Trello: Offers insights into publicly accessible project management boards that could reveal sensitive information.
- Twitter: A goldmine for discovering real-time information about companies, employees, and security events.
Active Reconnaissance Sources:
- DNS Brute-force: Theharvester performs dictionary-based brute-force attempts to discover subdomains that might not be publicly indexed.
- Screenshots: The tool can automatically capture images of discovered subdomains, providing a visual confirmation of any exposed resources.
These reconnaissance capabilities make theharvester an indispensable tool for understanding the scope of a company’s exposure in the digital world, offering invaluable insights for security teams looking to secure their networks or assess vulnerabilities.
theHarvester on Kali OSINT Tools for a Quick OSINT Test with Kali Linux OSINT Tools
There are several ways to set up theharvester, making it adaptable to various environments. The four primary methods of installation include:
- Kali Linux: Pre-installed and ready to use
- Docker: Quick and containerized deployment
- Installing from Source (without Pipenv): A manual method for a more customized setup
- Installing from Source (with Pipenv): Uses a Python package manager for a streamlined environment
For this tutorial, we will focus on the third option, setting up theharvester from source. It’s recommended to use a secure environment, such as a virtual machine (VM), container, or isolated test server, when installing new software. In this case, we are using Ubuntu 20.04 for the installation, though these instructions should work for most Debian-based systems, with minor adjustments needed for others.
Start by preparing the system and installing necessary dependencies:
sudo apt update
sudo apt upgrade
sudo apt install git python3-venv
Next, we create a Python virtual environment to isolate the required dependencies:
python3 -m venv harvest
cd harvest/
source bin/activate
git clone https://github.com/laramies/theHarvester
cd theHarvester/
pip install wheel
pip install -r requirements/base.txt
During the installation, a minor issue was encountered, requiring the installation of the wheel package before proceeding with the others in the base.txt file.
After completing the installation, you can verify it by running the following command:
python theHarvester.py -h
This will display the help menu and confirm that the tool is ready for use.
Lastly, don’t forget to configure the API keys needed for certain functions. Theharvester uses a YAML file called api-keys.yaml to store these keys. For testing, we’ll use free or freemium APIs, which can be added using a text editor like nano or vim:
nano -w api-keys.yaml
This step ensures theharvester is equipped to access the APIs and perform OSINT tasks efficiently.
How to Use theHarvester as a Free OSINT Tool for Open Source Intel Twitter
In this demonstration, we’ll leverage theHarvester to dig deeper into publicly available data by performing a basic search on a domain using various data sources.
For this test, let’s start with a simple query for the domain “rpfront[.]com,” limiting our results to 50 entries and utilizing Google as our primary data source. The command to run is:
python theHarvester.py -d rpfront.com -l 50 -b google
Upon executing the search, the results didn’t provide much valuable information. This is not surprising, as Google generally doesn’t offer the level of detailed OSINT that specialized APIs provide. To get more meaningful insights, we’ll shift our focus to other APIs that might offer richer data sources. Let’s see what we can uncover by trying different options in theHarvester.
Leveraging theHarvester GitHub for an Open Harvester Recon Tool
To enhance our reconnaissance efforts, we can use theHarvester in conjunction with GitHub’s powerful open-source tools. Let’s execute the following command to gather data about the domain “rpfront[.]com” using the SecurityTrails API as our data source:
python theHarvester.py -d rpfront.com -l 50 -b securityTrails
This time, the search yielded much more valuable information, providing us with 3 associated IP addresses and 2 related hosts. By integrating open-source APIs like SecurityTrails, theHarvester significantly expands the breadth of intelligence it can gather, helping to build a more complete picture of the target domain’s digital infrastructure. This data can be particularly useful for identifying potential vulnerabilities or mapping out a domain’s exposure in the wild.
Enhancing the Harvester Tool via the Harvester GitHub
After integrating theHarvester with various data sources, we decided to test ThreatCrowd as one of our information providers. Unfortunately, our search returned no relevant results.
This outcome highlights an important aspect of OSINT tools: the availability of data can vary across different platforms. Sometimes, a particular source might not have information related to a specific domain, especially if the domain is not widely discussed or indexed on that platform. This is a natural limitation in the world of open-source intelligence, but it also underscores the flexibility of theHarvester, which allows users to continuously explore other data sources in pursuit of valuable information.
Here are a few points to consider when working with theHarvester and other OSINT tools:
- Data Source Availability: Not all data sources will have information on every domain, and some platforms may have more extensive records than others.
- Flexible Data Integration: theHarvester lets you experiment with a variety of data providers, increasing the chances of uncovering useful intelligence.
- Open Source Contributions: As an open-source project on GitHub, theHarvester is constantly being improved, and additional data sources or bug fixes can be contributed by the community.
Even with some sources not yielding results, theHarvester’s open-source nature ensures that it remains a powerful and adaptable tool in any OSINT investigation.
Testing OSINT Virtual Tools Among the Best OSINT Tools
Testing OSINT Virtual Tools Among the Best OSINT Tools
To start, we used theHarvester to perform an OSINT search for the domain “rpfront[.]com” through the UrlScan API, limiting the results to 50 entries:
python theHarvester.py -d rpfront.com -l 50 -b urlscan
This time, we successfully found 5 related IP addresses and 1 host, showcasing the tool’s strength in uncovering domain infrastructure.
Primary OSINT sources like UrlScan are excellent for identifying IPs and hosts linked to a domain. One of the major advantages of theHarvester is its access to a wide range of data sources, so we’ll now extend our tests by using additional third-party services.
For example, we tried to gather data from Twitter for the same domain:
python theHarvester.py -d rpfront.com -l 50 -b twitter
Surprisingly, no relevant information appeared for “rpfront” or “rpfront[.]com.” To ensure this wasn’t an isolated case, we ran a similar search for moslempress[.]com on Twitter:
python theHarvester.py -d moslempress.com -l 100 -b duckduckgo
Once again, this search returned no useful data. We then moved on to test the Hunter API to look for associated email addresses:
python theHarvester.py -d moslempress.com -l 10 -b hunter
Unfortunately, we came up empty-handed once more. To verify the functionality of Hunter, we ran the same query for a more common domain, which yielded some email results. This discrepancy suggests that the lack of data for the “moslempress” domain may be due to its niche nature rather than a failure of the tool itself.
Next, we tried using ThreatMiner and RapidDNS:
python theHarvester.py -d moslempress.com -l 50 -b threatMiner
python theHarvester.py -d moslempress.com -l 50 -b rapidDNS
These sources provided crucial insights, confirming the current IP address of “moslempress[.]com.”
To broaden our investigation, we ran a comprehensive scan of the “moslempress[.]com” domain across all available sources and output the results to an HTML file:
python theHarvester.py -d moslempress.com -l 50 -b all -f moslempress.html
This scan proved to be the most informative, uncovering a wealth of data, including emails and other domain-related information. The HTML output provided detailed results, complete with filtering options for each data source.
However, when we tried to run the scan again, we faced challenges such as rate-limiting and blocking from some services, particularly LinkedIn. The number of hosts also dropped from 47 to 31, indicating some sources had flagged our IP due to repeated scans. To resolve this, theHarvester offers an option to use proxies:
theHarvester --proxies
By adding multiple proxy IPs to the proxies.yaml file, we can bypass some of the restrictions caused by frequent queries.
The HTML report also included helpful tables with filtering options for each data source, but the graphical plots (which track changes over time) were less useful in our case. However, these graphs can be beneficial for ongoing investigations, helping to highlight shifts in the data over time.
One feature we found a bit confusing was the –shodan option, which is not a data source but a flag that must be added for domain searches. After some experimentation, we found the correct usage:
python theHarvester.py -d moslempress.com -l 50 -b securityTrails --shodan
Shodan requires an associated IP address to work, and once it found one, it returned data on subdomains related to the domain.
Finally, we attempted to perform DNS brute forcing and take screenshots of the results with the following command:
python theHarvester.py -d moslempress.com -l 50 -b securityTrails -c --screenshot ~/harvest/theHarvester/sc/
The DNS brute-force approach wasn’t particularly useful, as we had already gathered this information from other sources. The screenshot feature, unfortunately, did not function as expected, and we encountered errors each time we tried.
This testing process highlights the versatility of theHarvester in collecting OSINT data across multiple platforms. While some sources may not yield results, the tool’s wide range of options ensures that users can gather valuable insights from the data available.
A Final Look at theHarvester GitHub and Its Impact in Cybersecurity Recon
In conclusion, theHarvester proves to be a highly valuable resource for performing basic OSINT investigations. Its ability to gather intelligence passively from a wide range of sources provides a solid foundation for any cybersecurity reconnaissance effort.
While the tool excels in functionality, one area that could be improved is its documentation. Many general guides overlook specific features or advanced configurations, but in-depth analyses like ours shed light on the full potential of theharvester. Additionally, as with most open-source software, we encountered a couple of minor bugs during testing. These issues are not uncommon but do not diminish the overall usefulness of the tool.
Here are a few key points about theharvester’s impact on cybersecurity reconnaissance:
- Comprehensive Data Sources: The tool allows users to pull data from multiple platforms, enabling thorough investigation and mapping of digital footprints.
- Flexible Integration: By incorporating open-source APIs, theharvester provides access to both free and premium data points, enhancing its effectiveness.
- Open-Source Nature: Being open-source, theharvester invites the community to contribute, leading to continual improvements and the addition of new features.
- Ease of Use: Despite its advanced capabilities, theharvester remains user-friendly, making it accessible to both newcomers and seasoned cybersecurity professionals.
- Minor Bugs and Limitations: As expected with open-source projects, occasional bugs and limitations may arise, but these are typically manageable and do not affect the tool’s overall value.
All things considered, theHarvester is a robust asset for digital security experts aiming to discover openly accessible data throughout their investigative efforts. Its robust functionality, though occasionally hindered by minor issues, makes it an essential asset in the OSINT toolkit.