When using Netlas.io for an OSINT investigation of an organization, you can find a variety of data types that could be insightful for cybersecurity analysis, market research, and other investigative purposes.
Building a Company Profile
With Netlas, you can build a comprehensive technical overview of a company. Here’s an overview of the types of data you might uncover:
- Network Infrastructure
Details on the network infrastructure, such as ASN (Autonomous System Number) information, which can help in understanding the organization’s internet service providers (ISPs) and network size. - Geographical Distribution
Geolocation data of the IP addresses, showing the physical locations of servers and other devices, which can illustrate the geographical reach of the organization’s infrastructure. - Device and Service Footprint
Information about the servers, including web servers, mail servers, and other application servers, detailing the technologies and software versions in use. - Technological Profile
The technology stack used by the organization, including web server types, programming languages, and content management systems (CMS), which can inform about the organization’s technical capabilities and preferences. - External Partnerships
Digital presence on SaaS platforms such as Slack, Atlassian, DocuSign, ServiceNow, etc. Content Delivery Networks (CDNs) or third-party services (like DDoS protection or external WAF) in use, which can be relevant for understanding dependencies and external partnerships. - Security Posture
Exposed services and open ports that might be vulnerable or misconfigured, indicating potential security risks. Specific vulnerabilities associated with the devices and services identified, based on known vulnerabilities of the software versions detected. - Web Content and Metadata
Snapshot or metadata of web content hosted by the organization, which can include titles, descriptions, and keywords. Searching resources published on subdomains can sometimes even lead to sensitive documents.
Let’s explore a few particular information gathering areas to grasp the depth of this subject.
Mapping Attack Surface
To collect most of the data types listed in the previous section, you need to discover the internet-connected parts of your target’s infrastructure. These internet-connected parts of the infrastructure are also known as the external attack surface. To find all the pieces of this external infrastructure, Netlas provides an Attack Surface Discovery Tool.
Here is an example of the external attack surface of the well-known Spotify, built using the Netlas Attack Surface Discovery Tool in less than 5 minutes.
The goal of this step is to obtain a list of domain names and IP addresses related to the target for further exploration. For more detailed methodologies and recommendations on this process, you can refer to our Complete Guide on Attack Surface Discovery or to its shortened version described in a use case on Attack Surface Discovery.
Fetching Internet Scan Data
When the attack surface is successfully mapped, most types of data for an OSINT profile can be fetched from the Netlas Responses data collection. You can use the Netlas CLI Tool to automate fetching responses for each target IP address and domain name.
Assuming that targets.txt
contains a list of IP addresses and domains of interest, the following command will download internet scan results for this list:
while IFS= read -r line; do netlas download --all host:\"$line\"; done \
< targets.txt > responses.json
It takes some time to complete the command depending on the number of targets in the list. All the responses related to target will be saved in responses.json
.
Filtering Internet Scan Data
Now when you have the data downloaded, you can filter it using the JQ utility to extract the specific information you need. Here are a few examples.
Every response has a record about the internet service provider at the top level of the document. The following command filters out ISPs, using sort -u
to remove duplicated values:
cat responses.json | jq -r ".data.isp" | sort -u
The answer should be something like:
Amazon CloudFront
Cloudflare
Fastly
Google Cloud
Google Servers
In the same way Autonomus System data can be gathered:
cat responses.json | jq -r '.data.whois.asn | "\(.name) \(.number[0])"' | sort -u
AMAZON-02 16509
CLOUDFLARENET 13335
FASTLY 54113
GOOGLE 15169
GOOGLE-CLOUD-PLATFORM 396982
The following command will give you the geographical location of your target:
cat responses.json | jq -r '.data.geo | "\(.country) \(.city)"' | sort -u
DE Frankfurt am Main
GB London
GB Poplar
US Jenkintown
US Kansas City
US Titusville
Using similar queries, you can get any data contained in the Netlas scan results. To understand the document structure, refer to the Responses data collection mapping in the Netlas application.
Search for Internal Documents
Subdomains are often used to host various internal services, such as development environments, staging areas, and internal documentation. If this documentation are accidentally exposed to the public internet, it can become a treasure trove of sensitive information.
There are two strong reasons to use Netlas in addition to traditional search engines to find document links:
- While search engines such as Google may not scan subdomains that employ robots.txt or noindex tags, Netlas archives responses from all known subdomains.
- Utilizing Netlas in conjunction with other search engines ensures that you’re accessing previously gathered data, rather than directly interacting with a company’s infrastructure. Consequently, your activities remain untraceable.
python3 netlas_docs_by_domain.py bankofamerica.com | grep employ
https://www.bankofamerica.com/content/documents/employees/retiree_newsletter.pdf
https://www.bankofamerica.com:443/content/documents/employees/2024_wellness_activities_faq.pdf
https://www.bankofamerica.com:443/content/documents/employees/401k-quick-tips.pdf
https://www.bankofamerica.com:443/content/documents/employees/entering-time-closures.pdf
https://www.bankofamerica.com:443/content/documents/employees/resources-guide-for-parents.pdf
https://www.bankofamerica.com:443/content/documents/employees/ERA_Document.pdf
https://www.bankofamerica.com:443/content/documents/employees/RehireEligibilityReviewForm.pdf
https://www.bankofamerica.com:443/content/documents/employees/learning-resources.pdf
https://www.bankofamerica.com:443/content/documents/employees/entering-time-closures.pdf
https://www.bankofamerica.com:443/content/documents/employees/resources-guide-for-parents.pdf
In the Netlas Scripts repository, you’ll find a handy script to iterate through internet scan data for a specific domain and all its subdomains, searching for links to files, like TXT, DOCX, PDF and others.
Company Email Addresses
APIs designed to retrieve publicly available organization emails can significantly streamline OSINT and cybersecurity tasks. Netlas may not primarily focus on supplying contact information, yet it can serve as one of your data sources for this purpose.
Given the breadth of Netlas’s data collections, you’re apt to discover numerous relevant contacts for the businesses you’re targeting. You can retrive data from:
- Internet scan data (not only http, but dozens of supported protocols);
- WHOIS data for domains and IP addresses;
- SSL certificates.
Through the API, accessing contacts from Netlas’s extensive data archives is straightforward. If you are regulary encounter this task, you can download a Python script that searches for email addresses by given domain name in all data collections. This and many others handy scripts published on one of Netlas Github repositories.
We considered only few areas of information collection. Using the API, you can expand or build your own solution for OSINT investigations. Refer to the documentation and sample scripts to learn more about the Netlas API.