OSINT (Open Source Intelligence) is the umbrella term for the techniques and culture of using publicly available information alone to investigate people, organisations, infrastructure, or events. Its applications are wide: attacker reconnaissance, defensive threat intelligence, investigative journalism, digital forensics. Starting from the premise that "more than half the answer is already out in the open", the essence of OSINT is how fast, how accurately, and how ethically you can gather and re-assemble it.
A short history of OSINT #
The term OSINT dates back to World War II. In 1941 the United States established the Foreign Broadcast Monitoring Service (later FBIS) to translate and analyse enemy and neutral-country radio, newspapers, and publications. Throughout the Cold War, public-source information was said to account for 80% of intelligence — Soviet rail timetables, agricultural statistics, and local-paper obituaries were used to infer military movements.
inurl:, filetype:, and intitle: to unearth sensitive material is codified. The major social networks (Facebook 2004, Twitter 2006, LinkedIn 2003) also appear.The OSINT cycle #
The intelligence cycle long used by militaries and intel agencies applies to OSINT unchanged. The point is that it loops — each round of findings produces new questions, which feed back into more collection.
The answers almost always raise new questions and you go back to Planning. In real engagements you loop several times per case.
Data source categories #
| Category | Examples | What it reveals |
|---|---|---|
| Web / blogs | Official sites, news, IR materials, job postings | Org info, contact details, tech stack |
| Social media | Twitter/X, Facebook, Instagram, LinkedIn, TikTok, Telegram | Connections, movement, preferences, timeline |
| Images / video | YouTube, TikTok, Flickr, satellite imagery (Maxar, Planet, Sentinel) | Location, time, identity |
| Maps / geography | Google Maps Street View, OpenStreetMap, Mapillary | Street features, building layout |
| Public records | Corporate registry, real-estate records, court records, FOIA disclosures | Officers, shareholders, disputes |
| DNS / IP / certificates | WHOIS, crt.sh, Shodan, Censys | Infrastructure layout, vulnerabilities |
| Code leaks | GitHub, GitLab, Pastebin, public S3 | Credentials, internal design |
| Breached data | Have I Been Pwned, Dehashed | Passwords, email addresses |
| Device fingerprints | EXIF, IPTC, ID3, FP | Camera, geotags |
In Japan corporate registry records can be obtained by anyone who pays the fee through the Registry Information Service; US real-estate records are fully public at the county level; the EU under GDPR strictly limits secondary use of personally identifiable information. Verifying legality under your own jurisdiction is the first priority.
Google dorking — search operators #
Special operators in Google / Bing surface public files and configuration that ordinary search wouldn't find.
| Operator | Example | Purpose |
|---|---|---|
site: |
site:example.com |
Restrict to a domain |
inurl: |
inurl:admin |
String in URL |
intitle: |
intitle:"index of" |
String in title |
filetype: |
filetype:pdf "internal" |
Narrow by file extension |
intext: |
intext:"password" |
String in body text |
cache: |
cache:example.com |
Google's cached copy |
- |
-marketing |
Exclude |
"" |
"social security number" |
Exact match |
# Open-directory leakage from an organisation
site:example.com intitle:"index of" -html
# Mistakenly-public .env files
filetype:env "DB_PASSWORD"
# Specific keywords on Pastebin
site:pastebin.com "internal-only" example.com
# Old, vulnerable phpMyAdmin
inurl:phpmyadmin/index.php intitle:"phpMyAdmin 2."The Google Hacking Database (GHDB) (exploit-db.com/google-hacking-database) catalogues thousands of pre-built queries.
Images and geolocation #
Reverse image search #
- Google Images — strong for everyday subjects, celebrities, and products
- TinEye — strong at finding original posting date and the sites where an image appeared
- Yandex Images — uncannily strong at face matching and location ID (in OSINT circles, a category of its own)
- Bing Visual Search — product recognition
- PimEyes — faces only; ethically debated
Geolocation #
The technique of "figuring out where a photo or video was taken". The core of the GeoGuessr-style OSINT that Bellingcat popularised.
- Road signs and street markings — language and typography (font, colour, shape)
- Building architecture and roof colours (have national characteristics)
- Vegetation (palms vs conifers, seasonal state)
- Sun position and shadow length for inferring time and latitude (SunCalc.org)
- Vehicles and license plates
- Power poles, wiring, postboxes
- Mountains and coastlines in the background, cross-referenced with Google Earth
Yandex dominates over Google in Russia, so its training set of Russian-language street images and portraits is overwhelmingly large. If you're looking for a face or location match, throw it at Yandex first. Bellingcat leaned on it heavily for the MH17 and Syria chemical-weapons investigations.
Social media and person reconciliation #
Working through social media (SOCMINT) systematically — it's the primary source for both people and events.
Typical items to investigate:
- Account creation date and the first post
- The follow / follower network
- Histogram of posting times → likely time zone of residence
- Photo geotags (sometimes still in EXIF even when stripped from display)
- "Likes" / comment targets → close relationships
- Cross-matching usernames, phone numbers, and emails across networks for identity reconciliation
# Sherlock — search for a username across hundreds of sites at once
$ sherlock johndoe
[+] GitHub: https://github.com/johndoe
[+] Reddit: https://reddit.com/user/johndoe
[+] Instagram: https://instagram.com/johndoe
# Maigret — Sherlock fork, even more sites
$ maigret johndoe --top-sites 500
# holehe — infer services an email is registered to
$ holehe target@example.com
# GHunt — Google account public info from a Gmail address
$ ghunt email target@gmail.comDomain, IP, and certificate OSINT #
Mapping an organisation's infrastructure from the outside. The starting point for attacker recon, penetration testing, and threat intelligence alike.
# WHOIS — domain registration info
$ whois example.com
# DNS — assorted records
$ dig example.com ANY +noall +answer
$ dig +short MX example.com
$ dig +short TXT example.com # SPF, DKIM, DMARC
# Subdomain enumeration
$ subfinder -d example.com -all -silent
$ amass enum -d example.com# crt.sh — every certificate ever issued (hidden subdomains pop out)
$ curl -s "https://crt.sh/?q=%25.example.com&output=json" \
| jq -r '.[].name_value' | sort -u
# Shodan — ports, banners, and vulnerabilities of publicly-exposed hosts
$ shodan host 93.184.216.34
$ shodan search "Server: Apache" port:80 country:JP
# Censys — competitor with very strong TLS-certificate indexing
$ censys search 'services.tls.certificates.leaf_data.subject.common_name: "example.com"'
# Wayback Machine — deleted historical pages
$ curl -s "http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json"Metadata and EXIF analysis #
Photos, PDFs, Office documents, and video all carry a great deal of metadata.
# Image EXIF — camera model, timestamp, GPS coordinates, serial number
$ exiftool photo.jpg
# PDF / Office — author, last editor, software, revision history
$ exiftool report.pdf
# Video
$ ffprobe -v error -show_format -show_streams video.mp4
# Bulk-collect metadata across a whole site
$ metagoofil -d example.com -t pdf,doc,xls -l 100 -o resultsEXIF is very commonly forgotten. There are many press and investigation cases where the author of an internal PDF revealed someone inside the organisation.
Major tools and search engines #
All-in-one frameworks #
| Tool | Use case | Licence |
|---|---|---|
| Maltego | Graph-visualisation OSINT IDE; Transforms unify various sources as nodes | Commercial (Community edition available) |
| SpiderFoot | Automated OSINT collection across 200+ modules; HX edition is cloud-hosted | Open / commercial (HX) |
| Recon-ng | Metasploit-like interactive CLI, modular | Open |
Target-specific tools #
| Tool | Input | Output |
|---|---|---|
| theHarvester | Domain | Emails / subdomains / employee names |
| Sherlock / Maigret | Username | Presence check across hundreds of social networks |
| holehe | Email address | List of registered services |
| GHunt | Gmail address | Public Google-account information |
| OSINT Framework (osintframework.com) | — | Categorised link directory of tools by purpose |
Specialised search engines #
| Service | Searches |
|---|---|
| Shodan | Internet-exposed hosts / services / banners |
| Censys | TLS certificate index, hosts, subdomains |
| ZoomEye | Chinese-built equivalent of Shodan |
| Wayback Machine | Historical Web (1996–present) |
| GreyNoise | Classification of "Internet background noise" (scanner IPs) |
| Have I Been Pwned | Check whether an email / password appears in breaches |
Worked example workflows #
(1) From a domain to a full picture of an organisation's infrastructure #
Starting point: "We want to pentest Company X's site (with authorisation)."
whois example.com for contact info and registration date.subfinder / amass with crt.sh to find hidden assets like vpn-staging.example.com.port:443/22/3389 against each IP — discover exposed RDP, old OpenSSH.theHarvester to collect email addresses from LinkedIn / Google → an employee roster.org:example-corp for committed tokens, internal hostnames, and leaked API keys.Aggregating each step into a Maltego or SpiderFoot graph gives you a single picture of "the relationships between infrastructure and people, anchored at the domain".
(2) Pinpointing time and place from a single photograph #
Verifying photos posted to social media as evidence of an incident or illegal act. Even without EXIF, you can narrow the location down by combining background text, building style, vegetation, power poles, and the sun's position. For the MH17 case, Bellingcat used exactly this technique to reconstruct the path of the Russian Buk system minute by minute.
Ethics, law, and Counter-OSINT #
OSINT may be "public information only", but the line between what you may and may not do is blurry, drawn by both national law and professional ethics.
- Unauthorised access — browsing public pages is fine, but guessing IDs / passwords to log in falls under the Unauthorised Computer Access Act (Japan) / CFAA (US)
- Secondary use or sale of personal information — GDPR / Japan's Amended PIPA / CCPA require a legitimate purpose
- Stalking and harassment — depending on how collected information is used, it can become a crime
- CSAM-related content — possession itself is criminal (no investigative-purpose exception)
- Redistributing leaked data — checking your own email is fine; redistributing someone else's leak data is not
- Face-recognition OSINT — Clearview AI has been ruled illegal in the EU; PimEyes is under continuing debate
Ethics code #
Guidelines broadly endorsed by investigative journalism and the security industry:
- Legitimate purpose — does it fall under public interest / contractual work / self-defence?
- Proportionality — is the depth of the investigation excessive relative to the goal?
- Minimisation — collect only what's necessary; discard unrelated third-party data
- Triangulation — never conclude from a single source; require at least three independent ones
- Eliminate false identification — stay constantly alert to the risk of misidentification (people with the same name, mistaken inferences)
Counter-OSINT — at the personal level #
If attackers can assemble a profile with OSINT, defenders should make sure they can't.
- Tighten social-media privacy settings to at least "friends only"; be cautious about real name, employer, school
- Strip EXIF before posting (Discord and Telegram do not auto-strip)
- Disable geotagging in OS settings
- Use different usernames per social network — using the same handle everywhere makes Sherlock-style reconciliation a one-shot
- Compartmentalise email and phone — primary / secondary / burner is a reasonable split
- Enable Have I Been Pwned notifications
Counter-OSINT — at the corporate level #
- WHOIS privacy to mask registration details
- Certificate-issuance policy — keep
staging.anddev.out of crt.sh (use an internal CA) - Always-on GitHub / GitLab secrets scanning
- Shodan self-monitoring — periodically scan your own IPs for unexpected services
- OSINT red-team exercises — OSINT your own organisation from the outside
Summary #
OSINT is not a "magic tool" — it is the combination of how you frame questions + how efficiently you collect public information + how rigorously you triangulate + how you make ethical calls. As Bellingcat and wartime OSINT have shown, even citizens can get close to the truth of historical events, but because it directly touches the privacy, safety, and reputation of the subject, you have to constantly ask "can I" vs "should I".
For security practitioners, OSINT is the phase attackers will run first — which is precisely why defenders are obligated to understand how their own organisation looks from the outside. Applying the techniques in this article to yourself is the best introduction to OSINT, and the best defence.