OSINT — Methods, Tools, and Real-World Examples of Open-Source Investigation thumbnail

OSINT — Methods, Tools, and Real-World Examples of Open-Source Investigation

⏱ approx. 23 min views 346 likes 0 LOG_DATE:2026-05-09
TOC

OSINT (Open Source Intelligence) is the umbrella term for the techniques and culture of using publicly available information alone to investigate people, organisations, infrastructure, or events. Its applications are wide: attacker reconnaissance, defensive threat intelligence, investigative journalism, digital forensics. Starting from the premise that "more than half the answer is already out in the open", the essence of OSINT is how fast, how accurately, and how ethically you can gather and re-assemble it.

01

A short history of OSINT #

The term OSINT dates back to World War II. In 1941 the United States established the Foreign Broadcast Monitoring Service (later FBIS) to translate and analyse enemy and neutral-country radio, newspapers, and publications. Throughout the Cold War, public-source information was said to account for 80% of intelligence — Soviet rail timetables, agricultural statistics, and local-paper obituaries were used to infer military movements.

1941 — FBIS established
Systematic monitoring, translation, and analysis of foreign broadcasts. The institutional origin of OSINT.
1990s — The Web arrives
Public-source information from around the world becomes searchable from home. Methodology, though, is still at the level of "search on Google".
2002 — Google Hacking
Johnny Long publishes the Google Hacking Database. The discipline of combining inurl:, filetype:, and intitle: to unearth sensitive material is codified. The major social networks (Facebook 2004, Twitter 2006, LinkedIn 2003) also appear.
2014 — Bellingcat founded
Eliot Higgins reconstructs the downing of MH17, Syrian chemical-weapons attacks, and Russian assassination attempts by combining public videos, satellite imagery, social-media posts, and Flightradar data. The power of citizen OSINT is demonstrated to the world.
2022 onward — Wartime OSINT and AI
Since Russia's invasion of Ukraine, TikTok / Twitter / Telegram civilian posts have made the front line trackable almost in real time. Maxar and Planet satellite imagery, AI-driven face matching, and AI geolocation have pushed both scale and accuracy a tier higher.
02

The OSINT cycle #

The intelligence cycle long used by militaries and intel agencies applies to OSINT unchanged. The point is that it loops — each round of findings produces new questions, which feed back into more collection.

1. Planning
Put what you want to know into words and formulate hypotheses. If the scope is too wide, you'll burn unlimited time, so the iron rule is to narrow it down to 1–3 questions.
2. Collection
Run Web / social media / DNS / satellite imagery / leak data / government data sources in parallel.
3. Processing
Transcribe videos, OCR images, translate foreign-language text, align timestamps to UTC, extract metadata. Get everything into a machine-readable form.
4. Analysis
Connect fragments, detect contradictions and biases, and pull out evidence that supports or refutes the hypothesis. Triangulation — confirmation by three or more independent sources — is the rule.
5. Dissemination
Turn conclusions into reports, timelines, and visualisations. Keeping your inferences clearly separated from the sources you cite makes the work verifiable later and earns trust.

The answers almost always raise new questions and you go back to Planning. In real engagements you loop several times per case.

03

Data source categories #

Category Examples What it reveals
Web / blogs Official sites, news, IR materials, job postings Org info, contact details, tech stack
Social media Twitter/X, Facebook, Instagram, LinkedIn, TikTok, Telegram Connections, movement, preferences, timeline
Images / video YouTube, TikTok, Flickr, satellite imagery (Maxar, Planet, Sentinel) Location, time, identity
Maps / geography Google Maps Street View, OpenStreetMap, Mapillary Street features, building layout
Public records Corporate registry, real-estate records, court records, FOIA disclosures Officers, shareholders, disputes
DNS / IP / certificates WHOIS, crt.sh, Shodan, Censys Infrastructure layout, vulnerabilities
Code leaks GitHub, GitLab, Pastebin, public S3 Credentials, internal design
Breached data Have I Been Pwned, Dehashed Passwords, email addresses
Device fingerprints EXIF, IPTC, ID3, FP Camera, geotags
▸ "Public" is defined differently country by country

In Japan corporate registry records can be obtained by anyone who pays the fee through the Registry Information Service; US real-estate records are fully public at the county level; the EU under GDPR strictly limits secondary use of personally identifiable information. Verifying legality under your own jurisdiction is the first priority.

04

Google dorking — search operators #

Special operators in Google / Bing surface public files and configuration that ordinary search wouldn't find.

Operator Example Purpose
site: site:example.com Restrict to a domain
inurl: inurl:admin String in URL
intitle: intitle:"index of" String in title
filetype: filetype:pdf "internal" Narrow by file extension
intext: intext:"password" String in body text
cache: cache:example.com Google's cached copy
- -marketing Exclude
"" "social security number" Exact match
Practical dork examples
# Open-directory leakage from an organisation site:example.com intitle:"index of" -html # Mistakenly-public .env files filetype:env "DB_PASSWORD" # Specific keywords on Pastebin site:pastebin.com "internal-only" example.com # Old, vulnerable phpMyAdmin inurl:phpmyadmin/index.php intitle:"phpMyAdmin 2."

The Google Hacking Database (GHDB) (exploit-db.com/google-hacking-database) catalogues thousands of pre-built queries.

05

Images and geolocation #

Reverse image search #

  • Google Images — strong for everyday subjects, celebrities, and products
  • TinEye — strong at finding original posting date and the sites where an image appeared
  • Yandex Imagesuncannily strong at face matching and location ID (in OSINT circles, a category of its own)
  • Bing Visual Search — product recognition
  • PimEyes — faces only; ethically debated

Geolocation #

The technique of "figuring out where a photo or video was taken". The core of the GeoGuessr-style OSINT that Bellingcat popularised.

  • Road signs and street markings — language and typography (font, colour, shape)
  • Building architecture and roof colours (have national characteristics)
  • Vegetation (palms vs conifers, seasonal state)
  • Sun position and shadow length for inferring time and latitude (SunCalc.org)
  • Vehicles and license plates
  • Power poles, wiring, postboxes
  • Mountains and coastlines in the background, cross-referenced with Google Earth
▸ Why Yandex is so strong on faces and places

Yandex dominates over Google in Russia, so its training set of Russian-language street images and portraits is overwhelmingly large. If you're looking for a face or location match, throw it at Yandex first. Bellingcat leaned on it heavily for the MH17 and Syria chemical-weapons investigations.

06

Social media and person reconciliation #

Working through social media (SOCMINT) systematically — it's the primary source for both people and events.

Typical items to investigate:

  • Account creation date and the first post
  • The follow / follower network
  • Histogram of posting times → likely time zone of residence
  • Photo geotags (sometimes still in EXIF even when stripped from display)
  • "Likes" / comment targets → close relationships
  • Cross-matching usernames, phone numbers, and emails across networks for identity reconciliation
Bulk-scan social networks by username or email
# Sherlock — search for a username across hundreds of sites at once $ sherlock johndoe [+] GitHub: https://github.com/johndoe [+] Reddit: https://reddit.com/user/johndoe [+] Instagram: https://instagram.com/johndoe # Maigret — Sherlock fork, even more sites $ maigret johndoe --top-sites 500 # holehe — infer services an email is registered to $ holehe target@example.com # GHunt — Google account public info from a Gmail address $ ghunt email target@gmail.com
07

Domain, IP, and certificate OSINT #

Mapping an organisation's infrastructure from the outside. The starting point for attacker recon, penetration testing, and threat intelligence alike.

WHOIS / DNS / subdomain enumeration
# WHOIS — domain registration info $ whois example.com # DNS — assorted records $ dig example.com ANY +noall +answer $ dig +short MX example.com $ dig +short TXT example.com # SPF, DKIM, DMARC # Subdomain enumeration $ subfinder -d example.com -all -silent $ amass enum -d example.com
Certificate Transparency logs and Shodan
# crt.sh — every certificate ever issued (hidden subdomains pop out) $ curl -s "https://crt.sh/?q=%25.example.com&output=json" \ | jq -r '.[].name_value' | sort -u # Shodan — ports, banners, and vulnerabilities of publicly-exposed hosts $ shodan host 93.184.216.34 $ shodan search "Server: Apache" port:80 country:JP # Censys — competitor with very strong TLS-certificate indexing $ censys search 'services.tls.certificates.leaf_data.subject.common_name: "example.com"' # Wayback Machine — deleted historical pages $ curl -s "http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json"

Metadata and EXIF analysis #

Photos, PDFs, Office documents, and video all carry a great deal of metadata.

EXIF / Office metadata extraction
# Image EXIF — camera model, timestamp, GPS coordinates, serial number $ exiftool photo.jpg # PDF / Office — author, last editor, software, revision history $ exiftool report.pdf # Video $ ffprobe -v error -show_format -show_streams video.mp4 # Bulk-collect metadata across a whole site $ metagoofil -d example.com -t pdf,doc,xls -l 100 -o results

EXIF is very commonly forgotten. There are many press and investigation cases where the author of an internal PDF revealed someone inside the organisation.

08

Major tools and search engines #

All-in-one frameworks #

Tool Use case Licence
Maltego Graph-visualisation OSINT IDE; Transforms unify various sources as nodes Commercial (Community edition available)
SpiderFoot Automated OSINT collection across 200+ modules; HX edition is cloud-hosted Open / commercial (HX)
Recon-ng Metasploit-like interactive CLI, modular Open

Target-specific tools #

Tool Input Output
theHarvester Domain Emails / subdomains / employee names
Sherlock / Maigret Username Presence check across hundreds of social networks
holehe Email address List of registered services
GHunt Gmail address Public Google-account information
OSINT Framework (osintframework.com) Categorised link directory of tools by purpose

Specialised search engines #

Service Searches
Shodan Internet-exposed hosts / services / banners
Censys TLS certificate index, hosts, subdomains
ZoomEye Chinese-built equivalent of Shodan
Wayback Machine Historical Web (1996–present)
GreyNoise Classification of "Internet background noise" (scanner IPs)
Have I Been Pwned Check whether an email / password appears in breaches
09

Worked example workflows #

(1) From a domain to a full picture of an organisation's infrastructure #

Starting point: "We want to pentest Company X's site (with authorisation)."

1. WHOIS for registration info
whois example.com for contact info and registration date.
2. Subdomain enumeration
Combine subfinder / amass with crt.sh to find hidden assets like vpn-staging.example.com.
3. Shodan for exposure
Check port:443/22/3389 against each IP — discover exposed RDP, old OpenSSH.
4. Build the employee list
Use theHarvester to collect email addresses from LinkedIn / Google → an employee roster.
5. Breach history
Submit employee emails to Have I Been Pwned to estimate the risk of password reuse.
6. GitHub secrets
Search org:example-corp for committed tokens, internal hostnames, and leaked API keys.

Aggregating each step into a Maltego or SpiderFoot graph gives you a single picture of "the relationships between infrastructure and people, anchored at the domain".

(2) Pinpointing time and place from a single photograph #

Verifying photos posted to social media as evidence of an incident or illegal act. Even without EXIF, you can narrow the location down by combining background text, building style, vegetation, power poles, and the sun's position. For the MH17 case, Bellingcat used exactly this technique to reconstruct the path of the Russian Buk system minute by minute.

10

Ethics, law, and Counter-OSINT #

OSINT may be "public information only", but the line between what you may and may not do is blurry, drawn by both national law and professional ethics.

▸ Areas that are illegal or borderline in many countries
  • Unauthorised access — browsing public pages is fine, but guessing IDs / passwords to log in falls under the Unauthorised Computer Access Act (Japan) / CFAA (US)
  • Secondary use or sale of personal information — GDPR / Japan's Amended PIPA / CCPA require a legitimate purpose
  • Stalking and harassment — depending on how collected information is used, it can become a crime
  • CSAM-related content — possession itself is criminal (no investigative-purpose exception)
  • Redistributing leaked data — checking your own email is fine; redistributing someone else's leak data is not
  • Face-recognition OSINT — Clearview AI has been ruled illegal in the EU; PimEyes is under continuing debate

Ethics code #

Guidelines broadly endorsed by investigative journalism and the security industry:

  • Legitimate purpose — does it fall under public interest / contractual work / self-defence?
  • Proportionality — is the depth of the investigation excessive relative to the goal?
  • Minimisation — collect only what's necessary; discard unrelated third-party data
  • Triangulation — never conclude from a single source; require at least three independent ones
  • Eliminate false identification — stay constantly alert to the risk of misidentification (people with the same name, mistaken inferences)

Counter-OSINT — at the personal level #

If attackers can assemble a profile with OSINT, defenders should make sure they can't.

  • Tighten social-media privacy settings to at least "friends only"; be cautious about real name, employer, school
  • Strip EXIF before posting (Discord and Telegram do not auto-strip)
  • Disable geotagging in OS settings
  • Use different usernames per social network — using the same handle everywhere makes Sherlock-style reconciliation a one-shot
  • Compartmentalise email and phone — primary / secondary / burner is a reasonable split
  • Enable Have I Been Pwned notifications

Counter-OSINT — at the corporate level #

  • WHOIS privacy to mask registration details
  • Certificate-issuance policy — keep staging. and dev. out of crt.sh (use an internal CA)
  • Always-on GitHub / GitLab secrets scanning
  • Shodan self-monitoring — periodically scan your own IPs for unexpected services
  • OSINT red-team exercises — OSINT your own organisation from the outside
11

Summary #

OSINT is not a "magic tool" — it is the combination of how you frame questions + how efficiently you collect public information + how rigorously you triangulate + how you make ethical calls. As Bellingcat and wartime OSINT have shown, even citizens can get close to the truth of historical events, but because it directly touches the privacy, safety, and reputation of the subject, you have to constantly ask "can I" vs "should I".

For security practitioners, OSINT is the phase attackers will run first — which is precisely why defenders are obligated to understand how their own organisation looks from the outside. Applying the techniques in this article to yourself is the best introduction to OSINT, and the best defence.

𝕏 Post B! Hatena