A (human) index that likes to code
Also drinks way too much coffee
Published Oct 08, 2023 21:51
Good morning!
Suppose a hypothetical situation where you gained access to a Python REPL on some server, somewhere. The REPL is artificially limited such that you have no access to any file or networking.
Given that you are in a REPL, you can theoretically write any program you want; however, you are lazy to write such a program, and instead wish to run an arbitrary executable. After running some REPL commands, like:
>>> import sys
>>> sys.version
'3.10.12 (main, Jun 11 2023, 00:00:00) [GCC 11.4.0]'
>>> sys.platform
'linux'
You realize the following:
As someone who knows how to code, surely you can whip up a script that would can execute any arbitrary binary file even under these conditions, right?
On Unix and Windows, Python supports an executable, as long as a file path is specified. This is typically done with the following recipe:
import os
os.execv("/bin/echo", ["-e", "hello world"])
The code above causes the /bin/echo
to replace the current process
immediately and prints “hello world”. After /bin/echo
quits, so does Python.
Great, problem solved, right? Unfortunately, the oddly specific constraints
stated above has explicitly denied access to files, which includes the
/bin/echo
executable.
Okay, so maybe we include the executable as part of the script instead. Since we know that the REPL runs on Linux, we spin up a Docker container, and begin experimenting.
First, we get the /bin/echo
program as bytes:
>>> data = open('/bin/echo', 'rb').read()
>>> data
b'\x7fELF\x02\x01\x01\x00...'
Backslashes looks really scary, so lets convert it to a Base64 encoded string:
>>> import base64
>>> data_str = base64.b64encode(data)
>>> data_str
b'f0VMRgIBAQAAAAAAAAAAAAMA...'
Great, let’s copy the whole string and keep it in our clipboard for now.
Next, we whip up a script, and write:
import os
import base64
bin_file = base64.b64decode(b'f0VMRgIBAQAAAAAAAAAAAAMA...')
os.execv(bin_file, ['-e', 'hello world'])
And the run the program, and oh…
Traceback (most recent call last):
File "your_file_here.py", line 1, in <module>
ValueError: execv: embedded null character in path
Turns out, even with all the bytes of the executable, you can’t just run it;
Python’s os.exec*
series of functions only support executables specified as
paths.
Well…
That statement is only half-true. As of Python version 3.3, the os.execve
official Python
documentation supports a
file descriptor.
According to this StackOverflow answer, a file descriptor is an entry created by the OS when a resource (e.g. file or sockets) is opened. This entry stores information about the resource, which includes how to access it. On Windows, this is also known as a handle.
The file descriptors on Unix can be found in /proc/<pid>/fd
, where <pid>
is
the process ID of the current process. Each file descriptor is represented by
an integer.
Okay, but why is this important? Because the standard streams, i.e. standard input, standard output and standard error all have their own file descriptors, which are 0, 1, and 2 respectively.
Notably, those standard streams definitely don’t occupy disk space; the file
descriptor to these standard streams simply represent the concept of those
streams (StackOverflow). Even though the
files /dev/stdout
, /dev/stdin
, and /dev/stderr
exist, they actually point
to /proc/self/fd/<0/1/2>
, which is basically /proc/<pid>/fd/<0/1/2>
, the
file descriptors in question.
In some sense, you can say that these streams exist in-memory (they’re technically buffered there, according to this Quora post).
Now, answer me this: what happens if I pass os.execve
a file descriptor
pointing to a resource that has executable content?
The theoretical answer: we can execute things.
Let’s run an experiment on a computer we have full access to.
We create two files; redirect.py
, which basically redirects the standard
input to standard output, and execute.py
, which spawns the redirect.py
subprocess, then attaches pipes to the standard output of redirect.py
.
execute.py
will write the Base64 string to redirect.py
, and redirect.py
will respond with the raw bytes.
We have to do it this way, because
sys.stdin.read()
reads strings instead of bytes, which causes issues when trying to pass an entire executable. Withsys.stdout.buffer.write()
, we can write raw bytes into the standard output. Since we hijackexecute.py
with pipes, we can also receive raw bytes fromredirect.py
.
redirect.py
:
import base64
import sys
r = base64.b64decode(sys.stdin.read())
sys.stdout.buffer.write(r)
sys.stdout.flush()
sys.stdout.close()
In execute.py
:
import os
import subprocess
bin_file = b'f0VMRgIBAQAAAAAAAAAAAAMA...'
process = subprocess.Popen(['python', 'redirect.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
process.stdin.write(bin_file)
process.stdin.close()
os.execve(process.stdout.fileno(), ['-e', 'hello world'], {})
Giving a quick whirl, we see… oh…
Traceback (most recent call last):
File "execute.py", line 10, in <module>
os.execve(process.stdout.fileno(), ['-e', 'hello world'], {})
PermissionError: [Errno 13] Permission denied: 5
Looking at this AskPython article, it seems like this error happens when:
Given that we’re using one of the standard streams, surely the file descriptor points to something that actually exists; and given standard streams are exclusive to processes, we couldn’t have concurrent reads.
Hence, the only logical explanation stems from us receiving a permissions error. However, that conclusion is relatively ill-conceived - how do we assign permissions to a pipe?
After calling os.stat
on both the process.stdout.fileno()
file descriptor
and a normal executable file descriptor, we discover that there are indeed
indicators on the file mode that differentiates a stream to an actual file on
the system.
In fact, it is possible to use os.chmod
to change process.stdout.fileno()
’s
file mode, but that will still not yield a working result.
So, end of the road? Can’t be done? Not quite.
We have just established that we need files; the operating system has to understand that the file descriptor points to a resource that is meant to be a file.
This would mean that creating a temporary file would work; however, since we don’t have write access to the filesystem, as constrained by above, we can’t do that. Instead, we simply create a file in memory.
But how?
If we look carefully under the Linux kernel manual, under the sys/mman.h
header file, we see that there is an interesting function by the name of
memfd_create
. Here is a link to that
manpage. The
manpage describes that:
And wouldn’t you know it, Python’s os
module has a memfd_create
function!
Here’s the plan:
os.execve
and we’re off the races!Here is the final script:
# script.py
import base64
import os
import sys
bin_file = base64.b64decode(b'f0VMRgIBAQAAAAAAAAAAAAMA...')
in_mem_fd = os.memfd_create("bin_name", os.MFD_CLOEXEC)
os.write(in_mem_fd, bin_file)
os.lseek(in_mem_fd, 0, os.SEEK_SET)
os.execve(in_mem_fd, ['-e', 'hello world'], {})
Finally, running the script will net us the result we were expecting:
$ python3 script.py
hello world
What are the implications of this? For starters, you can embed any kind of executable into a Python script. In the case of a malware, the script can download any random executable from the internet, and run it without leaving a file trace on your computer.
With enough trickery, the script can also hijack standard input and standard output of the embedded executable, with the UI being indistinguishable from just running the executable directly.
On a lighter note, you can, in theory, package your entire suite of applications into a single Python script. It isn’t feasible in production, sure, but you can rest well knowing that it is indeed, possible.
Nevertheless, I hope this little fun adventure was entertaining to read. Until next time!
Happy Coding,
CodingIndex