Recently, like everyone, I’m playing with LLMs. I’ve read several articles about how GitHub Copilot writes upwards of 80% of their code. I found this intriguing, so I signed up for the free trial. I tinkered with it while coding a new feature, in Swift, for ESEV. Honestly, I struggled to get it to write even 10% of my code.
I decided to try Copilot again, this time with a completely different project. Let’s see if it sticks this time.
I chose a simple Python script, for two primary reasons:
- GitHub Copilot recommends Python
- I prefer Python
The Python script explores macOS executable entitlements. It scans the macOS file system looking for executables, extracts the embedded entitlements and displays them or stores them to a local database. For more details about the entitlements or the database please read our post on entitlements (coming soon) .
TLDR;
I found Copilot much more useful with Python than I did with Swift. I attribute this to three primary factors:
- My familiarity with the Python vs Swift
- Knowing how to ask for what I wanted in Python
- Copilot trains on a larger corpus of Python code vs Swift
Copilot does well at generating standard code, like sqlite3 and directory traversal. It struggles with niche modules like lief
, where it invents attributes and functions that don’t exist in the library. I was disappointed that code suggestions omitted exception handling. To Copilot’s credit, it began suggesting exception handling after I manually added it to the script.
Experienced programmers can leverage Copilot to move quickly, with well supported languages.
Newer programmers may slow their learning by pair programming with Copilot. They may write more lines of code per hour but might lack a deep understanding of the codebase. This could increase debugging and troubleshooting latency as well as introduce more bugs. Why do I think this? I believe, without using Copilot, I would’ve learned more about the lief
library. When you skip reading the documentation you reduce the likelihood that you’ll learn the edge cases, full capabilities and limitations of the library. This is no different than if you copy/paste a snippet of code from stackoverflow without understanding how it works and why.
I can imagine a not-too-distant future where Copilot highlights edge cases, inefficient code and alternative methods for the developer. Making Copilot as much a teacher as a code snippet generator.
I found Copilot extremely approachable and easy to configure and use.
Using Copilot
It couldn’t be much easier. There are several supported IDEs. I chose Visual Studio Code (VSCode). Copilot Quickstart.
- Created a trial account
- Installed the plugin (authenticated)
- Enabled it for Python files
As you type code or comments Copilot will make suggestions in grey text. Pressing the tab
key will insert the suggested code. There are multiple keyboard shortcuts I’ll leave to you to explore. I invoked suggestions by writing a comment or function signature, at which point Copilot would automatically provide a suggestion in grey text. These suggestions frequently do take a few seconds, especially when asking for it to write an entire function.
# function traverses the file system looking for executable files
or
def find_executables(path):
Copilot Hallucinations
Everything looks like a class
Zero formats told
Same here same there
Exception huh?
Copilot learns from you
Conclusion
Copilot Hallucinations
Copilot commonly invented library attributes and functions that didn’t actually exist.
A hallucination occurred when I instructed Copilot to create a function using # using the lief binary object determine if the binary is a dynamic library
.
I didn’t save the entire suggestion, but it was reasonable and concise containing an attribute check for binary.is_dylib
. An attribute flag in the lief.MachO.Binary class. Awesome, the object already has an attribute for my use case. Upon executing the script I received an AttributeError
exception because is_dylib
doesn’t exist in the lief.MachO.Binary class. Copilot simply made it up.
A similar situation arose while trying to create a function that would determine if the Hardened Runtime flag was set in the CodeDirectory
structure memoryview buffer.
I started at a high level with the binary object itself, just typing def is_hardened(binary):
. Copilot does the same thing here with is_hardened
. It is not an attribute of code_signature.
def is_hardened(binary):
""" Returns True if binary is hardened """
return binary.has_code_signature and binary.code_signature is not None and binary.code_signature.is_hardened
Next, I tried a more descriptive function name code_directory_has_hardened_runtime
. Copilot presented:
def code_directory_has_hardened_runtime(code_directory):
return code_directory.has_flag(lief.MachO.CODE_SIGN_DRS)
The problem here is code_directory has no method has_flag()
and the lief.MachO class has no attribute CODE_SIGN_DRS
. There is a MachO command named DYLIB_CODE_SIGN_DRS though.
I gave the method name another shot and change it to code_directory_has_hardened_runtime_flag
. Maybe the flag
keyword will help nudge Copilot in the right direction.
def code_directory_has_hardened_runtime_flag(code_directory):
return code_directory.flags & lief.MachO.CODE_SIGNATURE_FLAGS.HARD
Visually closer to what I’d expect, but again, CODE_SIGNATURE_FLAGS
does not exist. That said, HARD
could be a flag name, but would most likely represent the forceHard flag rather than the hardened runtime flag.
Even though the structure name is CS_CodeDirectory, Apple calls them SecCodeSignatureFlags
in the documentation. So I tried naming the function code_signature_has_hardened_runtime_flag_set
.
def code_signature_has_hardened_runtime_flag_set(code_signature):
return any(code_directory_has_hardened_runtime_flag(cd) for cd in code_signature.code_directories)
With this suggestion, I’m further away from a viable solution so I stopped using only the function name to invoke code creation. NOTE using the function name to invoke code creation worked better as the script grew and the purpose of the function was more simplistic like with save_bytes_to_file(filepath, filebytes)
.
Still stuck on trying to get Copilot to write the Hardened Runtime flag detection function, I tried a more descriptive comment. Apple’s source code shows us the structure of CS_CodeDirectory:
typedef struct __CodeDirectory {
uint32_t magic; /* magic number (CSMAGIC_CODEDIRECTORY) */
uint32_t length; /* total length of CodeDirectory blob */
uint32_t version; /* compatibility version */
uint32_t flags; /* setup and mode flags */
uint32_t hashOffset; /* offset of hash slot element at index zero */
uint32_t identOffset; /* offset of identifier string */
uint32_t nSpecialSlots; /* number of special hash slots */
uint32_t nCodeSlots; /* number of ordinary (code) hash slots */
uint32_t codeLimit; /* limit to main image signature range */
uint8_t hashSize; /* size of each hash in bytes */
uint8_t hashType; /* type of hash (cdHashType* constants) */
uint8_t spare1; /* unused (must be zero) */
uint8_t pageSize; /* log2(page size in bytes); 0 => infinite */
uint32_t spare2; /* unused (must be zero) */
/* followed by dynamic content as located by offset fields above */
} CS_CodeDirectory;
What we’re interested in is the uint32_t flags
value. I used the comment # parse the CS_CodeDirectory memoryview buffer and determine if the hardened runtime flag has been set
and this produced:
def runtime_flag(memoryview):
# parse the CodeDirectory header
cdh = lief.MachO.CodeSignature.CodeDirectory.from_buffer(memoryview)
# check if the hardened runtime flag is set
if cdh.flags & lief.MachO.CodeSignature.DyldInfoFlags.HARDENED_RUNTIME:
return True
else:
return False
Yet again, more hallucinations. CodeDirectory
is not an attribute of CodeSignature
and DyldInfoFlags
nor does HARDENED_RUNTIME
exist in the lief
library - at least that I could find.
To get closer to a viable answer, I created a variable kSecCodeSignatureRuntime = 0x10000
. Using that and a more instructive comment I received:
kSecCodeSignatureRuntime = 0x10000
# code_directory is a memoryview object with a structure matching CodeDirectory.
# using the flags attribute of CodeDirecty return whether the runtime bit is set
def is_runtime(code_directory):
return code_directory.flags & kSecCodeSignatureRuntime
Getting closer. Let me make the comment a bit more instructive on how I want it done:
# code_directory is a memoryview object with a structure matching CodeDirectory.
# first extract the flags integer from the code_directory object and return whether the runtime bit is set
def is_runtime(code_directory):
flags = code_directory["flags"]
return flags & kSecCodeSignatureRuntime
Looking better yet, but code_directory is not a dictionary and I’d prefer the function returned a boolean instead of an integer. To encourage this I used type hints in the function signature.
# code_directory is a memoryview object with a structure matching CodeDirectory.
# first extract the flags integer from the code_directory object and return whether the runtime bit is set
def is_runtime(code_directory: memoryview) -> bool:
flags = code_directory[0x14:0x18]
return flags & kSecCodeSignatureRuntime == kSecCodeSignatureRuntime
Better! The offsets are incorrect and it didn’t convert the buffer to an int using int.from_bytes()
, but it has the correct structure and returns a boolean. I’d prefer parens rather than relying on precedence, but this is an adequate ending to a long string of hallucinations.
Everything looks like a class
I tried a slightly different tack with determining Hardened Runtime than above. I created a comment # parse the CodeDirectory buffer
. It looks a little more like I’d expect, but again it’s assuming the buffer is a class
, not a memoryview
buffer. The caller of this function passes in a memoryview
object.
# parse the CodeDirectory buffer
def parse_code_directory(code_directory):
""" Parses code directory looking for hardened runtime """
if code_directory is None:
return None
# print('Code directory: {}'.format(code_directory))
# print('Code directory magic: {}'.format(hex(code_directory.magic)))
# print('Code directory version: {}'.format(code_directory.version))
# print('Code directory flags: {}'.format(hex(code_directory.flags)))
# print('Code directory hash offset: {}'.format(code_directory.hash_offset))
# print('Code directory hash size: {}'.format(code_directory.hash_size))
# print('Code directory hash type: {}'.format(code_directory.hash_type))
It’s unfair for me to assume it knows “the buffer” is a memoryview
object. Adjusting the comment I more concretely told it the type memoryview
and it presented:
# parse the memoryview CodeDirectory buffer
def parse_code_directory(code_directory):
# parse the CodeDirectory header
cdh = lief.MachO.CodeSignature.CodeDirectory.from_buffer(code_directory)
# parse the CodeDirectory hash slots
hash_slots = []
for i in range(cdh.n_special_slots):
hash_slot = lief.MachO.CodeSignature.SuperBlob.HashSlot.from_buffer(code_directory[cdh.hash_offset + i * lief.MachO.CodeSignature.SuperBlob.HashSlot.sizeof()])
hash_slots.append(hash_slot)
# parse the CodeDirectory code slots
code_slots = []
for i in range(cdh.n_code_slots):
code_slot = lief.MachO.CodeSignature.SuperBlob.CodeSlot.from_buffer(code_directory[cdh.code_limit + i * lief.MachO.CodeSignature.SuperBlob.CodeSlot.sizeof()])
code_slots.append(code_slot)
# parse the CodeDirectory requirements
requirements = []
for i in range(cdh.n_special_slots, cdh.n_special_slots + cdh.n_requirements):
requirement = lief.MachO.CodeSignature.SuperBlob.Requirement.from_buffer(code_directory[cdh.requirements_offset + (i - cdh.n_special_slots) * lief.MachO.CodeSignature.SuperBlob.Requirement.sizeof()])
requirements.append(requirement)
# parse the CodeDirectory entitlements
entitlements = lief.MachO.CodeSignature.SuperBlob.Entitlement.from_buffer(code_directory[cdh.entitlements_offset:])
return cdh, hash_slots, code_slots, requirements, entitlements
Again. All kinds of class stuff that just doesn’t exist. It does use code_directory as a buffer input to the from_buffer
function. In turn, this generates a fake class that it uses throughout the rest of the code. Looking at the comments in the Copilot-generated code makes sense for parsing the CodeDirectory structure. For example, getting the hash slots and code slots. Entitlements and requirements are sibling blobs to CodeDirectory and therefore would not be part of parsing CodeDirectory itself.
Rather than trying to get it to parse the CodeDirectory I decided to try something more simple. I asked Copilot to simply give me the flag field from the buffer as an integer.
# parse the memoryview code_directory buffer and extract flags as integer
def get_flags(code_directory):
return int.from_bytes(code_directory[0x18:0x1c], byteorder='little')
Woot! The offsets are wrong and so is the byteorder
parameter, but we’re roughly at the code we needed.
Zero formats told
I have a few format strings that I always remember, like hex with padding and float values with precision. But I forget the exact syntax for the ones I use less frequently. What will Copilot do when instructed in English on how to format strings? The answer is nothing best I could tell but I’m likely asking incorrectly.
Starting code was:
# parse the CodeDirectory buffer
def get_cd_flags(cd_buf):
cd_flags = lief.MachO.CodeSignature.CDFlags()
cd_flags.parse(cd_buf)
return cd_flags
cd_flags = get_cd_flags(b"\x00\x00\x00\x00")
I then asked for it to # print cd_flags as an 0x hex string
. This gave me nothing. No suggestions. I then simplified it to # print cd_flags as a hex string
and it returned print(cd_flags)
. I hoped formatting strings might be a strength. Again, might just be that I need to learn how to ask it correctly.
Same here same there
A global variable in the script named save_code_signature
exists. The class lief.MachO.Binary
has an attribute code_signature
. The code completions from Copilot kept inserting code with binary.save_code_signature
where it meant binary.code_signature
. At first, I experienced several runtime errors glossing over that subtle mistake.
Exception, huh?
Copilot impressed me with it’s sqlite3 code suggestions, but none of them handled exceptions. Neither did any of the file creation code snippets.
That said, after I added simple exception handling into the script, Copilot then began producing code snippets with exception handling. Neat to see it learn from the script itself.
I wanted to print out the sqlite3 stats after the script was nearing completion. Copilot created the function below including the exception handling.
def print_sqlite_stats():
""" Prints stats from sqlite database """
try:
with sqlite3.connect('entitlements.sqlite3') as conn:
c = conn.cursor()
c.execute('SELECT count(*) FROM entitlements')
print('Entitlements: {}'.format(c.fetchone()[0]))
c.execute('SELECT count(*) FROM cd_flags')
print('Code Directory Flags: {}'.format(c.fetchone()[0]))
c.execute('SELECT count(*) FROM errors')
print('Errors: {}'.format(c.fetchone()[0]))
c.execute('SELECT count(*) FROM raw')
print('Raw: {}'.format(c.fetchone()[0]))
except sqlite3.Error as e:
pass
Copilot learns from you
As the code grew, Copilot learned. We saw a little of that above with the exception handling. When I created the second “extract” function, Copilot modeled its code suggestion after the first. I also noticed that Copilot adjusted its comments to mimic the style and language as those already in the script.
Conclusion
Overall, when I tested Copilot with Swift I was indifferent. But, after this experiment with Python, I find it pretty nifty. It’s especially adept at boilerplate code like:
- File manipulation
- SQL APIs
- Directory traversal
- Program argument parsing
Those tasks we’ve done a million times, but have to look up the API subtleties or syntax. A perfect example was the print_sqlite_stats
. It had been some years since I used sqlite3 so the exact syntax wasn’t top of mind. Although it wasn’t exactly what I wanted, I simply tweaked a couple of things. The more I use Copilot the better I think I’ll get at formatting the comments and function names.
I think a programming n00b might struggle with Copilot. Noticing issues and knowing what is fabricated might be harder for a n00b to decipher.
Ultimately, the code of the final script looks much different than I would have produced on my own. Part of that is me leaving Copilot’s inefficient code. Some of that is me not knowing how to ask Copilot for code I would have written. Overall though, it produced mostly working code which I left as is.
The code co-created in this project can be found on GitHub.
Highly recommend giving Copilot a spin.