nubco software

Recently, like everyone, I’m playing with LLMs. I’ve read several articles about how GitHub Copilot writes upwards of 80% of their code. I found this intriguing, so I signed up for the free trial. I tinkered with it while coding a new feature, in Swift, for ESEV. Honestly, I struggled to get it to write even 10% of my code.

I decided to try Copilot again, this time with a completely different project. Let’s see if it sticks this time.

I chose a simple Python script, for two primary reasons:

GitHub Copilot recommends Python
I prefer Python

The Python script explores macOS executable entitlements. It scans the macOS file system looking for executables, extracts the embedded entitlements and displays them or stores them to a local database. For more details about the entitlements or the database please read our post on entitlements (coming soon) .

TLDR;

I found Copilot much more useful with Python than I did with Swift. I attribute this to three primary factors:

My familiarity with the Python vs Swift
Knowing how to ask for what I wanted in Python
Copilot trains on a larger corpus of Python code vs Swift

Copilot does well at generating standard code, like sqlite3 and directory traversal. It struggles with niche modules like lief, where it invents attributes and functions that don’t exist in the library. I was disappointed that code suggestions omitted exception handling. To Copilot’s credit, it began suggesting exception handling after I manually added it to the script.

Experienced programmers can leverage Copilot to move quickly, with well supported languages.

Newer programmers may slow their learning by pair programming with Copilot. They may write more lines of code per hour but might lack a deep understanding of the codebase. This could increase debugging and troubleshooting latency as well as introduce more bugs. Why do I think this? I believe, without using Copilot, I would’ve learned more about the lief library. When you skip reading the documentation you reduce the likelihood that you’ll learn the edge cases, full capabilities and limitations of the library. This is no different than if you copy/paste a snippet of code from stackoverflow without understanding how it works and why.

I can imagine a not-too-distant future where Copilot highlights edge cases, inefficient code and alternative methods for the developer. Making Copilot as much a teacher as a code snippet generator.

I found Copilot extremely approachable and easy to configure and use.

Using Copilot

It couldn’t be much easier. There are several supported IDEs. I chose Visual Studio Code (VSCode). Copilot Quickstart.

Created a trial account
Installed the plugin (authenticated)
Enabled it for Python files

As you type code or comments Copilot will make suggestions in grey text. Pressing the tab key will insert the suggested code. There are multiple keyboard shortcuts I’ll leave to you to explore. I invoked suggestions by writing a comment or function signature, at which point Copilot would automatically provide a suggestion in grey text. These suggestions frequently do take a few seconds, especially when asking for it to write an entire function.

# function traverses the file system looking for executable files

def find_executables(path):

Copilot Hallucinations

Everything looks like a class

Zero formats told

Same here same there

Exception huh?

Copilot learns from you

Conclusion

Copilot Hallucinations

Copilot commonly invented library attributes and functions that didn’t actually exist.

A hallucination occurred when I instructed Copilot to create a function using # using the lief binary object determine if the binary is a dynamic library.

I didn’t save the entire suggestion, but it was reasonable and concise containing an attribute check for binary.is_dylib. An attribute flag in the lief.MachO.Binary class. Awesome, the object already has an attribute for my use case. Upon executing the script I received an AttributeError exception because is_dylib doesn’t exist in the lief.MachO.Binary class. Copilot simply made it up.

A similar situation arose while trying to create a function that would determine if the Hardened Runtime flag was set in the CodeDirectory structure memoryview buffer.

I started at a high level with the binary object itself, just typing def is_hardened(binary):. Copilot does the same thing here with is_hardened. It is not an attribute of code_signature.

def is_hardened(binary):
    """ Returns True if binary is hardened """
    return binary.has_code_signature and binary.code_signature is not None and binary.code_signature.is_hardened

Next, I tried a more descriptive function name code_directory_has_hardened_runtime. Copilot presented:

def code_directory_has_hardened_runtime(code_directory):
    return code_directory.has_flag(lief.MachO.CODE_SIGN_DRS)

The problem here is code_directory has no method has_flag() and the lief.MachO class has no attribute CODE_SIGN_DRS. There is a MachO command named DYLIB_CODE_SIGN_DRS though.

I gave the method name another shot and change it to code_directory_has_hardened_runtime_flag. Maybe the flag keyword will help nudge Copilot in the right direction.

def code_directory_has_hardened_runtime_flag(code_directory):
    return code_directory.flags & lief.MachO.CODE_SIGNATURE_FLAGS.HARD

Visually closer to what I’d expect, but again, CODE_SIGNATURE_FLAGS does not exist. That said, HARD could be a flag name, but would most likely represent the forceHard flag rather than the hardened runtime flag.

Even though the structure name is CS_CodeDirectory, Apple calls them SecCodeSignatureFlags in the documentation. So I tried naming the function code_signature_has_hardened_runtime_flag_set.

def code_signature_has_hardened_runtime_flag_set(code_signature):
    return any(code_directory_has_hardened_runtime_flag(cd) for cd in code_signature.code_directories)

With this suggestion, I’m further away from a viable solution so I stopped using only the function name to invoke code creation. NOTE using the function name to invoke code creation worked better as the script grew and the purpose of the function was more simplistic like with save_bytes_to_file(filepath, filebytes).

Still stuck on trying to get Copilot to write the Hardened Runtime flag detection function, I tried a more descriptive comment. Apple’s source code shows us the structure of CS_CodeDirectory:

typedef struct __CodeDirectory {
	uint32_t magic;					/* magic number (CSMAGIC_CODEDIRECTORY) */
	uint32_t length;				/* total length of CodeDirectory blob */
	uint32_t version;				/* compatibility version */
	uint32_t flags;					/* setup and mode flags */
	uint32_t hashOffset;			/* offset of hash slot element at index zero */
	uint32_t identOffset;			/* offset of identifier string */
	uint32_t nSpecialSlots;			/* number of special hash slots */
	uint32_t nCodeSlots;			/* number of ordinary (code) hash slots */
	uint32_t codeLimit;				/* limit to main image signature range */
	uint8_t hashSize;				/* size of each hash in bytes */
	uint8_t hashType;				/* type of hash (cdHashType* constants) */
	uint8_t spare1;					/* unused (must be zero) */
	uint8_t	pageSize;				/* log2(page size in bytes); 0 => infinite */
	uint32_t spare2;				/* unused (must be zero) */
	/* followed by dynamic content as located by offset fields above */
} CS_CodeDirectory;

What we’re interested in is the uint32_t flags value. I used the comment # parse the CS_CodeDirectory memoryview buffer and determine if the hardened runtime flag has been set and this produced:

def runtime_flag(memoryview):
    # parse the CodeDirectory header
    cdh = lief.MachO.CodeSignature.CodeDirectory.from_buffer(memoryview)
    # check if the hardened runtime flag is set
    if cdh.flags & lief.MachO.CodeSignature.DyldInfoFlags.HARDENED_RUNTIME:
        return True
    else:
        return False

Yet again, more hallucinations. CodeDirectory is not an attribute of CodeSignature and DyldInfoFlags nor does HARDENED_RUNTIME exist in the lief library - at least that I could find.

To get closer to a viable answer, I created a variable kSecCodeSignatureRuntime = 0x10000. Using that and a more instructive comment I received:

kSecCodeSignatureRuntime = 0x10000

# code_directory is a memoryview object with a structure matching CodeDirectory.
# using the flags attribute of CodeDirecty return whether the runtime bit is set
def is_runtime(code_directory):
    return code_directory.flags & kSecCodeSignatureRuntime

Getting closer. Let me make the comment a bit more instructive on how I want it done:

# code_directory is a memoryview object with a structure matching CodeDirectory.
# first extract the flags integer from the code_directory object and return whether the runtime bit is set 
def is_runtime(code_directory):
    flags = code_directory["flags"]
    return flags & kSecCodeSignatureRuntime

Looking better yet, but code_directory is not a dictionary and I’d prefer the function returned a boolean instead of an integer. To encourage this I used type hints in the function signature.

# code_directory is a memoryview object with a structure matching CodeDirectory.
# first extract the flags integer from the code_directory object and return whether the runtime bit is set 
def is_runtime(code_directory: memoryview) -> bool:
    flags = code_directory[0x14:0x18]
    return flags & kSecCodeSignatureRuntime == kSecCodeSignatureRuntime

Better! The offsets are incorrect and it didn’t convert the buffer to an int using int.from_bytes(), but it has the correct structure and returns a boolean. I’d prefer parens rather than relying on precedence, but this is an adequate ending to a long string of hallucinations.

Everything looks like a class

I tried a slightly different tack with determining Hardened Runtime than above. I created a comment # parse the CodeDirectory buffer. It looks a little more like I’d expect, but again it’s assuming the buffer is a class, not a memoryview buffer. The caller of this function passes in a memoryview object.

# parse the CodeDirectory buffer 
def parse_code_directory(code_directory):
    """ Parses code directory looking for hardened runtime """
    if code_directory is None:
        return None

    # print('Code directory: {}'.format(code_directory))
    # print('Code directory magic: {}'.format(hex(code_directory.magic)))
    # print('Code directory version: {}'.format(code_directory.version))
    # print('Code directory flags: {}'.format(hex(code_directory.flags)))
    # print('Code directory hash offset: {}'.format(code_directory.hash_offset))
    # print('Code directory hash size: {}'.format(code_directory.hash_size))
    # print('Code directory hash type: {}'.format(code_directory.hash_type))

It’s unfair for me to assume it knows “the buffer” is a memoryview object. Adjusting the comment I more concretely told it the type memoryview and it presented:

# parse the memoryview CodeDirectory buffer
def parse_code_directory(code_directory):
    # parse the CodeDirectory header
    cdh = lief.MachO.CodeSignature.CodeDirectory.from_buffer(code_directory)

    # parse the CodeDirectory hash slots
    hash_slots = []
    for i in range(cdh.n_special_slots):
        hash_slot = lief.MachO.CodeSignature.SuperBlob.HashSlot.from_buffer(code_directory[cdh.hash_offset + i * lief.MachO.CodeSignature.SuperBlob.HashSlot.sizeof()])
        hash_slots.append(hash_slot)

    # parse the CodeDirectory code slots
    code_slots = []
    for i in range(cdh.n_code_slots):
        code_slot = lief.MachO.CodeSignature.SuperBlob.CodeSlot.from_buffer(code_directory[cdh.code_limit + i * lief.MachO.CodeSignature.SuperBlob.CodeSlot.sizeof()])
        code_slots.append(code_slot)

    # parse the CodeDirectory requirements
    requirements = []
    for i in range(cdh.n_special_slots, cdh.n_special_slots + cdh.n_requirements):
        requirement = lief.MachO.CodeSignature.SuperBlob.Requirement.from_buffer(code_directory[cdh.requirements_offset + (i - cdh.n_special_slots) * lief.MachO.CodeSignature.SuperBlob.Requirement.sizeof()])
        requirements.append(requirement)

    # parse the CodeDirectory entitlements
    entitlements = lief.MachO.CodeSignature.SuperBlob.Entitlement.from_buffer(code_directory[cdh.entitlements_offset:])

    return cdh, hash_slots, code_slots, requirements, entitlements

Again. All kinds of class stuff that just doesn’t exist. It does use code_directory as a buffer input to the from_buffer function. In turn, this generates a fake class that it uses throughout the rest of the code. Looking at the comments in the Copilot-generated code makes sense for parsing the CodeDirectory structure. For example, getting the hash slots and code slots. Entitlements and requirements are sibling blobs to CodeDirectory and therefore would not be part of parsing CodeDirectory itself.

Rather than trying to get it to parse the CodeDirectory I decided to try something more simple. I asked Copilot to simply give me the flag field from the buffer as an integer.

# parse the memoryview code_directory buffer and extract flags as integer
def get_flags(code_directory):
    return int.from_bytes(code_directory[0x18:0x1c], byteorder='little')

Woot! The offsets are wrong and so is the byteorder parameter, but we’re roughly at the code we needed.

Zero formats told

I have a few format strings that I always remember, like hex with padding and float values with precision. But I forget the exact syntax for the ones I use less frequently. What will Copilot do when instructed in English on how to format strings? The answer is nothing best I could tell but I’m likely asking incorrectly.

Starting code was:

# parse the CodeDirectory buffer
def get_cd_flags(cd_buf):
    cd_flags = lief.MachO.CodeSignature.CDFlags()
    cd_flags.parse(cd_buf)
    return cd_flags

cd_flags = get_cd_flags(b"\x00\x00\x00\x00")

I then asked for it to # print cd_flags as an 0x hex string. This gave me nothing. No suggestions. I then simplified it to # print cd_flags as a hex string and it returned print(cd_flags). I hoped formatting strings might be a strength. Again, might just be that I need to learn how to ask it correctly.

Same here same there

A global variable in the script named save_code_signature exists. The class lief.MachO.Binary has an attribute code_signature. The code completions from Copilot kept inserting code with binary.save_code_signature where it meant binary.code_signature. At first, I experienced several runtime errors glossing over that subtle mistake.

Exception, huh?

Copilot impressed me with it’s sqlite3 code suggestions, but none of them handled exceptions. Neither did any of the file creation code snippets.

That said, after I added simple exception handling into the script, Copilot then began producing code snippets with exception handling. Neat to see it learn from the script itself.

I wanted to print out the sqlite3 stats after the script was nearing completion. Copilot created the function below including the exception handling.

def print_sqlite_stats():
    """ Prints stats from sqlite database """
    try:
        with sqlite3.connect('entitlements.sqlite3') as conn:
            c = conn.cursor()
            c.execute('SELECT count(*) FROM entitlements')
            print('Entitlements: {}'.format(c.fetchone()[0]))
            c.execute('SELECT count(*) FROM cd_flags')
            print('Code Directory Flags: {}'.format(c.fetchone()[0]))
            c.execute('SELECT count(*) FROM errors')
            print('Errors: {}'.format(c.fetchone()[0]))
            c.execute('SELECT count(*) FROM raw')
            print('Raw: {}'.format(c.fetchone()[0]))
    except sqlite3.Error as e:
        pass

Copilot learns from you

As the code grew, Copilot learned. We saw a little of that above with the exception handling. When I created the second “extract” function, Copilot modeled its code suggestion after the first. I also noticed that Copilot adjusted its comments to mimic the style and language as those already in the script.

Conclusion

Overall, when I tested Copilot with Swift I was indifferent. But, after this experiment with Python, I find it pretty nifty. It’s especially adept at boilerplate code like:

File manipulation
SQL APIs
Directory traversal
Program argument parsing

Those tasks we’ve done a million times, but have to look up the API subtleties or syntax. A perfect example was the print_sqlite_stats. It had been some years since I used sqlite3 so the exact syntax wasn’t top of mind. Although it wasn’t exactly what I wanted, I simply tweaked a couple of things. The more I use Copilot the better I think I’ll get at formatting the comments and function names.

I think a programming n00b might struggle with Copilot. Noticing issues and knowing what is fabricated might be harder for a n00b to decipher.

Ultimately, the code of the final script looks much different than I would have produced on my own. Part of that is me leaving Copilot’s inefficient code. Some of that is me not knowing how to ask Copilot for code I would have written. Overall though, it produced mostly working code which I left as is.

The code co-created in this project can be found on GitHub.

Highly recommend giving Copilot a spin.

GitHub Copilot Reboot

TLDR;

Using Copilot

Copilot Hallucinations

Everything looks like a class

Zero formats told

Same here same there

Exception huh?

Copilot learns from you

Conclusion

Copilot Hallucinations

Everything looks like a class

Zero formats told

Same here same there

Exception, huh?

Copilot learns from you

Conclusion