party tricks to show off your esoteric rust knowledge

motivation

My friend Sasha asked for some Rust party tricks, so here are a few that I remember off of the top of my head.

totally-safe-transmute

This one was on reddit a while ago. The trick is that you can open /proc/self/mem as a file to freely edit process memory without unsafe.

undroppable types

I learned this from Lachlan. Putting a const-context panic inside a function's Drop implementation means that dropping it (e.g with drop or by letting it fall out of scope) will emit a compile error. The only way to 'get rid' of such a value is by std::mem::forget.

This only works for generic types, since the const expression is evaluated per monomorphization.

bootleg proc macro

The typical tool of choice for generating nontrivial amounts of code at compile time are proc macros. However, I did come across another method in the wild that works if you don't need to consume any AST tokens from the actual site of use: dynamically generating code at build time in the build.rs script, and using include! to include the generated code verbatim in the crate source. This trick is used in the dxf crate to automatically generate struct definitions from XML files.

The crate generates code in build.rs, and then include! it in the crate source.

The Rust include! macro works similarly to #include in C.

post-build scripts

On occasion, it is useful to run some arbitrary script after compilation has finished. For example, it may be useful to produce extra artifacts (such as a binary file to compliment a generated ELF executable). This particular action is common for firmware targets (a rejected RFC mentions wanting to do this for the AVR platform, which is a family of microcontrollers). While there is no anointed way to do this in Cargo, we can make rustc invoke a different linker executable, like a bash script that wraps rust-lld.

The crux of the trick is that rustc will invoke the linker with arguments that look like this:

rust-lld \
    -m64 {{ codegen units from our crate}} \
    -Wl,--as-needed -Wl,-Bstatic {{ bits of the Rust standard library }} \
    -Wl,-Bdynamic -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc -Wl,--eh-frame-hdr -Wl,-z,noexecstack -L {{more bits of std}} \

    -o /Users/ritikmishra/__workspace/test_linker_trick/target/x86_64-unknown-linux-gnu/release/deps/test_linker_trick-5c2307036ad5cb3c \

    -Wl,--gc-sections -pie -Wl,-z,relro,-z,now -Wl,-O1 -Wl,--strip-debug -nodefaultlibs

Most usefully, the path of the final executable is passed via the -o flag to the linker. As such, we can write the following linker_wrapper.sh Bash script in the crate's root folder (adjacent to Cargo.toml):

#!/bin/bash
set -euo pipefail

# Find the argument after '-o'
executable_path=""
args=("$@")
for ((i=0; i<${#args[@]}; i++)); do
    if [[ "${args[$i]}" == "-o" && $((i+1)) -lt ${#args[@]} ]]; then
        executable_path="${args[$((i+1))]}"
        break
    fi
done

# Execute linker normally
rust-lld "$@"

# Do whatever we want!
objcopy -O binary $executable_path binary.bin

This must be supplemented with configuration in .cargo/config.toml to use it:

[target.'cfg(all())']
linker = "./linker_wrapper.sh"

One of my favorite uses for this is stamping binaries with the current Git hash after compilation. I prefer this over other approaches (e.g built) that involve collecting the Git hash before compilation starts in order to store it in a const, as doing so means that checking out a different commit will always necessitate recompilation (as opposed to merely re-linking). In practice, it may not be appropriate to use the same wrapper script for all targets (e.g since different utilities are required to interact with Mach-O files for macOS and Portable Executable files for Windows).

&dyn/&mut dyn but for stack variables

From time to time, the temptation arises to use Box<dyn SomeTrait>. However, in many cases, it is possible to avoid the heap allocation by making a reference to a stack variable that is conditionally initialized like so:

Looking at the disassembly (when compiled for thumbv7em-none-eabihf, which is a 32-bit architecture I happen to target often), we only use 12 bytes of stack. The compiler is smart enough to know that the use of a and b is disjoint, so both a and b consume the first 4 bytes of the stack frame.

I find this trick useful in two situations:

Unified Canadian Aboriginal Syllabics block

There's a Reddit post from before Go introduced generics that I appreciate a lot:

Rust is equally liberal about what characters are allowed in identifiers, so if you wanted to, you could do this in Rust as well!


[leave a comment] ~~~ [leave a comment anonymously]