Skip to content

Memory Corruption Due to Lambda Type Name Collision #1743

@bowars

Description

@bowars

Summary

A heap-buffer-overflow occurs during Lua state destruction when registering multiple lambdas with set_function(). The bug is caused by different lambda types receiving identical metatable names due to demangle<T>() producing the same string for lambdas with identical signatures but different captures. This causes the wrong destructor to be called during garbage collection, resulting in memory corruption and crashes.

Affected Version: Tested against commit c1f95a773c6f8f4fde8ca3efe872e7286afe4444

Environment

  • Compiler: GCC 13.3.0
  • Platform: Linux x86_64
  • Lua version: 5.4.8

Reproduction

#include <sol/sol.hpp>
#include <string>

int main() {
    sol::state lua;
    lua.open_libraries(sol::lib::base);

    std::string strValue;  // empty string, 32 bytes
    int iValue = 42;       // 4 bytes

    auto fn1 = [strValue](sol::this_state, sol::object) -> bool {
        (void)strValue;
        return false;
    };
    auto fn2 = [iValue](sol::this_state, sol::object) -> bool {
        (void)iValue;
        return false;
    };

    sol::table pr = lua.create_table();
    pr.set_function("fn1", fn1);  // FIRST: 32-byte capture
    pr.set_function("fn2", fn2);  // SECOND: 4-byte capture
    lua["pr"] = pr;

    return 0;  // CRASH during lua_close()
}

Compile and Run

g++ -fsanitize=address -static-libasan -g -O0 -std=c++20 \
    -I /path/to/sol2/include \
    -I /path/to/lua \
    test.cpp -llua -lm -ldl -o test

./test

Expected Result

Clean exit with no errors.

Actual Result

==PID==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x506000000358 at pc 0x... bp 0x... sp 0x...
READ of size 8 at 0x506000000358 thread T0
    #0 std::__cxx11::basic_string<...>::_M_data() const
    #1 std::__cxx11::basic_string<...>::_M_is_local() const
    #2 std::__cxx11::basic_string<...>::_M_dispose()
    #3 std::__cxx11::basic_string<...>::~basic_string()
    #4 ~<lambda>
    #5 ~functor_function sol/function_types_stateful.hpp:32
    #6 destroy_at<sol::function_detail::functor_function<...>>
    #7 user_alloc_destroy<sol::function_detail::functor_function<...>> sol/stack_core.hpp:460
    ...

Root Cause

The Problem: Identical Metatable Names for Different Types

When sol2 registers a lambda via set_function(), it:

  1. Wraps the lambda in functor_function<Lambda, false, true>
  2. Allocates Lua userdata via user_allocate<T>() with size from aligned_space_for<T>()
  3. Creates/reuses a metatable named from usertype_traits<T>::user_gc_metatable()
  4. Sets the __gc metamethod to user_alloc_destroy<T>

The metatable name is generated in usertype_traits.hpp:49-52:

static const std::string& user_gc_metatable() {
    static const std::string u_g_m = std::string("sol.")
        .append(detail::demangle<T>())
        .append(".user\xE2\x99\xBB");
    return u_g_m;
}

The demangle<T>() function uses __PRETTY_FUNCTION__ to extract type names. The problem is that lambdas with the same signature produce identical demangled names, even though they are distinct types with different sizes:

// Both lambdas demangle to the SAME string:
// "sol::function_detail::functor_function<main()::<lambda(sol::this_state, sol::object)>, false, true>"

using Functor1 = functor_function<decltype(fn1), false, true>;  // 32 bytes, align 8
using Functor2 = functor_function<decltype(fn2), false, true>;  // 4 bytes, align 4

// But they produce identical metatable names!
usertype_traits<Functor1>::user_gc_metatable();  // "sol....user♻"
usertype_traits<Functor2>::user_gc_metatable();  // "sol....user♻"  <- SAME!

The Crash Sequence

  1. fn1 registered:

    • user_allocate<Functor1>() allocates aligned_space_for<Functor1>() = 39 bytes
    • luaL_newmetatable(L, "sol...user♻") creates NEW metatable (returns 1)
    • Sets __gc = user_alloc_destroy<Functor1> (expects 32-byte type with 8-byte alignment)
  2. fn2 registered:

    • user_allocate<Functor2>() allocates aligned_space_for<Functor2>() = 7 bytes
    • luaL_newmetatable(L, "sol...user♻") finds EXISTING metatable (returns 0)
    • __gc is not updated — still points to user_alloc_destroy<Functor1>
    • fn2's 7-byte userdata gets fn1's metatable
  3. Destruction (lua_close):

    • Lua GC calls __gc on fn2's userdata
    • user_alloc_destroy<Functor1> is invoked on 7-byte allocation
    • Calls align_user<Functor1>(memory) — aligns for 8-byte type
    • Attempts to destroy 32-byte Functor1 (containing std::string)
    • Reads/writes past the 7-byte allocation → heap-buffer-overflow

Relevant Code Paths

Metatable creation (stack_push.hpp:665-687):

template <bool with_meta = true, typename Key, typename... Args>
static int push_with(lua_State* L, Key&& name, Args&&... args) {
    T* data = detail::user_allocate<T>(L);
    if (with_meta) {
        if (luaL_newmetatable(L, name) != 0) {  // Only enters if NEW metatable
            lua_CFunction cdel = detail::user_alloc_destroy<T>;
            lua_pushcclosure(L, cdel, 0);
            lua_setfield(L, -2, "__gc");
        }
        lua_setmetatable(L, -2);  // Always sets metatable (even if reused!)
    }
    // ...
}

Destructor (stack_core.hpp:454-462):

template <typename T>
int user_alloc_destroy(lua_State* L) noexcept {
    void* memory = lua_touserdata(L, 1);
    void* aligned_memory = align_user<T>(memory);  // Aligns for type T
    T* typed_memory = static_cast<T*>(aligned_memory);
    std::allocator<T> alloc;
    std::allocator_traits<std::allocator<T>>::destroy(alloc, typed_memory);  // Destroys as T
    return 0;
}

Why __PRETTY_FUNCTION__ Fails for Lambdas

In C++, each lambda expression creates a unique anonymous type. The compiler internally distinguishes them:

auto a = [x](int) { };  // GCC type: main::{lambda(int)#1}
auto b = [y](int) { };  // GCC type: main::{lambda(int)#2}

However, __PRETTY_FUNCTION__ (used by sol2's demangle<T>()) represents both as:

"main()::<lambda(int)>"

The unique #1/#2 suffix is lost in the string representation that sol2 parses, causing different lambda types to produce identical metatable names.

Compiler-Specific Behavior

Whether this bug manifests depends on how the compiler represents lambda types.

GCC 13.3 (affected): Represents lambdas by signature only, e.g., main()::<lambda(sol::this_state, sol::object)>. Multiple lambdas with the same signature produce identical type names, causing the collision. Tested and confirmed to crash.

Clang 18.1 (not affected): Includes source location in lambda names, e.g., (lambda at /path/file.cpp:10:16). The file, line, and column make each lambda unique. Tested and confirmed to work correctly.

MSVC (untested): Uses __FUNCSIG__ instead of __PRETTY_FUNCTION__. Has not been tested.

Conditions

Configuration Result
fn1=string(32B), fn2=int(4B) CRASH
fn1=vector, fn2=int CRASH
fn1=int, fn2=string (order swapped) No crash (memory leak[1])
fn1=string, fn2=empty capture [] OK (different code path[2])
Only fn1 registered (no fn2) OK
Only fn2 registered (no fn1) OK

[1] When the small type is first, the large type's destructor is never called (wrong destructor runs), causing a memory leak but no crash since reads stay within the larger allocation.

[2] Stateless lambdas (empty capture) are convertible to function pointers and take a different code path that doesn't use functor_function wrappers or userdata metatables.

Key observations:

  1. Order matters: Crash occurs when larger type is registered FIRST (smaller allocation gets larger destructor)
  2. Both lambdas required: Removing either one prevents the collision
  3. Same signature required: Both lambdas must have the same function signature (e.g., (sol::this_state, sol::object) -> bool)
  4. Size difference required: The functor wrappers must have different sizes/alignments

Fix

Proposed Solution

Use a static variable address as a unique type identifier. Each template instantiation gets its own static variable with a guaranteed-unique address, providing robust type discrimination without requiring RTTI.

File: include/sol/usertype_traits.hpp

Add a type ID helper in the sol::detail namespace:

namespace detail {
    // Each instantiation of type_id<T> has a unique address for 'tag',
    // providing a guaranteed-unique identifier per type without requiring RTTI
    template <typename T>
    struct type_id {
        static inline const char tag = '\0';
    };
}

Then modify user_gc_metatable():

static const std::string& user_gc_metatable() {
    // Use unique type address to disambiguate types with identical demangled names
    // (e.g., lambdas with same signature but different captures)
    static const std::string u_g_m = std::string("sol.")
        .append(detail::demangle<T>())
        .append(".user\xE2\x99\xBB.")
        .append(std::to_string(reinterpret_cast<std::uintptr_t>(&detail::type_id<T>::tag)));
    return u_g_m;
}

Verification

After applying the fix:

=== Metatable names ===
User1 metatable: sol...user♻.100358201542387
User2 metatable: sol...user♻.100358201542388

=== Type sizes ===
sizeof(Functor1) = 32, alignof = 8
sizeof(Functor2) = 4, alignof = 4

The test passes with AddressSanitizer enabled:

$ ./test_fixed
$ echo $?
0

Alternative Fixes Considered

  1. Use sizeof(T) and alignof(T) in metatable name — Insufficient because two distinct types can have identical size and alignment but different internal structures. For example, struct { unique_ptr<int> } and struct { double } are both 8 bytes with 8-byte alignment, but require different destructors.

  2. Use typeid(T).hash_code() — Requires RTTI. Fails to compile with -fno-rtti (GCC/Clang) or /GR- (MSVC).

  3. Always update __gc when reusing metatable — Would apply the new destructor to all existing userdata sharing that metatable, breaking previously registered types.

  4. Check if __gc function pointer matches before reusing — Feasible, but if there's a mismatch you still need a unique identifier to create a separate metatable.

The static variable address approach is chosen because:

  • Guaranteed unique per type by C++ standard (each template instantiation has distinct static members)
  • Does not require RTTI
  • Minimal code change
  • One-time string initialization per type (address is a link-time constant)
  • Works for all types, not just those with different sizes

Scope

Other functions in usertype_traits.hpp also derive names from demangle<T>():

  • metatable() and gc_table() — Used with new_usertype<T>() for registering classes. These are not affected because named class types (e.g., MyClass) have distinct demangled names. The bug only manifests with lambdas, which have colliding names on GCC.

  • user_metatable() — Used for call syntax detection. Does not store a __gc destructor, so even if a collision occurred, it would not cause memory corruption.

The proposed fix only modifies user_gc_metatable() because it is the only function where lambda type collisions occur in practice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions