Alrighty folks, buckle up and get ready to learn some x86 assembly and V8 internals. I want to tell you about how I spent yesterday beating my head against a problem we ran into while upgrading the version of Chromium used in Electron, and finally figured out the fix.

This was originally an internal essay published to the Electron Maintainers group. For context, Electron continually tracks the latest version of Chromium, generally landing an update into the main branch once every week or two. Some upgrades are more difficult than others.

Let’s set the scene. Chromium is probably the largest and most complex C++ codebase in the world. Even not counting any of its 300+ dependencies, Chromium itself contains over 11 megalines of C and C++. So it’s not terribly surprising then that its build configuration is similarly complex. Chromium’s build system uses a version of Clang and LLVM taken from tip-of-tree, the latest and greatest in compiler technology. It depends on features in Clang that aren’t yet in released versions of Clang — the latest Chromium code will fail to build on stable versions of Clang. And if that wasn’t cutting-edge enough, Chromium also declines to use the standard C++ runtime library, opting instead to use a fairly new reimplementation of the C++ standard library called, confusingly, libc++. The reasons for this are varied and mostly boring. But, having tried it, we also know that Chromium does not build without this fancy custom C++ standard library.

So, okay, Chromium’s build process is complicated. But that’s fine, Chromium comes with instructions for how to build it, in the form of GN files. We follow those instructions, we use the fancy compiler and the custom standard library when we build Electron. It works fine. What’s the problem?

Well, let me show you. In the latest Chromium upgrade PR in Electron, we started seeing this error:

# Fatal error in ../../v8/src/api/api-inl.h, line 131
# Debug check failed: that == nullptr || v8::internal::Object( *reinterpret_cast<const v8::internal::Address*>(that)).IsJSArrayBuffer().

This error was showing up in the tests we have that check that native node modules work. Your alarm bells might be ringing already: are the native node modules also built with the fancy compiler and the custom standard library and all that stuff? Dear reader, they are not. But, uh, mostly it works fine anyway? This test used to pass, in any case. So what’s going on here?

Let’s look at the test that’s failing. It’s a nan test, so it’s under third_party/nan/test. It has two parts, a JavaScript part and a C++ part. The relevant JavaScript part looks like this:

var zeros = new Uint8Array(5)
t.same(bindings.ReadU8(zeros), [0,0,0,0,0])

And the corresponding C++ part that is the code behind bindings.ReadU8 in the above is:

NAN_METHOD(ReadU8) {
TypedArrayContents<uint8_t> data(info[0]);
// … this function has other things in it, but as we’ll see in a
// moment, the above line is where the failure happens.
}

So let’s look the stack trace. I ran the failing test under lldb to get a more detailed stack trace, and here’s what I saw.

* frame #0: Abort() at platform-posix.cc:502:5 [opt]
frame #1: V8_Fatal() at logging.cc:167:3 [opt]
frame #2: DefaultDcheckHandler() at logging.cc:57:3 [opt]
frame #3: GetBackingStore() [inlined] OpenHandle at api-inl.h:131:1 [opt]
frame #4: GetBackingStore() at api.cc:3784 [opt]
frame #5: Nan::TypedArrayContents::TypedArrayContents() at nan_typedarray_contents.h:45:41 [opt]
frame #6: ReadU8() [inlined] at nan_typedarray_contents.h:20:31 [opt]
frame #7: ReadU8() at typedarrays.cpp:17 [opt]
[…]

Alrighty, so we’re crashing in the bindings.ReadU8() call. Something in the constructor of Nan::TypedArrayContents is calling v8::ArrayBuffer::GetBackingStore(), and something in there is hitting a DCHECK. Specifically, the DCHECK that’s failing is v8::internal::Object(*reinterpret_cast<const v8::internal::Address*>(that)).IsJSArrayBuffer(), which is basically checking that the thing passed to it is actually an ArrayBuffer.

It is, of course, impossible for this DCHECK to be firing.

Here’s the calling code, in nan_typedarray_contents.h, slightly abridged:

if (from->IsArrayBufferView()) {
v8::Local<v8::ArrayBufferView> array =
v8::Local<v8::ArrayBufferView>::Cast(from);
v8::Local<v8::ArrayBuffer> buffer = array->Buffer();
data = static_cast<char*>(buffer->GetBackingStore()->Data());

i.e. we (1) check that from is an ArrayBufferView, (2) get the underlying ArrayBuffer by calling array->Buffer(), then (3) call buffer->GetBackingStore() on that object. It is step (3) here which triggers the DCHECK: GetBackingStore() thinks this is not an ArrayBuffer. It absolutely, definitely is, though. I verified this in a few different ways:

  • I printed out the result of calling buffer->IsArrayBuffer(). It’s true.
  • I compiled V8 with its internal debug printing code enabled, and asked the debugger to print the value of buffer. It replied that buffer was a JSArrayBuffer.
  • I changed the test function in C++ to return its first argument, then printed it out from JS, and saw that it was a UInt8Array.

So why is this DCHECK failing? How could it possibly be failing? The thing is definitely a JSArrayBuffer, so why is IsJSArrayBuffer returning false???

At this time I paged Samuel Attard who kindly Zoomed with me while I sobbed into the debugger. And by “debugger”, I of course mean printf. I added two printf calls, one in nan_typedarray_contents.h right before the failing call to GetBackingStore(), and one inside GetBackingStore() itself. To explain what I saw I have to take a ̶q̶u̶i̶c̶k̶ (lol sorry) detour into some V8 internals. It’ll only hurt a bit, I promise.

When interacting with V8 from C++, it’s very common to want to have a handle to an object, kind of like a pointer. v8::Local is what represents this kind of handle in C++. So what exactly is v8::Local? Is it a pointer? To what does it point? Let’s take a look at the source code and find out:

// in v8/include/v8.h
template <class T>
class Local {
public:
V8_INLINE T* operator->() const { return val_; }
// … etc.
private:
T* val_;
}

That’s all! Local<T> wraps a pointer to T. It overrides operator-> so that if you have a Local<T> called x, then x->Foo() calls T::Foo() on the underlying pointer. In our case, T is ArrayBuffer. So what does ArrayBuffer::GetBackingStore do? (src)

std::shared_ptr<v8::BackingStore> v8::ArrayBuffer::GetBackingStore() {
i::Handle<i::JSArrayBuffer> self = Utils::OpenHandle(this);
std::shared_ptr<i::BackingStore> backing_store =
self->GetBackingStore();
// …
}

Interesting! The only thing it does with this is to call Utils::OpenHandle on it. Simplifying a little macro expansion, this is what Utils::OpenHandle looks like:

v8::internal::Handle<v8::internal::JSArrayBuffer>
Utils::OpenHandle(const v8::ArrayBuffer* that) {
DCHECK(that == nullptr ||
v8::internal::Object(
*reinterpret_cast<const v8::internal::Address*>(that))
.IsJSArrayBuffer());
return v8::internal::Handle<v8::internal::JSArrayBuffer>(
reinterpret_cast<v8::internal::Address*>(
const_cast<v8::ArrayBuffer*>(that)));
}

Whoa! That’s our DCHECK that’s failing! And … what’s this reinterpret_cast?? It looks like it takes our v8::ArrayBuffer*, which is this from ArrayBuffer::GetBackingStore() and interprets it as a pointer to a v8::Internal::Address. What’s a v8::internal::Address?

typedef uintptr_t Address;

uintptr_t is an unsigned integer type large enough to hold a pointer. So Address is … a number?? That’s a bit surprising! So, to recap, here’s what’s going on:

  1. Local<ArrayBuffer> is roughly the same as ArrayBuffer*
  2. ArrayBuffer::GetBackingStore takes this and passes it to Utils::OpenHandle
  3. Utils::OpenHandle reinterprets this, which is of type ArrayBuffer*, as Address*
  4. Address is the same as uintptr_t
  5. So Utils::OpenHandle is interpreting this as a uintptr_t*

So basically ArrayBuffer is the same as uintptr_t??? Local<ArrayBuffer> is really just a pointer to an integer in disguise!

“Wait, it’s all just numbers?” / “Always has been” (astronaut meme)
“Wait, it’s all just numbers?” / “Always has been” (astronaut meme)

Okay, so what does Utils::OpenHandle do next? It dereferences the uintptr_t*, and passes the resulting integer to v8::internal::Object(), then calls IsJSArrayBuffer on the Object.

We’re uh, pretty deep in the weeds here, so I wanted to take a moment to remember what drove us to this clearly insane point. We have some code that calls an ArrayBuffer method on an ArrayBuffer, but that method insists it’s actually not being called on an ArrayBuffer. It uses IsJSArrayBuffer to check that. So somehow IsJSArrayBuffer is returning the wrong thing. Okay. Probably take a moment to breathe and get a coffee or something. I’ll wait.

☕️️️ ☕️ ☕️

Alright, feeling refreshed? Here we go again. Let’s look at the definition of IsJSArrayBuffer. It’s actually defined by some macros, but I’ll expand them for convenience here:

bool Object::IsJSArrayBuffer() const {
return IsHeapObject() &&
HeapObject::cast(*this).IsJSArrayBuffer();
}

Using a debugger, I stepped very carefully through this code, instruction by instruction, and discovered that we never even get to HeapObject::IsJSArrayBuffer. IsHeapObject() returns false! What does Object::IsHeapObject do? (src 1, src 2)

// Returns true if this tagged value is a strong pointer to a HeapObject.
constexpr inline bool IsHeapObject() const { return IsStrong(); }
constexpr inline bool IsStrong() const {
CONSTEXPR_DCHECK(kCanBeWeak ||
(!IsSmi() == HAS_STRONG_HEAP_OBJECT_TAG(ptr_)));
return kCanBeWeak ? HAS_STRONG_HEAP_OBJECT_TAG(ptr_) : !IsSmi();
}

In this particular case kCanBeWeak is false (proving this is left as an exercise to the reader), so this is the same as calling !IsSmi(). I’m leaving aside for the moment the details of exactly what a “heap object” or a “tagged value” or a “strong pointer” are, because frankly, I have no idea. I’m just following the code where it leads. So let’s take a look at IsSmi(): (src 1, src 2, src 3)

// Returns true if this tagged value is a Smi.
constexpr bool IsSmi() const { return HAS_SMI_TAG(ptr_); }
// ...#define HAS_SMI_TAG(value) \
((static_cast<i::Tagged_t>(value) & ::i::kSmiTagMask) == ::i::kSmiTag)
// ...// Tag information for Smi.
const int kSmiTag = 0;
const int kSmiTagSize = 1;
const intptr_t kSmiTagMask = (1 << kSmiTagSize) — 1;

This is some bit twiddling fanciness that boils down to: if the lowest bit of ptr_ is 0, then this is a “smi”. Otherwise it’s not.

We know that IsHeapObject is returning false for us. Which means that !IsSmi() is false. Which means that IsSmi() is returning true!!

At this point I googled “v8 smi” and it turns out “smi” in V8’s parlance means “small integer”, i.e. a JS number that’s an integer within a certain range. So, somehow, this ArrayBuffer object, which is definitely an ArrayBuffer, is being seen to be a JS number.

ngl, I was pretty completely flummoxed by now. But I have this stubborn belief that everything that happens on a computer is, ultimately, possible to understand. So I kept going.

Okay, fine. So the lowest bit of ptr_ is 0, which causes IsSmi() to return true. Where does ptr_ come from?

Remember this from Utils::OpenHandle?

DCHECK(that == nullptr ||
v8::internal::Object(
*reinterpret_cast<const v8::internal::Address*>(that))
.IsJSArrayBuffer());

Let’s look at the definition of v8::internal::Object():

class Object
: public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
public:
explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {}

That passes its argument (promisingly called ptr!) to TaggedImpl(): (src)

template <HeapObjectReferenceType kRefType, typename StorageType>
class TaggedImpl {
public:
explicit constexpr TaggedImpl(StorageType ptr) : ptr_(ptr) {}

There it is! So ptr_ is the v8::internal::Address, aka uintptr_t, that was passed to v8::internal::Object!

Now we have enough background to understand what it was I printed out with those two printf calls, and why the answer made me even more confused.

Here’s the first printf call I added, in nan_typedarray_contents.h, before we call out to GetBackingStore():

 v8::Local<v8::ArrayBuffer> buffer = array->Buffer();
+v8::ArrayBuffer* addr = *buffer;
+uintptr_t* addr_as_uintptr = reinterpret_cast<uintptr_t*>(addr);
+
+fprintf(stderr, “addr of buffer = %08lx\n”, *addr_as_uintptr);
data = static_cast<char*>(buffer->GetBackingStore()->Data())
+ byte_offset;

The effect of this is to print out the value of the v8::internal::Address that’s underlying buffer.

The second printf call was inside v8::ArrayBuffer::GetBackingStore. For this one I just copied the code out of the failing DCHECK in Utils::OpenHandle:

 std::shared_ptr<v8::BackingStore>
v8::ArrayBuffer::GetBackingStore() {
+ fprintf(stderr, “addr inside v8::AB::GBS = %lx\n”,
+ *reinterpret_cast<const v8::internal::Address*>(this));
i::Handle<i::JSArrayBuffer> self = Utils::OpenHandle(this);

The effect of this is also to print out the value of the v8::internal::Address that’s underlying the ArrayBuffer.

These two printf calls should print the same value.

Here’s what they printed when I recompiled and ran the test:

addr of buffer = 4f084b8c9d
addr inside v8::AB::GBS = 4f00000070

I logged off.

The next day, after having regained my composure, and having reaffirmed my belief that everything on a computer is ultimately graspable, I decided that C++ must be lying to me. You know what never lies? x86 assembly. It was time to peel away the illusion of pointers and objects and methods, and get down to the movs and jmps. In mov, veritas.

Also, as a side note, we now know why IsSmi was returning true: the low bit of 0x4f00000070 is 0, so it passes the HAS_SMI_TAG check. It’s just that that’s … not the Address of our object.

I loaded up the test in lldb:

$ (cd ../third_party/nan/test/js; lldb ~/work/electron/src/out/Testing/Electron.app/Contents/MacOS/Electron -- typedarrays-test.js)
(lldb) ▊

I set a breakpoint immediately before the call to GetBackingStore():

(lldb) b nan_typedarray_contents.h:39

and ran the test:

(lldb) r

It stopped at my breakpoint, and I typed the command to lift the veil:

(lldb) disassemble
typedarrays.node`Nan::TypedArrayContents<unsigned char>::TypedArrayContents:
[...]
0x11b7af8c8 <+104>: movq 0x739(%rip), %rax ; (void *): __stderrp
0x11b7af8cf <+111>: movq (%rax), %rdi
0x11b7af8d2 <+114>: movq (%rbx), %rdx
0x11b7af8d5 <+117>: leaq 0x641(%rip), %rsi ; “addr of buffer = %08lx\n”
0x11b7af8dc <+124>: xorl %eax, %eax
0x11b7af8de <+126>: callq 0x11b7afd62 ; symbol stub for: fprintf
0x11b7af8e3 <+131>: leaq -0x38(%rbp), %rdi
-> 0x11b7af8e7 <+135>: movq %rbx, %rsi
0x11b7af8ea <+138>: callq 0x11b7afca2 ; symbol stub for: v8::ArrayBuffer::GetBackingStore()
[...]

Now we’re talking. Three quick things about x86 assembly, if you’ve not seen it much before: first, things prefixed with % are registers. so %rip, %rax, %rdi etc. are all registers, each holds a 64-bit integer. Second, most instructions put their result in the 2nd argument. So for example, movq %rbx, %rsi copies the value of the register %rbx into the register %rsi. Third, (%rax) means “the value in memory at the address contained in %rax”, i.e. it’s a dereference, like *foo in C++.

Looking at the above disassembly, we can guess a couple of things:

  • %rbx contains the uintptr_t* that we called addr_as_uintptr. This isn’t the v8::internal::Address itself, it’s a pointer to a bit of memory that holds that address. movq (%rbx), %rdx fetches the Address out of RAM and puts it in %rdx. We guess this because it happens before callq 0x11a7c6d5c, which lldb has helpfully annotated is the address for the fprintf function, and because %rbx is also referenced before the call to v8::ArrayBuffer::GetBackingStore.
  • This code is passing this to v8::ArrayBuffer::GetBackingStore in the register %rsi. We guess this because right before the callq instruction that jumps to the code for v8::ArrayBuffer::GetBackingStore, we see movq %rbx, %rsi, which copies the value of %rbx into %rsi. So the calling code is “setting up” for the GetBackingStore() to run by putting the value this in the right place.

This might sound weird because this isn’t something you pass to a function in C++! But that information still has to get to the function somehow. On POSIX systems, the convention is that it’s passed as if it were the first argument to the function, which happens to be, also by convention, the %rdi register. (But wait, we’re seeing this passed through %rsi…? Our spidey senses are tingling now.)

We can confirm that %rbx points to the same address that was printed out by our debugging printf call by asking lldb what’s in all the registers:

(lldb) register read
General Purpose Registers:
rax = 0x000000000000001c
rbx = 0x000000011b8713d0
rcx = 0xecc0c702c36f00e1
rdx = 0x0000000000000000
rdi = 0x00007ffeefbfdf98
rsi = 0x00000000000120a8
[...]

rbx contains the value 0x000000011b8713d0, which isn’t what we expected (0x4f084b8c9d), but that’s because it’s not the uintptr_t itself, it’s a pointer to that (uintptr_t*). Let’s dereference it:

(lldb) p (void*)*(uintptr_t*)0x000000011b8713d0
(void *) $0 = 0x0000004f084b8c9d

Ah, there we go. (The (void*) is there to get lldb to print out the value in hex.)

Alrighty, so %rbx contains the address of our address, and it’s going to get put into %rsi and then we’re going to jump into v8::ArrayBuffer::GetBackingStore. Let’s see what happens when we do, stepping one instruction at a time with si:

(lldb) si

after a detour through libdyld.dylib to resolve the function address, we eventually end up in v8::ArrayBuffer::GetBackingStore, and run disassemble again:

(lldb) disassemble
Electron Framework`::GetBackingStore():
[...]
0x101cf725a <+10>: movq %rdi, %r14
0x101cf725d <+13>: movq 0x9bacc04(%rip), %rax ; (void *): __stderrp
0x101cf7264 <+20>: movq (%rax), %rdi
0x101cf7267 <+23>: movq (%r14), %rdx
0x101cf726a <+26>: leaq 0x897e8fd(%rip), %rsi ; “v8::AB::GBS %lx\n”
0x101cf7271 <+33>: xorl %eax, %eax
-> 0x101cf7273 <+35>: callq 0x10a4c6ed6 ; symbol stub for: fprintf

… NOW WAIT A DANG MINUTE. The Nan::TypedArrayContents code put this into %rsi. But this code doesn’t read %rsi at all! In fact it overwrites %rsi at <+26>! This code is printing the value of %rdi!

No wonder it’s printing different things! It’s because it’s printing different things! 😬

How could this happen? When I write this in ArrayBuffer::GetBackingStore(), the compiler generates code that reads the value of the register %rdi, but the calling code in Nan::TypedArrayContents thinks it’s supposed to put this into %rsi. What’s going on here???

This would be a good moment to recall that Nan::TypedArrayContents is part of the native module we’re building, and is built by the system clang++ compiler, using the GNU libstdc++ standard library, whereas ArrayBuffer::GetBackingStore is built by Chromium’s clang++ compiler, using the LLVM libc++ standard library.

But… this used to work, right? This only started failing when we rolled chromium last week! What happened? Well, let’s check the logs of the roll which broke it (which Samuel Attard kindly bisected for me): it broke when we rolled chromium from 1f252b391..90.0.4415.0.

… uh okay, there are over 5,000 commits in that range. What if we just look for changes in the //build directory? That’s where compiler configuration is mostly stored, so hopefully, this looking like a compiler configuration thing, what broke it is in there somewhere.

And, ding ding ding! Here we are, look at this!

3f200c0d Roll src/buildtools/third_party/libc++/trunk/ d9040c75c..69897abe2 (1149 commits) by Reid Kleckner · 8 weeks ago

Holy heck! This Chromium roll included an update of the bundled libc++ library that included over one thousand commits to libc++! That’s over a year of work in libc++. No wonder something broke. In fact, things broke in Chromium too, because you can see in the changelog that they reverted this commit after it landed, then fixed some stuff and landed it again.

Alright, so Chromium updated libc++. That seems like it could be the cause of this breakage. But how do we confirm that? We could try reverting this libc++ update in Chromium and rebuilding, but we can’t resist progress forever. Instead, let’s try building the native module with the same compiler and standard library as Electron itself. If it’s a problem with mismatched compiler and/or standard library, then that should fix it, right?

A “proper” fix for this would mean teaching node-gyp how to use the Chromium clang++ and libc++, but I have no idea how node-gyp works so I’m just going to hack it. I know that make respects environment variables called CXX and CXXFLAGS to set the path to the C++ compiler executable and extra flags to pass to it, so I’ll use those. I want to set it to use the clang++ compiler that’s used to build Electron, and all the same -fwhatever and -DWHATEVER flags that GN passes to the compiler. After a little trial and error, and copy-pasting things out of out/Testing/obj/electron/electron_lib.ninja, I ended up with something like this:

$ export CXX=".../src/third_party/llvm-build/Release+Asserts/bin/clang++"
$ export CXXFLAGS="-DV8_DEPRECATION_WARNINGS -DDCHECK_ALWAYS_ON=1 -D_LIBCPP_HAS_NO_ALIGNED_ALLOCATION -DCR_XCODE_VERSION=1240 -DCR_CLANG_REVISION=\"llvmorg-13-init-3462-gfe5c2c3c-2\" -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_LIBCPP_ABI_UNSTABLE -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCXXABI_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_ENABLE_NODISCARD -D_LIBCPP_HAS_NO_VENDOR_AVAILABILITY_ANNOTATIONS -D_LIBCPP_DEBUG=0 -DCR_LIBCXX_REVISION=8fa87946779682841e21e2da977eccfb6cb3bded -D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORES=0 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DV8_USE_EXTERNAL_STARTUP_DATA -DNODE_WANT_INTERNALS=1 -DELECTRON_PRODUCT_NAME=\"Electron\" -DELECTRON_PROJECT_NAME=\"electron\" -DENABLE_IPC_FUZZER -DWEBP_EXTERN=extern -DUSE_EGL -D_WTL_NO_AUTOMATIC_NAMESPACE -DTOOLKIT_VIEWS=1 -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DSK_FAVOR_WUFFS_V_0_3_OVER_V_0_2 -DSK_CODEC_DECODES_PNG -DSK_CODEC_DECODES_WEBP -DSK_ENCODE_PNG -DSK_ENCODE_WEBP -DSK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -DSK_GL -DSK_CODEC_DECODES_JPEG -DSK_ENCODE_JPEG -DSK_HAS_WUFFS_LIBRARY -DSK_SUPPORT_GPU=1 -DSK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -DSK_BUILD_FOR_MAC -DSK_METAL -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DHAVE_PTHREAD -DWEBRTC_ENABLE_AVX2 -DWEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -DWEBRTC_CHROMIUM_BUILD -DWEBRTC_POSIX -DWEBRTC_MAC -DABSL_ALLOCATOR_NOTHROW=1 -DWEBRTC_USE_BUILTIN_ISAC_FIX=0 -DWEBRTC_USE_BUILTIN_ISAC_FLOAT=1 -DWEBRTC_HAVE_SCTP -DNO_MAIN_THREAD_WRAPPING -DLEVELDB_PLATFORM_CHROMIUM=1 -DLEVELDB_PLATFORM_CHROMIUM=1 -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DHAVE_INSPECTOR=1 -DHAVE_OPENSSL=1 -DOPENSSL_NO_SSL_TRACE=1 -DNODE_HAVE_I18N_SUPPORT=1 -DNODE_USE_V8_PLATFORM=0 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_DARWIN_USE_64_BIT_INODE=1 -DCRASHPAD_ZLIB_SOURCE_EXTERNAL -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 -fcomplete-member-pointers -arch x86_64 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -Xclang -fdebug-compilation-dir -Xclang . -no-canonical-prefixes -Wall -Werror -Wextra -Wimplicit-fallthrough -Wunreachable-code -Wthread-safety -Wextra-semi -Wunguarded-availability -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-unneeded-internal-declaration -Wno-undefined-var-template -Wno-psabi -Wno-ignored-pragma-optimize -Wno-implicit-int-float-conversion -Wno-final-dtor-non-final-class -Wno-builtin-assume-aligned-alignment -Wno-deprecated-copy -Wno-non-c-typedef-for-linkage -Wno-max-tokens -O2 -fno-omit-frame-pointer -gdwarf-4 -g1 -isysroot sdk/xcode_links/MacOSX11.1.sdk -mmacosx-version-min=10.11.0 -ftrivial-auto-var-init=pattern -fvisibility=hidden -Xclang -add-plugin -Xclang find-bad-constructs -Xclang -plugin-arg-find-bad-constructs -Xclang checked-ptr-as-trivial-member -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DPROTOBUF_ALLOW_DEPRECATED=1 -Wno-shorten-64-to-32 -Wno-microsoft-include"
$ make -C ~/work/electron/src/third_party/nan/test/build typedarrays

Holy robot vomit. This failed at first because it turns on the find-bad-constructs clang plugin which doesn’t like some things in nan, so I removed those flags and a couple of others, and eventually it built a typedarrays.node module. I ran the test again and…

… it worked!! THANK FUCK.

Okay, so which of those ten million flags is actually responsible for fixing this problem? I tried building with just the Chromium clang++ and none of the fancy flags, and the error came back, so one of those flags must be important. I narrowed it down by deleting half the flags, rebuilding and rerunning until I zeroed in on the flag responsible:

-D_LIBCPP_ABI_UNSTABLE

Building with Chromium’s clang++ and this flag present is enough to produce a typedarrays.node that functions at least well enough to pass nan’s tests. Phew! So uh, how do we turn this into something we can commit to the roller branch?

Let’s figure out what’s responsible for setting this flag. Searching on s.c.o leads us to this line in //build/config/c++/BUILD.gn:

if (libcxx_abi_unstable) {
defines += [ “_LIBCPP_ABI_UNSTABLE” ]
}

Well, interesting. Can we convince libcxx_abi_unstable to be false? That would mean Electron would be building without this flag, so maybe a node module also built without this flag would work?

libcxx_abi_unstable turns out to be a build argument, so we can try setting libcxx_abi_unstable = false in args.gn to build Electron without the _LIBCPP_ABI_UNSTABLE flag.

And there you have it:

--- a/build/args/all.gn
+++ b/build/args/all.gn
@@ -19,4 +19,7 @@
dawn_enable_vulkan_validation_layers = false
+# This breaks native node modules
+libcxx_abi_unstable = false
+
is_cfi = false

This fixes the problem (and exposes a different problem, but this essay is already about ten times too long, so I’m not going to go into that).

🎉

Ultimately, it is asking for trouble to be building dynamically-linked native node modules that use C++ features across the dynamic link boundary with a different compiler. The C++ ABI is huge and complicated, and as we have discovered at great length today, varies between compilers, and even based on what #defines are used.

There are two ways to resolve this:

  1. Build C++ native modules using the same clang++ and libc++ as Electron itself, or
  2. Don’t call C++ things across the dynamic link boundary.

Honestly, I think (2) is the best way forward. N-API in Node.js provides exactly this: a stable C (not C++) ABI for interacting with V8 and Node without relying on the enormous, varied and moving target that is the C++ ABI. Modules built with N-API would not encounter this kind of problem, because they stick to the C ABI, which is much smaller, simpler and more stable. This is an ecosystem problem, though, and so it will take a long time to shift people away from C++-based systems like Nan and towards N-API.

(1) is also possible, though. I think we can provide some support in tools like electron-rebuild to download & configure the Chromium compiler toolchain, instead of using the system one. It will significantly complicate electron-rebuild, but it will also probably eliminate most issues like this one.

I’m still curious though. I want to know what the heck _LIBCPP_ABI_UNSTABLE does, and why it causes a mismatch in what register this is passed in. As mentioned, this essay is already too long so I’m going to leave some tantalizing hints and let you follow up on them if you desire. _LIBCPP_ABI_UNSTABLE is a proxy for a whole bunch of other #defines. I tried building the native module without _LIBCPP_ABI_UNSTABLE but just each of those individual flags, and it turned out _LIBCPP_ABI_ENABLE_SHARED_PTR_TRIVIAL_ABI was enough to generate compatible code. That’s only referenced in one place in libc++, where it causes __attribute__((trivial_abi)) to be added to shared_ptr and weak_ptr. What is trivial_abi? Well, it’s a relatively new clang feature that involves %rsi and %rdi. Hmmmm!

Thanks to Keeley Hammond for reviewing drafts of this article, and to Deepak Mohan for encouraging me to publish it as a blog post.

Nullius in verba.