A libc++ Odyssey
Alrighty folks, buckle up and get ready to learn some x86 assembly and V8 internals. I want to tell you about how I spent yesterday beating my head against a problem we ran into while upgrading the version of Chromium used in Electron, and finally figured out the fix.
This was originally an internal essay published to the Electron Maintainers group. For context, Electron continually tracks the latest version of Chromium, generally landing an update into the main branch once every week or two. Some upgrades are more difficult than others.
Let’s set the scene. Chromium is probably the largest and most complex C++ codebase in the world. Even not counting any of its 300+ dependencies, Chromium itself contains over 11 megalines of C and C++. So it’s not terribly surprising then that its build configuration is similarly complex. Chromium’s build system uses a version of Clang and LLVM taken from tip-of-tree, the latest and greatest in compiler technology. It depends on features in Clang that aren’t yet in released versions of Clang — the latest Chromium code will fail to build on stable versions of Clang. And if that wasn’t cutting-edge enough, Chromium also declines to use the standard C++ runtime library, opting instead to use a fairly new reimplementation of the C++ standard library called, confusingly, libc++. The reasons for this are varied and mostly boring. But, having tried it, we also know that Chromium does not build without this fancy custom C++ standard library.
So, okay, Chromium’s build process is complicated. But that’s fine, Chromium comes with instructions for how to build it, in the form of GN files. We follow those instructions, we use the fancy compiler and the custom standard library when we build Electron. It works fine. What’s the problem?
Well, let me show you. In the latest Chromium upgrade PR in Electron, we started seeing this error:
# Fatal error in ../../v8/src/api/api-inl.h, line 131
# Debug check failed: that == nullptr || v8::internal::Object( *reinterpret_cast<const v8::internal::Address*>(that)).IsJSArrayBuffer().
This error was showing up in the tests we have that check that native node modules work. Your alarm bells might be ringing already: are the native node modules also built with the fancy compiler and the custom standard library and all that stuff? Dear reader, they are not. But, uh, mostly it works fine anyway? This test used to pass, in any case. So what’s going on here?
Let’s look at the test that’s failing. It’s a nan test with two parts; a JavaScript part and a C++ part. The relevant JavaScript part looks like this:
var zeros = new Uint8Array(5)
t.same(bindings.ReadU8(zeros), [0,0,0,0,0])
And the corresponding C++ part that is the code behind bindings.ReadU8 in the above is:
NAN_METHOD(ReadU8) {
TypedArrayContents<uint8_t> data(info[0]); // … this function has other things in it, but as we’ll see in a
// moment, the above line is where the failure happens.
}
So let’s look the stack trace. I ran the failing test under lldb to get a more detailed stack trace, and here’s what I saw.
* frame #0: Abort() at platform-posix.cc:502:5 [opt]
frame #1: V8_Fatal() at logging.cc:167:3 [opt]
frame #2: DefaultDcheckHandler() at logging.cc:57:3 [opt]
frame #3: GetBackingStore() [inlined] OpenHandle at api-inl.h:131:1 [opt]
frame #4: GetBackingStore() at api.cc:3784 [opt]
frame #5: Nan::TypedArrayContents::TypedArrayContents() at nan_typedarray_contents.h:45:41 [opt]
frame #6: ReadU8() [inlined] at nan_typedarray_contents.h:20:31 [opt]
frame #7: ReadU8() at typedarrays.cpp:17 [opt][…]
Alrighty, so we’re crashing in the bindings.ReadU8()
call. Something in the constructor of Nan::TypedArrayContents
is calling v8::ArrayBuffer::GetBackingStore()
, and something in there is hitting a DCHECK. Specifically, the DCHECK that’s failing is v8::internal::Object(*reinterpret_cast<const v8::internal::Address*>(that)).IsJSArrayBuffer()
, which is basically checking that the thing passed to it is actually an ArrayBuffer.
It is, of course, impossible for this DCHECK to be firing.
Stage 1: Denial
Here’s the calling code, in nan_typedarray_contents.h, slightly abridged:
if (from->IsArrayBufferView()) {
v8::Local<v8::ArrayBufferView> array =
v8::Local<v8::ArrayBufferView>::Cast(from);
v8::Local<v8::ArrayBuffer> buffer = array->Buffer();
data = static_cast<char*>(buffer->GetBackingStore()->Data());
i.e. we (1) check that from is an ArrayBufferView, (2) get the underlying ArrayBuffer by calling array->Buffer()
, then (3) call buffer->GetBackingStore()
on that object. It is step (3) here which triggers the DCHECK: GetBackingStore()
thinks this
is not an ArrayBuffer. It absolutely, definitely is, though. I verified this in a few different ways:
- I printed out the result of calling
buffer->IsArrayBuffer()
. It’strue
. - I compiled V8 with its internal debug printing code enabled, and asked the debugger to print the value of
buffer
. It replied thatbuffer
was aJSArrayBuffer
. - I changed the test function in C++ to return its first argument, then printed it out from JS, and saw that it was a UInt8Array.
So why is this DCHECK failing? How could it possibly be failing? The thing is definitely a JSArrayBuffer, so why is IsJSArrayBuffer
returning false inside of V8???
At this time I paged Samuel Attard who kindly Zoomed with me while I sobbed into the debugger. And by “debugger”, I of course mean printf
. I added two printf
calls, one in nan_typedarray_contents.h right before the failing call to GetBackingStore()
, and one inside GetBackingStore()
itself. To explain what I saw I have to take a q̶u̶i̶c̶k (lol sorry) detour into some V8 internals. It’ll only hurt a bit, I promise.
Integers All the Way Down
When interacting with V8 from C++, it’s very common to want to have a handle to an object, kind of like a pointer. v8::Local
is what represents this kind of handle in C++. So what exactly is v8::Local
? Is it a pointer? To what does it point? Let’s take a look at the source code and find out:
// in v8/include/v8.h
template <class T>
class Local {
public:
V8_INLINE T* operator->() const { return val_; }
// … etc.
private:
T* val_;
}
That’s all! Local<T>
wraps a pointer to T
. It overrides operator->
so that if you have a Local<T>
called x
, then x->Foo()
calls T::Foo()
on the underlying pointer. In our case, T
is ArrayBuffer
. So what does ArrayBuffer::GetBackingStore
do? (src)
std::shared_ptr<v8::BackingStore> v8::ArrayBuffer::GetBackingStore() {
i::Handle<i::JSArrayBuffer> self = Utils::OpenHandle(this);
std::shared_ptr<i::BackingStore> backing_store =
self->GetBackingStore();
// …
}
Interesting! The only thing it does with this
is to call Utils::OpenHandle
on it. Simplifying a little macro expansion, this is what Utils::OpenHandle
looks like:
v8::internal::Handle<v8::internal::JSArrayBuffer>
Utils::OpenHandle(const v8::ArrayBuffer* that) {
DCHECK(that == nullptr ||
v8::internal::Object(
*reinterpret_cast<const v8::internal::Address*>(that))
.IsJSArrayBuffer());
return v8::internal::Handle<v8::internal::JSArrayBuffer>(
reinterpret_cast<v8::internal::Address*>(
const_cast<v8::ArrayBuffer*>(that)));
}
Whoa! That’s our DCHECK that’s failing! And … what’s this reinterpret_cast
?? It looks like it takes our v8::ArrayBuffer*
, which is this
from ArrayBuffer::GetBackingStore()
and interprets it as a pointer to a v8::Internal::Address
. What’s a v8::internal::Address
?
typedef uintptr_t Address;
uintptr_t
is an unsigned integer type large enough to hold a pointer. So Address
is … a number?? That’s a bit surprising! So, to recap, here’s what’s going on:
Local<ArrayBuffer>
is roughly the same asArrayBuffer*
ArrayBuffer::GetBackingStore
takes this and passes it toUtils::OpenHandle
Utils::OpenHandle
reinterpretsthis
, which is of typeArrayBuffer*
, asAddress*
Address
is the same asuintptr_t
- So
Utils::OpenHandle
is interpretingthis
as auintptr_t*
So basically ArrayBuffer
is the same as uintptr_t
??? Local<ArrayBuffer>
is really just a pointer to an integer in disguise!
Okay, so what does Utils::OpenHandle
do next? It dereferences the uintptr_t*
, and passes the resulting integer to v8::internal::Object()
, then calls IsJSArrayBuffer
on the Object
.
We’re uh, pretty deep in the weeds here, so I wanted to take a moment to remember what drove us to this clearly insane point. We have some code that calls an ArrayBuffer method on an ArrayBuffer, but that method insists it’s actually not being called on an ArrayBuffer. It uses IsJSArrayBuffer
to check that. So somehow IsJSArrayBuffer
is returning the wrong thing. Okay. Probably take a moment to breathe and get a coffee or something. I’ll wait.
☕️️️ ☕️ ☕️
Ultimately Graspable
Alright, feeling refreshed? Here we go again. Let’s look at the definition of IsJSArrayBuffer
. It’s actually defined by some macros, but I’ll expand them for clarity here:
bool Object::IsJSArrayBuffer() const {
return IsHeapObject() &&
HeapObject::cast(*this).IsJSArrayBuffer();
}
Using a debugger, I stepped very carefully through this code, instruction by instruction, and discovered that we never even get to HeapObject::IsJSArrayBuffer. IsHeapObject()
returns false! What does Object::IsHeapObject
do? (src 1, src 2)
// Returns true if this tagged value is a strong pointer to a HeapObject.
constexpr inline bool IsHeapObject() const { return IsStrong(); }
constexpr inline bool IsStrong() const {
CONSTEXPR_DCHECK(kCanBeWeak ||
(!IsSmi() == HAS_STRONG_HEAP_OBJECT_TAG(ptr_)));
return kCanBeWeak ? HAS_STRONG_HEAP_OBJECT_TAG(ptr_) : !IsSmi();
}
In this particular case kCanBeWeak
is false (proving this is left as an exercise to the reader), so this is the same as calling !IsSmi()
. I’m leaving aside for the moment the details of exactly what a “heap object” or a “tagged value” or a “strong pointer” are, because frankly, I have no idea. I’m just following the code where it leads. So let’s take a look at IsSmi()
: (src 1, src 2, src 3)
// Returns true if this tagged value is a Smi.
constexpr bool IsSmi() const { return HAS_SMI_TAG(ptr_); }// ...#define HAS_SMI_TAG(value) \
((static_cast<i::Tagged_t>(value) & ::i::kSmiTagMask) == ::i::kSmiTag)// ...// Tag information for Smi.
const int kSmiTag = 0;
const int kSmiTagSize = 1;
const intptr_t kSmiTagMask = (1 << kSmiTagSize) — 1;
This is some bit twiddling fanciness that boils down to: if the lowest bit of ptr_
is 0, then this is a “smi”. Otherwise it’s not.
We know that IsHeapObject
is returning false for us. Which means that !IsSmi()
is false. Which means that IsSmi()
is returning true!!
Our ArrayBuffer is a Smi?????
At this point I googled “v8 smi” and it turns out “smi” in V8’s parlance means “small integer”, i.e. a JS number that’s an integer within a certain range. So, somehow, this ArrayBuffer object, which is definitely an ArrayBuffer, is being seen to be a JS number.
ngl, I was pretty completely flummoxed by now. But I have this stubborn belief that everything that happens on a computer is, ultimately, possible to understand. So I kept going.
What Even Are Computers
Okay, fine. So the lowest bit of ptr_
is 0, which causes IsSmi()
to return true
. Where does ptr_
come from?
Remember this from Utils::OpenHandle?
DCHECK(that == nullptr ||
v8::internal::Object(
*reinterpret_cast<const v8::internal::Address*>(that))
.IsJSArrayBuffer());
Let’s look at the definition of v8::internal::Object()
:
class Object
: public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
public:
explicit constexpr Object(Address ptr) : TaggedImpl(ptr) {}
That passes its argument (promisingly called ptr
!) to TaggedImpl()
: (src)
template <HeapObjectReferenceType kRefType, typename StorageType>
class TaggedImpl {
public:
explicit constexpr TaggedImpl(StorageType ptr) : ptr_(ptr) {}
There it is! So ptr_
is the v8::internal::Address
, aka uintptr_t
, that was passed to v8::internal::Object
!
Now we have enough background to understand what it was I printed out with those two printf
calls, and why the answer made me even more confused.
Advanced Debugging Techniques
Here’s the first printf
call I added, in nan_typedarray_contents.h
, before we call out to GetBackingStore()
:
v8::Local<v8::ArrayBuffer> buffer = array->Buffer();
+v8::ArrayBuffer* addr = *buffer;
+uintptr_t* addr_as_uintptr = reinterpret_cast<uintptr_t*>(addr);
+
+fprintf(stderr, “addr of buffer = %08lx\n”, *addr_as_uintptr);
data = static_cast<char*>(buffer->GetBackingStore()->Data())
+ byte_offset;
The effect of this is to print out the value of the v8::internal::Address
that’s underlying buffer
.
The second printf
call was inside v8::ArrayBuffer::GetBackingStore
. For this one I just copied the code out of the failing DCHECK in Utils::OpenHandle
:
std::shared_ptr<v8::BackingStore>
v8::ArrayBuffer::GetBackingStore() {
+ fprintf(stderr, “addr inside v8::AB::GBS = %lx\n”,
+ *reinterpret_cast<const v8::internal::Address*>(this));
i::Handle<i::JSArrayBuffer> self = Utils::OpenHandle(this);
The effect of this is also to print out the value of the v8::internal::Address
that’s underlying the ArrayBuffer
.
These two printf calls should print the same value.
Here’s what they printed when I recompiled and ran the test:
addr of buffer = 4f084b8c9d
addr inside v8::AB::GBS = 4f00000070
I logged off.
In mov, Veritas
The next day, after having regained my composure, and having reaffirmed my belief that everything on a computer is ultimately graspable, I decided that C++ must be lying to me. You know what never lies? x86 assembly. It was time to peel away the illusion of pointers and objects and methods, and get down to the mov
s and jmp
s. In mov
, veritas.
Also, as a side note, we now know why
IsSmi
was returningtrue
: the low bit of 0x4f00000070 is 0, so it passes theHAS_SMI_TAG
check. It’s just that that’s … not the Address of our object.
I loaded up the test in lldb:
$ (cd ../third_party/nan/test/js; lldb ~/work/electron/src/out/Testing/Electron.app/Contents/MacOS/Electron -- typedarrays-test.js)
(lldb) ▊
I set a breakpoint immediately before the call to GetBackingStore()
:
(lldb) b nan_typedarray_contents.h:39
and ran the test:
(lldb) r
It stopped at my breakpoint, and I typed the command to lift the veil:
(lldb) disassemble
typedarrays.node`Nan::TypedArrayContents<unsigned char>::TypedArrayContents:
[...]
0x11b7af8c8 <+104>: movq 0x739(%rip), %rax ; (void *): __stderrp
0x11b7af8cf <+111>: movq (%rax), %rdi
0x11b7af8d2 <+114>: movq (%rbx), %rdx
0x11b7af8d5 <+117>: leaq 0x641(%rip), %rsi ; “addr of buffer = %08lx\n”
0x11b7af8dc <+124>: xorl %eax, %eax
0x11b7af8de <+126>: callq 0x11b7afd62 ; symbol stub for: fprintf
0x11b7af8e3 <+131>: leaq -0x38(%rbp), %rdi
-> 0x11b7af8e7 <+135>: movq %rbx, %rsi
0x11b7af8ea <+138>: callq 0x11b7afca2 ; symbol stub for: v8::ArrayBuffer::GetBackingStore()
[...]
Now we’re talking. Three quick things about x86 assembly, if you’ve not seen it much before: first, things prefixed with %
are registers. so %rip
, %rax
, %rdi
etc. are all registers, each holds a 64-bit integer. Second, most instructions put their result in the 2nd argument. So for example, movq %rbx, %rsi
copies the value of the register %rbx
into the register %rsi
. Third, (%rax)
means “the value in memory at the address contained in %rax
”, i.e. it’s a dereference, like *foo
in C++.
Looking at the above disassembly, we can guess a couple of things:
%rbx
contains theuintptr_t*
that we calledaddr_as_uintptr
. This isn’t thev8::internal::Address
itself, it’s a pointer to a bit of memory that holds that address.movq (%rbx), %rdx
fetches the Address out of RAM and puts it in%rdx
. We guess this because it happens beforecallq 0x11a7c6d5c
, which lldb has helpfully annotated is the address for thefprintf
function, and because%rbx
is also referenced before the call tov8::ArrayBuffer::GetBackingStore
.- This code is passing
this
tov8::ArrayBuffer::GetBackingStore
in the register%rsi
. We guess this because right before thecallq
instruction that jumps to the code forv8::ArrayBuffer::GetBackingStore
, we seemovq %rbx, %rsi
, which copies the value of%rbx
into%rsi
. So the calling code is “setting up” for theGetBackingStore()
to run by putting the valuethis
in the right place.
This might sound weird because this
isn’t something you pass to a function in C++! But that information still has to get to the function somehow. On POSIX systems, the convention is that it’s passed as if it were the first argument to the function, which happens to be, also by convention, the %rdi
register. (But wait, we’re seeing this passed through %rsi
…? Our spidey senses are tingling now.)
We can confirm that %rbx
points to the same address that was printed out by our debugging printf
call by asking lldb what’s in all the registers:
(lldb) register read
General Purpose Registers:
rax = 0x000000000000001c
rbx = 0x000000011b8713d0
rcx = 0xecc0c702c36f00e1
rdx = 0x0000000000000000
rdi = 0x00007ffeefbfdf98
rsi = 0x00000000000120a8
[...]
rbx
contains the value 0x000000011b8713d0
, which isn’t what we expected (0x4f084b8c9d
), but that’s because it’s not the uintptr_t
itself, it’s a pointer to that (uintptr_t*
). Let’s dereference it:
(lldb) p (void*)*(uintptr_t*)0x000000011b8713d0
(void *) $0 = 0x0000004f084b8c9d
Ah, there we go. (The (void*)
is there to get lldb to print out the value in hex.)
Alrighty, so %rbx
contains the address of our Address, and it’s going to get put into %rsi
and then we’re going to jump into v8::ArrayBuffer::GetBackingStore
. Let’s see what happens when we do, stepping one instruction at a time with si
:
(lldb) si
after a detour through libdyld.dylib
to resolve the function address, we eventually end up in v8::ArrayBuffer::GetBackingStore
, and run disassemble
again:
(lldb) disassemble
Electron Framework`::GetBackingStore():
[...]
0x101cf725a <+10>: movq %rdi, %r14
0x101cf725d <+13>: movq 0x9bacc04(%rip), %rax ; (void *): __stderrp
0x101cf7264 <+20>: movq (%rax), %rdi
0x101cf7267 <+23>: movq (%r14), %rdx
0x101cf726a <+26>: leaq 0x897e8fd(%rip), %rsi ; “v8::AB::GBS %lx\n”
0x101cf7271 <+33>: xorl %eax, %eax
-> 0x101cf7273 <+35>: callq 0x10a4c6ed6 ; symbol stub for: fprintf
… NOW WAIT A DANG MINUTE. The Nan::TypedArrayContents
code put this into %rsi
. But this code doesn’t read %rsi
at all! In fact it overwrites %rsi
at <+26>
! This code is printing the value of %rdi
!
No wonder it’s printing different things! It’s because it’s printing different things! 😬
At Least it’s Not DNS
How could this happen? When I write this
in ArrayBuffer::GetBackingStore()
, the compiler generates code that reads the value of the register %rdi
, but the calling code in Nan::TypedArrayContents
thinks it’s supposed to put this into %rsi
. What’s going on here???
This would be a good moment to recall that Nan::TypedArrayContents
is part of the native module we’re building, and is built by the system clang++ compiler, using the GNU libstdc++ standard library, whereas ArrayBuffer::GetBackingStore
is built by Chromium’s clang++ compiler, using the LLVM libc++ standard library.
But… this used to work, right? This only started failing when we rolled chromium last week! What happened? Well, let’s check the logs of the roll which broke it (which Samuel Attard kindly bisected for me): it broke when we rolled chromium from 1f252b391..90.0.4415.0.
… uh okay, there are over 5,000 commits in that range. What if we just look for changes in the //build directory? That’s where compiler configuration is mostly stored, so hopefully, this looking like a compiler configuration thing, what broke it is in there somewhere.
And, ding ding ding! Here we are, look at this!
3f200c0d Roll src/buildtools/third_party/libc++/trunk/ d9040c75c..69897abe2 (1149 commits) by Reid Kleckner · 8 weeks ago
Holy heck! This Chromium roll included an update of the bundled libc++ library that included over one thousand commits to libc++! That’s over a year of work in libc++. No wonder something broke. In fact, things broke in Chromium too, because you can see in the changelog that they reverted this commit after it landed, then fixed some stuff and landed it again.
Alright, so Chromium updated libc++. That seems like it could be the cause of this breakage, maybe. But how do we confirm that? We could try reverting this libc++ update in Chromium and rebuilding, but we can’t resist progress forever. Instead, let’s try building the native module with the same compiler and standard library as Electron itself. If it’s a problem with mismatched compiler and/or standard library, then that should fix it, right?
A “proper” fix for this would mean teaching node-gyp
how to use the Chromium clang++ and libc++, but I have no idea how node-gyp
works so I’m just going to hack it. I know that make
respects environment variables called CXX
and CXXFLAGS
to set the path to the C++ compiler executable and extra flags to pass to it, so I’ll use those. I want to set it to use the clang++ compiler that’s used to build Electron, and all the same -fwhatever
and -DWHATEVER
flags that GN passes to the compiler. After a little trial and error, and copy-pasting things out of out/Testing/obj/electron/electron_lib.ninja, I ended up with something like this:
$ export CXX=".../src/third_party/llvm-build/Release+Asserts/bin/clang++"
$ export CXXFLAGS="-DV8_DEPRECATION_WARNINGS -DDCHECK_ALWAYS_ON=1 -D_LIBCPP_HAS_NO_ALIGNED_ALLOCATION -DCR_XCODE_VERSION=1240 -DCR_CLANG_REVISION=\"llvmorg-13-init-3462-gfe5c2c3c-2\" -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_LIBCPP_ABI_UNSTABLE -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCXXABI_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_ENABLE_NODISCARD -D_LIBCPP_HAS_NO_VENDOR_AVAILABILITY_ANNOTATIONS -D_LIBCPP_DEBUG=0 -DCR_LIBCXX_REVISION=8fa87946779682841e21e2da977eccfb6cb3bded -D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORES=0 -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DV8_USE_EXTERNAL_STARTUP_DATA -DNODE_WANT_INTERNALS=1 -DELECTRON_PRODUCT_NAME=\"Electron\" -DELECTRON_PROJECT_NAME=\"electron\" -DENABLE_IPC_FUZZER -DWEBP_EXTERN=extern -DUSE_EGL -D_WTL_NO_AUTOMATIC_NAMESPACE -DTOOLKIT_VIEWS=1 -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DU_STATIC_IMPLEMENTATION -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DSK_FAVOR_WUFFS_V_0_3_OVER_V_0_2 -DSK_CODEC_DECODES_PNG -DSK_CODEC_DECODES_WEBP -DSK_ENCODE_PNG -DSK_ENCODE_WEBP -DSK_USER_CONFIG_HEADER=\"../../skia/config/SkUserConfig.h\" -DSK_GL -DSK_CODEC_DECODES_JPEG -DSK_ENCODE_JPEG -DSK_HAS_WUFFS_LIBRARY -DSK_SUPPORT_GPU=1 -DSK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -DSK_BUILD_FOR_MAC -DSK_METAL -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DHAVE_PTHREAD -DWEBRTC_ENABLE_AVX2 -DWEBRTC_NON_STATIC_TRACE_EVENT_HANDLERS=0 -DWEBRTC_CHROMIUM_BUILD -DWEBRTC_POSIX -DWEBRTC_MAC -DABSL_ALLOCATOR_NOTHROW=1 -DWEBRTC_USE_BUILTIN_ISAC_FIX=0 -DWEBRTC_USE_BUILTIN_ISAC_FLOAT=1 -DWEBRTC_HAVE_SCTP -DNO_MAIN_THREAD_WRAPPING -DLEVELDB_PLATFORM_CHROMIUM=1 -DLEVELDB_PLATFORM_CHROMIUM=1 -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_DEPRECATION_WARNINGS -DCPPGC_CAGED_HEAP -DHAVE_INSPECTOR=1 -DHAVE_OPENSSL=1 -DOPENSSL_NO_SSL_TRACE=1 -DNODE_HAVE_I18N_SUPPORT=1 -DNODE_USE_V8_PLATFORM=0 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_DARWIN_USE_64_BIT_INODE=1 -DCRASHPAD_ZLIB_SOURCE_EXTERNAL -DFLATBUFFERS_LOCALE_INDEPENDENT=0 -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -fcolor-diagnostics -fmerge-all-constants -fcrash-diagnostics-dir=../../tools/clang/crashreports -mllvm -instcombine-lower-dbg-declare=0 -fcomplete-member-pointers -arch x86_64 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -Xclang -fdebug-compilation-dir -Xclang . -no-canonical-prefixes -Wall -Werror -Wextra -Wimplicit-fallthrough -Wunreachable-code -Wthread-safety -Wextra-semi -Wunguarded-availability -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-unneeded-internal-declaration -Wno-undefined-var-template -Wno-psabi -Wno-ignored-pragma-optimize -Wno-implicit-int-float-conversion -Wno-final-dtor-non-final-class -Wno-builtin-assume-aligned-alignment -Wno-deprecated-copy -Wno-non-c-typedef-for-linkage -Wno-max-tokens -O2 -fno-omit-frame-pointer -gdwarf-4 -g1 -isysroot sdk/xcode_links/MacOSX11.1.sdk -mmacosx-version-min=10.11.0 -ftrivial-auto-var-init=pattern -fvisibility=hidden -Xclang -add-plugin -Xclang find-bad-constructs -Xclang -plugin-arg-find-bad-constructs -Xclang checked-ptr-as-trivial-member -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DPROTOBUF_ALLOW_DEPRECATED=1 -Wno-shorten-64-to-32 -Wno-microsoft-include"
$ make -C ~/work/electron/src/third_party/nan/test/build typedarrays
Holy robot vomit. This failed at first because it turns on the find-bad-constructs
clang plugin which doesn’t like some things in nan, so I removed those flags and a couple of others, and eventually it built a typedarrays.node
module. I ran the test again and…
… it worked!! THANK FUCK.
The Road to Victory
Okay, so which of those ten million flags is actually responsible for fixing this problem? I tried building with just the Chromium clang++ and none of the fancy flags, and the error came back, so one of those flags must be important. I narrowed it down by deleting half the flags, rebuilding and rerunning until I zeroed in on the flag responsible:
-D_LIBCPP_ABI_UNSTABLE
Building with Chromium’s clang++ and this flag present is enough to produce a typedarrays.node
that functions at least well enough to pass nan’s tests. Phew! So uh, how do we turn this into something we can commit to the roller branch?
Let’s figure out what’s responsible for setting this flag. Searching on s.c.o leads us to this line in //build/config/c++/BUILD.gn:
if (libcxx_abi_unstable) {
defines += [ “_LIBCPP_ABI_UNSTABLE” ]
}
Well, interesting. Can we convince libcxx_abi_unstable
to be false? That would mean Electron would be building without this flag, so maybe a node module also built without this flag would work?
libcxx_abi_unstable
turns out to be a build argument, so we can try setting libcxx_abi_unstable = false
in args.gn
to build Electron without the _LIBCPP_ABI_UNSTABLE
flag.
And there you have it:
--- a/build/args/all.gn
+++ b/build/args/all.gn
@@ -19,4 +19,7 @@
dawn_enable_vulkan_validation_layers = false
+# This breaks native node modules
+libcxx_abi_unstable = false
+
is_cfi = false
This fixes the problem (and exposes a different problem, but this essay is already about ten times too long, so I’m not going to go into that).
🎉
What Now?
Ultimately, it is asking for trouble to be building dynamically-linked native node modules that use C++ features across the dynamic link boundary with a different compiler. The C++ ABI is huge and complicated, and as we have discovered at great length today, varies between compilers, and even based on what #define
s are used.
There are two ways to resolve this:
- Build C++ native modules using the same clang++ and libc++ as Electron itself, or
- Don’t call C++ things across the dynamic link boundary.
Honestly, I think (2) is the best way forward. N-API in Node.js provides exactly this: a stable C (not C++) ABI for interacting with V8 and Node without relying on the enormous, varied and moving target that is the C++ ABI. Modules built with N-API would not encounter this kind of problem, because they stick to the C ABI, which is much smaller, simpler and more stable. This is an ecosystem problem, though, and so it will take a long time to shift people away from C++-based systems like Nan and towards N-API.
(1) is also possible, though. I think we can provide some support in tools like electron-rebuild
to download & configure the Chromium compiler toolchain, instead of using the system one. It will significantly complicate electron-rebuild
, but it will also probably eliminate most issues like this one.
Extra Credit
I’m still curious though. I want to know what the heck _LIBCPP_ABI_UNSTABLE
does, and why it causes a mismatch in what register this
is passed in. As mentioned, this essay is already too long so I’m going to leave some tantalizing hints and let you follow up on them if you desire. _LIBCPP_ABI_UNSTABLE
is a proxy for a whole bunch of other #define
s. I tried building the native module without _LIBCPP_ABI_UNSTABLE
but just each of those individual flags, and it turned out _LIBCPP_ABI_ENABLE_SHARED_PTR_TRIVIAL_ABI
was enough to generate compatible code. That’s only referenced in one place in libc++, where it causes __attribute__((trivial_abi))
to be added to shared_ptr
and weak_ptr
. What is trivial_abi
? Well, it’s a relatively new clang feature that involves %rsi
and %rdi
. Hmmmm!
Thanks to Keeley Hammond for reviewing drafts of this article, and to Deepak Mohan for encouraging me to publish it as a blog post.