[{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nC++ build issues have always been a hot topic, especially in various language wars, where they are often used as a negative example. Interestingly, most C++ programmers are often involved in maintaining existing systems, facing highly solidified, unchangeable build processes. The number of people who actually need to set up a project from scratch is in the minority.\nThis leads to an awkward situation: when you really need to build from scratch and want to find reference cases, you\u0026rsquo;ll find that there is no so-called Best Practice, only various unsystematic workarounds, which is very frustrating.\nclice is also a C++ project started from scratch, and inevitably, we\u0026rsquo;ve made almost all the same mistakes our predecessors did. Recently, we\u0026rsquo;ve finally figured out a workflow that we consider quite elegant. So, we want to take this opportunity to share this solution and, by the way, popularize some of the principles and knowledge behind C++ builds. We hope it will be helpful to you!\nWhere does complexity come from? Before discussing solutions, let\u0026rsquo;s first analyze the problem. Where does the complexity of C++ builds actually come from? If there were a package manager, would all problems be solved?\nI believe the complexity mainly comes from two different dimensions: the toolchain and the build system.\nToolchain So what is a toolchain? Besides the compiler and linker, it also includes more details that are overlooked by most tutorials. We can break down these concepts with a simple command.\nConsider the following file. Executing clang++ -std=c++23 main.cpp -o main will give you an executable program.\n// main.cpp #include \u0026lt;print\u0026gt; int main () { std::println(\u0026#34;Hello world!\u0026#34;); return 0; } So, the first question is, we all know that the traditional C/C++ compilation model is divided into two processes: compile and link. First, the compiler is called to compile intermediate .obj files, and then the linker is used to link them into an executable. Why did we get it done with just one command here?\nThis is because clang++ is just a driver; it will call the compiler and linker for you to complete all the work. How can we verify this? clang has a command-line option -### that can be used to only output the underlying commands to be executed, without actually executing the tasks.\nFor example, executing clang++ -### -std=c++23 main.cpp -o main on my Linux environment produces the following output (unimportant information has been omitted with ...):\n\u0026#34;/usr/lib/llvm-20/bin/clang\u0026#34; \u0026#34;-cc1\u0026#34; ... \u0026#34;-triple\u0026#34; \u0026#34;x86_64-pc-linux-gnu\u0026#34; \u0026#34;-resource-dir\u0026#34; \u0026#34;/usr/lib/llvm-20/lib/clang/20\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/include/c++/14\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/include/x86_64-linux-gnu/c++/14\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/include/c++/14/backward\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/lib/llvm-20/lib/clang/20/include\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/local/include\u0026#34; \u0026#34;-internal-isystem\u0026#34; \u0026#34;/usr/x86_64-linux-gnu/include\u0026#34; \u0026#34;-internal-externc-isystem\u0026#34; \u0026#34;/usr/include/x86_64-linux-gnu\u0026#34; \u0026#34;-internal-externc-isystem\u0026#34; \u0026#34;/include\u0026#34; \u0026#34;-internal-externc-isystem\u0026#34; \u0026#34;/usr/include\u0026#34; ... \u0026#34;-std=c++23\u0026#34; ... \u0026#34;-o\u0026#34; \u0026#34;/tmp/main-a82bce.o\u0026#34; ... \u0026#34;main.cpp\u0026#34; \u0026#34;/usr/bin/ld\u0026#34; ... \u0026#34;-dynamic-linker\u0026#34; \u0026#34;/lib64/ld-linux-x86-64.so.2\u0026#34; ... \u0026#34;/usr/lib/x86_64-linux-gnu/Scrt1.o\u0026#34; \u0026#34;/usr/lib/x86_64-linux-gnu/crti.o\u0026#34; \u0026#34;/usr/lib/gcc/x86_64-linux-gnu/14/crtbeginS.o\u0026#34; \u0026#34;/usr/lib/gcc/x86_64-linux-gnu/14/crtendS.o\u0026#34; \u0026#34;/usr/lib/x86_64-linux-gnu/crtn.o\u0026#34; \u0026#34;-L/usr/lib/gcc/x86_64-linux-gnu/14\u0026#34; \u0026#34;-L/usr/lib64\u0026#34; \u0026#34;-L/usr/lib/x86_64-linux-gnu\u0026#34; \u0026#34;-L/usr/lib/llvm-20/lib\u0026#34; \u0026#34;-L/usr/lib\u0026#34; \u0026#34;-lstdc++\u0026#34; \u0026#34;-lm\u0026#34; \u0026#34;-lgcc_s\u0026#34; \u0026#34;-lgcc\u0026#34; \u0026#34;-lc\u0026#34; \u0026#34;/tmp/main-a82bce.o\u0026#34; You can see that clang++ indeed calls the compiler and linker underneath to complete the tasks. What\u0026rsquo;s more noteworthy is that it injects a large number of implicit flags! In fact, the often-ignored part of the toolchain is these implicit compilation parameters.\nGNU-style compilers like g++ and clang++ can often directly call the linker and inject these implicit parameters. Therefore, build systems will call them directly instead of the linker to perform linking. You can use options like -fuse-ld to switch the linker launched by the driver. This also explains why compiling C++ programs with clang instead of clang++ results in many undefined references to the C++ standard library. In fact, on many distributions, both clang and clang++ are just symbolic links to /usr/lib/llvm-20/bin/clang, and this binary program injects different implicit parameters based on the program name and other arguments.\nOn the other hand, MSVC-style compilers like cl.exe or clang-cl tend to pass these implicit states (such as INCLUDE, LIB, and LIBPATH) through environment variables. Therefore, before using these compilers, you usually must first run the initialization script vcvarsall.bat provided by Visual Studio to \u0026ldquo;activate\u0026rdquo; the current terminal\u0026rsquo;s environment, or build directly in the Developer Command Prompt. Otherwise, the compiler will report errors because it cannot find standard library headers or system libraries. In this case, the build system will generally also call the linker directly to perform linking.\nA complete toolchain can be considered to consist of three parts: Tools, Runtime, and Environment.\nTools are the various utilities used during the build process, including:\nCompiler Drivers: Responsible for orchestrating the entire process, such as g++ and clang++. Translators: The actual compilers and assemblers responsible for translating C++ code into machine code, such as cc1 and as. Linkers: Responsible for piecing together fragmented .o files and library files, such as ld, lld, and mold. Binutils: Responsible for auxiliary tasks like archiving, format conversion, and symbol processing, such as ar, objcopy, strip, and nm. Runtime refers to the various libraries implicitly linked in the options above. They are essential:\nC Runtime (CRT) Startup Objects: These are the Scrt1.o, crti.o, crtn.o, etc., seen in the log. After the operating system loads the program, the first address it jumps to is usually _start in the CRT. These object files are responsible for initializing the stack, heap, running global constructors (a C++ feature), and finally calling main. They also perform cleanup work after main returns.\nC Standard Library: Corresponds to -lc in the log. This is the implementation of the C standard library, providing POSIX or system API wrappers that interact with the operating system kernel, such as malloc, printf, and open. Common implementations include GNU\u0026rsquo;s glibc and musl, UCRT on Windows, and the developing llvm libc from the LLVM community.\nC++ Standard Library: Corresponds to -lstdc++ in the log. It provides implementations of high-level C++ standard library features like std::vector and std::iostream. It\u0026rsquo;s worth noting that it usually depends on lower-level Compiler Support Libraries to implement features like exceptions and RTTI. The main implementations are libstdc++ (gcc\u0026rsquo;s standard library), libc++ (clang\u0026rsquo;s standard library), and MSVC STL.\nCompiler Support Libraries: Corresponds to -lgcc_s in the log. This is a class of easily overlooked but crucial libraries. They are mainly responsible for two things:\nBuiltins: Handling operations that the target CPU\u0026rsquo;s instruction set cannot directly support. For example, performing a 64-bit division on a 32-bit CPU, or soft-float operations on a CPU without floating-point support. The compiler translates these operations into calls to functions like __udivdi3. Language Runtime Support: The implementation of some advanced C++ features. For example, Exception Handling (exception catching and stack unwinding) is usually provided by libunwind or libgcc_eh; while the C++ ABI (for features like dynamic_cast, RTTI) is provided by libcxxabi or libsupc++. In the Windows MSVC environment, these are usually encapsulated together in vcruntime140.dll. Sanitizer Runtimes: These are the libraries (like libclang_rt.asan.so) linked when you enable -fsanitize=address/thread/memory. They work by inserting instrumentation code at compile time and taking over the memory allocator (malloc/free) at runtime, using Shadow Memory technology to detect undefined behaviors like memory out-of-bounds and data races.\nEnvironment is the context in which the compilation is executed, including:\nTarget Triple: Corresponds to -triple x86_64-pc-linux-gnu in the log. It defines the detailed \u0026ldquo;identity\u0026rdquo; of the target platform, usually in the format \u0026lt;arch\u0026gt;-\u0026lt;vendor\u0026gt;-\u0026lt;sys\u0026gt;-\u0026lt;abi\u0026gt;. It determines what instruction set the compiler generates (x86 vs ARM), what object format to use (ELF vs PE), and the details of calling conventions. Cross Compilation: This is a very important concept in modern builds. When the Host (the machine running the compiler) is different from the Target (the machine running the product), you are cross-compiling. This difference is not just about CPU architecture (e.g., compiling for ARM on x86), but can also be about the operating system or even the C runtime library version (e.g., compiling a product that depends on glibc 2.17 on a system running glibc 2.35). Sysroot (System Root): To solve the problem of environment pollution during cross-compilation, Sysroot was created. It is a logical root directory that simulates the file system structure of the target machine. When you specify --sysroot=/path/to/sysroot, the compiler will ignore the host system\u0026rsquo;s /usr/include and look for dependencies in the Sysroot instead. It\u0026rsquo;s worth noting that most platforms have a default toolchain, such as the MSVC toolchain on Windows, which includes a complete set of tools like the compiler, linker, various utilities, and runtime libraries. There\u0026rsquo;s the GNU toolchain on Linux and the Apple Clang toolchain on Mac. Many platforms have more than one; Windows also has the MinGW toolchain, and all these toolchains can be partially switched to the LLVM toolchain.\nBuild System Having solved the toolchain problem for a single file, we\u0026rsquo;ve managed compilation and linking using the compiler driver. But in the real world, projects often contain thousands of source files. The core task of a Build System is to figure out how to efficiently and correctly direct the toolchain to assemble these thousands of files into the final product.\nWe can examine the development of C++ build systems from the perspective of \u0026ldquo;complexity\u0026rdquo; evolution over time:\nThe Primitive Era: Shell Scripts In the very beginning, building a project meant writing a Shell script. The logic was very crude: list all .c files, hardcode the compiler path, and call it directly. As the project grew, every time a single line of code was modified, hundreds of files had to be recompiled from scratch. The waiting time went from a few seconds to tens of minutes, resulting in a terrible development experience.\nThe Cornerstone of Build Systems (1976) To solve the problem of redundant compilation, Stuart Feldman at Bell Labs created Make. It introduced the Dependency Graph and Incremental Build. By comparing file timestamps (mtime), if main.cpp\u0026rsquo;s modification time is later than main.o\u0026rsquo;s, it gets recompiled; otherwise, it\u0026rsquo;s skipped. This simple rule laid the foundation for build systems.\n3. The Portability Crisis (1990s)\nIn the 90s, operating systems blossomed (Solaris, HP-UX, Linux, BSD, Windows). Although Make solved automation, Makefiles were not portable. Different OSs had completely different Shell commands, compiler flags, and library paths. The world split into two camps:\nUnix Camp - Autotools (GNU): The famous ./configure \u0026amp;\u0026amp; make. Its core idea is \u0026ldquo;probing\u0026rdquo;—running a large number of scripts before the build to scan the system environment (is there a unistd.h? where is libz?), and then dynamically generating a Makefile adapted to the current system. IDE Camp (Visual Studio / Xcode): Windows and Mac chose another path—deeply integrating the build system with the editor. Visual Studio\u0026rsquo;s .sln and Xcode\u0026rsquo;s .xcodeproj provided an out-of-the-box experience, but at the cost of sacrificing automation and flexibility, and being completely non-cross-platform. True Cross-Platform (CMake, 2000s) With the explosion of open-source software, code needed to run on both Linux servers and Windows desktops. To end the nightmare of \u0026ldquo;maintaining two sets of build scripts,\u0026rdquo; CMake was born. CMake is not a build tool; it\u0026rsquo;s a Meta-Build System, or a Generator. Developers write an abstract CMakeLists.txt, and CMake is responsible for \u0026ldquo;translating\u0026rdquo; it into the native dialect of each platform—generating .sln files on Windows, .xcodeproj on Mac, and Makefile on Linux.\nModern Engineering: The Challenges of Scale and Reproducibility (2010s - Present) Entering the era of mobile internet and cloud-native, giants (Google/Meta) saw their code repositories swell to hundreds of millions of lines (Monorepo), and polyglot programming became the norm. New scenarios naturally brought new problems:\nBuild Speed: Makefile\u0026rsquo;s parsing speed is too slow and doesn\u0026rsquo;t support distribution. We need to shard compilation tasks and send them to a cluster, which is known as Distributed Build, and implement Remote Caching—if colleague A has already compiled base_lib, colleague B should just download the cache instead of wasting local CPU to recompile. Environment Consistency (Hermetic Build): It works on my machine, but fails on CI or other machines. This is the biggest pain point in modern development, usually caused by using dependencies from host system directories (like /usr/include) with inconsistent versions. Modern builds pursue Hermeticity—the build process must be like running in a sandbox, strictly prohibiting access to undeclared system libraries to ensure Reproducible Build. Polyglot: A modern project often uses C++ for the backend, Python for glue code, Rust for security components, and TypeScript for the frontend. CMake is very painful to use for non-C/C++ languages. Dependency Management: Whether a project is large or small, it often needs to introduce third-party libraries. However, C++ has long lacked a unified package manager like Rust\u0026rsquo;s Cargo or Node\u0026rsquo;s npm. Developers have to manually handle source code downloads, matching build parameters (Debug/Release, Static/Shared), and complex ABI compatibility issues. Traditional git submodules or system-level package managers (like apt/brew) are often inadequate in cross-platform and multi-version scenarios. To solve these problems, many new tools have emerged:\nNinja: A new build backend, a replacement for Make, with extremely fast build speeds. FetchContent/Conan/vcpkg: Aim to reduce the difficulty of introducing dependencies in CMake. ccache/sccache: Calculate a cache key based on compilation inputs (compiler version/flags, preprocessed results, etc.) to achieve cross-project/cross-machine reuse (sccache can also do remote caching). distcc/icecream: Distributed build, distributing compilation tasks to other machines. Bazel/Buck2: Build systems written by Google and Meta based on their internal scenarios. They execute builds in a sandbox, come with a built-in build cache, and achieve good hermeticity and cross-language support. Meson/XMake: Modern build systems with built-in package management, using python like dsl/lua as the build language, aiming to provide higher usability than CMake. Summary Now we can answer the question from the beginning: where does the complexity of C++ builds come from? It actually comes from the combinatorial explosion caused by freedom. C++ has so many toolchains and so many build systems. It\u0026rsquo;s easy for a build system configuration that works with my toolchain to fail with another. Add to that the various implicit compiler flags that can hide problems, and you might not even realize it. But now you probably have an intuitive understanding.\nPurpose Now, we can formally discuss clice\u0026rsquo;s build issues. First, we need to clarify our goals. What do we want to achieve? We hope to have the following three environments for building.\nDevelop: For developers\u0026rsquo; local development. We want the local build/compile speed to be as fast as possible to reduce interruptions caused by waiting for compilation. We also need to ensure debug information is preserved for easy debugging. We must enable sanitizers like the address sanitizer to catch errors early in the development process. CI: For automatic builds on platforms like Github Actions, running unit/integration tests to ensure reliability. We also want the build/compile speed to be as fast as possible. We want to test on different platforms/environments as much as possible to prevent crashes due to accidental reliance on platform-specific features. We also hope to keep the CI environment consistent with the Develop environment so that CI errors can be reproduced locally. Release: For building the final distributable binary product. We want the product to be as fast as possible, ensuring it\u0026rsquo;s built with LTO. We want the program to print a function call stack in the logs when it crashes, to help locate the scene when users file issues. We also want the binary distributed to users to be as small as possible, so we want to strip debug information into a separate file (this can reduce the program size by about 2/3). When needed, we can use the relative addresses to get the corresponding symbols. We also want as few runtime dependencies as possible and will statically link the entire program. First, let\u0026rsquo;s consider clice\u0026rsquo;s build dependencies. Currently, they are llvm, libuv, spdlog, toml++, croaring, flatbuffers, and cpptrace. It is built with C++23 and depends on a high-version C++ compiler. It uses different C++ standard libraries on different platforms:\nWindows: msvc stl Linux: libstdc++ macOS: libc++ As you can see, clice doesn\u0026rsquo;t actually have many dependencies, so the complexity of dependency management is not high. We have two build systems: CMake uses FetchContent to manage these dependencies, while XMake uses its built-in package manager, xrepo. Since our number of dependencies is small, the complexity here is not high. Both CMake and XMake support pulling source code and building it locally (building dependencies from source), which meets our need for build consistency. Most dependencies have very few source files and have little impact on build speed, except for LLVM!\nPrebuilt Libraries clice depends on the clang libraries to parse the AST. Even if we only build the necessary targets, the number of files to build is as high as 3,000. Building on Github CI takes an average of two hours. Since we want CI to be as fast as possible, we need to consider optimizing LLVM\u0026rsquo;s build speed. Two methods easily come to mind:\nGithub Actions supports caching. We can use ccache to cache LLVM\u0026rsquo;s build results and reuse them between different workflows. However, this method is not stable, especially since LLVM\u0026rsquo;s build results take up a lot of disk space, which can easily fill up Github\u0026rsquo;s cache. Pre-compile LLVM and publish the binaries on Github Releases. Then, just download them during the build. This way, not only can CI builds use them, but users who want to compile and develop clice locally can also use them. At first, we used Github Actions for caching, but after running into problems, we decisively switched to maintaining pre-compiled binaries. However, building pre-compiled binaries is not a simple matter. The biggest problem is ABI compatibility. The combinations of C++ toolchains/build parameters are numerous, and many options can affect the ABI. For a discussion on C++ ABI, you can refer to Thoroughly Understanding C++ ABI.\nWe need to support three platforms: Windows, Linux, and macOS. For each platform, we need to build three different versions of the product to meet our needs:\nDebug + Address Sanitizer to expose undefined behavior in the code as early as possible. ReleaseWithDebInfo to test code behavior with optimizations enabled. ReleaseWithDebInfo + LTO to build the final binary product. The address sanitizer depends on compiler-rt. Different versions of compiler-rt cannot be mixed, let alone those from different compilers. This means we have to lock down the compiler version. There is also the infamous glibc version problem: a program built on a high-version glibc will fail to run on a low-version glibc due to dependencies on high-version glibc symbols. And our C++ compiler version is very high, and the Linux distributions that support them generally have high glibc versions too, like Ubuntu 24.04. How do we solve the glibc version problem for pre-compiled binaries? We also need to ensure that the CI environment is consistent with the local development environment. To solve this problem elegantly, we did a lot of exploration.\nExploration First, statically linking glibc is a highly discouraged practice for complex reasons, which you can read about in this discussion: Why is statically linking glibc discouraged?. In contrast, another C standard library, musl, is very friendly to static linking. But using it is not easy either; it requires building the C++ standard library, runtime, etc., from scratch, and there might be potential performance degradation. To solve the problem for mainstream Linux distributions, we still tried to solve the glibc problem first.\nDocker The most obvious solution is Docker. With Docker, we can theoretically unify the development environment across different platforms by providing a corresponding Docker image with all dependencies installed for each platform. However, due to the special nature of our environment—we depend on a high-version C++ toolchain but want a low-version glibc—we cannot use the C++ toolchains from existing Linux distributions, because their libstdc++ is compiled with a high-version glibc. How to solve this?\nThe initial solution was to find a low-version glibc, then compile a low-version glibc ourselves. Then use this low-version glibc to compile a high-version libstdc++, and then use these two products to compile LLVM and clice. I felt this solution was too complex and prone to problems. Moreover, we are not very familiar with the compilation options for glibc and libstdc++, so we might run into some pitfalls.\nBesides, Docker\u0026rsquo;s biggest pain point is its poor native cross-platform experience (especially on Windows/macOS where it relies on a VM). clice needs to be compiled and run on Windows, Linux, and macOS. If we were to maintain images, we would need one for each of these three platforms. And since we frequently update the toolchain configuration and version, building images could become very frequent, making the maintenance cost very high. From my observation, most people who use Docker to manage development environments are in Linux-only scenarios, meaning they don\u0026rsquo;t consider cross-platform. In that case, the burden is much lighter.\nIn short, this solution is theoretically feasible, but because the cumulative cost was too high, I rejected it. I decided to look for other, more lightweight methods.\nZig zig is an emerging programming language, positioned as a \u0026ldquo;better C\u0026rdquo;. To enhance interoperability with C/C++, zig integrates clang directly at the source code level. Using the zig cc/zig c++ commands, you can use zig as a C/C++ compiler. Furthermore, zig directly integrates the sysroot for various targets into its installation package, making it extremely convenient for us to perform cross-compilation. You can check the support for various targets at zig-bootstrap. For example, use the following command for cross-compilation:\nzig c++ -target x86_64-linux-gnu.2.17 main.cpp -o main The generated main will be compiled with glibc 2.17, without any extra setup. This is incredibly convenient. Since zig c++ is just a wrapper for clang, it can also be used to compile clice. This meant we hoped to use zig to unify the development environments across different platforms while solving the glibc version problem.\nHowever, after actually trying it, I failed. The main reasons are as follows:\nZig bundles the header files for all glibc versions together and uses macros to control whether certain headers are used. But C++17 supports using __has_include to detect if a header file exists. A header that shouldn\u0026rsquo;t exist in a low-version glibc does exist in Zig\u0026rsquo;s bundled headers. This causes __has_include to misjudge, leading to compilation failures. Zig also directly integrates runtimes from the LLVM ecosystem like libc++, libunwind, and libcxxabi, and compiles them on the fly. I tried various methods to switch to other runtimes, but none worked. After looking at the source code, it directly forces the injection of compilation parameters, and there is currently no way to modify them. In the future, after clice itself supports C++20 modules, we also plan to migrate the source code to modules. But Zig does not support import std. Since it implicitly forces the non-module build of libc++, I cannot control it to use the libc++ module I built. Zig currently does not support cross-compilation for windows-msvc, and on macOS, it forces the use of its own linker. Currently, enabling LTO will cause an error right at the command-line parsing stage. In short, we ran into many problems. For the sake of the future, I decided not to use Zig in the end. However, if you don\u0026rsquo;t encounter the problems we did, it\u0026rsquo;s still usable. zig cc is indeed a very convenient cross-compilation tool, especially when you need to release your code to multiple different platforms. But clice doesn\u0026rsquo;t have a strong need for cross-compilation, so this advantage is not enough to outweigh the problems we encountered.\nPixi So I thought carefully about our problem. The main difficulty now is the conflict between the low-version glibc and the high-version compiler on Linux. Building it ourselves is too much trouble. If some professionals have already built it, can\u0026rsquo;t we just use their work to solve the problem? With this idea in mind, I started searching for such a thing. AI told me I could use micromamba, which uses packages from conda-forge, where most of the software is compiled against glibc 2.17.\nConda? My impression of it was only from using Anaconda on Windows to install deep learning dependencies, which took a very long time to install and was slow to start. I was also told not to use conda at work because it\u0026rsquo;s a paid service. In short, all bad impressions: hard to use and costs money. But I decided to give it a try anyway and found that indeed, it has a sysroot_linux-64 package. Simply specifying the version ==2.17 gets you the low-version glibc. And the high-version compilers in its environment automatically use this sysroot without any extra configuration, which is just as convenient as Zig—out of the box.\nAfter a closer look at Anaconda\u0026rsquo;s pricing policy, the conda software itself is open-source, and packages on channels maintained by the open-source community, like conda-forge, are also free. Only when using the official default source does it charge commercial companies. You can find related discussions in this blog post: Towards a Vendor-Lock-In-Free conda Experience.\nTaking it a step further, I discovered pixi, a package manager based on conda-forge. It allows installing packages in a declarative way. Then I carefully checked the packages on conda-forge and found that the packages for Windows, Linux, and macOS are all very complete. So I immediately thought, we can use pixi to unify the development environments across different platforms! And solve the glibc problem at the same time.\nWrite the following pixi.toml description file:\n[workspace] name = \u0026#34;clice\u0026#34; version = \u0026#34;0.1.0\u0026#34; channels = [\u0026#34;conda-forge\u0026#34;] platforms = [\u0026#34;win-64\u0026#34;, \u0026#34;linux-64\u0026#34;, \u0026#34;osx-arm64\u0026#34;] [dependencies] python = \u0026#34;\u0026gt;=3.13\u0026#34; cmake = \u0026#34;\u0026gt;=3.30\u0026#34; ninja = \u0026#34;*\u0026#34; clang = \u0026#34;==20.1.8\u0026#34; clangxx = \u0026#34;==20.1.8\u0026#34; lld = \u0026#34;==20.1.8\u0026#34; llvm-tools = \u0026#34;==20.1.8\u0026#34; compiler-rt = \u0026#34;==20.1.8\u0026#34; [target.linux-64.dependencies] sysroot_linux-64 = \u0026#34;==2.17\u0026#34; gcc = \u0026#34;==14.2.0\u0026#34; gxx = \u0026#34;==14.2.0\u0026#34; Activate the environment with pixi shell, and it will automatically install the above packages on these three platforms. At the same time, it automatically installs the low-version glibc sysroot and high-version libstdc++ on Linux. Such a lightweight tool that unifies development environments across different platforms. This is the perfect solution in my mind! Much better than Docker.\nNot only does it solve the toolchain consistency problem, but pixi also has many other nice-to-have practical features. First, it can also be used to manage Python dependencies (by integrating uv from source to manage pypi dependencies). Since clice happens to use Python for some integration tests, we can use pixi to manage that as well (before this, we were using uv to install and manage Python; although uv is quite good, if we can get it done with one tool, we don\u0026rsquo;t want to install a second).\n[feature.test.pypi-dependencies] pytest = \u0026#34;*\u0026#34; pytest-asyncio = \u0026#34;\u0026gt;=1.1.0\u0026#34; pre-commit = \u0026#34;\u0026gt;=4.3.0\u0026#34; In addition, it has a very flexible task runner based on deno_task_shell. I used to write some local shell scripts to facilitate my local development, but I never committed them to the repository because they couldn\u0026rsquo;t be used on Windows. Now, with pixi\u0026rsquo;s tasks, I can easily define some cross-platform convenience tasks, which is also convenient for other developers, such as building, running unit tests, integration tests, and so on.\n[tasks.ci-cmake-configure] args = [\u0026#34;build_type\u0026#34;] cmd = [\u0026#34;cmake\u0026#34;, \u0026#34;-B\u0026#34;, \u0026#34;build\u0026#34;, \u0026#34;-G\u0026#34;, \u0026#34;Ninja\u0026#34;, \u0026#34;-DCMAKE_BUILD_TYPE={{ build_type }}\u0026#34;, \u0026#34;-DCMAKE_TOOLCHAIN_FILE=cmake/toolchain.cmake\u0026#34;, \u0026#34;-DCLICE_ENABLE_TEST=ON\u0026#34;, \u0026#34;-DCLICE_CI_ENVIRONMENT=ON\u0026#34;, ] Besides, pixi also supports flexible environment combinations, allowing you to easily define different dependencies for different environments. In short, it fits our needs perfectly. So I immediately started using pixi to manage clice\u0026rsquo;s development environment. After being able to easily ensure that the local environment and the CI environment are consistent, building pre-compiled binaries is no longer a difficult task. And so, finally, clice can run on operating systems with glibc 2.17.\nSummary This article mainly discussed where the complexity of C++ builds comes from, and the series of toolchain version-related build problems encountered when trying to speed up CI builds with pre-compiled binaries. Finally, after continuous trial and error, we found that we can use pixi to lock down the toolchain version, thereby reducing complexity. The key to this workflow is to use pixi to create a reproducible build environment, while the actual building and package management are still handled by CMake/XMake. Now, developers can easily reproduce the CI environment, and we have already ensured the reliability of CI through countless tests. Thus, they can configure the environment very quickly for development, which also lowers the barrier for new developers to contribute.\nNow on Linux, we can compile without relying on any system toolchain, using only the toolchain installed by pixi, which can be said to be completely reproducible.\nBut it\u0026rsquo;s worth noting that due to SDK licensing issues, Windows and macOS are not distributable, and developers still need to have the relevant development tools installed on their computers. That is, to compile on these two platforms, developers must install and configure the system\u0026rsquo;s native build toolchain themselves (such as MSVC/Windows SDK on Windows or Xcode Command Line Tools on macOS). There is currently no perfect alternative for this problem. Perhaps when LLVM libc is officially released and mature, we can switch to the full LLVM toolchain, thereby completely eliminating the dependency on the operating system\u0026rsquo;s native SDK through toolchain bootstrapping. On the other hand, these two platforms have superior ABI stability and libc compatibility. Unlike the common glibc version dependency problems on Linux, even when building on the latest versions of Windows and macOS, the product can usually be made compatible with lower versions of the operating system with simple configuration.\nSo, is pixi a silver bullet? Obviously not. In fact, its isolation and reproducibility are not as good as solutions like Docker or Nix, as it\u0026rsquo;s just based on environment variables for some isolation. If someone hardcodes a system dependency in a build script or modifies the system configuration, pixi is of course powerless. But this is our trade-off between ease of use and reproducibility. Achieving such a high degree of cross-platform reproducibility at a low cost is already quite worthwhile.\nAnother point is that the topic of package managers, which many C++ developers care about, was only briefly mentioned in the article. Why? As mentioned earlier, there are already many C++ package management tools, but the usability of a package manager depends on whether there are enough reliable people to package things. The chaotic state of C++ toolchains and build systems dictates this outcome: a centralized repository will never be able to meet everyone\u0026rsquo;s diverse needs. However, for personal development, using a tool like XMake is already quite sufficient.\nMy personal view is that although a centralized package manager is not very realistic, defining some standards to reduce the cost of communication between different ecosystems is very feasible and has great value. For example, many developers may not even consider different toolchains when writing a build system, and they hardcode compilation options. When you switch to a different toolchain, it breaks. In this situation, the person packaging it can only patch the build system to solve the problem, which is very inefficient. If there were some kind of standardized toolchain here—the content would be simple, just the intersection of mainstream toolchains—and you wanted to add a feature, like enabling sanitizers, instead of directly adding compilation options in a CMake string, you would have a standardized interface that automatically selects the correct switch for different toolchains. Wouldn\u0026rsquo;t that be convenient?\nXMake actually has a toolchain abstraction and some set_policy options that can achieve the effect I mentioned above, although there aren\u0026rsquo;t many. But what I want to say is that this is actually a process of joint effort between upstream and downstream. Relying solely on the build system side to do abstraction can easily encounter some corner cases, and at that point, it requires the upstream to be able to fix the relevant toolchain errors in a timely manner.\nSimilarly, although package management cannot be centralized, can packages from different build systems be used conveniently with each other? It\u0026rsquo;s actually not that difficult. The main way C++ uses dependencies is still include + lib, which is simple. The key is to provide some extra metadata to ensure the usability of the package. There is currently such a standard, the Common Package Specification (CPS), but it is not widely recognized by the C++ community.\n","permalink":"https://www.ykiko.me/en/articles/1985940996270339378/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eC++ build issues have always been a hot topic, especially in various language wars, where they are often used as a negative example. Interestingly, most C++ programmers are often involved in maintaining existing systems, facing highly solidified, unchangeable build processes. The number of people who actually need to set up a project from scratch is in the minority.\u003c/p\u003e","title":"Building an Elegant C++ Cross-Platform Development and Build Workflow"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nWhy do we need AOT for CuTe DSL? CUTLASS C++ is a library for writing high-performance CUDA operators, known for its complexity and difficulty. To reduce the learning curve, NVIDIA introduced the Python-based CuTe DSL. Using Python instead of C++ templates for metaprogramming offers many benefits. First, users no longer have to struggle with the obscure template errors of C++, which is a major headache for C++ beginners; now they can focus on the code logic. Additionally, nvcc compilation is slow, and most of that time is spent in the compiler frontend, parsing C++ code. Especially for template-heavy libraries like CUTLASS, most of the time is spent processing template instantiations. Using CuTe DSL can bypass this issue. Compared to C++ code using CUTLASS, its compilation speed can be tens or even hundreds of times faster. Furthermore, operators and unit tests can now be written together in Python, which is much more convenient.\nUsing Python for prototyping is excellent, but when deploying inference services, we want dependencies to be as simple as possible. Having a large number of Python dependencies that could crash due to version issues is undesirable. It would be great if operators written with CuTe DSL could be compiled into a library for C++ code to call. This is precisely why we want to support AOT for CuTe DSL.\nExport Binary CuTe DSL in v4.3 added options to export the ptx and cubin for compiled kernels. Set the following environment variables:\nexport CUTE_DSL_KEEP_PTX=1 export CUTE_DSL_KEEP_CUBIN=1 export CUTE_DSL_DUMP_DIR=/tmp You can directly access the __ptx__ or __cubin__ attributes of the kernel to get the corresponding values:\ncompiled_foo = cute.compile(foo, ...) print(f\u0026#34;PTX: {compiled_foo.__ptx__}\u0026#34;) with open(\u0026#34;foo.cubin\u0026#34;, \u0026#34;wb\u0026#34;) as f: f.write(compiled_foo.__cubin__) So now we have the cubin file for the operator. The remaining questions are:\nHow to load cubin-formatted operators in C++ code. How to embed the cubin file into C++ code and compile it into a library. How to generate a .h header file for downstream users to call. CUDA Driver API For question 1, we can use the CUDA Driver API.\nCUresult CUDAAPI cuModuleLoadData(CUmodule *module, const void *image); CUresult CUDAAPI cuModuleGetFunction(CUfunction *hfunc, CUmodule hmod, const char *name); Load the cubin file with cuModuleLoadData and get the kernel function with cuModuleGetFunction.\nCUresult CUDAAPI cuLaunchKernel(CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, CUstream hStream, void **kernelParams, void **extra); Then launch the kernel with cuLaunchKernel. It\u0026rsquo;s worth noting that kernel parameters are passed via void**, i.e., an array of void*, which means we need to know the kernel\u0026rsquo;s function signature to launch it.\nEmbed Binary For question 2, we need a way to embed binary files into C++ files, and then directly reference the kernel in the C++ file. The discussion on how to embed binary files in C++ code deserves a separate article, so I won\u0026rsquo;t elaborate too much here. I\u0026rsquo;ll just mention the method I chose. Use objcopy to convert the binary file into an ELF-formatted file, and at the same time, it will insert several symbols for referencing the binary data, for example:\nobjcopy -I binary test.txt -O elf64-x86-64 -B i386:x86-64 test.o Then use nm test.o to view the symbols within:\n000000000000000d D _binary_test_txt_end 000000000000000d A _binary_test_txt_size 0000000000000000 D _binary_test_txt_start Note that the generated symbol names are related to the input file path; all / and . in the input path will be replaced with _. It is recommended to use relative paths to obtain controllable symbol names.\nYou just need to declare these symbols like _binary_test_txt_start in C++, and finally link the test.o file with the source file.\n/// main.cpp #include \u0026lt;iostream\u0026gt; #include \u0026lt;string_view\u0026gt; extern \u0026#34;C\u0026#34; { extern const char _binary_test_txt_start[]; extern const char _binary_test_txt_end[]; } int main() { std::cout \u0026lt;\u0026lt; std::string_view(_binary_test_txt_start, _binary_test_txt_end - _binary_test_txt_start) \u0026lt;\u0026lt; std::endl; return 0; } Compile and run with the following commands, and it will output the content of test.txt:\n$ g++ -std=c++17 main.cpp test.o -o main $ ./main Function Signature From the discussion above, it\u0026rsquo;s clear that whether exporting header files for kernel functions or passing kernel functions to cuLaunchKernel, we need to obtain the kernel\u0026rsquo;s function signature. However, in CuTe DSL v4.3, this cannot be done perfectly. Consider this simple example:\nimport torch import cutlass.cute as cute @cute.kernel def test_kernel(tensor): cute.printf(tensor) @cute.jit def test(tensor): kernel = test_kernel(tensor) kernel.launch((1, 1, 1), (1, 1, 1)) a = torch.zeros([4, 3, 5]).to(\u0026#34;cuda\u0026#34;) kernel = cute.compile(test, a) print(kernel.__ptx__) According to the official documentation, if torch.Tensor is used directly to instantiate the function for compilation, it will be treated as a dynamic layout by default. Inspecting the generated ptx reveals that the kernel\u0026rsquo;s signature is:\n.visible .entry kernel_cutlass_test_kernel_tensorptrf32_gmem_o_1_0( .param .align 8 .b8 kernel_cutlass_test_kernel_tensorptrf32_gmem_o_1_0_param_0[40] ) This is a 40-byte struct, where the first 8 bytes are clearly a float pointer. What about the remaining 32 bytes? Further analysis of the assembly shows that shape uses 3 u32s for parameters, followed by 4 bytes of padding. stride uses two u64s for passing, and since the stride of the last dimension is 1, it is omitted. Well\u0026hellip; this is actually a very simple case. For situations where dynamic and static layouts are mixed, I haven\u0026rsquo;t found a general method to automatically generate reliable signatures.\nBesides Tensor directly serving as a function signature, there are other issues. For example, in the official flash attention operator example, the operator\u0026rsquo;s function signature is like this:\n@cute.kernel def kernel( self, mQ: cute.Tensor, mK: cute.Tensor, mV: cute.Tensor, mO: cute.Tensor, softmax_scale_log2: cutlass.Float32, sQ_layout: cute.ComposedLayout, sKV_layout: cute.ComposedLayout, sO_layout: cute.ComposedLayout, gmem_tiled_copy_QKV: cute.TiledCopy, gmem_tiled_copy_O: cute.TiledCopy, tiled_mma: cute.TiledMma, SharedStorage: cutlass.Constexpr, ): Among these many function parameters, which ones are constants that will be preserved, and which ones are variables that will not? Unfortunately, these latter parameters are opaque on the Python side and cannot be determined because they are types bound from the C++ side via nanobind. If you debug and look at the kernel\u0026rsquo;s initial MLIR, you will find that parameters are indeed generated for these types, but they are deleted in subsequent passes, and these passes are also opaque. So I gave up on the idea of automatically generating function signatures for kernels.\nFinal Effect The workaround adopted is to manually specify the signature. For example, we can artificially restrict all operator signatures to use cutlass.Pointer and cute.Integer and then create the tensor inside the kernel. The effect is the same, it just manually reduces the complexity of the function signature. Or, one could directly hardcode the signature by looking at the generated ptx. Based on this assumption and the previous steps, we can ultimately achieve the following effect:\ncc = Compiler() t = from_dlpack(torch.randn(M, N, device=\u0026#34;cuda\u0026#34;, dtype=torch.bfloat16), assumed_align=16) cc.compile(naive_elementwise_add, [ (\u0026#34;nv_bfloat16*\u0026#34;, \u0026#34;a\u0026#34;), (\u0026#34;nv_bfloat16*\u0026#34;, \u0026#34;b\u0026#34;), (\u0026#34;nv_bfloat16*\u0026#34;, \u0026#34;o\u0026#34;)], t, t, t) t = from_dlpack(torch.randn(M, N, device=\u0026#34;cuda\u0026#34;, dtype=torch.float32), assumed_align=16) cc.compile(naive_elementwise_add, [ (\u0026#34;float*\u0026#34;, \u0026#34;a\u0026#34;), (\u0026#34;float*\u0026#34;, \u0026#34;b\u0026#34;), (\u0026#34;float*\u0026#34;, \u0026#34;o\u0026#34;)], t, t, t) cc.link() compile collects the cubin generated for the corresponding kernel and the function names within the cubin. link converts the cubin into .o files and then generates a C++ file containing symbols for all these binary arrays. It will generate a corresponding wrapper for each kernel, which calls cuLaunchKernel to execute the respective kernel. Finally, nvcc compiles them together into a dynamic library.\nThis will ultimately generate a header file and a dynamic library, for C++ programs to call.\nnamespace cutedsl_aot { struct LaunchParams { dim3 gridDim; dim3 blockDim; unsigned int sharedMemBytes = 0; cudaStream_t hStream = nullptr; }; void naive_elementwise_add(const LaunchParams\u0026amp; params, nv_bfloat16* a, nv_bfloat16* b, nv_bfloat16* o); void naive_elementwise_add(const LaunchParams\u0026amp; params, float* a, float* b, float* o); } // namespace cutedsl_aot This implementation is not very elegant, but from the user\u0026rsquo;s perspective, it seems to be the best we can do. According to unofficial sources, AOT for CuTe DSL is currently being supported. Let\u0026rsquo;s look forward to future updates!\n","permalink":"https://www.ykiko.me/en/articles/1971691994037334904/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"why-do-we-need-aot-for-cute-dsl\"\u003eWhy do we need AOT for CuTe DSL?\u003c/h2\u003e\n\u003cp\u003eCUTLASS C++ is a library for writing high-performance CUDA operators, known for its complexity and difficulty. To reduce the learning curve, NVIDIA introduced the Python-based \u003ca href=\"https://docs.nvidia.com/cutlass/latest/media/docs/pythonDSL/overview.html\"\u003eCuTe DSL\u003c/a\u003e. Using Python instead of C++ templates for metaprogramming offers many benefits. First, users no longer have to struggle with the obscure template errors of C++, which is a major headache for C++ beginners; now they can focus on the code logic. Additionally, \u003ccode\u003envcc\u003c/code\u003e compilation is slow, and most of that time is spent in the compiler frontend, parsing C++ code. Especially for template-heavy libraries like CUTLASS, most of the time is spent processing template instantiations. Using CuTe DSL can bypass this issue. Compared to C++ code using CUTLASS, its compilation speed can be tens or even hundreds of times faster. Furthermore, operators and unit tests can now be written together in Python, which is much more convenient.\u003c/p\u003e","title":"Support AOT for CuTe DSL"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nLast night, I bought the domain name clice.io and deployed clice\u0026rsquo;s documentation website on it. There\u0026rsquo;s an indescribable joy in my heart. On one hand, I really like the clice.io domain; it looks very refined and beautiful. I actually had a few other candidates, such as .dev, but in the end, I chose this one, which is relatively more expensive (500 per year) but looks better. On the other hand, it signifies that clice has entered a new phase.\nDoes this new phase mean clice is ready for preliminary use? Unfortunately, not yet. So when will clice be usable? This is indeed the most frequently asked question I\u0026rsquo;ve received recently. Seven months have passed since the first blog post about clice was published. According to the estimates at the time, clice should be highly usable by now, but things don\u0026rsquo;t always go as planned. So, I\u0026rsquo;m writing this article to update readers who are following clice\u0026rsquo;s progress on what\u0026rsquo;s happened in the past few months.\nTime Constraints First and foremost, the main reason is a lack of time. I\u0026rsquo;m a junior university student. In the first semester of my junior year, I didn\u0026rsquo;t have many classes, and I\u0026rsquo;m quite a homebody, so I spent a lot of time coding in my dorm when I had nothing else to do. This meant I could dedicate a significant amount of time to clice, roughly 6 to 10 hours a day. At that time, I also didn\u0026rsquo;t have anything else to do, so I could fully immerse myself in clice, spending my days reading and writing code without the need for additional context switching to handle different tasks. The winter break was largely the same.\nThen came the second semester of my junior year, from March to June. In March, I was mainly looking for a summer internship. My thinking at the time was quite simple: since I\u0026rsquo;m not from a computer science background (I studied chemistry), if I wanted to find a job related to computers in the future, a summer internship would be more convincing. Also, I had never interned before and wanted to experience what work was like. Fortunately, the internship search went quite smoothly. A senior\u0026rsquo;s team happened to be recruiting interns, working on C++ compilers and toolchains, which is a field I\u0026rsquo;m familiar with and very interested in, so I applied for that position. The interview process was very smooth, and the interview experience was excellent; there were no rote \u0026ldquo;eight-legged essay\u0026rdquo; questions, but rather questions based on my resume, and the algorithm problems were also simple (a topological sort in the first round, no algorithm problems in the second, and two easy problems in the third). The only minor hiccup was that they hoped I could start as soon as possible, rather than waiting until summer. I had quite a few classes scheduled for the second semester of my junior year, but considering it was a rare opportunity, we eventually negotiated for me to intern four days a week. I skipped classes I could, and most unskippable classes were in the morning, while our department generally only required arrival by 10:30 AM, so sometimes I\u0026rsquo;d go to work after my 8 AM class. The good news was that the company was very close to the school; a ten-minute subway ride from the school gate, so commuting didn\u0026rsquo;t take much time. Thus, from April to June, I balanced classes and the internship. Although I didn\u0026rsquo;t work on weekends, sometimes I had school matters to attend to, so very little time was left for clice. Moreover, constantly switching between different contexts was far less efficient than when I was focused on one thing.\nLooking back now, I was indeed very lucky. My first resume submission and first interview led to an offer, the location was ideal, and the work content was very interesting.\nNow it\u0026rsquo;s July, summer vacation has started, so school matters should be over, right? But then I was informed that I had to attend \u0026ldquo;mini-semester\u0026rdquo; courses from July 13th to August 4th, which are hard to miss. After much thought, I decided to resign and return to finish my studies, also to catch my breath; balancing classes and the internship had been exhausting. My last day was July 11th. The \u0026ldquo;mini-semester\u0026rdquo; essentially involved listening to lectures, so I brought my laptop and worked on clice during class.\nSo, after August, there should be nothing much left with school, right? That\u0026rsquo;s what I thought, but then August and September mean getting busy with autumn recruitment; it\u0026rsquo;s time to look for a job! Actually, I\u0026rsquo;ve submitted some resumes recently and taken some quantitative firm coding tests, which made me feel quite defeated. Every day is spent browsing Maimai (a Chinese professional social network), submitting resumes, practicing coding problems, and memorizing interview boilerplate, feeling incredibly anxious. Yesterday, I discussed it with some friends in a group chat and suddenly wondered why I was doing this. Why apply for positions I don\u0026rsquo;t want, and why force myself to memorize boilerplate and grind coding problems? It feels like self-inflicted misery, do I enjoy being a masochist? So, my strategy for autumn recruitment will probably be to go with the flow. I\u0026rsquo;ll still submit resumes, but whether I pass coding tests/interviews will be left to fate. I don\u0026rsquo;t want to memorize boilerplate or solve many hard problems anymore; it\u0026rsquo;s torturous. If a company/department insists on difficult algorithm questions or boilerplate, even if I got in, it would probably be a miserable experience. Finding a job also depends on luck and mutual selection, so I\u0026rsquo;d rather spend more time on clice!\nSince I\u0026rsquo;ve brought it up, I\u0026rsquo;ll also put in a plug for myself: if there are any interesting positions, please consider me! I\u0026rsquo;m mainly familiar with C++ toolchains and compilers, but I\u0026rsquo;m not necessarily limited to this direction. For other directions, if you don\u0026rsquo;t mind my lack of experience, I\u0026rsquo;d also consider them!\nOptimistic Estimates Besides personal time constraints, another factor is probably a common mistake programmers make: underestimating project progress. In \u0026ldquo;The Mythical Man-Month,\u0026rdquo; the author points out that we tend to use \u0026ldquo;man-months,\u0026rdquo; a seemingly linear unit, to measure nonlinear, complex software development work. The effort to implement a feature is not just simple coding; it also includes a large amount of hidden costs like debugging, testing, and integration with existing systems. I discovered a very serious flaw in my previous index design in practice, to the extent that this part of the code needs to be rewritten. The good news, however, is that our project has no delivery date, so we have ample time to find the best solution.\nDespite being busy and not having time to write specific code, I can still use my free time to reflect on the shortcomings of previous code designs, record them, and think about how to solve them later. These are things that can be done anytime.\nAfter several months of slow progress, clice is now basically usable, handling document events, highlighting, code completion, and other functions. However, it still requires a lot of testing and refinement before the first version can be released. I think I\u0026rsquo;ll only consider releasing the first version after I\u0026rsquo;ve personally switched from clangd to clice for a period of time.\nThere\u0026rsquo;s also the issue of multi-person collaboration. I\u0026rsquo;ve always been somewhat unconfident about it. For a complex project like clice, especially a few months ago when the code might still undergo large-scale changes, it wasn\u0026rsquo;t very suitable for collaboration. However, the good news is that after these past few months, it has reached a point where collaboration is relatively easy; the overall code framework has stabilized. Indeed, some group members have already deeply participated and are responsible for implementing some features, and everything is moving in a good direction! I aim to release the first version within the next two months and completely replace clangd before I graduate from university!\n","permalink":"https://www.ykiko.me/en/articles/1931855290430624907/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eLast night, I bought the domain name \u003ca href=\"https://clice.io/\"\u003eclice.io\u003c/a\u003e and deployed clice\u0026rsquo;s documentation website on it. There\u0026rsquo;s an indescribable joy in my heart. On one hand, I really like the \u003ccode\u003eclice.io\u003c/code\u003e domain; it looks very refined and beautiful. I actually had a few other candidates, such as \u003ccode\u003e.dev\u003c/code\u003e, but in the end, I chose this one, which is relatively more expensive (500 per year) but looks better. On the other hand, it signifies that clice has entered a new phase.\u003c/p\u003e","title":"Clice, how have you been lately?"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nAt the C++26 Sofia meeting, which just concluded yesterday, seven proposals related to Static Reflection:\nReflection for C++26 Function Parameter Reflection Annotations for Reflection Splicing a base class subobject Expansion Statements definestatic{string,object,array} Error Handling in Reflection All passed plenary and were officially incorporated into the C++26 standard. This is an exciting moment. In my opinion, static reflection is undoubtedly the most important new feature in C++ in 20 years. It completely changes the previous pattern of metaprogramming using templates, making meta programming code as easy to read, write, and use as ordinary code logic, rather than the template-based DSLs of the past.\nMore than a year ago, when P2996R1 was released, I wrote an article introducing this exciting proposal for static reflection. After such a long time, the content of the static reflection proposal itself has changed significantly, the content of the article above is outdated, and many new auxiliary proposals have been added. So I decided to write a new article to introduce static reflection and its auxiliary proposals.\nIf you want to try static reflection, there are two ways: one is through the Compiler Explorer online editor, just set the compiler to P2996 clang. The other is to compile the P2996 branch of clang and libc++ from https://github.com/bloomberg/clang-p2996/tree/p2996 yourself. Then, refer to the use libc++ page and use the newly compiled libc++ as the standard library during compilation. Remember to enable the C++26 standard.\nWhat is Static Reflection? First, what does reflection mean? This term, like many other idiomatic terms in computer science, does not have a detailed and precise definition. My reflection column discusses this issue in more detail; interested readers can read it themselves. The focus of this article is C++\u0026rsquo;s static reflection. Why emphasize \u0026ldquo;static\u0026rdquo;? Mainly because when we usually talk about reflection, we almost always refer to reflection in languages like Java, C#, and Python, and their implementations all involve type erasure and querying metadata at runtime. This approach, of course, has unavoidable runtime overhead, and this overhead clearly violates the C++ principle of zero-cost abstraction. To distinguish it from their reflection, the qualifier \u0026ldquo;static\u0026rdquo; is added, also indicating that C++\u0026rsquo;s reflection is completed at compile time.\nEverything as Value Static reflection introduces two new syntaxes. The reflection operator ^^ can map most name entities to std::meta::info:\nconstexpr std::meta::info rint = ^^int; std::meta::info is a new, special, consteval only builtin type. It can only exist at compile time. You can think of it as a handle to this name entity within the compiler, and subsequent operations can be performed based on this opaque handle.\nSpecifically, ^^ supports the following four types of name entities:\n::: global namespace namespace-name: ordinary namespace type-id: type id-expression: most named things, such as variables, static member variables, fields, functions, templates, enums, etc. So, how can this handle be converted back? Yes, it can, using the splicer: [: :] to convert std::meta::info back to a name entity.\nFor example:\nconstexpr std::meta::info rint = ^^int; using int2 = [:rint:]; Using [:rint:] maps rint back to the int type. The same applies to other name entities; [:rint:] can map them back. Note that in some contexts that might cause ambiguity, typename or template keywords need to be added before [: :] to resolve the ambiguity.\nAmbiguous situations basically still involve dependent names. That is, when r is a template parameter, it\u0026rsquo;s impossible to directly determine whether [:r:] is an expression, a type, or a template, so manual disambiguation is required.\nIn summary, static reflection introduces two new operators: ^^ to get the handle of a name entity, and [: :] to map the handle back to the corresponding name entity.\nMeta Function As we all know, merely obtaining a handle is not very useful; the key lies in operations based on that handle. For example, if you get a file handle, you can read its content or close the file based on that handle. In static reflection, these operations on handles are meta functions. The \u0026lt;meta\u0026gt; header provides a very wide range of functions for operating on these handles. Some of the most commonly used meta functions are introduced below.\nReflection currently uses compile-time exceptions to handle errors encountered in meta functions.\nmembers namespace std::meta { consteval vector\u0026lt;info\u0026gt; members_of(info r, access_context ctx); consteval vector\u0026lt;info\u0026gt; bases_of(info type, access_context ctx); consteval vector\u0026lt;info\u0026gt; static_data_members_of(info type, access_context ctx); consteval vector\u0026lt;info\u0026gt; nonstatic_data_members_of(info type, access_context ctx); consteval vector\u0026lt;info\u0026gt; enumerators_of(info type_enum); consteval bool has_parent(info r); consteval info parent_of(info r); } A common requirement in serialization and deserialization is to get the members of a struct and then recursively serialize them. Before static reflection, we could only achieve this through various hacks, and it was not perfect. For example, reflect-cpp supports getting data members of aggregate classes under C++20, and magic-enum supports enum members with values in the range [-127, 128]. The implementation methods are very hacky and unfriendly to compilers, instantiating a large number of templates, leading to slower compilation, and also having many limitations.\nNow, with static reflection, we can easily use these meta functions to get members of namespaces or types, and not just data members; member functions and aliases can also be easily obtained. It also supports getting base class information, which was previously impossible. Reverse operations are also supported, using parent_of to get the parent of a member, which is the namespace, class, or function that defines this entity.\nstruct Point { int x; int y; }; int main() { Point p = {1, 2}; constexpr auto no_check = meta::access_context::unchecked(); constexpr auto rx = meta::nonstatic_data_members_of(^^Point, no_check)[0]; constexpr auto ry = meta::nonstatic_data_members_of(^^Point, no_check)[1]; p.[:rx:] = 3; p.[:ry:] = 4; std::println(\u0026#34;p: {}, {}\u0026#34;, p.x, p.y); } Output p: 3, 4, successfully accessing members via reflection!\nThe access_context parameter is used to control access permissions. It determines whether we can \u0026ldquo;see\u0026rdquo; private or protected members. unchecked() means full access, i.e., no access checks are performed. Besides unchecked, there is also current, which means using the access permissions of the current scope, and unprivileged, which can only access non-private members. The meta functions for getting members mentioned above will filter the returned results according to the access_context.\nidentifiers namespace std::meta { consteval bool has_identifier(info r); consteval string_view identifier_of(info r); consteval u8string_view u8identifier_of(info r); consteval string_view display_string_of(info r); consteval u8string_view u8display_string_of(info r); consteval source_location source_location_of(info r); } This feature is also something C++ programmers have long wished for: getting variable names, function names, and field names.\nconstexpr auto rx = meta::nonstatic_data_members_of(^^Point, no_check)[0]; constexpr auto ry = meta::nonstatic_data_members_of(^^Point, no_check)[1]; static_assert(meta::identifier_of(rx) == \u0026#34;x\u0026#34;); static_assert(meta::identifier_of(ry) == \u0026#34;y\u0026#34;); This makes it easy to serialize to formats like JSON that require field names. identifier_of generally only applies to entities with simple names and directly returns the unqualified name of the named entity. display_string_of, on the other hand, might be more inclined to return the fully qualified name, such as its namespace prefix, and can also be used to handle template specializations like vector\u0026lt;int\u0026gt;. source_location_of further breaks the limitation of C++20\u0026rsquo;s std::source_location::current() which can only get the current source location.\noffsets namespace std::meta { struct member_offset { ptrdiff_t bytes; ptrdiff_t bits; constexpr ptrdiff_t total_bits() const { return CHAR_BIT * bytes + bits; } auto operator\u0026lt;=\u0026gt;(const member_offset\u0026amp;) const = default; }; consteval member_offset offset_of(info r); consteval size_t size_of(info r); consteval size_t alignment_of(info r); consteval size_t bit_size_of(info r); } offset_of returns the offset information for a given field, consisting of two parts: bytes and bits. total_bits can be used to get the specific offset. This design primarily considers that fields might be bit-fields, so the offset is not necessarily just the number of bytes. size_of and alignment_of, as their names suggest, get the size and alignment. bit_size_of gets the size of a bit-field.\nWith this set of meta functions, there is no longer a need to use various hacky methods to get field offsets, such as bit_cast member pointers to get offsets based on ABI details. This is very useful in certain binary serialization scenarios.\ntype operations Next are operations related to types, which are key to simplifying template metaprogramming. Before this, since types could only be template parameters, we had to rely on ugly template DSLs to perform computations on types. A purely functional, variable-less, ugly DSL that uses template specialization for branching and template recursion for looping, which is why template metaprogramming has long been criticized. Now with static reflection, we can map types to values, operate on values, and simply write consteval functions, which are no different from normal code logic, except that the handle becomes std::meta::info.\nFirst, let\u0026rsquo;s talk about the equality of std::meta::info. Consider the following code:\nusing int1 = int; constexpr auto rint = ^^int; constexpr auto rint1 = ^^int1; Should rint and rint1 be equal here? Undoubtedly, they represent the same type, but as we said before, std::meta::info is a handle representing an internal compiler representation. Clearly, the compiler tracks type alias information separately, so rint and rint1 are actually handles for different name entities, meaning they are not equal. The complete rules for determining whether two std::meta::info are equal are omitted here; there are other cases to consider, and specific details can be found later in cppreference or the standard draft. For the examples in this article, understanding the alias example above is sufficient.\nnamespace std::meta { consteval auto type_of(info r) -\u0026gt; info; consteval auto dealias(info r) -\u0026gt; info; } You can use type_of to get the type of a typed entity like a struct field, and dealias to get the underlying entity of an alias, such as the original entity of a type alias or namespace alias. This process is recursive and will resolve all aliases.\nFor example:\nusing X = int; using Y = X; static_assert(^^int == dealias(^^Y)); The template-form traits originally defined in the \u0026lt;type_traits\u0026gt; header now have corresponding reflection versions in \u0026lt;meta\u0026gt;. The naming convention is to change the suffix from _v to _type, for example, is_same_v becomes is_same_type, and the _t suffix is simply removed.\nThere are too many functions in this part, so here are some as examples:\nnamespace std::meta { consteval info remove_const(info type); consteval info remove_volatile(info type); consteval info remove_cv(info type); consteval info add_const(info type); consteval info add_volatile(info type); consteval info add_cv(info type); consteval info remove_pointer(info type); consteval info add_pointer(info type); consteval info remove_cvref(info type); consteval info decay(info type); } So now it\u0026rsquo;s convenient to replace the previous type_traits versions with equivalent reflection versions. The code will be much easier to understand, and I will provide a few such examples at the end of the article.\ntemplate arguments In addition to the type operations mentioned above, we can now conveniently operate on templates:\nnamespace std::meta { consteval info template_of(info r); consteval vector\u0026lt;info\u0026gt; template_arguments_of(info r); template \u0026lt;reflection_range R = initializer_list\u0026lt;info\u0026gt;\u0026gt; consteval bool can_substitute(info templ, R\u0026amp;\u0026amp; arguments); template \u0026lt;reflection_range R = initializer_list\u0026lt;info\u0026gt;\u0026gt; consteval info substitute(info templ, R\u0026amp;\u0026amp; arguments); } Assuming r is a template specialization, template_of returns its template, and template_arguments_of returns its template arguments. substitute returns the reflection of the template specialization resulting from the given template and arguments (without triggering instantiation). With this set of functions, we no longer need to extract template arguments of template specializations through partial specialization; we can easily get the argument list.\nWe can also use them to write an is_specialization_of to determine if a type is a specialization of a certain template, which was previously impossible:\nconsteval bool is_specialization_of(info templ, info type) { return templ == template_of(dealias(type)); } Why was this impossible before? This is because template parameters can be types (typename), values (auto), or template template parameters (template), and you couldn\u0026rsquo;t enumerate all combinations of these three types of parameters. In that case, when writing is_specialization_of, the template signature to be checked would be fixed. For example, if it were \u0026lt;typename T, template\u0026lt;typename...\u0026gt; HKT\u0026gt;, then HKT could only be filled with type template parameters, and it wouldn\u0026rsquo;t be able to handle std::array.\nreflect value namespace std::meta { template\u0026lt;typename T\u0026gt; consteval auto reflect_constant(const T\u0026amp; expr) -\u0026gt; info; template\u0026lt;typename T\u0026gt; consteval auto reflect_object(T\u0026amp; expr) -\u0026gt; info; template\u0026lt;typename T\u0026gt; consteval auto reflect_function(T\u0026amp; expr) -\u0026gt; info; template\u0026lt;typename T\u0026gt; consteval auto extract(info) -\u0026gt; T; } These meta functions produce a reflection of the evaluated result of the provided expression. One of the most common use cases for this type of reflection is as an argument to std::meta::substitute to construct a template specialization.\nreflect_constant(expr) is equivalent to the following code:\ntemplate \u0026lt;auto P\u0026gt; struct C {}; Then we have:\nstatic_assert(reflect_constant(V) == template_arguments_of(^^C\u0026lt;V\u0026gt;)[0]); constexpr auto rarray5 = substitute(^^std::array, {^^int, std::meta::reflect_constant(5)}); static_assert(rarray5 == ^^std::array\u0026lt;int, 5\u0026gt;); reflect_object(expr) produces a reflection of the object referred to by expr. This is often used to obtain a reflection of a subobject, which can then be used as a non-type template parameter of a reference type.\ntemplate \u0026lt;int \u0026amp;\u0026gt; void fn(); int p[2]; constexpr auto r = substitute(^^fn, {std::meta::reflect_object(p[1])}); reflect_function(expr) produces a reflection of the function referred to by expr. It is very useful for reflecting the properties of a function when only a reference to the function is available.\nconsteval bool is_global_with_external_linkage(void(*fn)()) { std::meta::info rfn = std::meta::reflect_function(*fn); return (has_external_linkage(rfn) \u0026amp;\u0026amp; parent_of(rfn) == ^^::); } extract is the reverse operation of the reflect_xxx series mentioned above, and can be used to restore a value\u0026rsquo;s reflection to its corresponding C++ value.\nIf r is a reflection of a value, extract\u0026lt;ValueType\u0026gt;(r) returns that value. If r is a reflection of an object, extract\u0026lt;ObjectType\u0026amp;\u0026gt;(r) returns a reference to that object. If r is a reflection of a function, extract\u0026lt;FuncPtrType\u0026gt;(r) returns a pointer to that function. If r is a reflection of a non-static member, extract\u0026lt;MemberPtrType\u0026gt;(r) returns a member pointer. define aggregate namespace std::meta { struct data_member_options { struct name_type { template \u0026lt;typename T\u0026gt; requires constructible_from\u0026lt;u8string, T\u0026gt; consteval name_type(T \u0026amp;\u0026amp;); template \u0026lt;typename T\u0026gt; requires constructible_from\u0026lt;string, T\u0026gt; consteval name_type(T \u0026amp;\u0026amp;); }; optional\u0026lt;name_type\u0026gt; name; optional\u0026lt;int\u0026gt; alignment; optional\u0026lt;int\u0026gt; bit_width; bool no_unique_address = false; }; consteval auto data_member_spec(info type, data_member_options options) -\u0026gt; info; template \u0026lt;reflection_range R = initializer_list\u0026lt;info\u0026gt;\u0026gt; consteval auto define_aggregate(info type_class, R\u0026amp;\u0026amp;) -\u0026gt; info; } define_aggregate can be used to generate member definitions for an incomplete type, which is useful for implementing types with a variable number of members like tuple or variant. For example:\nunion U; consteval { define_aggregate(^^U, { data_member_spec(^^int), data_member_spec(^^char), data_member_spec(^^double), }); } This is equivalent to:\nunion U { int _0; char _1; double _2; }; This makes it easy to implement a variant type without any template recursion instantiation.\nother functions In addition to the functions listed above, there are many more functions for querying certain properties of r, which are mostly self-explanatory. Only a few are listed:\nconsteval auto is_public(info r) -\u0026gt; bool; consteval auto is_protected(info r) -\u0026gt; bool; consteval auto is_private(info r) -\u0026gt; bool; consteval auto is_virtual(info r) -\u0026gt; bool; consteval auto is_pure_virtual(info r) -\u0026gt; bool; consteval auto is_override(info r) -\u0026gt; bool; consteval auto is_final(info r) -\u0026gt; bool; consteval auto is_deleted(info r) -\u0026gt; bool; consteval auto is_defaulted(info r) -\u0026gt; bool; consteval auto is_explicit(info r) -\u0026gt; bool; consteval auto is_noexcept(info r) -\u0026gt; bool; consteval auto is_bit_field(info r) -\u0026gt; bool; consteval auto is_enumerator(info r) -\u0026gt; bool; consteval auto is_const(info r) -\u0026gt; bool; consteval auto is_volatile(info r) -\u0026gt; bool; consteval auto is_mutable_member(info r) -\u0026gt; bool; consteval auto is_lvalue_reference_qualified(info r) -\u0026gt; bool; consteval auto is_rvalue_reference_qualified(info r) -\u0026gt; bool; consteval auto has_static_storage_duration(info r) -\u0026gt; bool; consteval auto has_thread_storage_duration(info r) -\u0026gt; bool; consteval auto has_automatic_storage_duration(info r) -\u0026gt; bool; consteval auto has_internal_linkage(info r) -\u0026gt; bool; consteval auto has_module_linkage(info r) -\u0026gt; bool; consteval auto has_external_linkage(info r) -\u0026gt; bool; consteval auto has_linkage(info r) -\u0026gt; bool; consteval auto is_class_member(info r) -\u0026gt; bool; consteval auto is_namespace_member(info r) -\u0026gt; bool; consteval auto is_nonstatic_data_member(info r) -\u0026gt; bool; consteval auto is_static_member(info r) -\u0026gt; bool; consteval auto is_base(info r) -\u0026gt; bool; consteval auto is_data_member_spec(info r) -\u0026gt; bool; consteval auto is_namespace(info r) -\u0026gt; bool; consteval auto is_function(info r) -\u0026gt; bool; consteval auto is_variable(info r) -\u0026gt; bool; consteval auto is_type(info r) -\u0026gt; bool; consteval auto is_type_alias(info r) -\u0026gt; bool; consteval auto is_namespace_alias(info r) -\u0026gt; bool; consteval auto is_complete_type(info r) -\u0026gt; bool; consteval auto is_enumerable_type(info r) -\u0026gt; bool; consteval auto is_template(info r) -\u0026gt; bool; consteval auto is_function_template(info r) -\u0026gt; bool; consteval auto is_variable_template(info r) -\u0026gt; bool; consteval auto is_class_template(info r) -\u0026gt; bool; consteval auto is_alias_template(info r) -\u0026gt; bool; consteval auto is_conversion_function_template(info r) -\u0026gt; bool; consteval auto is_operator_function_template(info r) -\u0026gt; bool; consteval auto is_literal_operator_template(info r) -\u0026gt; bool; consteval auto is_constructor_template(info r) -\u0026gt; bool; consteval auto is_concept(info r) -\u0026gt; bool; consteval auto is_structured_binding(info r) -\u0026gt; bool; consteval auto is_value(info r) -\u0026gt; bool; consteval auto is_object(info r) -\u0026gt; bool; consteval auto has_template_arguments(info r) -\u0026gt; bool; consteval auto has_default_member_initializer(info r) -\u0026gt; bool; consteval auto is_special_member_function(info r) -\u0026gt; bool; consteval auto is_conversion_function(info r) -\u0026gt; bool; consteval auto is_operator_function(info r) -\u0026gt; bool; consteval auto is_literal_operator(info r) -\u0026gt; bool; consteval auto is_constructor(info r) -\u0026gt; bool; consteval auto is_default_constructor(info r) -\u0026gt; bool; consteval auto is_copy_constructor(info r) -\u0026gt; bool; consteval auto is_move_constructor(info r) -\u0026gt; bool; consteval auto is_assignment(info r) -\u0026gt; bool; consteval auto is_copy_assignment(info r) -\u0026gt; bool; consteval auto is_move_assignment(info r) -\u0026gt; bool; consteval auto is_destructor(info r) -\u0026gt; bool; consteval auto is_user_provided(info r) -\u0026gt; bool; consteval auto is_user_declared(info r) -\u0026gt; bool; As you can see, a vast amount of information can be queried, including storage class and linkage, and even information like user_declared and user_provided.\nFunction Reflection The main reflection proposal discussed above did not cover function parameter reflection, meaning you couldn\u0026rsquo;t get information like injected function parameter names. However, this information is very useful in certain scenarios, such as when binding C++ functions to Python using pybind11. P3096R12 introduced the following meta functions to allow reflection of function parameters:\nnamespace std::meta { consteval vector\u0026lt;info\u0026gt; parameters_of(info r); consteval info variable_of(info r); consteval info return_type_of(info r); } If r is a reflection of a function or function type, then return_type_of returns the reflection of its return type, and parameters_of returns the reflection of its function parameters. For example:\nvoid foo(int x, float y); constexpr auto param0 = meta::parameters_of(^^foo)[0]; static_assert(identifier_of(param0) == \u0026#34;x\u0026#34;); static_assert(type_of(param0) == ^^int); constexpr auto param1 = meta::parameters_of(^^foo)[1]; static_assert(identifier_of(param1) == \u0026#34;y\u0026#34;); static_assert(type_of(param1) == ^^float); static_assert(return_type_of(^^foo) == ^^void); Since we can already get parameter names and types, what is variable_of used for? variable_of can only be used inside the reflected function to get the reflection of the variable corresponding to that function parameter in the function definition. For example:\nvoid foo(const int x, float y) { constexpr auto param0 = meta::parameters_of(^^foo)[0]; static_assert(type_of(param0) == ^^int); static_assert(param0 != ^^x); constexpr auto var0 = meta::variable_of(param0); static_assert(type_of(var0) == ^^const int); static_assert(var0 == ^^x); } From this example, the difference between the two can be seen. C++ implicitly ignores const on function parameters in types. For example, decltype(foo) results in void(int, float). Therefore, you can never observe this from outside the function. parameters_of reflects the function\u0026rsquo;s interface, used to observe the function from outside, and its behavior is consistent with the above. variable_of, on the other hand, reflects the function\u0026rsquo;s definition, used to observe the function from inside. If decltype(x) were used inside foo, it would be const int, without ignoring const, and variable_of behaves similarly.\nThere are other subtle differences. For example, in multiple declarations of the same function, the name of a function parameter might differ:\nvoid foo(int x); void foo(int y); In this case, identifier_of(parameter) would fail to evaluate, as it wouldn\u0026rsquo;t know which of the multiple results to choose. However, identifier_of(variable_of(parameter)) would not; it returns the parameter corresponding to the variable declaration in the function definition.\nnamespace std::meta { consteval bool is_function_parameter(info r); consteval bool is_explicit_object_parameter(info r); consteval bool has_ellipsis_parameter(info r); consteval bool has_default_argument(info r); } The remaining functions query certain properties of function parameters, and their names are self-explanatory:\nis_function_parameter: Determines if a reflection is a function parameter reflection. is_explicit_object_parameter: Determines if a function parameter reflection is an explicit object parameter newly added in C++23. has_ellipsis_parameter: Determines if a function or function type contains ..., i.e., C-style variadic arguments, such as C\u0026rsquo;s printf(const char*, ...). has_default_argument: Checks if a parameter has a default value. Annotations The purpose of metaprogramming is to write generic code, such as automatically generating serialization code logic for a certain type, so that serialization can be done with a single line of code, for example:\nstruct Point { int x; int y; }; Point p = {1, 2}; auto data = json::serialize(p); With static reflection, json::serialize can traverse the fields of Point and automatically generate serialization logic, thus completing serialization with a single line of code. We no longer need to write repetitive, tedious boilerplate serialization code ourselves. Generality is good, but sometimes we also want some customization capabilities.\nStill using the JSON serialization example above, suppose the JSON field name we receive from the server is \u0026quot;first-name\u0026quot;, but C++ identifiers cannot contain -, so we might name the member first_name. It would be great if we could handle it specially during serialization, renaming the first_name member to \u0026quot;first-name\u0026quot;.\nIn other languages, metadata can be attached via attribute or annotation, and then read in the code. C++ also added attribute, with the syntax [[...]], such as [[nodiscard]]. However, its primary design intent is to provide additional information to the compiler, not to allow users to attach and retrieve additional metadata.\nTo solve this problem, P3394R4 (Annotations for Reflection) proposes the introduction of reflectable annotations for C++26. Its syntax is very intuitive, using [[=...]] to add an annotation to an entity. Any constant expression that can be a template argument can be the content of an annotation.\nFor example:\nstruct [[=\u0026#34;A simple point struct\u0026#34;]] Point { [[=serde::rename(\u0026#34;point_x\u0026#34;)]] int x; [[=serde::rename(\u0026#34;point_y\u0026#34;)]] int y; }; It additionally adds these three functions for interacting with annotations:\nnamespace std::meta { consteval bool is_annotation(info); consteval vector\u0026lt;info\u0026gt; annotations_of(info item); consteval vector\u0026lt;info\u0026gt; annotations_of_with_type(info item, info type); } is_annotation determines if a reflection is an annotation reflection. annotations_of gets reflections of all annotations on a given entity, and annotations_of_with_type gets reflections of all annotations of a given type on a given entity. Once the annotation is obtained, extract can be used to unwrap the value and use it.\nFor example:\nstruct Info { int a; int b; }; [[=Info(1, 2)]] int x = 1; constexpr auto rs = annotations_of(^^x)[0]; constexpr auto info = std::meta::extract\u0026lt;Info\u0026gt;(rs); static_assert(info.a == 1 \u0026amp;\u0026amp; info.b == 2); This way, we can pre-define some types in the serialization library, such as serde::rename in the previous example, and then check if the user\u0026rsquo;s fields have these annotations to perform some special processing. This ensures both overall generality and local customizability, achieving both.\nExpansion Statement Traditional range-for loops iterate over runtime sequences, while in metaprogramming, the need to iterate over compile-time sequences is becoming increasingly common. For example, iterating over a tuple, the biggest difference between such compile-time sequences and runtime sequences is that the types of elements may differ.\nBefore C++17, we could only accomplish such iteration through template recursion. C++17\u0026rsquo;s addition of fold expressions slightly alleviated this situation, but still required writing a lot of complex template code to achieve this goal. Given how common iterating over compile-time sequences is, P1306R5 (Expansion Statements) introduced a new template for syntax to solve this problem.\nNow you can easily and intuitively iterate over a tuple. In effect, it\u0026rsquo;s equivalent to compile-time loop unrolling, instantiating the loop body once for each element.\nvoid print_all(std::tuple\u0026lt;int, char\u0026gt; xs) { template for (auto elem : xs) { std::println(\u0026#34;{}\u0026#34;, elem); } } The precise syntax definition is as follows:\ntemplate for (init-statement(opt) for-range-declaration : expansion-initializer) compound-statement init-statement(opt): Optional preceding initialization statement. for-range-declaration: Declaration of the loop variable. expansion-initializer: The sequence to iterate over. template for supports three different types of sequences, in descending order of precedence:\nExpression List: { expression-list }, iterates over each element in the list. template for (auto elem : {1, \u0026#34;hello\u0026#34;, true}) { ... } Pack expansion is also supported, and you can easily add content to parameter packs:\nvoid foo(auto\u0026amp;\u0026amp; ...args) { template for (auto elem : {args...}) { ... } template for (auto elem : {0, args..., 1}) { ... } } Constant Range: Requires the range length to be compile-time determined.\nvoid foo() { constexpr static std::array arr = {1, 2, 3}; constexpr static std::span\u0026lt;const int\u0026gt; view = arr; template for (constexpr auto elem : view) { ... } } Tuple-like Destructuring: If neither of the above two conditions is met, the compiler will try to treat expansion-initializer as a tuple-like entity and destructure it (like structured binding auto [a, b] = ...).\nstd::tuple t(1, \u0026#34;hello\u0026#34;, true); template for (auto elem : t) { ... } The loop variable declaration has an optional constexpr. If marked, it requires every element in the loop to be constexpr.\ntemplate for also supports continue and break statements, which can skip the remaining uninstantiated code.\ndefine static array Okay, you\u0026rsquo;ve learned template for, so you\u0026rsquo;re eager to write a function that can print any struct for debugging:\nvoid print_struct(auto\u0026amp;\u0026amp; value) { constexpr auto info = meta::remove_cvref(^^decltype(value)); constexpr auto no_check = meta::access_context::unchecked(); template for (constexpr auto e : meta::nonstatic_data_members_of(info, no_check)) { constexpr auto type = type_of(e); auto\u0026amp;\u0026amp; member = value.[:e:]; if constexpr (is_class_type(type)) { print_struct(member); } else { std::println(\u0026#34;{} {}\u0026#34;, identifier_of(e), member); } } } You find an error, saying that the initialization expression of template for is not a constant expression. Why is this? This is a long story. You\u0026rsquo;ll find that the return value of nonstatic_data_members_of is actually a vector. We said earlier that C++ reflection is done at compile time. Is vector even usable at compile time? Indeed it is; C++20 allows dynamic memory allocation at compile time, so you can use vector in constexpr/consteval functions to handle intermediate states. However, the limitation is that memory allocated at compile time must be deallocated within the same compile-time evaluation context. If there is unreleased memory in a single compile-time evaluation, it will lead to a compilation error. This is understandable, after all, memory allocated at compile time has no meaning if it persists into runtime, right? And each top-level constexpr variable, template parameter, etc., including the initialization expression of template for, is considered a separate constant evaluation.\nSo the error above is easy to understand: the initialization expression of template for is considered a separate constant evaluation, but returning a vector results in unreleased compile-time memory, hence the error. So how to solve it? P3491R3 (define_static_{string,object,array}) introduces a set of functions as a temporary solution to this problem:\nnamespace std { template \u0026lt;ranges::input_range R\u0026gt; consteval const ranges::range_value_t\u0026lt;R\u0026gt;* define_static_string(R\u0026amp;\u0026amp; r); template \u0026lt;ranges::input_range R\u0026gt; consteval span\u0026lt;const ranges::range_value_t\u0026lt;R\u0026gt;\u0026gt; define_static_array(R\u0026amp;\u0026amp; r); template \u0026lt;class T\u0026gt; consteval const remove_cvref_t\u0026lt;T\u0026gt;* define_static_object(T\u0026amp;\u0026amp; r); } They can elevate compile-time allocated memory to static storage duration, meaning it has the same storage duration as global variables, and return a pointer or reference to that static storage duration, thereby solving this problem. So the code above only needs to use std::define_static_array to convert the vector to a span when getting members:\nvoid print_struct(auto\u0026amp;\u0026amp; value) { constexpr auto info = meta::remove_cvref(^^decltype(value)); constexpr auto no_check = meta::access_context::unchecked(); constexpr auto members = std::define_static_array(meta::nonstatic_data_members_of(info, no_check)); template for (constexpr auto e : members) { constexpr auto type = type_of(e); auto\u0026amp;\u0026amp; member = value.[:e:]; if constexpr (is_class_type(type)) { print_struct(member); } else { std::println(\u0026#34;{} {}\u0026#34;, identifier_of(e), member); } } } Every vector and template for location needs this interaction, which seems a bit redundant. However, there\u0026rsquo;s no other way; this is actually just a temporary workaround. The truly complete solution is persistent constexpr allocation, which can automatically elevate unreleased compile-time content to static storage, but for various reasons, it hasn\u0026rsquo;t progressed. Another article could be written about it, but I won\u0026rsquo;t go into it further here. Interested readers can read: The History of constexpr in C++! (Part Two).\nExample Finally, let\u0026rsquo;s write a simple to_string function as a conclusion:\n#include \u0026lt;meta\u0026gt; #include \u0026lt;print\u0026gt; #include \u0026lt;string\u0026gt; #include \u0026lt;vector\u0026gt; namespace meta = std::meta; namespace print_utility { struct skip_t {}; constexpr inline static skip_t skip; struct rename_t { const char* name; }; consteval rename_t rename(std::string_view name) { return rename_t(std::define_static_string(name)); } } // namespace print_utility /// annotations_of =\u0026gt; annotations_of_with_type consteval std::optional\u0026lt;std::meta::info\u0026gt; get_annotation(std::meta::info entity, std::meta::info type) { auto annotations = meta::annotations_of_with_type(entity, type); if (annotations.empty()) { return {}; } else if (annotations.size() == 1) { return annotations.front(); } else { throw \u0026#34;too many annotations!\u0026#34;; } } consteval auto fields_of(std::meta::info type) { return std::define_static_array( meta::nonstatic_data_members_of(type, meta::access_context::unchecked())); } template \u0026lt;typename T\u0026gt; auto to_string(const T\u0026amp; value) -\u0026gt; std::string { constexpr auto type = meta::remove_cvref(^^T); if constexpr (!meta::is_class_type(type)) { return std::format(\u0026#34;{}\u0026#34;, value); } else if constexpr (meta::is_same_type(type, ^^std::string)) { return value; } else { std::string result; result += meta::identifier_of(type); result += \u0026#34; { \u0026#34;; bool first = true; template for (constexpr auto member : fields_of(type)) { if constexpr (get_annotation(member, ^^print_utility::skip_t)){ continue; } if (!first) { result += \u0026#34;, \u0026#34;; } first = false; std::string_view field_name = meta::identifier_of(member); constexpr auto rename = get_annotation(member, ^^print_utility::rename_t); if constexpr (rename) { constexpr auto annotation = *rename; field_name = meta::extract\u0026lt;print_utility::rename_t\u0026gt;(annotation).name; } result += std::format(\u0026#34;{}: {}\u0026#34;, field_name, to_string(value.[:member:])); } result += \u0026#34; }\u0026#34;; return result; } } Our simple to_string function supports two types of annotations: skip to skip printing a field, and rename to rename that field. get_annotation is used to determine if a given entity has exactly one annotation of a given type; if so, it returns that annotation, otherwise it returns empty or throws an error. The processing logic in the to_string function is also straightforward: if value is a fundamental type or string, it simply calls format and returns the result. Otherwise, it recursively converts its fields, first checking if the field has the skip annotation, and if so, skipping it. If not, it checks if it has rename, and if so, uses the rename\u0026rsquo;s name, otherwise uses the field name.\nAttempt to use:\nstruct User { int id; std::string username; [[= print_utility::skip]] std::string password_hash; }; struct Order { int order_id; [[= print_utility::rename(\u0026#34;buyer\u0026#34;)]] User user_info; }; int main() { User u = {101, \u0026#34;Alice\u0026#34;, \u0026#34;abcdefg\u0026#34;}; Order o = {20240621, u}; std::println(\u0026#34;{}\u0026#34;, to_string(u)); std::println(\u0026#34;{}\u0026#34;, to_string(o)); } Output:\nUser { id: 101, username: Alice } Order { order_id: 20240621, buyer: User { id: 101, username: Alice } } As expected! The code is available on Compiler Explorer.\nConclusion This introductory article on static reflection concludes here. I have tried to cover some of the more important core features of reflection and provided suitable examples. Static reflection is concise, powerful, and easy to understand. It also symbolizes a significant milestone in C++\u0026rsquo;s decades-long evolution of constexpr. In closing, let me quote Herb Sutter to end this article:\nUntil today, perhaps the most important single feature vote in C++ history was in July 2007 in Toronto, which decided to incorporate Bjarne Stroustrup and Gabriel Dos Reis’s first “constexpr” proposal into the C++11 draft. Looking back, we can see what a huge structural shift that brought to C++.\nI firmly believe that many years from now, when we look back at today, the day this reflection feature was first adopted into standard C++, we will view it as a pivotal date in the language’s history. Reflection will fundamentally improve how we write C++ code, and its extension to the language’s expressiveness will exceed any feature we’ve seen in at least 20 years, and will greatly simplify real-world C++ toolchains and environments. Even with just the partial reflection capabilities we have today, we can already reflect C++ types and use that information, plus ordinary std::cout, to generate arbitrary additional C++ source code, which is based on reflection information and can be compiled and linked into the same program at build time (in the future we will also get token injection capabilities, allowing us to directly generate C++ source code within the same source file). But we can actually generate anything: arbitrary binary metadata, such as .WINMD files; arbitrary other language code, such as automatically generating Python or JS bindings to wrap C++ types. All of this can be achieved with portable standard C++.\nThis is a very big deal. Listen, everyone knows I’m biased towards C++, but I don’t like hyperbole, and I’ve never said anything like this. Today is truly unique: the transformative power of reflection is greater than the sum of all other 10 major features we’ve ever voted into the standard. Over the next decade (and beyond), it will dominate C++ development, and we will refine this feature by adding more capabilities (just as we’ve added capabilities to constexpr over time to make it complete), and learn how to use it in our programs and build environments.\n","permalink":"https://www.ykiko.me/en/articles/1919923607997518115/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAt the C++26 Sofia meeting, which just concluded yesterday, seven proposals related to \u003cstrong\u003eStatic Reflection\u003c/strong\u003e:\u003c/p\u003e","title":"Reflection for C++26!!!"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nAfter the article about clice was published, the response I received far exceeded my expectations, and many friends expressed a desire to participate in the development. While this enthusiasm is good, the barrier to entry is not low, mainly due to the difficulty of interacting with clang\u0026rsquo;s API. What makes it difficult? On the one hand, there is relatively little information about clang on the internet, whether in Chinese or English communities (this is expected, as there is very little demand for it, so naturally few people discuss it). On the other hand, due to the complexity of the C++ language itself, many details require a deeper understanding of the language before they can be grasped, and connecting theory with implementation is not easy.\nTherefore, I decided to write two articles about clang. The first article will introduce how to write code generation or static analysis tools based on clang, along with a comprehensive introduction to the clang AST. The second article will delve deeper into clang\u0026rsquo;s architecture as a compiler, specifically the implementation details of various processes, and how clice uses clang. I hope these two articles will remove obstacles for readers who wish to participate in clice\u0026rsquo;s development. If you are interested in contributing to clang or writing clang tools, you can also continue reading.\nDevelopment environment The first step is to set up the development environment. clang is a subproject of LLVM, and the LLVM project\u0026rsquo;s build system is written in CMake and is relatively outdated, only allowing inclusion via find_package. This means we need to compile and install LLVM and clang beforehand. One way is to download pre-compiled binaries; LLVM\u0026rsquo;s releases provide pre-built packages for various platforms.\nHowever, I highly recommend building a Debug version from source for easier debugging and development. The specific build instructions can be found in the official LLVM documentation, which I won\u0026rsquo;t elaborate on here.\nFrom the documentation, it\u0026rsquo;s clear that there are many build parameters, and this is where it\u0026rsquo;s easy to run into issues. I\u0026rsquo;ll highlight some of the more important parameters here:\nLLVM_ENABLE_PROJECTS is used to specify which subprojects, besides LLVM itself, should be built. Here, we only need clang. LLVM_TARGETS_TO_BUILD is used to specify the target platforms supported by the compiler. The more you enable, the longer LLVM will take to compile. For development, X86 is usually sufficient. LLVM_BUILD_LLVM_DYLIB is used to specify whether all LLVM build artifacts should be built as dynamic libraries. This is highly recommended. Building as dynamic libraries makes linking very fast, improving the development experience. Furthermore, if this option is not enabled, the binaries built in debug mode will be very large, potentially hundreds of GBs. Unfortunately, this option is currently not supported on Windows with the MSVC target, and related work is still in progress; see llvm-windows-support. Therefore, it is recommended to develop on Linux or Windows using MinGW or WSL. LLVM_USE_SANITIZER is used to specify which sanitizer to enable. This option is also largely unavailable with the MSVC target. Also, absolutely do not use GNU\u0026rsquo;s ld for linking; its memory consumption is very high, and it can easily run out of memory during concurrent linking. As a reference, my build command on Linux is as follows:\ncmake \\ -G Ninja -S ./llvm \\ -B build-debug \\ -DLLVM_USE_LINKER=lld \\ -DCMAKE_C_COMPILER=clang \\ -DCMAKE_CXX_COMPILER=clang++ \\ -DCMAKE_BUILD_TYPE=Debug \\ -DBUILD_SHARED_LIBS=ON \\ -DLLVM_TARGETS_TO_BUILD=X86 \\ -DLLVM_USE_SANITIZER=Address \\ -DLLVM_ENABLE_PROJECTS=\u0026#34;clang\u0026#34; \\ -DCMAKE_INSTALL_PREFIX=./build-debug-install If the build is successful, the final binaries should be located in the build-debug-install directory of the LLVM project. Then, create a new directory for writing your tool\u0026rsquo;s code, and in that directory, create a CMakeLists.txt file with the following content:\ncmake_minimum_required(VERSION 3.10) project(clang-tutorial VERSION 1.0) set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED True) set(CMAKE_EXPORT_COMPILE_COMMANDS ON) set(CMAKE_PREFIX_PATH \u0026#34;${LLVM_INSTALL_PATH}\u0026#34;) find_package(LLVM REQUIRED CONFIG) find_package(Clang REQUIRED CONFIG) message(STATUS \u0026#34;Found LLVM ${LLVM_INCLUDE_DIRS}\u0026#34;) add_executable(tooling main.cpp) set(CMAKE_CXX_FLAGS \u0026#34;${CMAKE_CXX_FLAGS} -fno-rtti -fno-exceptions -fsanitize=address\u0026#34;) set(CMAKE_EXE_LINKER_FLAGS \u0026#34;${CMAKE_EXE_LINKER_FLAGS} -fsanitize=address\u0026#34;) target_include_directories(tooling PRIVATE ${LLVM_INCLUDE_DIRS}) target_link_libraries(tooling PRIVATE LLVMSupport clangAST clangBasic clangLex clangFrontend clangSerialization clangTooling ) The content is simple: it includes LLVM and clang header and library files. Note that LLVM_INSTALL_PATH is the path where you just installed LLVM. You can check if the path output by the message function is as expected.\nNext, create a main.cpp file with the following content:\n#include \u0026#34;clang/Tooling/Tooling.h\u0026#34; class ToolingASTConsumer : public clang::ASTConsumer { public: void HandleTranslationUnit(clang::ASTContext\u0026amp; context) override { context.getTranslationUnitDecl()-\u0026gt;dump(); } }; class ToolingAction : public clang::ASTFrontendAction { public: std::unique_ptr\u0026lt;clang::ASTConsumer\u0026gt; CreateASTConsumer(clang::CompilerInstance\u0026amp; instance, llvm::StringRef file) override { return std::make_unique\u0026lt;ToolingASTConsumer\u0026gt;(); } }; int main() { const char* content = R\u0026#34;( int main() { return 0; } )\u0026#34;; bool success = clang::tooling::runToolOnCode(std::make_unique\u0026lt;ToolingAction\u0026gt;(), content, \u0026#34;main.cpp\u0026#34;); return !success; } Compile and run. The expected output should be:\nTranslationUnitDecl 0x7dabb8df5508 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; |-TypedefDecl 0x7dabb8e4e238 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; implicit __int128_t \u0026#39;__int128\u0026#39; | `-BuiltinType 0x7dabb8df5d90 \u0026#39;__int128\u0026#39; |-TypedefDecl 0x7dabb8e4e2b0 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; implicit __uint128_t \u0026#39;unsigned __int128\u0026#39; | `-BuiltinType 0x7dabb8df5dc0 \u0026#39;unsigned __int128\u0026#39; |-TypedefDecl 0x7dabb8e4e698 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; implicit __NSConstantString \u0026#39;__NSConstantString_tag\u0026#39; | `-RecordType 0x7dabb8e4e3b0 \u0026#39;__NSConstantString_tag\u0026#39; | `-CXXRecord 0x7dabb8e4e310 \u0026#39;__NSConstantString_tag\u0026#39; |-TypedefDecl 0x7dabb8df61e0 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; implicit __builtin_ms_va_list \u0026#39;char *\u0026#39; | `-PointerType 0x7dabb8df6190 \u0026#39;char *\u0026#39; | `-BuiltinType 0x7dabb8df55e0 \u0026#39;char\u0026#39; |-TypedefDecl 0x7dabb8e4e1c0 \u0026lt;\u0026lt;invalid sloc\u0026gt;\u0026gt; \u0026lt;invalid sloc\u0026gt; implicit __builtin_va_list \u0026#39;__va_list_tag[1]\u0026#39; | `-ConstantArrayType 0x7dabb8e4e160 \u0026#39;__va_list_tag[1]\u0026#39; 1 | `-RecordType 0x7dabb8df62e0 \u0026#39;__va_list_tag\u0026#39; | `-CXXRecord 0x7dabb8df6240 \u0026#39;__va_list_tag\u0026#39; `-FunctionDecl 0x7dabb8e4e768 \u0026lt;main.cpp:2:5, line:4:5\u0026gt; line:2:9 main \u0026#39;int ()\u0026#39; `-CompoundStmt 0x7dabb8e4e8e8 \u0026lt;col:16, line:4:5\u0026gt; `-ReturnStmt 0x7dabb8e4e8d0 \u0026lt;line:3:9, col:16\u0026gt; `-IntegerLiteral 0x7dabb8e4e8a8 \u0026lt;col:16\u0026gt; \u0026#39;int\u0026#39; 0 With this, the development environment is set up, and you can happily proceed with clang development.\nAST AST (Abstract Syntax Tree) is a data structure generated by the compiler during compilation to represent the grammatical structure of source code. It is an abstract layer of the source code, used to capture its syntactic information while removing specific details like semicolons, parentheses, etc. The small tool we wrote above compiles the input string and prints its AST. In fact, both static analysis tools and code generation tools are implemented by manipulating the AST. We need to know how to filter out nodes of interest from the AST and how to further obtain more detailed information about those nodes.\nAll AST nodes have the same lifecycle and may have complex inter-referencing relationships. For clang AST, although it\u0026rsquo;s nominally called an Abstract Syntax Tree, it\u0026rsquo;s actually a cyclic Graph. In this special scenario, using a memory pool to uniformly allocate memory for all AST nodes can greatly simplify node lifecycle management. clang also does this; all AST nodes are allocated via clang::ASTContext. Through ASTContext::getTranslationUnitDecl, we can obtain the root node of the AST, which, as the name suggests, represents a translation unit.\nBefore traversing the AST, we first need to understand the structure of the clang AST. The two most basic node types in clang are Decl and Stmt, with Expr being a subclass of Stmt. Decl represents declarations, such as variable declarations, function declarations, etc. Stmt represents statements, such as assignment statements, function call statements, etc. Expr represents expressions, such as addition expressions, function call expressions, etc.\nTaking int x = (1 + 2) * 3; as an example, its AST structure is as follows:\n`-VarDecl 0x7e0b3974e710 \u0026lt;main.cpp:2:1, col:19\u0026gt; col:5 x \u0026#39;int\u0026#39; cinit `-BinaryOperator 0x7e0b3974e898 \u0026lt;col:9, col:19\u0026gt; \u0026#39;int\u0026#39; \u0026#39;*\u0026#39; |-ParenExpr 0x7e0b3974e848 \u0026lt;col:9, col:15\u0026gt; \u0026#39;int\u0026#39; | `-BinaryOperator 0x7e0b3974e820 \u0026lt;col:10, col:14\u0026gt; \u0026#39;int\u0026#39; \u0026#39;+\u0026#39; | |-IntegerLiteral 0x7e0b3974e7d0 \u0026lt;col:10\u0026gt; \u0026#39;int\u0026#39; 1 | `-IntegerLiteral 0x7e0b3974e7f8 \u0026lt;col:14\u0026gt; \u0026#39;int\u0026#39; 2 `-IntegerLiteral 0x7e0b3974e870 \u0026lt;col:19\u0026gt; \u0026#39;int\u0026#39; 3 As you can see, it\u0026rsquo;s very clear and corresponds directly to the grammatical structure in the source code. Due to the complexity of C++ syntax, there are naturally many corresponding node types. You can find the inheritance graph of all node types in the clang source directory under clang/AST/DeclNodes.td and clang/AST/StmtNodes.td. Note that this is the source directory, not the installation directory. Files with the .td suffix are LLVM TableGen language, a special configuration file format used for code generation and similar tasks. Since LLVM disables exceptions and RTTI, but type conversions like dynamic_cast are very common when operating on AST nodes, LLVM implements its own similar mechanism through code generation. For related content, refer to the LLVM Programmer\u0026rsquo;s Manual.\nBelow, I will introduce some of the more important nodes and APIs in the AST (currently only a few, more will be added based on feedback).\nCast First, the most basic and important operation is downcasting node types. As mentioned earlier, related operations in LLVM share a common logic. The most commonly used API is llvm::dyn_cast. For example, if we want to check if a declaration is a function declaration:\nvoid foo(clang::Decl* decl) { if(auto FD = llvm::dyn_cast\u0026lt;clang::FunctionDecl\u0026gt;(decl)) { llvm::outs() \u0026lt;\u0026lt; FD-\u0026gt;getName() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; } } Its usage is almost identical to C++ standard library\u0026rsquo;s dynamic_cast.\nDeclContext In C++, there are declarations where we can define other declarations internally, for example:\nnamespace foo { int x = 1; } Here, foo acts as the declaration context for x. To describe this relationship, in clang, all Decls that can serve as a declaration context inherit from DeclContext. A typical example is the NamespaceDecl mentioned above. You can get all declarations within the context using the DeclContext::decls() member.\nTemplate Handling templates is arguably the most complex part of the clang AST. Here\u0026rsquo;s a brief introduction. All uninstantiated template declarations inherit from the clang::TemplateDecl class. By looking at the inheritance graph, there are types such as ClassTemplate, FunctionTemplate, VarTemplate, TypeAliasTemplate, TemplateTemplateParm, and Concept. These correspond to class templates, function templates, variable templates, type alias templates, template template parameters, and concepts in the C++ standard, respectively.\nWait, I have a question. How are member functions of class templates represented?\ntemplate \u0026lt;typename T\u0026gt; class Foo { void bar(); }; For example, what node is bar in the AST here? Dumping the AST reveals it\u0026rsquo;s a regular CXXMethodDecl, just like ordinary member functions. In fact, TemplateDecl has a member getTemplatedDecl, which can retrieve the underlying type of the template. It essentially treats T as a normal type. Once the underlying declaration is obtained, everything is the same as with ordinary non-template types. The Parser only has some special handling when parsing dependent names.\nAll template instantiations are also represented in the AST, whether explicit or implicit. You can use getSpecializationKind to get the type of template instantiation. TemplateSpecializationKind is an enumeration type with TSK_Undeclared, TSK_ImplicitInstantiation, TSK_ExplicitInstantiationDeclaration, TSK_ExplicitInstantiationDefinition, TSK_ExplicitSpecialization, and TSK_ExplicitInstantiation. These correspond to undeclared instantiation (template instantiation is lazy), implicit instantiation, explicit instantiation declaration, explicit instantiation definition, explicit specialization, and explicit instantiation, respectively.\nVisitor After gaining some basic understanding of the AST structure, we can now traverse the AST to collect information. The key class for this step is clang::RecursiveASTVisitor.\nRecursiveASTVisitor is a typical application of CRTP. It performs a depth-first traversal of the entire AST by default and provides a set of interfaces that subclasses can override to change the default behavior. For example, the following code will traverse the AST and dump all function declarations.\n#include \u0026#34;clang/Tooling/Tooling.h\u0026#34; #include \u0026#34;clang/AST/RecursiveASTVisitor.h\u0026#34; class ToolingASTVisitor : public clang::RecursiveASTVisitor\u0026lt;ToolingASTVisitor\u0026gt; { public: bool VisitFunctionDecl(clang::FunctionDecl* FD) { FD-\u0026gt;dump(); return true; } }; class ToolingASTConsumer : public clang::ASTConsumer { public: void HandleTranslationUnit(clang::ASTContext\u0026amp; context) override { ToolingASTVisitor visitor; visitor.TraverseAST(context); } }; I won\u0026rsquo;t go into too much detail about the working principle of RecursiveASTVisitor here; its documentation comments are sufficiently detailed. In short, if you only want to visit a certain node, override VisitFoo; if you want to customize the traversal behavior, for example, to filter out uninteresting nodes to speed up traversal, just override TraverseFoo, where Foo is the node type.\nNote that intuitively, traversal should be a read-only operation and thus thread-safe. However, clang AST caches some results during traversal, so concurrent multi-threaded traversal of the same AST is not thread-safe.\nPreprocess As you can see, there are no macro-related nodes in the AST node definitions. Nor are there any preprocessor directive nodes. In fact, clang\u0026rsquo;s AST is built after complete preprocessing; both macro expansion and preprocessor directives have already been processed. But what if we want to get related information?\nclang provides us with the PPCallbacks class, which allows us to override its relevant interfaces to obtain some information.\n#include \u0026#34;clang/Tooling/Tooling.h\u0026#34; #include \u0026#34;clang/AST/RecursiveASTVisitor.h\u0026#34; class ToolingPPCallbacks : public clang::PPCallbacks { public: void MacroDefined(const clang::Token\u0026amp; MacroNameTok, const clang::MacroDirective* MD) override { llvm::outs() \u0026lt;\u0026lt; \u0026#34;MacroDefined: \u0026#34; \u0026lt;\u0026lt; MacroNameTok.getIdentifierInfo()-\u0026gt;getName() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; } }; class ToolingAction : public clang::ASTFrontendAction { public: std::unique_ptr\u0026lt;clang::ASTConsumer\u0026gt; CreateASTConsumer(clang::CompilerInstance\u0026amp; instance, llvm::StringRef file) override { return std::make_unique\u0026lt;ToolingASTConsumer\u0026gt;(); } bool BeginSourceFileAction(clang::CompilerInstance\u0026amp; instance) override { llvm::outs() \u0026lt;\u0026lt; \u0026#34;BeginSourceFileAction\\n\u0026#34;; instance.getPreprocessor().addPPCallbacks( std::make_unique\u0026lt;ToolingPPCallbacks\u0026gt;()); return true; } }; The example above prints all macro definitions. PPCallbacks also provides many other interfaces, and the related comments are quite detailed, so you can use them as needed.\nLocation clang records detailed location information for nodes in the AST. The core class for representing location information is clang::SourceLocation. To store as much information as possible while reducing memory usage, it itself is just an ID, with the size of an int, making it very lightweight. The actual location information is stored in clang::SourceManager. When detailed information is needed, it must be decoded via SourceManager.\nYou can get the corresponding SourceManager via ASTContext::getSourceManager. Then we can dump the node\u0026rsquo;s location information with the following code:\nvoid dump(clang::SourceManager\u0026amp; SM, clang::FunctionDecl* FD) { FD-\u0026gt;getLocation().dump(SM); } Among the member functions of SourceManager, you will find many APIs prefixed with spelling or expansion. For example, getSpellingLineNumber and getExpansionLineNumber. What do these mean? First, it\u0026rsquo;s important to realize that all SourceLocations in the AST represent the starting position of a token. A token can originate in two ways: either directly corresponding to a token in the source code, or from a macro expansion. You can use SourceLocation::isMacroID to determine if a token\u0026rsquo;s location is generated by a macro expansion.\nFor tokens generated by macro expansion, clang tracks two pieces of information: the location of the macro expansion, which is ExpansionLocation, and the location of the token that produced this expanded token, which is SpellingLocation.\nFor example, consider the following code:\n#define Self(name) name int Self(x) = 1; Using the following code to print the location information of the variable declaration:\nvoid dump(clang::SourceManager\u0026amp; SM, clang::SourceLocation location) { llvm::outs() \u0026lt;\u0026lt; \u0026#34;is from macro expansion: \u0026#34; \u0026lt;\u0026lt; location.isMacroID() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; llvm::outs() \u0026lt;\u0026lt; \u0026#34;expansion location: \u0026#34;; SM.getExpansionLoc(location).dump(SM); llvm::outs() \u0026lt;\u0026lt; \u0026#34;spelling location: \u0026#34;; SM.getSpellingLoc(location).dump(SM); } The expected output is:\nis from macro expansion: 1 expansion location: main.cpp:2:5 spelling location: main.cpp:2:10 In this variable declaration, x is generated by a macro expansion, so isMacroID is true. Its expansion location is the position of the macro expansion, which is the starting position of Self(x). The spelling location, on the other hand, is the position of the token that produced x during expansion, which is the position of x within Self(x).\nIn addition, according to the C++ standard, the #line preprocessor directive can be used to change line numbers and filenames, for example:\n#include \u0026lt;string_view\u0026gt; #line 10 \u0026#34;fake.cpp\u0026#34; static_assert(__LINE__ == 10); static_assert(__FILE__ == std::string_view(\u0026#34;fake.cpp\u0026#34;)); Does this affect clang\u0026rsquo;s location recording? The answer is yes. So, what if I want to get the real line number and filename? You can use getPresumedLoc to get the location information, whether modified by #line directives or not.\nvoid dump(clang::SourceManager\u0026amp; SM, clang::SourceLocation location) { /// The second argument determines whether the location is modified by /// `#line` directives. auto loc = SM.getPresumedLoc(location, true); llvm::outs() \u0026lt;\u0026lt; loc.getFilename() \u0026lt;\u0026lt; \u0026#34;:\u0026#34; \u0026lt;\u0026lt; loc.getLine() \u0026lt;\u0026lt; \u0026#34;:\u0026#34; \u0026lt;\u0026lt; loc.getColumn() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; loc = SM.getPresumedLoc(location, false); llvm::outs() \u0026lt;\u0026lt; loc.getFilename() \u0026lt;\u0026lt; \u0026#34;:\u0026#34; \u0026lt;\u0026lt; loc.getLine() \u0026lt;\u0026lt; \u0026#34;:\u0026#34; \u0026lt;\u0026lt; loc.getColumn() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; } The first output is the location information modified by the #line directive, while the second output is the real location information. Note that clang uses UTF-8 as the default file encoding, and line and column calculations are also based on UTF-8. What if I want to get the column information in UTF-16 encoding? VS Code actually uses this by default. You can use getDecomposedLoc to decompose SourceLocation into a clang::FileID and an offset relative to the start of the file. With the offset and the source file\u0026rsquo;s text content, we can then calculate the column ourselves based on UTF-16 encoding.\nSince clang::FileID was mentioned, let\u0026rsquo;s continue with it. Like SourceLocation, it is also an ID, but it represents a file. We can use getIncludeLoc to get the location where the file was included (i.e., the position of the #include directive for the current file). Use getFileEntryRefForID to get information about the file it refers to, including its name, size, etc.\nvoid dump(clang::SourceManager\u0026amp; SM, clang::SourceLocation location) { auto [fid, offset] = SM.getDecomposedLoc(location); auto loc = SM.getIncludeLoc(fid); llvm::outs() \u0026lt;\u0026lt; SM.getFileEntryRefForID(fid)-\u0026gt;getName() \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; } A header file might be included multiple times. Each inclusion results in a new FileID, but they all refer to the same underlying file.\nConclusion After understanding the above content, readers should find it easy to write a clang-based tool, such as a clang-based reflection code generator.\nFor clice, many language server requests are fulfilled by traversing the AST. For example, SemanticTokens, which provides code kind and modifier decorations for tokens in the source code, allowing editors to further highlight them based on themes.\nHow is this implemented? It\u0026rsquo;s simply by traversing the AST and, based on the node type, returning the kind defined in the LSP standard, such as variable and function. Finally, the tokens are sorted by their position. The principle is very simple; the remaining task is to handle the many corner cases arising from the complexity of C++ syntax.\nThis concludes the article. Thank you for reading. Here are some additional reference documents from the official clang documentation:\nLibTooling AST Matcher Transformer Introduction to the Clang AST Clang CFE Internals Manual ","permalink":"https://www.ykiko.me/en/articles/21319978959/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAfter the \u003ca href=\"https://www.ykiko.me/en/articles/13394352064\"\u003earticle\u003c/a\u003e about clice was published, the response I received far exceeded my expectations, and many friends expressed a desire to participate in the development. While this enthusiasm is good, the barrier to entry is not low, mainly due to the difficulty of interacting with clang\u0026rsquo;s API. What makes it difficult? On the one hand, there is relatively little information about clang on the internet, whether in Chinese or English communities (this is expected, as there is very little demand for it, so naturally few people discuss it). On the other hand, due to the complexity of the C++ language itself, many details require a deeper understanding of the language before they can be grasped, and connecting theory with implementation is not easy.\u003c/p\u003e","title":"Deep Dive into Clang (Part 1)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nIt\u0026rsquo;s been several months since my last blog post. The reason for this long hiatus is that I\u0026rsquo;ve been busy working on clice – a brand new C++ language server.\nSome readers might be unfamiliar with the concept of a language server. However, you\u0026rsquo;ve certainly used IDEs like Visual Studio or CLion and experienced features such as code completion, navigation, and refactoring. In traditional IDEs, these features were implemented by IDE plugins or built-in functionalities. This approach meant that each language required separate support development for every editor, leading to high maintenance costs. When Microsoft released Visual Studio Code, they aimed to solve this problem and thus introduced the LSP (Language Server Protocol) concept. LSP proposes a standard client-server model. Language features are provided by an independent language server, and VSCode only needs to implement a generic client to communicate with it. This method decouples the editor from language support, allowing VSCode to easily support multiple languages. For example, if you want to view the implementation of a method, your editor will send a go to implementation request to the language server. The specific content of this request might include the file path and the current cursor position in the source file. The language server processes this request and returns a file location and coordinates, which the editor then uses to open the corresponding file and navigate.\nclice is precisely such a language server, designed to handle requests related to C/C++ code. The name comes from my avatar, Alice; by replacing the initial \u0026lsquo;A\u0026rsquo; with \u0026lsquo;C\u0026rsquo; (representing C/C++), I got clice.\nAfter several months of design and development, the project has taken shape, but it is expected to require a few more months for refinement before it can be put into use. The main purpose of this article is to introduce the design and implementation of clice, serving as my personal interim summary of the current development. Although it\u0026rsquo;s about language servers, it actually involves a lot of popular science knowledge related to C/C++ compilation. Interested readers can continue reading.\nMeanwhile, if you have any feature requests or suggestions, feel free to leave a comment and discuss. I will try my best to consider them in the next phase of development.\nWhy a new language server? So, the first question is, why develop a new language server? Is it necessary to reinvent the wheel?\nThis question deserves a serious answer. Before this project, I had written many projects, big and small. However, most of them were toy projects, written merely to validate an idea or for personal learning, and didn\u0026rsquo;t solve any real-world problems. clice is different; it genuinely aims to solve existing issues (more on specific problems later), rather than being a rewrite for the sake of rewriting.\nEarlier this year, I wanted to get involved in the development of the LLVM project. I wanted to start with something I was relatively familiar with, C++, specifically Clang. But without a specific need, I couldn\u0026rsquo;t just stare at the source code. The usual process in such cases is to start with some \u0026ldquo;first issues\u0026rdquo; and gradually familiarize oneself with the project. However, I found that boring; I wanted to tackle something significant right away, like implementing a feature from a new C++ standard. But I found there was little room for me to contribute; new features were almost always implemented by a few core Clang developers. Alright, since there wasn\u0026rsquo;t much opportunity there, I looked elsewhere. My attention naturally shifted to clangd, as I primarily use VSCode for development, and clangd is the best C++ language server available for VSCode.\nAt the time, I knew nothing about clangd, except that it seemed to incorrectly highlight keywords. So, I started reading clangd\u0026rsquo;s source code while browsing through its numerous issues to see if there was anything I could solve. After going through hundreds of issues, I found quite a few problems. I was particularly interested in an issue related to code completion within templates. Why was I interested in this? Readers familiar with my work might know that I\u0026rsquo;m an experienced metaprogramming enthusiast, and I\u0026rsquo;ve written many related articles before. Naturally, I was curious not only about how template metaprogramming itself works but also how Clang, as a compiler, implements related features. This issue seemed like an excellent entry point for me. After spending a few weeks exploring a prototype implementation, I initially resolved that issue, but then I realized there was no one available to review the related code!\nAfter some investigation, I found that clangd\u0026rsquo;s current situation is quite dire. Let\u0026rsquo;s trace the timeline: clangd initially started as a simple small project within LLVM, not excelling in functionality or usability. As MaskRay mentioned in his ccls blog post, clangd at the time could only handle single compilation units, and cross-compilation unit requests were unmanageable. This blog post was published in 2017, which was one reason MaskRay chose to write ccls. ccls was also a C/C++ language server and was superior to clangd at that point. However, later, Google began assigning people to improve clangd to meet the needs of their internal large codebases. Concurrently, the LSP standard was continuously expanding, and clangd kept pace with the new standard\u0026rsquo;s content, while the author of ccls seemed to gradually become busy with other things and had less time to maintain ccls. Consequently, clangd eventually surpassed ccls overall. The turning point occurred around 2023; clangd seemed to have reached a highly usable state for Google internally, and the employees originally responsible for clangd were reassigned to other tasks. Currently, clangd\u0026rsquo;s issues are primarily handled by only HighCommander4, purely out of passion, without being employed by anyone for this role. Since he isn\u0026rsquo;t specifically hired to maintain clangd, he can only address issues in his limited free time, and his contributions are restricted to answering questions and very limited reviews. As he mentioned in this comment:\nThe other part of the reason is lack of resources to pursue the ideas we do have, such as the idea mentioned above of trying to shift more of the burden to disk usage through more aggressive preamble caching. I\u0026rsquo;m a casual contributor, and the limited time I have to spend on clangd is mostly taken up by answering questions, some code reviews, and the occasional small fix / improvement; I haven\u0026rsquo;t had the bandwidth to drive this type of performance-related experimentation.\nGiven this situation, it\u0026rsquo;s no surprise that a large PR like initial C++20 module support for clangd has been delayed for nearly a year. After realizing this state of affairs, I conceived the idea of writing my own language server. I estimated the project size, excluding test code, to be around 20,000 lines, which is a manageable workload for one person over a period, and there are precedents like ccls and rust-analyzer. Another point is that clangd\u0026rsquo;s codebase is quite old; despite numerous comments, the underlying logic is still very convoluted, and making extensive modifications might take longer than a complete rewrite.\nSo, I got to work. I categorized hundreds of clangd issues to see if any were difficult to solve due to initial architectural design flaws and subsequently shelved. If so, could these problems be addressed during a redesign? I found that there indeed were some! Consequently, over the next two months, I dedicated myself to studying Clang\u0026rsquo;s internal mechanisms, exploring solutions to related problems, and prototyping implementations. After confirming that these issues could largely be resolved, I officially began developing clice.\nImportant improvement Having said all that, let\u0026rsquo;s first look at which significant existing problems in clangd clice actually solves. The focus here will be on feature introduction; the implementation principles will be covered in the Design section. Besides these important improvements, there are, of course, many minor functional enhancements, which I won\u0026rsquo;t list individually here.\nBetter template support First and foremost, there\u0026rsquo;s better template support, which was the feature I initially wanted clangd to support. Specifically, what are the current problems with handling templates?\nTaking code completion as an example, consider the following code, where ^ represents the cursor position:\ntemplate \u0026lt;typename T\u0026gt; void foo(std::vector\u0026lt;T\u0026gt; vec) { vec.^ } In C++, if a type depends on template parameters, we cannot make any accurate assumptions about it before template instantiation. For example, vector here could be the primary template or a partial specialization like vector\u0026lt;bool\u0026gt;. Which one should be chosen? For code compilation, accuracy is always paramount; no results that could lead to errors can be used. However, for a language server, providing more possible results is often better than providing nothing at all. We can assume that users more often use the primary template rather than partial specializations, and thus provide code completion results based on the primary template. Currently, clangd indeed does this, offering code completion based on the primary vector template in the situation described above.\nConsider a more complex example:\ntemplate \u0026lt;typename T\u0026gt; void foo(std::vector\u0026lt;std::vector\u0026lt;T\u0026gt;\u0026gt; vec2) { vec2[0].^ } From a user\u0026rsquo;s perspective, completion should also be provided here, as the type of vec2[0] is also vector\u0026lt;T\u0026gt;, just like in the previous example. However, clangd doesn\u0026rsquo;t offer any completion here. What\u0026rsquo;s the problem? According to the C++ standard, the return type of std::vector\u0026lt;T\u0026gt;\u0026rsquo;s operator[] is std::vector\u0026lt;T\u0026gt;::reference, which is actually a dependent name. Its result seems quite straightforward: T\u0026amp;. But in libstdc++, its definition is nested through a dozen layers of templates, seemingly for compatibility with older standards? So why can\u0026rsquo;t clangd handle this situation?\nIt relies on primary template assumptions, and not considering partial specializations might prevent the lookup from proceeding. It only performs name lookup without template instantiation, so even if the final result is found, it cannot be mapped back to the original template parameters. It doesn\u0026rsquo;t consider default template parameters, making it unable to handle dependent names caused by them. Although we could create \u0026ldquo;holes\u0026rdquo; for standard library types to provide relevant support, I want user code to have the same standing as standard library code. Therefore, we need a generic algorithm to handle dependent types. To solve this problem, I developed a pseudo instantiator. It can instantiate dependent types without concrete types, thereby achieving simplification. For example, std::vector\u0026lt;std::vector\u0026lt;T\u0026gt;\u0026gt;::reference in the example above can be simplified to std::vector\u0026lt;T\u0026gt;\u0026amp;, which then allows providing code completion options to the user.\nHeader context For clangd to function correctly, users often need to provide a compile_commands.json file (hereinafter referred to as CDB). The basic compilation unit in C++\u0026rsquo;s traditional compilation model is a source file (e.g., .c and .cpp files), and #include simply pastes the contents of a header file into the corresponding location in the source file. The aforementioned CDB file stores the compilation commands for each source file. When you open a source file, clangd uses its corresponding compilation command from the CDB to compile that file.\nNaturally, a question arises: since the CDB only contains compilation commands for source files, not header files, how does clangd handle header files? In fact, clangd treats header files as source files and then, based on certain rules, uses the compilation command of a source file in the corresponding directory as the compilation command for that header file. This model is simple and effective but overlooks some situations.\nSince header files are part of source files, their content can vary depending on the preceding content in the source file. For example:\n// a.h #ifdef TEST struct X { ... }; #else struct Y { ... }; #endif // b.cpp #define TEST #include \u0026#34;a.h\u0026#34; // c.cpp #include \u0026#34;a.h\u0026#34; Clearly, a.h has different states in b.cpp and c.cpp; one defines X, and the other defines Y. If a.h is simply treated as a source file, only Y will be visible.\nA more extreme case is non-self-contained header files, for example:\n// a.h struct Y { X x; }; // b.cpp struct X {}; #include \u0026#34;a.h\u0026#34; a.h itself cannot be compiled, but it compiles normally when embedded in b.cpp. In this scenario, clangd would report an error in a.h, stating that X\u0026rsquo;s definition cannot be found. This is clearly because it treats a.h as an independent source file. Many such header files exist in libstdc++ code, and some popular C++ header-only libraries also contain similar code, which clangd currently cannot handle.\nclice will support header context, allowing automatic and user-initiated switching of header file states, and naturally, it will also support non-self-contained header files. We aim to achieve the following effect, using the initial code example: when you jump from b.cpp to a.h, b.cpp will be used as the context for a.h. Similarly, when you jump from c.cpp to a.h, c.cpp will be used as the context for a.h.\nFully C++20 module support C++20 introduced the module feature to accelerate compilation. Unlike traditional compilation models, module units can have dependencies on each other. This requires additional handling. Although the PR for initial module support in clangd has been merged, it is still in a very early state.\nPrecompiled modules are not shared between different files, leading to redundant module compilation. Other accompanying LSP features have not kept pace, such as providing highlighting and navigation for module names, or offering completion similar to header files. Only Clang is supported, not other compilers. clice will provide compiler- and build-system-agnostic C++20 module support, and the project itself will fully migrate to modules later.\nBetter index format Some ccls users might complain that even though both pre-index the entire project, ccls allows instant navigation upon opening a file, while clangd still requires waiting for the file to be parsed. Why does this happen? This is actually due to a design flaw in clangd\u0026rsquo;s index format. What is an index? Since C/C++ supports forward declarations, declarations and definitions can be in different source files, so we need to handle symbol relationships across compilation units.\nHowever, parsing files is a very time-consuming operation. If we wait to parse files until a query is needed, the query time would be astronomical. To support fast lookup of symbol relationships, language servers generally pre-index the entire project. But what format should be used to store the relevant data? There is no standard for this.\nclice has thoroughly referenced existing index designs and developed a more efficient index format. It can also achieve the same effect as ccls: if a project is pre-indexed, responses can be obtained immediately without waiting.\nDesign This section will discuss the design and implementation of clice in more detail.\nServer First, a language server is also a server, and in this regard, it\u0026rsquo;s no different from a traditional server. It uses an event-driven programming model to receive and process server requests. Since C++20 is available, it\u0026rsquo;s natural to experience asynchronous programming using stackless coroutines. clangd\u0026rsquo;s code contains a large number of callback functions, making that part of the code quite difficult to read. Using stackless coroutines can avoid similar callback hell.\nIt\u0026rsquo;s worth noting that regarding library selection, I didn\u0026rsquo;t choose an off-the-shelf coroutine library. Instead, I used C++20\u0026rsquo;s coroutine facilities to wrap libuv and create a simple coroutine library myself. The reasons are as follows:\nThe LLVM project does not use exceptions. We try to maintain consistency with this, and directly wrapping C libraries allows us better control over this aspect. The event model of a language server is quite simple, with one-to-one connections. Handling I/O-related requests on the main thread and using a thread pool for time-consuming tasks is entirely sufficient. In this model, there\u0026rsquo;s no need for synchronization primitives like locks for inter-thread communication. Therefore, the models of general network libraries are overly complex for clice. Finally, I successfully replicated an asynchronous programming experience similar to Python and JavaScript in C++, which was very pleasant and effortless.\nHow it works? Next, let\u0026rsquo;s discuss in detail how clice handles certain specific requests.\nFirst, when a user opens or updates a file in the editor, the editor sends relevant notifications to clice. Upon receiving the request, clice parses the file. More specifically, it parses the file into an AST (Abstract Syntax Tree). Since C++ grammar is quite complex, writing a parser from scratch is impractical. Like clangd, we choose to use the interfaces provided by Clang to parse source files.\nAfter obtaining the AST, we traverse it to collect the information we are interested in. Taking SemanticTokens as an example, we need to traverse the AST to add semantic information to each token in the source code: Is it a variable or a function? Is it const? Is it static? And so on. In short, all this information can be obtained from the AST. For a deeper understanding of this, you can read an introductory article I previously wrote about Clang AST.\nMost requests can be implemented in a similar manner as described above. Code Completion and Signature Helper are somewhat special. Since the syntax at the completion point might be incomplete, in a regular compilation process, if a syntax node is incomplete, Clang might treat it as an error node, discard it entirely, or even terminate compilation with a fatal error. In any case, these outcomes are unacceptable to us. Generally, to implement code completion, the parser needs to make \u0026ldquo;holes\u0026rdquo; for special handling. Clang is no exception; it provides a special code completion mode, which obtains relevant information by inheriting CodeCompleteConsumer and overriding its related methods.\nYou can experience this feature with a special compilation option:\n-std=c++20 -fsyntax-only -Xclang -code-completion-at=\u0026#34;example.cpp:1:3\u0026#34; Assuming the source file is\ncon Then the expected output is\nCOMPLETION: const COMPLETION: consteval COMPLETION: constexpr COMPLETION: constinit It can be seen that the result is the completion of four C++ keywords, without any errors or warnings.\nWell, that\u0026rsquo;s the whole process. Doesn\u0026rsquo;t it sound quite simple? Indeed, the logic for traversing the AST in this part is quite clear. It\u0026rsquo;s just that there are many corner cases to consider; it simply requires gradually investing time to implement features and then iteratively fixing bugs.\nIncremental compilation Since users might frequently change files, if the entire file needs to be re-parsed every time, parsing can be very slow for large files, leading to very long response times (considering that #include is just copy-pasting, it\u0026rsquo;s easy to create a huge file). Imagine how terrible the experience would be if you typed a letter and had to wait several seconds for code completion results to appear!\nWhat to do then? The answer is Incremental Compilation. You might have heard this term when learning about build tools like CMake, but there are some differences. Incremental compilation for build tools operates at the granularity of a file, recompiling only changed files. However, this is clearly insufficient for us; the most basic request unit for LSP is a file, and we need finer-grained incremental compilation.\nClang provides a mechanism called Precompiled Header (PCH), which can be used to serialize a segment of code to disk after compiling it into an AST, and then reuse it during subsequent compilations.\nFor example:\n#include \u0026lt;vector\u0026gt; #include \u0026lt;string\u0026gt; #include \u0026lt;iostream\u0026gt; int main() { std::vector\u0026lt;int\u0026gt; vec; std::string str; std::cout \u0026lt;\u0026lt; \u0026#34;Hello, World!\u0026#34; \u0026lt;\u0026lt; std::endl; } We can compile the first three lines of this file\u0026rsquo;s code into a PCH and cache it. This way, even if the user frequently modifies the file content, as long as those first three lines are unchanged, we can directly reuse the PCH for compilation, significantly reducing compilation time. This part of the code is called the preamble. If the preamble is changed, a new PCH file needs to be regenerated. Now you should understand why clangd takes a long time to respond when a file is opened for the first time, but subsequent responses are very fast; it\u0026rsquo;s precisely this preamble optimization at work. If you want to optimize your project\u0026rsquo;s build time, you can also consider using PCM; not only Clang, but GCC and MSVC also support similar mechanisms for fine-grained incremental compilation.\nPCH is good, but its dependencies can only be linear. You can use one PCH to build a new PCH, as long as it\u0026rsquo;s located in the first few lines of another file. However, you cannot use two PCHs to build a new PCH. So what if there\u0026rsquo;s a directed acyclic graph of dependencies? The answer is C++20 modules. C++20 modules are essentially a \u0026ldquo;PCH Pro\u0026rdquo; version; their implementation principle is entirely similar, but they relax the limitations of dependency chains, allowing a module to depend on several other modules.\nAs for how to support C++20 modules? That\u0026rsquo;s a broad topic, deserving a separate article for discussion, so I won\u0026rsquo;t elaborate on it here.\nConclusion Well, I\u0026rsquo;ll stop here for now. There are actually many topics I haven\u0026rsquo;t covered, but upon reflection, each one could easily become a lengthy article on its own. I\u0026rsquo;ll save them for future additions; consider this article an introduction. I also regularly update progress in the project\u0026rsquo;s issues, so interested readers can follow along.\n","permalink":"https://www.ykiko.me/en/articles/13394352064/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIt\u0026rsquo;s been several months since my last blog post. The reason for this long hiatus is that I\u0026rsquo;ve been busy working on \u003ca href=\"https://github.com/clice-project/clice\"\u003eclice\u003c/a\u003e – a brand new C++ language server.\u003c/p\u003e","title":"Design and Implementation of a New C++ Language Server"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nDue to a series of coincidences, I participated in last week\u0026rsquo;s WG21 meeting (the C++ Standards Committee meeting). Although I often browse new proposals for the C++ standard, I never expected to one day actually attend a WG21 meeting and get real-time updates on the latest progress of the C++ standard. Of course, this was my first time attending, and I was very excited. I\u0026rsquo;m writing this to record my feelings and the progress of the meeting.\nHow It Started It all started this January when I was figuring out how to write an efficient small_vector. I looked at the LLVM source code for reference and found that it had a specialized implementation for types that are trivially destructible, using bitwise copy for operations like resizing. At the time, I didn\u0026rsquo;t quite understand why this was possible. Later, I learned about the concept of trivially copyable and then the concept of relocatable. After reading a few related proposals, I wrote this article discussing trivially relocatable.\nA few days later, a good friend of mine, blueloveTH, asked if I could write a lightweight small_vector for his project. The project is pocketpy, a lightweight Python interpreter. I thought, what a coincidence! I had just been researching this very thing a few days ago. So, I spent a few hours and wrote a very lightweight small_vector with support for trivially relocatable optimization. Coincidentally, this was also the project I applied to for this year\u0026rsquo;s GSoC.\nOn May 1st, I received two emails. One was from the GSoC committee informing me that my application was accepted. The other was from Arthur O\u0026rsquo;Dwyer, the author of P1144 (trivially relocatable). I was very confused at the time. Why would he suddenly email me? I didn\u0026rsquo;t know him at all. It turns out he regularly searches GitHub for C++ projects using the keyword trivially relocatable to exchange ideas with the project authors. He found the code in pocketpy, which is why he emailed us. It seems he also found the article on my personal blog discussing trivially relocatable. We had a brief exchange via email at first, and later we discussed some of the proposal\u0026rsquo;s content on Slack.\nAt the end of our discussion, he invited me to attend this WG21 meeting. The reason was that the current situation for trivially relocatable in C++ was that the committee was planning to adopt a flawed proposal, P2786, instead of the more complete proposal, P1144. Arthur O\u0026rsquo;Dwyer hoped that we, as supporters of P1144, could express our approval. So, I wrote an email to ISO to apply to attend the meeting online as a guest. After three weeks with no reply, I was starting to think I wouldn\u0026rsquo;t be able to attend. Then, three days before the meeting started, Herb Sutter finally replied to my email, saying he thought all emails had been answered but had somehow missed mine. He then said my application was approved and welcomed me to the meeting.\nThere was a small mishap here. During the opening session, Herb Sutter was counting the number of participating countries. He did this by calling out each country, and attendees would raise their hands. When he called \u0026lsquo;China,\u0026rsquo; I got a bit flustered and couldn\u0026rsquo;t find the \u0026lsquo;raise hand\u0026rsquo; button. In the end, when he saw no one raised their hand, he even commented that he was sure there was a participant from China in this meeting.\nHow the C++ Standard Evolves To make it easier to explain the meeting\u0026rsquo;s progress later, I\u0026rsquo;ll first briefly introduce how the C++ committee operates.\nC++ has 23 study groups, from SG1 to SG23, each responsible for discussing different topics. For example, compile-time metaprogramming is discussed by the SG7 group.\nAfter a proposal passes a study group, it is forwarded to either the EWG (Evolution Working Group) or the LEWG (Library Evolution Working Group) for review, depending on whether it concerns a language feature or a standard library feature. If the review is successful, it is then submitted to the CWG (Core Working Group) or LWG (Library Working Group) to refine the wording in the proposal so that it can be incorporated into the C++ standard.\nFinally, proposals that pass CWG or LWG are voted on in a plenary session. If the vote passes, they are officially added to the C++ standard.\nThe schedule for this St. Louis meeting was: opening session on Monday morning. In the afternoon, the various groups began discussing their respective agendas, all happening concurrently. I spent most of my time in the EWG meeting room. Guests are allowed to participate in group polls but cannot vote in the final plenary session.\nMeeting Progress First, I\u0026rsquo;ll briefly talk about the proposals that were confirmed to pass, and then discuss the current progress of some important proposals.\nApproved Proposals For the core language, the main proposals that passed were:\nconstexpr placement new supports using placement new directly in constant evaluation to call an object\u0026rsquo;s constructor. Before this, one could only use std::construct_at, which is essentially a parenthesized version of placement new. For a detailed discussion on this, you can read my blog post on the history of constexpr. deleting a pointer to an incomplete type should be ill-formed deleting a pointer to an incomplete type will now result in a compile error instead of causing undefined behavior. ordering of constraints involving fold expressions clarifies the partial ordering rules for constraints involving fold expressions. structured binding declaration as a condition structured bindings can now be used in the condition of an if statement. For the standard library, the main proposals that passed were:\ninplace_vector Note that inplace_vector is different from small_vector. The latter performs dynamic memory allocation when its SBO capacity is insufficient, while the former does not. It\u0026rsquo;s like a dynamic array and can be conveniently used as a buffer. std::is_virtual_base_of used to determine if one class is a virtual base class of another. std::optional range support adds range support for std::optional. std::execution The long-debated std::execution has finally made it into the standard. Proposals with Significant Progress I spent almost all my time in the EWG meeting room these past few days, so I\u0026rsquo;ll mainly talk about some progress on the core language side.\nOn Monday afternoon and all of Tuesday, EWG was discussing Contracts. Compared to the last meeting in Tokyo, some consensus was reached on certain debates about Contracts, but there are still areas without consensus. I personally think the chances of it being included in C++26 are still slim.\nOn Wednesday morning, EWG discussed Reflection for C++26. It was ultimately passed with 0 votes against (including my own \u0026lsquo;super favor\u0026rsquo; vote) and was forwarded to CWG for wording revisions to be included in the C++ standard. On Thursday and Friday, CWG reviewed a portion of the content, but the proposal is too large to be finished. If all goes well, it is expected to be officially added to C++26 after two or three more meetings. The voting results show that everyone believes C++ needs reflection, and it has a very high chance of being included in C++26.\nOn Friday morning, EWG mainly discussed trivially relocatable. In a previous meeting, P2786 had already passed the EWG vote and was forwarded to CWG. However, it is incomplete and has many issues. Adding such a proposal to the standard would undoubtedly be detrimental to the development of C++. Since the last Tokyo meeting, several new proposals discussing trivially relocatable have emerged:\nIssues with P2786 Please reject P2786 and adopt P1144 Analysis of interaction between relocation, assignment, and swap Clearly, they all took aim at P2786. After the authors of these proposals gave their presentations, a vote was held to decide whether to return P2786 from CWG to EWG, which means reconsidering the trivially relocatable model for C++26. In the end, P2786 was returned to EWG by an overwhelming majority. Of course, I voted \u0026lsquo;super favor,\u0026rsquo; as this was the main reason I attended the meeting. As for P1144, we\u0026rsquo;ll probably have to wait for the next meeting; it wasn\u0026rsquo;t discussed this time.\nThe rest is progress on some smaller proposals. It\u0026rsquo;s worth mentioning that many constexpr-related proposals passed EWG, namely:\nLess transient constexpr allocation Allowing exception throwing in constant-evaluation Emitting messages at compile time However, it\u0026rsquo;s hard to say what their chances are of passing CWG later. If you\u0026rsquo;re interested in the latest progress of a specific proposal, you can just search for the proposal number in the issues on the ISO C++ GitHub, which will have detailed records of its latest progress.\nSome Impressions That\u0026rsquo;s it for the meeting progress. Now, I\u0026rsquo;d like to share some of my personal feelings.\nFirst, regarding the vote on trivially relocatable, I actually felt a sudden sense of guilt after casting my vote. The reason is that before the final vote, the author of P2786 said:\nif other people want to make modifications and bring forward their own paper, you know, as an author, I am not going to say, \u0026lsquo;No, don\u0026rsquo;t; it\u0026rsquo;s my paper.\u0026rsquo; If it\u0026rsquo;s a good change, you know, that\u0026rsquo;s good.\nI could clearly hear his voice was trembling as he said this. Putting myself in his shoes, I think I can understand his feelings. He must have poured a lot of effort into this proposal, and having it withdrawn in such a dishonorable way is hard to accept. But in reality, the author of P1144 put in even more effort; the proposal is already at version R11 and is more complete, yet it has been consistently ignored. I find it hard to understand why this situation occurred.\nAnother thing was the situation during the plenary vote for the proposal Structured Bindings can introduce a Pack, which is about introducing parameter packs in structured bindings:\nauto [x, ...pack] = std::tuple{1, 2, 3, 4}; It had already passed CWG, but during the plenary session, a compiler vendor pointed out at the last minute that some examples in the wording would cause a compiler crash with the proposal\u0026rsquo;s reference implementation. As a result, it failed the final plenary vote.\nA similar situation happened with std::execution. Before the plenary vote, someone argued that std::execution should not be added to C++26, claiming it is too complex and not mature enough, that the authors were just talking in the abstract without considering practical application scenarios. Furthermore, its heavy use of templates leads to very slow compilation speeds and frequently causes internal compiler errors. Although the final vote had more in favor than against, the C++ committee emphasizes achieving consensus, not simple majority rule. The ratio has to reach a certain threshold to pass. So, logically, the proposal shouldn\u0026rsquo;t have passed in this meeting, but it did for certain reasons. I\u0026rsquo;m not sure about the exact details, as I wasn\u0026rsquo;t paying close attention at that moment, so it\u0026rsquo;s hard for me to say.\nTo be honest, I didn\u0026rsquo;t gain much in terms of knowledge from attending this meeting, but I certainly broadened my horizons. Some of the debates felt no different from online arguments. From their respective perspectives, both sides\u0026rsquo; views are correct, which is reasonable. Not everything has an absolute right or wrong, and many problems don\u0026rsquo;t have a perfect solution, especially in the field of software engineering. So, to get into the standard, compromises must be made. But where to compromise, and who compromises for whom? This is often accompanied by heated debates and is sometimes even decided by other external factors (outside of the issue itself).\nIf I have the chance in the future, I might attend again, preferably in person. But I definitely won\u0026rsquo;t be attending every single session on time every day like I did this time (I was a bit too excited for my first time). I\u0026rsquo;ll probably just listen in on the parts that interest me. I really, really want to see the moment reflection gets into C++26.\nAlright, that\u0026rsquo;s the end of the article. Thanks for reading.\n","permalink":"https://www.ykiko.me/en/articles/706509748/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eDue to a series of coincidences, I participated in last week\u0026rsquo;s WG21 meeting (the C++ Standards Committee meeting). Although I often browse new proposals for the C++ standard, I never expected to one day actually attend a WG21 meeting and get real-time updates on the latest progress of the C++ standard. Of course, this was my first time attending, and I was very excited. I\u0026rsquo;m writing this to record my feelings and the progress of the meeting.\u003c/p\u003e","title":"St. Louis WG21 Meeting Review"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nI participated in Google Summer of Code 2024. The main task was to implement a pybind11-compatible interface for a Python interpreter. Saying \u0026ldquo;implement a compatible interface\u0026rdquo; is somewhat of an understatement — it was essentially a rewrite of pybind11, so I\u0026rsquo;ve been spending a lot of time reading through its source code lately.\nSome readers might not be familiar with pybind11. Simply put, pybind11 is middleware that facilitates interaction between Python and C++ code. For example, it allows embedding a Python interpreter in C++ or compiling C++ code into a dynamic library for Python to call. For more details, please refer to the official documentation.\nRecently, I\u0026rsquo;ve basically clarified the overall operational logic of the framework. Looking back now, pybind11 truly lives up to its reputation as the de facto standard for C++ and Python binding, featuring many ingenious designs. Its interaction logic could also be fully applied to interactions between C++ and other GC-enabled languages, such as JS and C# (although there aren\u0026rsquo;t things like jsbind11 and csharpbind11 yet). I might write a series of related articles soon, stripping away some tedious details to introduce some of the common ideas.\nThis article primarily discusses some interesting aspects of pybind11\u0026rsquo;s object design.\nPyObject As we all know, in Python, everything is an object, all objects. However, pybind11 actually needs to interact with specific Python implementations like CPython. So, what is the manifestation of \u0026ldquo;everything is an object\u0026rdquo; in CPython? The answer is PyObject*. Let\u0026rsquo;s now \u0026ldquo;see\u0026rdquo; Python and understand how actual Python code operates within CPython.\nCreating an object is essentially creating a PyObject*\nx = [1, 2, 3] CPython has dedicated APIs to create objects of built-in types. The above statement would likely be translated into:\nPyObject* x = PyList_New(3); PyList_SetItem(x, 0, PyLong_FromLong(1)); PyList_SetItem(x, 1, PyLong_FromLong(2)); PyList_SetItem(x, 2, PyLong_FromLong(3)); In this way, the role of is becomes easy to understand: it\u0026rsquo;s used to determine if the values of two pointers are the same. The reason for so-called default shallow copying is simply that default assignment is just pointer assignment, not involving the elements it points to.\nCPython also provides a series of APIs to operate on objects pointed to by PyObject*, for example:\nPyObject* PyObject_CallObject(PyObject *callable_object, PyObject *args); PyObject* PyObject_CallFunction(PyObject *callable_object, const char *format, ...); PyObject* PyObject_CallMethod(PyObject *o, const char *method, const char *format, ...); PyObject* PyObject_CallFunctionObjArgs(PyObject *callable, ...); PyObject* PyObject_CallMethodObjArgs(PyObject *o, PyObject *name, ...); PyObject* PyObject_GetAttrString(PyObject *o, const char *attr_name); PyObject* PyObject_SetAttrString(PyObject *o, const char *attr_name, PyObject *v); int PyObject_HasAttrString(PyObject *o, const char *attr_name); PyObject* PyObject_GetAttr(PyObject *o, PyObject *attr_name); int PyObject_SetAttr(PyObject *o, PyObject *attr_name, PyObject *v); int PyObject_HasAttr(PyObject *o, PyObject *attr_name); PyObject* PyObject_GetItem(PyObject *o, PyObject *key); int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *v); int PyObject_DelItem(PyObject *o, PyObject *key); These functions generally have direct counterparts in Python, and their names indicate their purpose.\nhandle Since pybind11 needs to support operating on Python objects in C++, the primary task is to encapsulate these C-style APIs. This is specifically done by the handle type. handle is a simple wrapper around PyObject* and encapsulates some member functions, for example:\nRoughly like this:\nclass handle { protected: PyObject* m_ptr; public: handle(PyObject* ptr) : m_ptr(ptr) {} friend bool operator==(const handle\u0026amp; lhs, const handle\u0026amp; rhs) { return PyObject_RichCompareBool(lhs.m_ptr, rhs.m_ptr, Py_EQ); } friend bool operator!=(const handle\u0026amp; lhs, const handle\u0026amp; rhs) { return PyObject_RichCompareBool(lhs.m_ptr, rhs.m_ptr, Py_NE); } // ... }; Most functions are simply wrapped like the above, but some functions are special.\nget/set According to Bjarne Stroustrup, the father of C++, in \u0026ldquo;The Design and Evolution of C++\u0026rdquo;, one reason for introducing reference (lvalue) types was to allow users to assign to return values, making operator overloading for [] more natural. For example:\nstd::vector\u0026lt;int\u0026gt; v = {1, 2, 3}; int x = v[0]; // get v[0] = 4; // set Without references, one would have to return pointers, and the above code would have to be written like this:\nstd::vector\u0026lt;int\u0026gt; v = {1, 2, 3}; int x = *v[0]; // get *v[0] = 4; // set In comparison, isn\u0026rsquo;t using references much more elegant? This problem also exists in other programming languages, but not all languages adopt this solution. For example, Rust chooses automatic dereferencing, where the compiler automatically adds * to dereference at appropriate times, thus eliminating the need to explicitly write *. However, neither of these methods works for Python, because Python fundamentally has no concept of dereferencing, nor does it distinguish between lvalues and rvalues. So what\u0026rsquo;s the solution? The answer is to distinguish between getter and setter.\nFor example, to overload []:\nclass List: def __getitem__(self, key): print(\u0026#34;__getitem__\u0026#34;) return 1 def __setitem__(self, key, value): print(\u0026#34;__setitem__\u0026#34;) a = List() x = a[0] # __getitem__ a[0] = 1 # __setitem__ Python checks the syntactic structure; if [] appears on the left side of =, __setitem__ will be called, otherwise __getitem__ will be called. Many languages actually adopt similar designs, such as C#\u0026rsquo;s this[] operator overloading.\nEven the . operator can be overloaded, simply by overriding __getattr__ and __setattr__:\nclass Point: def __getattr__(self, key): print(f\u0026#34;__getattr__\u0026#34;) return 1 def __setattr__(self, key, value): print(f\u0026#34;__setattr__\u0026#34;) p = Point() x = p.x # __getattr__ p.x = 1 # __setattr__ pybind11 aims for handle to achieve a similar effect, i.e., calling __getitem__ and __setitem__ at appropriate times. For example:\npy::handle obj = py::list(1, 2, 3); obj[0] = 4; // __setitem__ auto x = obj[0]; // __getitem__ x = py::int_(1); The corresponding Python code is:\nobj = [1, 2, 3] obj[0] = 4 x = obj[0] x = 1 accessor Next, let\u0026rsquo;s focus on how to achieve this effect. First, consider the return value of operator[]. Since __setitem__ might need to be called, we return a proxy object here. It will store the key for subsequent calls.\nclass accessor { handle m_obj; ssize_t m_key; handle m_value; public: accessor(handle obj, ssize_t key) : m_obj(obj), m_key(key) { m_value = PyObject_GetItem(obj.ptr(), key); } }; The next problem is how to distinguish between obj[0] = 4 and x = int_(1), so that the former calls __setitem__ and the latter is a simple assignment to x. Notice the key difference between the two cases above: lvalue and rvalue.\nobj[0] = 4; // assign to rvalue auto x = obj[0]; x = 1; // assign to lvalue How can operator= call different functions based on the value category of its operand? This requires a somewhat less common trick: we know that a const qualifier can be added to a member function, allowing it to be called on a const object.\nstruct A { void foo() {} void bar() const {} }; int main() { const A a; a.foo(); // error a.bar(); // ok } Besides this, reference qualifiers \u0026amp; and \u0026amp;\u0026amp; can also be added. The effect is to require that the expr in expr.f() be an lvalue or an rvalue. This way, we can call different functions based on whether it\u0026rsquo;s an lvalue or an rvalue.\nstruct A { void foo() \u0026amp; {} void bar() \u0026amp;\u0026amp; {} }; int main() { A a; a.foo(); // ok a.bar(); // error A().bar(); // ok A().foo(); // error } Using this feature, we can achieve the above effect:\nclass accessor { handle m_obj; ssize_t m_key; handle m_value; public: accessor(handle obj, ssize_t key) : m_obj(obj), m_key(key) { m_value = PyObject_GetItem(obj.ptr(), key); } // assign to rvalue void operator=(handle value) \u0026amp;\u0026amp; { PyObject_SetItem(m_obj.ptr(), m_key, value.ptr()); } // assign to lvalue void operator=(handle value) \u0026amp; { m_value = value; } }; lazy evaluation Furthermore, we want this proxy object to behave just like a handle, capable of using all of handle\u0026rsquo;s methods. This is simple: just inherit from handle.\nclass accessor : public handle { handle m_obj; ssize_t m_key; public: accessor(handle obj, ssize_t key) : m_obj(obj), m_key(key) { m_ptr = PyObject_GetItem(obj.ptr(), key); } // assign to rvalue void operator=(handle value) \u0026amp;\u0026amp; { PyObject_SetItem(m_ptr, m_key, value.ptr()); } // assign to lvalue void operator=(handle value) \u0026amp; { m_ptr = value; } }; It seems to end here, but notice that our __getitem__ is called in the constructor, meaning it will be invoked even if the retrieved value is not used later. There seems to be room for further optimization: can we make this evaluation lazy through some means? Only calling __getitem__ when functions within handle actually need to be called?\nDirectly inheriting handle as it is currently won\u0026rsquo;t work; it\u0026rsquo;s impossible to insert a check before every member function call to decide whether to invoke __getitem__. We can have both handle and accessor inherit from a base class, which provides an interface to actually retrieve the pointer to be operated on.\nclass object_api{ public: virtual PyObject* get() = 0; bool operator==(const handle\u0026amp; rhs) { return PyObject_RichCompareBool(get(), rhs.ptr(), Py_EQ); } // ... }; Then, both handle and accessor inherit from this base class, and accessor can perform lazy evaluation of __getitem__ here.\nclass handle : public object_api { PyObject* get() override { return m_ptr; } }; class accessor : public object_api { PyObject* get() override { if (!m_ptr) { m_ptr = PyObject_GetItem(m_obj.ptr(), m_key); } return m_ptr; } }; This doesn\u0026rsquo;t involve type erasure; it merely requires subclasses to expose an interface. Therefore, we can naturally use CRTP to devirtualize.\ntemplate \u0026lt;typename Derived\u0026gt; class object_api { public: PyObject* get() { return static_cast\u0026lt;Derived*\u0026gt;(this)-\u0026gt;get(); } bool operator==(const handle\u0026amp; rhs) { return PyObject_RichCompareBool(get(), rhs.ptr(), Py_EQ); } // ... }; class handle : public object_api\u0026lt;handle\u0026gt; { PyObject* get() { return m_ptr; } }; class accessor : public object_api\u0026lt;accessor\u0026gt; { PyObject* get() { if (!m_ptr) { m_ptr = PyObject_GetItem(m_obj.ptr(), m_key); } return m_ptr; } }; This way, we\u0026rsquo;ve made the __getitem__ call lazy without introducing additional runtime overhead.\nConclusion We often say that C++ is too complex, with too many dazzling features that often clash with each other. But looking at it from another perspective, having many features means users have more choices and more design space, allowing them to assemble brilliant designs like the one described above. I think it would be difficult for another language to achieve such an effect. Perhaps this is the charm of C++.\nThis article concludes here. Thank you for reading, and feel free to discuss in the comments section.\n","permalink":"https://www.ykiko.me/en/articles/702197261/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eI participated in \u003ca href=\"https://summerofcode.withgoogle.com/programs/2024/projects/Ji2Mi97o\"\u003eGoogle Summer of Code 2024\u003c/a\u003e. The main task was to implement a \u003ca href=\"https://github.com/pybind/pybind11\"\u003epybind11\u003c/a\u003e-compatible interface for a \u003ca href=\"https://pocketpy.dev/\"\u003ePython interpreter\u003c/a\u003e. Saying \u0026ldquo;implement a compatible interface\u0026rdquo; is somewhat of an understatement — it was essentially a rewrite of pybind11, so I\u0026rsquo;ve been spending a lot of time reading through its source code lately.\u003c/p\u003e","title":"The Perfect Combination of Python and C++: Object Design in pybind11"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nSingleton Pattern is a common design pattern, often applied in scenarios such as configuration systems, logging systems, and database connection pools, where object uniqueness must be ensured. But can the Singleton Pattern truly guarantee a single instance? What are the consequences if uniqueness is not guaranteed?\nSince I\u0026rsquo;ve written this article, the answer is definitely no. There have been many related discussions on Zhihu, such as Will C++ Singleton Pattern across DLLs cause problems? and Singleton Pattern BUG when mixing dynamic and static libraries. However, most of them just post solutions after encountering problems, which are scattered and lack a systematic analysis of the root causes. Therefore, I wrote this article to discuss this issue in detail.\nClarifying the Problem First, let\u0026rsquo;s clarify the problem we are discussing, taking a common C++11 Singleton Pattern implementation as an example:\nclass Singleton { public: Singleton(const Singleton\u0026amp;) = delete; Singleton\u0026amp; operator=(const Singleton\u0026amp;) = delete; static Singleton\u0026amp; instance() { static Singleton instance; return instance; } private: Singleton() = default; }; We set the default constructor to private and explicitly delete the copy constructor and assignment operator, so users can only obtain our pre-created object through the instance function and cannot create an object themselves via a constructor. The use of a static local variable is to ensure thread-safe initialization of this variable.\nHowever, in reality, a singleton object is no different from a regular global variable. In C++, both belong to static storage duration, and the compiler treats them similarly (with slight differences in initialization methods). The so-called Singleton Pattern merely uses language-level mechanisms to prevent users from accidentally creating multiple objects.\nSo, the problem we are discussing can actually be equivalent to: Are global variables in C++ unique?\nA Single Definition First, we need to distinguish between variable declaration and definition. As we all know, variable definitions generally cannot be written in header files. Otherwise, if the header file is included by multiple source files, multiple definitions will occur, leading to a multiple definition of variable error during linking. Therefore, we usually use extern to declare variables in header files and then define them in the corresponding source files.\nSo, how does the compiler handle global variable definitions?\nSuppose we define a global variable\nint x = 1; This actually doesn\u0026rsquo;t generate any instructions; the compiler will add a symbol x to the symbol table of the compilation unit\u0026rsquo;s (each source file\u0026rsquo;s) compiled output. It reserves 4 bytes of space for the symbol x in static storage (the specific implementation might be the bss section or rdata section, etc.). The way this memory block is filled with data depends on the initialization method (static initialization or dynamic initialization).\nSince there is only one definition, this situation is certainly globally unique.\nMultiple Definitions As we all know, C++ doesn\u0026rsquo;t have an official build system, and different libraries using different build systems make it inconvenient to use them together (CMake is the de facto standard currently). This situation has made header-only libraries increasingly popular; include and use, who doesn\u0026rsquo;t like that? However, header-only also means all code is written in header files. How can one define variables in header files such that they can be directly included by multiple source files without causing linking errors?\nBefore C++17, there was no direct way. But there were some indirect methods, considering that inline functions or template functions can have their definitions appear in multiple source files, and the C++ standard guarantees they have the same address (for related discussion, refer to Where exactly does C++ code bloat occur?). Thus, by defining static local variables within these functions, it effectively becomes equivalent to defining variables in header files.\ninline int\u0026amp; x() { static int x = 1; return x; } template\u0026lt;typename T = void\u0026gt; int\u0026amp; y() { static int y = 1; return y; } After C++17, we can directly use inline to mark variables, allowing their definitions to appear in multiple source files. Using it, we can directly define variables in header files.\ninline int x = 1; We know that marking a variable as static also allows its definition to appear in multiple source files. So, what\u0026rsquo;s the difference between inline and static? The key difference is that static variables have internal linkage; each compilation unit has its own instance, and their addresses will differ across compilation units. Conversely, inline variables have external linkage, and the C++ standard guarantees that the address of the same inline variable will be identical across different compilation units.\nTruly a Singleton? Practice is the sole criterion for testing truth. Let\u0026rsquo;s experiment to see if the C++ standard is deceiving us.\nExample code is as follows\n// src.cpp #include \u0026lt;cstdio\u0026gt; inline int x = 1; void foo() { printf(\u0026#34;addreress of x in src: %p\\n\u0026#34;, \u0026amp;x); } // main.cpp #include \u0026lt;cstdio\u0026gt; inline int x = 1; extern void foo(); int main() { printf(\u0026#34;addreress of x in main: %p\\n\u0026#34;, \u0026amp;x); foo(); } Let\u0026rsquo;s start simple: compile these two source files together into a single executable, and try it on Windows (MSVC) and Linux (GCC) respectively.\n# Windows: addreress of x in main: 00007FF7CF84C000 addreress of x in src: 00007FF7CF84C000 # Linux: addreress of x in main: 0x404018 addreress of x in src: 0x404018 We can see that the addresses are indeed the same. Next, let\u0026rsquo;s try compiling src.cpp into a dynamic library, and main.cpp links to this library, then compile and run. Let\u0026rsquo;s see if it fails when dynamic libraries are involved, as many people claim. Note that on Windows, __declspec(dllexport) must be explicitly added to foo, otherwise the dynamic library will not export this symbol.\n# Windows: addreress of x in main: 00007FF72F3FC000 addreress of x in src: 00007FFC4D91C000 # Linux: addreress of x in main: 0x404020 addreress of x in src: 0x404020 Oh no, why are the situations different for Windows and Linux?\nSymbol Export Initially, I simply thought it was a problem with the dynamic library\u0026rsquo;s default symbol export rules. Because when GCC compiles dynamic libraries, it exports all symbols by default. MSVC, on the other hand, does the opposite; it exports no symbols by default, and all must be exported manually. Clearly, only when a symbol is exported can the linker \u0026lsquo;see\u0026rsquo; it and then merge symbols from different dynamic libraries.\nWith this idea, I tried to find ways to customize symbol export on GCC and eventually found Visibility - GCC Wiki. When compiling, using -fvisibility=hidden makes all symbols hidden (not exported) by default. Then, use __attribute__((visibility(\u0026quot;default\u0026quot;))) or its C++ equivalent [[gnu::visibility(\u0026quot;default\u0026quot;)]] to explicitly mark symbols that need to be exported. So I modified the code\n// src.cpp #include \u0026lt;cstdio\u0026gt; inline int x = 1; [[gnu::visibility(\u0026#34;default\u0026#34;)]] void foo () { printf(\u0026#34;addreress of x in src: %p\\n\u0026#34;, \u0026amp;x); } // main.cpp #include \u0026lt;cstdio\u0026gt; inline int x = 1; extern void foo(); int main() { printf(\u0026#34;addreress of x in main: %p\\n\u0026#34;, \u0026amp;x); foo(); } Note that I only exported foo for function calls; neither of the inline variables were exported. Compile and run\naddreress of x in main: 0x404020 addreress of x in src: 0x7f5a45513010 As we expected, the addresses are indeed different. This verifies that: symbol export is a necessary condition for the linker to merge symbols, but not a sufficient one. If, on Windows, changing the default symbol export rules could lead to inline variables having the same address, then sufficiency would be verified. When I excitedly started trying, I found that things were not that simple.\nI noticed that GCC on Windows (MinGW64 toolchain) still exports all symbols by default, so according to my hypothesis, the variable addresses should be the same. The results of the attempt are as follows\naddreress of x in main: 00007ff664a68130 addreress of x in src: 00007ffef4348110 It can be seen that the results are not the same. I didn\u0026rsquo;t understand why and considered it a compiler bug. I then switched to MSVC and found that CMake provides a CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS option, which, when enabled, automatically exports all symbols (implemented via dumpbin). So I tried it, compiled and ran, and the results are as follows\naddreress of x in main: 00007FF60B11C000 addreress of x in src: 00007FFEF434C000 Oh, the results are still different. I realized my hypothesis was flawed. But after searching for a long time, I couldn\u0026rsquo;t find out why. Later, I asked in a C++ group on TG and finally got the answer.\nSimply put, in ELF, it doesn\u0026rsquo;t distinguish which .so a symbol comes from; it uses whichever is loaded first. So, when encountering multiple inline variables, it uses the first one loaded. However, the symbol table of PE files specifies which dll a certain symbol is imported from. This means that as long as a variable is dllexported, that DLL will definitely use its own variable. Even if multiple dlls simultaneously dllexport the same variable, they cannot be merged; the DLL format on Windows restricts this from happening.\nThe problem of symbol resolution during dynamic library linking can actually be much more complex, with many other scenarios, such as actively loading dynamic libraries via functions like dlopen. If I have time later, I might write a dedicated article to analyze this issue, so I won\u0026rsquo;t elaborate further here.\nWhat if Not Unique? Why is it necessary to ensure the uniqueness of \u0026lsquo;singleton\u0026rsquo; variables? Let\u0026rsquo;s take the C++ standard library as an example.\nAs we all know, type_info can be used to distinguish different types at runtime, and type-erasure facilities like std::function and std::any in the standard library rely on it for implementation. Its constructor and operator= are deleted, so we can only obtain a reference to the corresponding type_info object via typeid(T), with object creation handled by the compiler.\nWell, doesn\u0026rsquo;t that perfectly fit the Singleton Pattern? The next question is, how does the compiler determine if two type_info objects are the same? A typical implementation is as follows\n#if _PLATFORM_SUPPORTS_UNIQUE_TYPEINFO bool operator==(const type_info\u0026amp; __rhs) const { return __mangled_name == __rhs.__mangled_name; } #else bool operator==(const type_info\u0026amp; __rhs) const { return __mangled_name == __rhs.__mangled_name || strcmp(__mangled_name, __rhs.__mangled_name) == 0; } #endif The code above is easy to understand: if the address of type_info is guaranteed to be unique, then directly comparing __mangled_name is sufficient (since it\u0026rsquo;s const char*, it\u0026rsquo;s a pointer comparison). Otherwise, compare the addresses first, then the type names. Specifically, for the implementations of the three major standard libraries:\nlibstdc++ uses __GXX_MERGED_TYPEINFO_NAMES to control whether it\u0026rsquo;s enabled. libc++ uses _LIBCPP_TYPEINFO_COMPARATION_IMPLEMENTATION to determine the approach (there\u0026rsquo;s also a special BIT_FLAG mode). msvc stl (crt/src/vcruntime/std_type_info.cpp) always uses the second approach due to the aforementioned DLL limitations on Windows. The purpose of this example is to illustrate that the uniqueness of a singleton variable\u0026rsquo;s address affects how we write our code. If it\u0026rsquo;s not unique, we might be forced to write defensive code, which could impact performance, and if not written, it could even directly lead to logical errors.\nSolution Just raising problems isn\u0026rsquo;t enough; they need to be solved. How can we ensure singleton uniqueness?\nOn Linux, it\u0026rsquo;s simple: if the same variable appears in multiple dynamic libraries, you just need to ensure that all these dynamic libraries make this symbol externally visible. And the compiler\u0026rsquo;s default behavior is to make symbols externally visible, so there\u0026rsquo;s generally no need to worry about this issue.\nWhat about Windows? It\u0026rsquo;s very troublesome. You must ensure that only one DLL uses dllexport to export this symbol, and all other DLLs must use dllimport. This is often not easy to do; you might forget which DLL is responsible for exporting this symbol as you write code. What to do then? The solution is to use a dedicated DLL to manage all singleton variables. This means this DLL is responsible for dllexporting all singleton variables, and all other DLLs simply dllimport them. Subsequent additions and modifications are then made within this DLL, making it easier to manage.\nThis concludes the article. Honestly, I\u0026rsquo;m not sure if the discussion above covers all scenarios. If there are any errors, feel free to leave a comment for discussion.\n","permalink":"https://www.ykiko.me/en/articles/696878184/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003e\u003cstrong\u003eSingleton Pattern\u003c/strong\u003e is a common design pattern, often applied in scenarios such as configuration systems, logging systems, and database connection pools, where object uniqueness must be ensured. But can the Singleton Pattern truly guarantee a single instance? What are the consequences if uniqueness is not guaranteed?\u003c/p\u003e","title":"Is the singleton pattern in C++ truly a 'singleton'?"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nCompiler Explorer is a very popular online C++ compiler, which can be used to test different compilation and execution environments, or to share code. As a C++ enthusiast, I interact with it almost every day, and its frequency of use far exceeds my imagination. At the same time, I am also a heavy VSCode user, completing almost everything within VSCode. Considering that I often write code locally and then copy it to Compiler Explorer, it always felt uncomfortable. Sometimes I would directly modify it on its web editor, but without code completion, that was also uncomfortable. Therefore, I collaborated with @iiirhe to write this extension Compiler Explorer for VSCode, which integrates Compiler Explorer into VSCode based on the API provided by Compiler Explorer, allowing users to directly use Compiler Explorer\u0026rsquo;s features within VSCode.\nYou can now search for this extension in the VSCode Marketplace.\nDemo Single File Support Let\u0026rsquo;s introduce them from top to bottom.\nThe functions of these three buttons are, in order:\nCompile All: Compiles all compiler instances Add New: Adds a new compiler instance Share Link: Generates a link based on the current compiler instance and copies it to the clipboard The functions of these four buttons are, in order:\nAdd CMake: Adds a CMake compiler instance (more details later) Clear All: Closes all webview panels used for display Load Link: Loads compiler instance information based on the input link Remove All: Removes all compiler instances The functions of these three buttons are, in order:\nRun: Compiles this compiler instance Clone: Clones this compiler instance Remove: Removes this compiler instance The following parameters are used to configure compiler instances:\nCompiler: Click the button on the right to select the compiler version Input: Select the source code file, default is active (the currently active editor) Output: Output file for compilation results, webview by default Options: Compilation options, click the button on the right to open the input box Execute Arguments: Arguments passed to the executable Stdin: Buffer for standard input Filters: Some options Multi-file Support You can add a CMake compiler instance using the Add CMake button, which can be used to compile multiple files.\nMost options are the same as for single-file compiler instances, with two additional ones:\nCMake Arguments: Arguments passed to CMake Source: Path to the folder containing CMakeLists.txt Note that since multi-file compilation requires uploading all used files to the server, we will by default read all files (regardless of extension) in the directory you specify. Therefore, please do not specify folders with too many files for now. Options to allow users to filter out some files might be added later, but are not available yet.\nUser Settings compiler-explorer.default.options: Default parameters when creating a compiler using the + sign\n\u0026#34;compiler-explorer.default.options\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;object\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;The default compiler configuration\u0026#34;, \u0026#34;default\u0026#34;: { \u0026#34;compiler\u0026#34;: \u0026#34;x86-64 gcc 13.2\u0026#34;, \u0026#34;language\u0026#34;: \u0026#34;c++\u0026#34;, \u0026#34;options\u0026#34;: \u0026#34;-std=c++17\u0026#34;, \u0026#34;exec\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;stdin\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;cmakeArgs\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;src\u0026#34;: \u0026#34;workspace\u0026#34;, \u0026#34;filters\u0026#34;: { \u0026#34;binaryObject\u0026#34;: false, \u0026#34;binary\u0026#34;: false, \u0026#34;execute\u0026#34;: false, \u0026#34;intel\u0026#34;: true, \u0026#34;demangle\u0026#34;: true, \u0026#34;labels\u0026#34;: true, \u0026#34;libraryCode\u0026#34;: true, \u0026#34;directives\u0026#34;: true, \u0026#34;commentOnly\u0026#34;: true, \u0026#34;trim\u0026#34;: false, \u0026#34;debugCalls\u0026#34;: false } } } compiler-explorer.default.color: Used to specify the color for highlighting assembly code\n\u0026#34;compiler-explorer.default.color\u0026#34;:{ \u0026#34;symbol\u0026#34;: \u0026#34;#61AFEF\u0026#34;, \u0026#34;string\u0026#34;: \u0026#34;#98C379\u0026#34;, \u0026#34;number\u0026#34;: \u0026#34;#D19A66\u0026#34;, \u0026#34;register\u0026#34;: \u0026#34;#E5C07B\u0026#34;, \u0026#34;instruction\u0026#34;: \u0026#34;#C678DD\u0026#34;, \u0026#34;comment\u0026#34;: \u0026#34;#7F848E\u0026#34;, \u0026#34;operator\u0026#34;: \u0026#34;#ABB2BF\u0026#34; } compiler-explorer.default.url: The default link loaded when opening the extension, empty by default\n\u0026#34;compiler-explorer.default.url\u0026#34;: { \u0026#34;default\u0026#34;: \u0026#34;\u0026#34; } Feedback This extension is still in its early stages. If you encounter any problems during use, or have any suggestions, please feel free to leave a message and discuss it on GitHub. Alternatively, join the QQ group: 662499937.\nhttps://qm.qq.com/q/DiO6rvnbHi (QR code auto-recognition)\nAdditionally, the Output window may provide some useful information, which you can check.\n","permalink":"https://www.ykiko.me/en/articles/694365783/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003e\u003ca href=\"https://godbolt.org/\"\u003eCompiler Explorer\u003c/a\u003e is a very popular online C++ compiler, which can be used to test different compilation and execution environments, or to share code. As a C++ enthusiast, I interact with it almost every day, and its frequency of use far exceeds my imagination. At the same time, I am also a heavy VSCode user, completing almost everything within VSCode. Considering that I often write code locally and then copy it to Compiler Explorer, it always felt uncomfortable. Sometimes I would directly modify it on its web editor, but without code completion, that was also uncomfortable. Therefore, I collaborated with \u003ca href=\"https://www.zhihu.com/people/32ffceca937677f7950b64e5186bb998\"\u003e@iiirhe\u003c/a\u003e to write this extension \u003ca href=\"https://marketplace.visualstudio.com/items?itemName=ykiko.vscode-compiler-explorer\"\u003eCompiler Explorer for VSCode\u003c/a\u003e, which integrates Compiler Explorer into VSCode based on the \u003ca href=\"https://github.com/compiler-explorer/compiler-explorer/blob/main/docs/API.md\"\u003eAPI\u003c/a\u003e provided by Compiler Explorer, allowing users to directly use Compiler Explorer\u0026rsquo;s features within VSCode.\u003c/p\u003e","title":"Super easy-to-use C++ Online Compiler (VSCode Version)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nApplication Binary Interface, or ABI as we commonly call it, is a concept that feels both familiar and unfamiliar. Familiar in what sense? It\u0026rsquo;s often discussed when troubleshooting, frequently mentioned in articles, and sometimes we even have to deal with compatibility issues it causes. Unfamiliar in what sense? If someone asks you what an ABI is, you\u0026rsquo;ll find that you know what it\u0026rsquo;s about, but describing it in precise language is quite difficult. In the end, you might just resort to saying, as WIKI does: an ABI is an interface between two binary program modules. Is there a problem with that? No, as a general description, it\u0026rsquo;s sufficient. But it can feel a bit hollow.\nThis situation is not uncommon in the field of Computer Science. The author encountered the exact same situation in a previous article discussing reflection. Fundamentally, CS is not a discipline that strives for absolute rigor; many concepts lack strict definitions and are more often conventional understandings. So, instead of getting bogged down in definitions, let\u0026rsquo;s look at what these so-called binary interfaces actually are and what factors affect their stability.\nCPU \u0026amp; OS The final executable file ultimately runs on a specific operating system on a specific CPU. If the CPU instruction sets are different, it will certainly lead to binary incompatibility. For example, programs on ARM cannot run directly on x64 processors (unless some virtualization technology is used). What if the instruction sets are compatible? For instance, x64 processors are compatible with the x86 instruction set. Does that mean an x86 program can definitely run on an x64 operating system? This is where the operating system comes into play. Specifically, factors such as Object File Format, Data Representation, Function Calling Convention, and Runtime Library must be considered. These points can be regarded as ABI regulations at the operating system level. We will discuss the fourth point in a dedicated section later. Below, taking the x64 platform as an example, we will discuss the first three points.\nx64, x86-64, x86_64, AMD64, and Intel 64 all refer to the 64-bit version of the x86 instruction set.\nThere are two main common ABIs on the x64 platform:\nWindows x64 ABI for 64-bit Windows operating systems x86-64 System V ABI for 64-bit Linux and various UNIX-like operating systems Calling a function from a dynamic library can be simply viewed as the following three steps:\nParse the dynamic library according to a certain format. Look up the function address from the parsed result based on the symbol name. Pass function parameters and call the function. Object File Format How to parse a dynamic library? This is where the ABI\u0026rsquo;s regulations on Object File Format come into play. If you want to write your own linker, the final executable file must meet the format requirements of the corresponding platform. Windows x64 uses the PE32+ executable file format, which is the 64-bit version of PE32 (Portable Executable 32-bit). The System V ABI uses the ELF (Executable Linkable Format) executable file format. By using parsing libraries (or writing your own if interested), such as pe-parse and elfio, to parse actual executable files and obtain their symbol tables, we can get the mapping between function names and function addresses.\nData Representation After obtaining the function address, the next step is how to call it. Before calling, parameters must be passed, right? When passing parameters, special attention must be paid to the consistency of Data Representation. What does this mean?\nSuppose I compile the following file into a dynamic library:\nstruct X{ int a; int b; }; int foo(X x){ return x.a + x.b; } Then, a subsequent version upgrade changes the structure content, and the structure definition seen in the user\u0026rsquo;s code becomes:\nstruct X{ int a; int b; int c; }; And then it still tries to link to the dynamic library compiled from the old version code and call its function:\nint main(){ int n = foo({1, 2, 3}); printf(\u0026#34;%d\\n\u0026#34;, n); } Will it succeed? Of course, it will fail. This type of error can be considered a so-called ODR (One Definition Rule) violation. More examples will be discussed in later sections.\nThe above situation is an ODR violation caused by the user actively changing the code. But what if I don\u0026rsquo;t actively change the code, can I ensure the stability of the structure layout? This is guaranteed by the Data Representation in the ABI. For example, it specifies the size and alignment of basic types. Windows x64 specifies long as 32 bits, while System V specifies long as 64 bits. It also specifies the size and alignment of struct and union, and so on.\nNote that the C language standard still does not specify an ABI. For the System V ABI, it is primarily written using C language terminology and concepts, so it can be considered to provide an ABI for the C language. The Windows x64 ABI does not have a very clear boundary between C and C++.\nFunction Calling Convention Next, we come to the step of passing function parameters. We know that a function is just a piece of binary data. Executing a function simply means jumping to the function\u0026rsquo;s entry address, executing that piece of code, and then jumping back when finished. Parameter passing is nothing more than finding a place to store data, so that this location can be accessed to retrieve data both before and after the call. What locations can be chosen? There are mainly four options:\nglobal (global variables) heap (heap) register (registers) stack (stack) Using global variables for parameter passing sounds magical, but in practice, when writing code, parameters that need to be passed repeatedly, such as config, are often changed to global variables. However, it\u0026rsquo;s clear that not all parameters are suitable for global variable passing, and thread safety needs to be considered even more carefully.\nUsing the heap for parameter passing also seems incredible, but in fact, C++20\u0026rsquo;s stackless coroutines store coroutine states (function parameters, local variables) on the heap. However, for ordinary function calls, if dynamic memory allocation is required every time parameters are passed, it is indeed a bit extravagant.\nSo we mainly consider using registers and the stack for parameter passing. Having more options is always good, but not here. If the caller thinks parameters should be passed via registers, it stores the parameters in registers. But the callee thinks parameters should be passed via the stack, so it retrieves data from the stack. Inconsistency arises, and it\u0026rsquo;s very likely that garbage values are read from the stack, leading to logical errors in the code and program crashes.\nHow to ensure that the caller and callee pass parameters to the same location? I believe you\u0026rsquo;ve already guessed: this is where the Function Calling Convention comes into play.\nSpecifically, the calling convention specifies the following:\nOrder of function parameter passing: left-to-right or right-to-left? Method of function parameter and return value passing: via stack or registers? Which registers remain unchanged before and after the caller\u0026rsquo;s call? Who is responsible for cleaning up the stack frame: the caller or the callee? How to handle C language variadic functions? ... In 32-bit programs, there were many calling conventions, such as __cdecl, __stdcall, __fastcall, __thiscall, etc., and programs at that time suffered greatly from compatibility issues. In 64-bit programs, unification has largely been achieved. There are mainly two calling conventions, those specified by the Windows x64 ABI and the x86-64 System V ABI respectively (though they don\u0026rsquo;t have formal names). It needs to be emphasized that the function parameter passing method is only related to the calling convention, not to the code optimization level. You wouldn\u0026rsquo;t want code compiled with different optimization levels to fail when linked together, would you?\nIntroducing specific regulations can be tedious. Interested readers can refer to the relevant sections of the corresponding documentation. Below, we mainly discuss some more interesting topics.\nNote: The following discussions only apply when function calls actually occur. If a function is fully inlined, the act of passing function parameters does not happen. Currently, C++ code inlining optimization mainly occurs within the same compilation unit (single file). For code across compilation units, LTO (Link Time Optimization) must be enabled. Code across dynamic libraries cannot be inlined yet.\nPassing struct values smaller than 16 bytes is more efficient than passing by reference. This statement has been around for a long time, but I\u0026rsquo;ve never found the basis for it. Finally, while researching calling conventions recently, I found the reason. First, if the struct size is less than or equal to 8 bytes, it can be directly placed into a 64-bit register for parameter passing. Passing parameters via registers involves fewer memory accesses than passing by reference, making it more efficient, which is fine. What about 16 bytes? The System V ABI allows a 16-byte struct to be split into two 8-byte parts and then passed using registers separately. In this case, passing by value is indeed more efficient than passing by reference. Observe the following code:\n#include \u0026lt;cstdio\u0026gt; struct X { size_t x; size_t y; }; extern void f(X); extern void g(const X\u0026amp;); int main() { f({1, 2}); // pass by value g({1, 2}); // pass by reference } The generated code is as follows:\nmain: sub rsp, 24 mov edi, 1 mov esi, 2 call f(X) movdqa xmm0, XMMWORD PTR .LC0[rip] mov rdi, rsp movaps XMMWORD PTR [rsp], xmm0 call g(X const\u0026amp;) xor eax, eax add rsp, 24 ret .LC0: .quad 1 .quad 2 The System V ABI specifies that the first six integer parameters can be passed using rdi, rsi, rdx, rcx, r8, r9 registers, respectively. The Windows x64 ABI specifies that the first four integer parameters can be passed using rcx, rdx, r8, r9 registers, respectively. If registers are exhausted, parameters are passed via the stack. Integer parameters include char, short, int, long, long long, and other basic integer types, plus pointer types. Floating-point parameters and SIMD type parameters have dedicated registers, which are not covered in detail here.\nIt can be seen that 1 and 2 are passed to function f via registers edi and esi respectively, while g is passed the address of a temporary variable. However, this is only for the System V ABI. For the Windows x64 ABI, if the size of a struct is greater than 8 bytes, it can only be passed by reference. The same code, compiled on Windows, yields the following result:\nmain: sub rsp, 56 lea rcx, QWORD PTR [rsp+32] mov QWORD PTR [rsp+32], 1 mov QWORD PTR [rsp+40], 2 call void f(X) lea rcx, QWORD PTR [rsp+32] mov QWORD PTR [rsp+32], 1 mov QWORD PTR [rsp+40], 2 call void g(X const \u0026amp;) xor eax, eax add rsp, 56 ret 0 It can be seen that the code generated for both function calls is exactly the same. This means that for the Windows x64 ABI, whether a struct larger than 8 bytes is passed by reference or by value, the generated code is identical.\nunique_ptr and raw_ptr have exactly the same efficiency. Well, before this, I always firmly believed this, after all, unique_ptr is just a simple wrapper around a raw pointer. It wasn\u0026rsquo;t until I watched the thought-provoking CPPCON talk There are no zero-cost abstractions that I realized I was completely taking it for granted. Here, we won\u0026rsquo;t discuss the extra overhead caused by exceptions (destructors requiring the compiler to generate additional code for stack frame cleanup). Let\u0026rsquo;s just discuss whether a C++ object (smaller than 8 bytes) can be passed via registers. For a completely trivial type, it\u0026rsquo;s fine; it behaves almost exactly like a C language struct. But what if it\u0026rsquo;s not trivial?\nFor example, if a custom copy constructor is defined, can it still be placed in a register? Logically, it cannot. Why? We know that C++ allows us to take the address of function parameters. If an integer parameter is passed via a register, where does the result of taking its address come from? Let\u0026rsquo;s experiment and find out:\n#include \u0026lt;cstdio\u0026gt; extern void f(int\u0026amp;); int g(int x) { f(x); return x; } The corresponding assembly generated is as follows:\ng(int): sub rsp, 24 mov DWORD PTR [rsp+12], edi lea rdi, [rsp+12] call f(int\u0026amp;) mov eax, DWORD PTR [rsp+12] add rsp, 24 ret It can be seen that the value in edi (used to pass the first integer parameter) is copied to the address rsp+12, which is on the stack, and then this address is passed to f. This means that if a function parameter is passed via a register, and its address is needed in some situations, the compiler will copy this parameter to the stack. However, users cannot observe these copy processes, because their copy constructors are trivial. Any optimization that does not affect the final execution result of the code complies with the as-if rule.\nNow, if this object has a user-defined copy constructor, and assuming the parameter is passed via a register, it might lead to additional copy constructor calls, and the user can observe this side effect. This is clearly unreasonable, so objects with user-defined copy constructors are not allowed to be passed via registers. What about passing via the stack? In fact, similar copying dilemmas would also be encountered. Therefore, such objects can ultimately only be passed by reference. Note that explicitly marking a copy constructor as delete also counts as a user-defined copy constructor.\nSo for unique_ptr, it can only be passed by reference. Regardless of whether you write the function signature as void f(unique_ptr\u0026lt;int\u0026gt;) or void f(unique_ptr\u0026lt;int\u0026gt;\u0026amp;), the binary code generated at the parameter passing point will be the same. However, raw pointers can be safely passed via registers. In summary, the efficiency of unique_ptr and raw pointers is not exactly the same.\nIn reality, the actual situation of whether a non-trivial C++ object can be passed via registers is more complex. Relevant details can be found in the corresponding sections of the respective ABIs, and are not described in detail here. Furthermore, it is not entirely clear whether the rules for how C++ objects are passed belong to the operating system\u0026rsquo;s ABI or the C++ compiler\u0026rsquo;s ABI.\nC++ Standard Finally, we\u0026rsquo;ve covered the guarantees at the operating system level. Since it leans towards the low-level, involving a lot of assembly, it might be difficult for readers not so familiar with assembly. However, the following content is basically unrelated to assembly, so you can read it with confidence.\nWe all know that the C++ standard does not explicitly specify an ABI, but it\u0026rsquo;s not entirely without rules. It does have some requirements for compiler implementations, such as:\nStruct member addresses increase according to their declaration order (explanation), which ensures that compilers do not reorder struct members. Structs satisfying the Standard Layout constraint must be layout-compatible with corresponding C structs. Structs satisfying the Trivially Copyable constraint can be copied using memmove or memcpy to obtain an identical new object. ... Furthermore, as C++ continues to release new versions, if I compile the same code using a new standard and an old standard respectively, will the results be the same (ignoring the impact of using macros to control C++ versions for conditional compilation)? This depends on the C++ standard\u0026rsquo;s guarantees for ABI compatibility. In fact, the C++ standard strives to ensure backward compatibility. That is, for two pieces of code, the code compiled with the old standard and the new standard should be exactly the same.\nHowever, there are a very few exceptions (I could only find these; feel free to add more in the comments):\nC++17 made noexcept part of the function type, which affects the mangling name generated for the function. C++20 introduced no_unique_address, which MSVC still doesn\u0026rsquo;t directly support because it would cause an ABI break. More often, new C++ versions introduce new language features along with new ABIs, without affecting old code. For example, two new features added in C++23:\nExplicit Object Parameter Before C++23, there was no legal way to get the address of a member function. The only thing we could do was get a member pointer (for what a member pointer is, you can refer to this article):\nstruct X { void f(int); }; auto p = \u0026amp;X::f; // p is a pointer to member function of X // type of p is void (X::*)(int) To use a member function as a callback, you could only wrap it with a lambda expression:\nstruct X { void f(int); }; using Fn = void(*)(X*, int); Fn p = [](A* self, int x) { self-\u0026gt;f(x); }; This is actually quite cumbersome and unnecessary, and this wrapping layer might lead to additional function call overhead. To some extent, this is a historical issue; on 32-bit systems, the calling convention for member functions was somewhat special (the well-known thiscall), and C++ did not have calling convention-related content, so a member function pointer was created. Old code cannot be changed for ABI compatibility, but new code can. C++23 added explicit object parameters, so we can now clearly define how this is passed, and even use pass-by-value:\nstruct X { // The \u0026#39;this\u0026#39; here is just a marker to distinguish it from old syntax void f(this X self, int x); // pass by value void g(this X\u0026amp; self, int x); // pass by reference }; Functions marked with explicit this can also directly obtain their function addresses, just like ordinary functions:\nauto f = \u0026amp;X::f; // type of f is void(*)(X, int) auto g = \u0026amp;X::g; // type of g is void(*)(X*, int) So new code can adopt this writing style, which only brings benefits and no drawbacks.\nStatic Operator() Some function objects in the standard library have no members other than an operator(), such as std::hash:\ntemplate \u0026lt;class T\u0026gt; struct hash { std::size_t operator()(T const\u0026amp; t) const; }; Although this is an empty struct, because operator() is a member function, it has an implicit this parameter. In the case of non-inlined calls, a useless null pointer still needs to be passed. This problem was solved in C++23, where static operator() can be directly defined to avoid this issue:\ntemplate \u0026lt;class T\u0026gt; struct hash { static std::size_t operator()(T const\u0026amp; t); }; static means this is a static function, and its usage remains the same as before:\nstd::hash\u0026lt;int\u0026gt; h; std::size_t n = h(42); However, hash is just an example here. In reality, standard library code will not be modified for ABI compatibility. New code can use this feature to avoid unnecessary this passing.\nCompiler Specific Now for the main event: the implementation-defined parts. This section seems to be the most criticized content. But is that really the case? Let\u0026rsquo;s look at it piece by piece.\nDe Facto Standard Some abstractions in C++ ultimately need to be implemented, and if the standard doesn\u0026rsquo;t specify how to implement them, then this part is left to the compiler\u0026rsquo;s discretion. For example:\nName mangling rules (for implementing function overloading and template functions) Layout of complex types (e.g., those with virtual inheritance) Virtual function table layout RTTI implementation Exception handling ... If compilers implement these parts differently, then the binary products compiled by different compilers will naturally be incompatible and cannot be mixed.\nIn the 1990s, which was the golden age of C++ development, various vendors were dedicated to implementing their own compilers and expanding their user base, competing for users. Due to this competition, it was common for different compilers to use different ABIs. As time progressed, most of them have exited the historical stage, either stopping updates or only maintaining existing versions, no longer keeping up with new C++ standards. After the wave, only GCC, Clang, and MSVC remain as the three major compilers.\nToday, the C++ compiler ABI has largely been unified, with only two main ABIs:\nItanium C++ ABI, with publicly available documentation MSVC C++ ABI, which does not have official documentation, but there is an unofficial version available Although named Itanium C++ ABI, it is actually a cross-architecture ABI for C++. Almost all C++ compilers except MSVC use it, although there are slight differences in exception handling details. Historically, C++ compilers handled the C++ ABI in their own ways. When Intel heavily promoted Itanium, they wanted to avoid incompatibility issues, so they created a standardized ABI for all C++ vendors on Itanium. Later, for various reasons, GCC needed to modify its internal ABI, and given that it already supported the Itanium ABI (for Itanium processors), they chose to extend the ABI definition to all architectures instead of creating their own. Since then, all major compilers except MSVC have adopted the cross-architecture Itanium ABI, and even though the Itanium processor itself no longer receives maintenance, the ABI is still maintained.\nOn Linux platforms, both GCC and Clang use the Itanium ABI, so code compiled by these two compilers is interoperable and can be linked together and run. On Windows platforms, the situation is slightly more complex. The default MSVC toolchain uses its own ABI. However, in addition to the MSVC toolchain, GCC has also been ported to Windows, known as the MinGW toolchain, which still uses the Itanium ABI. These two ABIs are incompatible, and code compiled by them cannot be directly linked together. Clang on Windows can control which of these two ABIs to use via compilation options.\nNote: Since MinGW runs on Windows, its generated code\u0026rsquo;s calling convention naturally tries to comply with the Windows x64 ABI, and the final executable file format is also PE32+. However, the C++ ABI it uses is still the Itanium ABI, and there is no necessary connection between the two.\nConsidering the huge C++ codebase, these two C++ ABIs have largely stabilized and will not change further. Therefore, we can now actually say that C++ compilers have stable ABIs. How about that, isn\u0026rsquo;t it different from the mainstream view online? But the facts are indeed right here.\nMSVC has guaranteed ABI stability from its 2015 version onwards. GCC started using the Itanium ABI from 3.4 and has guaranteed ABI stability since then.\nWorkaround Although the basic ABI no longer changes, upgrading compiler versions can still lead to ABI breaks in compiled libraries. Why?\nThis is not difficult to understand. First, compilers are also software, and all software can have bugs. Sometimes, to fix bugs, some ABI breaks are forced (usually explained in detail in the release notes of new versions). For example, GCC has a compilation option -fabi-version specifically to control these different versions. Some of its contents are as follows:\nVersion 7 first appeared in G++ 4.8, treating nullptr_t as a built-in type and correcting the name encoding of lambda expressions in default argument scope. Version 8 first appeared in G++ 4.9, correcting the substitution behavior of function types with function CV qualifiers. Version 9 first appeared in G++ 5.2, correcting the alignment of nullptr_t. Additionally, users might have written some special code to work around compiler bugs, which we generally call a workaround. When the bug is fixed, these workarounds might have an adverse effect, leading to ABI incompatibility.\nImportant Options In addition, compilers provide a series of options to control their behavior, and these options may affect the ABI, such as:\n-fno-strict-aliasing: Disable strict aliasing -fno-exceptions: Disable exceptions -fno-rtti: Disable RTTI ... When linking libraries compiled with different options, compatibility issues must be especially considered. For example, if your code disables strict aliasing, but a dependent external library enables strict aliasing, pointer propagation errors are very likely to occur, leading to program errors.\nI recently encountered this situation. I was writing Python wrappers for some LLVM functions using pybind11. Pybind11 requires RTTI to be enabled, but LLVM\u0026rsquo;s default build disables exceptions and RTTI, so the code couldn\u0026rsquo;t link together. Initially, I compiled a version of LLVM with RTTI enabled, which caused binary bloat. Later, I realized this was unnecessary. I wasn\u0026rsquo;t actually using RTTI information for LLVM types; it was just that because they were written in the same file, the compiler thought I was using it. So, I solved it by compiling the LLVM-dependent part of the code into a separate dynamic library and then linking it with the pybind11-dependent part of the code.\nRuntime \u0026amp; Library This subsection mainly discusses the ABI stability of libraries that a C++ program depends on. Ideally, for an executable program, replacing an old version of a dynamic library with a new version should not affect its operation.\nThe three major C++ compilers each have their own standard libraries:\nMSVC corresponds to msvc stl GCC corresponds to libstdc++ Clang corresponds to libc++ As we mentioned earlier, the C++ standard tries to ensure ABI backward compatibility. Even with major updates like C++98 to C++11, the ABI of old code was not significantly affected, and there are no documented ABI Break Change wording changes at all.\nHowever, for the C++ standard library, the situation is somewhat different. From C++98 to C++11, the standard library underwent a major ABI Break Change. The standard library modified the requirements for some container implementations, such as std::string. This led to the widely used COW implementation no longer conforming to the new standard, so a new implementation had to be adopted in C++11. This resulted in an ABI break between C++98 and C++11 standard libraries. However, since then, the standard library\u0026rsquo;s ABI has generally been relatively stable, and each implementation tries to ensure this. Refer to the relevant pages for stl, libstdc++, and libc++ for detailed information.\nAdditionally, since RTTI and Exception can generally be turned off, these two features might be handled by separate runtime libraries, such as MSVC\u0026rsquo;s vcruntime and libc++\u0026rsquo;s libcxxabi.\nIt\u0026rsquo;s worth mentioning that libcxxabi also includes support for static local variable initialization, primarily involving the functions __cxa_guard_acquire and __cxa_guard_release. These are used to ensure that static local variables are initialized only once at runtime. If you are curious about the specific implementation, you can consult the relevant source code.\nThere are also runtime libraries responsible for some low-level functions, such as libgcc and compiler-rt.\nBesides the standard library, C++ programs generally also need to link to the C runtime:\nOn Windows, CRT must be linked. On Linux, depending on the distribution and compilation environment used, it might link to glibc or musl. The C runtime, in addition to providing the implementation of the C standard library, is also responsible for program initialization and cleanup. It is responsible for calling the main function and managing the program\u0026rsquo;s startup and termination process, including performing necessary initialization and cleanup tasks. For most software running on an operating system, linking to it is essential.\nThe ideal state is naturally to upgrade these corresponding runtime library versions when upgrading the compiler to avoid unnecessary trouble. However, in actual projects, dependencies can be very complex, potentially triggering a chain reaction.\nUser Code Finally, let\u0026rsquo;s talk about ABI issues caused by changes in user code itself. If you want to distribute your library in binary form, then ABI compatibility becomes very important once the user base reaches a certain size.\nIn the first subsection discussing calling conventions, we mentioned ABI incompatibility caused by changes in struct definitions. So, what if you want to ensure ABI compatibility while also leaving room for future expansion? The answer is to handle it at runtime:\nstruct X{ size_t x; size_t y; void* reserved; }; A void* pointer is used to reserve space for future extensions. Based on it, different versions can be distinguished, for example:\nvoid f(X* x) { Reserved* r = static_cast\u0026lt;Reserved*\u0026gt;(x-\u0026gt;reserved); if (r-\u0026gt;version == ...) { // do something } else if (r-\u0026gt;version == ...) { // do something else } } This way, new features can be added without affecting existing code.\nWhen exposing interfaces, special attention should be paid to types with custom destructors in function parameters. Suppose we want to expose std::vector as a return value. For example, compile the simple code below into a dynamic library, and use the \\MT option to statically link the Windows CRT.\n__declspec(dllexport) std::vector\u0026lt;int\u0026gt; f() { return {1, 2, 3}; } Then we write a source file, link it to the dynamic library just compiled, and call this function:\n#include \u0026lt;vector\u0026gt; std::vector\u0026lt;int\u0026gt; f(); int main() { auto vec = f(); } Compile and run, and it crashes directly. If we recompile the dynamic library with \\MT disabled and then run it, everything works fine. It\u0026rsquo;s strange, why would a dependent dynamic library statically linking CRT cause the code to crash?\nThinking about the code above, it\u0026rsquo;s not hard to find that vec\u0026rsquo;s construction actually happens inside the dynamic library, while its destruction happens inside the main function. More precisely, memory is allocated inside the dynamic library and freed inside the main function. However, each CRT has its own malloc and free (similar to memory between different processes). You cannot free memory allocated by CRT A with CRT B. This is the root of the problem. So, after not statically linking to CRT, everything is fine; they all use the same malloc and free. This applies not only to Windows CRT but also to glibc or musl on Linux. Example code is available here; feel free to try it yourself.\nextern \u0026ldquo;C\u0026rdquo; The situation described above can occur for any C++ type with a custom destructor. For various reasons, constructor and destructor calls crossing dynamic library boundaries break the RAII contract, leading to serious errors.\nHow to solve this? Naturally, function parameters and return values should not use types with destructors, but only POD types.\nFor example, the above example needs to be changed to:\nusing Vec = void*; __declspec(dllexport) Vec create_Vec() { return new std::vector\u0026lt;int\u0026gt;; } __declspec(dllexport) void destroy_Vec(Vec vec) { delete static_cast\u0026lt;std::vector\u0026lt;int\u0026gt;*\u0026gt;(vec); } And then usage would be like this:\nusing Vec = void*; Vec create_Vec(); void destroy_Vec(Vec vec); int main() { Vec vec = create_Vec(); destroy_Vec(vec); } In fact, we are encapsulating it in a C-style RAII manner. Furthermore, if you want to solve the linking problem between C and C++ due to different mangling, you can use extern \u0026quot;C\u0026quot; to decorate the function:\nextern \u0026#34;C\u0026#34; { Vec create_Vec(); void destroy_Vec(Vec vec); } This way, C language can also use the exported functions mentioned above.\nHowever, if the codebase is large, encapsulating all functions into such an API is clearly unrealistic. In that case, C++ types must be exposed in the exported interfaces, and dependencies must be carefully managed (e.g., all dependent libraries are statically linked). The specific choice depends on the project size and complexity.\nConclusion Here, we have finally discussed the main factors affecting the ABI of C++ programs. It is clear that the C++ standard, compiler vendors, and runtime libraries are all striving to maintain ABI stability. The C++ ABI is not as bad or unstable as many people claim. For small projects, static linking with source code almost eliminates any compatibility issues. For large, long-standing projects, due to complex dependencies, upgrading certain library versions might cause program crashes. But this is not C++\u0026rsquo;s fault. Managing large projects goes beyond the language level itself; one cannot expect to solve these problems by simply changing programming languages. In fact, learning software engineering is about learning how to deal with immense complexity and how to ensure the stability of complex systems.\nThe article ends here. Thank you for reading. The author\u0026rsquo;s expertise is limited, and this article covers a wide range of topics. Please feel free to leave comments and discuss any errors.\nSome other references:\nAn Overview of ABI in Different Platforms Windows x64 ABI System V x64 ABI Itanium C++ ABI MinGW x64 Software Convention MacOS x64 ABI ARM ABI Windows ARM64 ABI RISCV ABI Go Internal ABI ","permalink":"https://www.ykiko.me/en/articles/692886292/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eApplication Binary Interface, or ABI as we commonly call it, is a concept that feels both familiar and unfamiliar. Familiar in what sense? It\u0026rsquo;s often discussed when troubleshooting, frequently mentioned in articles, and sometimes we even have to deal with compatibility issues it causes. Unfamiliar in what sense? If someone asks you what an ABI is, you\u0026rsquo;ll find that you know what it\u0026rsquo;s about, but describing it in precise language is quite difficult. In the end, you might just resort to saying, as \u003ca href=\"https://en.wikipedia.org/wiki/Application_binary_interface\"\u003eWIKI\u003c/a\u003e does: an ABI is an interface between two binary program modules. Is there a problem with that? No, as a general description, it\u0026rsquo;s sufficient. But it can feel a bit hollow.\u003c/p\u003e","title":"Thoroughly Understanding C++ ABI"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nReaders probably often hear people say that C++ code suffers from severe binary bloat, but usually few people point out the specific reasons. After a search online, I found that there aren\u0026rsquo;t many articles that delve deeply into this issue. The above statement is more like part of a cliché, passed down by word of mouth, but few can explain why. Today, your editor ykiko will take everyone on a journey to explore the ins and outs of C++ code bloat (^ω^)\nFirst, let\u0026rsquo;s discuss, what is code bloat? If a function is heavily inlined, the final executable file will be larger compared to not being inlined, right? Does that count as bloat? I don\u0026rsquo;t think so; this is within our expectations, acceptable, and normal behavior. Conversely, code bloat that is outside our expectations, theoretically eliminable but not eliminated due to existing implementations, I call \u0026ldquo;true code bloat\u0026rdquo;. All discussions of bloat in the following text refer to this meaning.\nDoes marking a function with inline cause bloat? First, it\u0026rsquo;s important to clarify that inline here refers to C++\u0026rsquo;s inline, whose semantic meaning as defined by the standard is to allow a function to be defined in multiple source files. Functions marked with inline can be defined directly in header files, and even if #included by multiple source files, they will not cause linking errors, thus conveniently supporting header-only libraries.\nCase of multiple instances Since it can be defined in multiple source files, does that mean each source file will have a separate code instance, potentially leading to code bloat?\nConsider the following example, where the comments at the beginning indicate the filename:\n// src1.cpp inline int add(int a, int b) { return a + b; } int g1(int a, int b) { return add(a, b); } // src2.cpp inline int add(int a, int b) { return a + b; } int g2(int a, int b){ return add(a, b); } // main.cpp #include \u0026lt;cstdio\u0026gt; extern int g1(int, int); extern int g2(int, int); int main() { return g1(1, 2) + g2(3, 4); } Let\u0026rsquo;s first try compiling the first two files without optimization to see if they each retain an instance of the add function.\n$ g++ -c src1.cpp -o src1.o $ g++ -c src2.cpp -o src2.o Let\u0026rsquo;s examine the symbol tables in these two files separately.\n$ objdump -d src1.o | c++filt $ objdump -d src2.o | c++filt Local verification is done by directly viewing the symbol table using the above commands. However, for convenience of demonstration, I will include the corresponding Godbolt link and screenshot, which omits many non-critical symbols that affect readability, making it clearer.\nAs you can see, these two source files each retain an instance of the add function. Then we link them into an executable file.\n$ g++ main.o src1.o src2.o -o main.exe $ objdump -d main.exe | c++filt The result is shown below.\nWe find that the linker only keeps one of the two add instances, so there is no additional code bloat. Furthermore, the C++ standard requires that the definitions of inline functions in different translation units must be identical, so it makes no difference which copy of the code is kept. But if you ask: what if the definitions are different? That would lead to an ODR violation, which is strictly speaking undefined behavior. Which one is kept might depend on the specific implementation, and even on the linking order. I might write a separate article about ODR violations soon, so I won\u0026rsquo;t go into too much detail here. Just know that the C++ standard guarantees that inline functions have identical definitions across different translation units.\nCase of complete inlining Earlier, I specifically emphasized compiling without optimization. What happens if optimization is enabled? Using the same code as above, let\u0026rsquo;s try enabling O2 optimization. The final result is shown below.\nIt might be a bit surprising, but after enabling -O2 optimization, the add call is completely inlined. The compiler doesn\u0026rsquo;t even generate a symbol for add in the end, so naturally, there\u0026rsquo;s no add during linking. According to our previous definition, this kind of function inlining is not considered code bloat, so there is no additional binary bloat overhead.\nTo digress slightly, since neither of these files generates the add symbol, wouldn\u0026rsquo;t linking fail if another file referenced the add symbol?\nConsider the following code:\n// src1.cpp inline int add(int a, int b) { return a + b; } int g1(int a, int b) { return add(a, b); } // main.cpp inline int add(int a, int b); int main() { return g1(1, 2) + add(3, 4); } Let\u0026rsquo;s try compiling and linking the above code. We find that linking succeeds without optimization. With optimization enabled, linking will fail. The linker will tell you undefined reference to add(int, int). This is the behavior of all three major compilers. The specific reason has been explained above: after optimization is enabled, the compiler simply doesn\u0026rsquo;t generate the add symbol, so it cannot be found during linking.\nBut what we want to know is, does this comply with the C++ standard?\nSince all three major compilers behave this way, it seems there\u0026rsquo;s no reason for it to be non-compliant. However, it\u0026rsquo;s not explicitly stated in the inline section, but the One Definition Rule states the following two sentences:\nFor an inline function or inline variable(since C++17), a definition is required in every translation unit where it is odr-used. a function is odr-used if a function call to it is made or its address is taken What do these two sentences mean? It means that if an inline function is odr-used in a certain translation unit, then that translation unit must have the definition of that function. What constitutes odr-use? The next sentence explains that if a function is called or its address is taken, it is considered odr-used.\nSo, looking at our previous code, an inline function is called in main.cpp but not defined there, which actually violates the C++ standard\u0026rsquo;s agreement. At this point, it\u0026rsquo;s a relief. Although it\u0026rsquo;s a bit counter-intuitive, it is indeed the case, and all three major compilers are correct!\nOther cases In this subsection, we mainly discussed two situations:\nIn the first case, the inline function has instances in multiple translation units (generating symbols). In this scenario, most mainstream linkers will only choose to keep one copy, so there will be no additional code bloat. The second case is when the inline function is completely inlined and no symbol is generated. In this situation, just like a regular function being inlined, it does not constitute \u0026ldquo;additional overhead\u0026rdquo;. Some might feel that C++ optimization has too many rules. But in reality, there\u0026rsquo;s only one core rule: the as-if rule, which means the compiler can perform any optimization as long as the final generated code behaves the same as if it were not optimized. Compilers mostly optimize according to this principle, with only a few exceptions where this principle might not be met. The optimization of inline functions mentioned above also adheres to this principle; if the address of an inline function is not explicitly taken, there\u0026rsquo;s indeed no need to retain its symbol.\nAdditionally, although inline no longer carries a mandatory inlining semantic at the standard level, it actually provides hints to the compiler, making the function more likely to be inlined. How does this hint work? As mentioned earlier, the standard wording indicates that inline functions may not generate symbols. In contrast, functions without any specifiers are implicitly marked as extern and must generate symbols. Compilers are certainly more inclined to inline functions that do not need to generate symbols. From this perspective, you might guess that static would also have a similar hint effect, and indeed it does. Of course, these are just one aspect; in reality, the calculation to determine whether a function is inlined is much more complex.\nNote: This subsection only discussed functions marked solely with inline. There are also combinations like inline static and inline extern. Interested readers can consult official documentation or experiment themselves to see their effects.\nThe true reason for code bloat caused by templates? If someone gives a reason for C++ binary bloat, their answer will almost certainly be templates. Is that really the case? How exactly do templates cause binary bloat? Under what circumstances does it occur? Does using them automatically lead to it?\nImplicit instantiation is like inline marking We know that template instantiation happens in the current translation unit, and each instantiation generates a copy of the code. Consider the following example:\n// src1.cpp template \u0026lt;typename T\u0026gt; int add(T a, T b) { return a + b; } float g1() { return add(1, 2) + add(3.0, 4.0); } // src2.cpp template \u0026lt;typename T\u0026gt; int add(T a, T b) { return a + b; } float g2() { return add(1, 2) + add(3.0, 4.0); } // main.cpp extern float g1(); extern float g2(); int main() { return g1() + g2(); } Still without optimization, let\u0026rsquo;s try compiling. The compilation result is as follows:\nAs you can see, just like functions marked with inline, both translation units instantiate add\u0026lt;int, int\u0026gt; and add\u0026lt;double, double\u0026gt;, each having a copy of the code. Then, during final linking, the linker only keeps one copy of the code for each template instantiation. Now let\u0026rsquo;s try enabling -O2 and see what happens. The result is as follows:\nSimilar to the effect of inline marking, the compiler directly inlines the function and discards the symbols of the instantiated functions. In this case, either the function is inlined and no symbol is generated, or a symbol is generated and the functions are eventually merged. Like inline, this situation doesn\u0026rsquo;t seem to have additional bloat. So, where exactly is the often-mentioned template bloat?\nExplicit instantiation and extern templates Before introducing the true reasons for bloat, let\u0026rsquo;s first discuss explicit instantiation.\nAlthough the linker can eventually merge multiple identical template instantiations. However, parsing template definitions, template instantiation, generating the final binary code, and the linker removing duplicate code all take compilation time. Sometimes, we know that we only use instantiations with a few fixed template parameters, for example, standard library basic_string almost exclusively uses a few fixed types as template parameters. If every file that uses them has to perform template instantiation, it can significantly increase compilation time.\nCan we, like non-template functions, put the implementation in one source file and have other files reference functions from that source file? From the discussion in the previous subsection, since symbols are generated, there should be a way to link to them. But it\u0026rsquo;s not guaranteed that symbols will always be generated. Is there a way to ensure symbol generation?\nThe answer is — explicit instantiation!\nWhat is explicit instantiation? Simply put, if you use a template directly. Without explicitly declaring the specific type beforehand, and the compiler generates the declaration for you, that\u0026rsquo;s implicit instantiation. Conversely, that\u0026rsquo;s called explicit instantiation. Taking a function template as an example:\ntemplate \u0026lt;typename T\u0026gt; void f(T a, T b) { return a + b; } template void f\u0026lt;int\u0026gt;(int, int); // Explicitly instantiate f\u0026lt;int\u0026gt; definition void g() { f(1, 2); // Call the previously explicitly instantiated f\u0026lt;int\u0026gt; f(1.0, 2.0); // Implicitly instantiate f\u0026lt;double\u0026gt; } I believe it\u0026rsquo;s still easy to understand, and with explicit instantiation definition, the compiler will definitely retain the symbol for you. Next, how to link to this explicitly instantiated function from outside? There are two ways:\nOne way is to explicitly instantiate a function declaration directly:\ntemplate \u0026lt;typename T\u0026gt; void f(T a, T b); template void f\u0026lt;int\u0026gt;(int, int); // Explicitly instantiate f\u0026lt;int\u0026gt; declaration only Another way is to directly use the extern keyword to instantiate a definition:\ntemplate \u0026lt;typename T\u0026gt; void f(T a, T b){ return a + b; } extern template void f\u0026lt;int\u0026gt;(int, int); // Explicitly instantiate f\u0026lt;int\u0026gt; declaration // Note that without extern, it would explicitly instantiate a definition. Both of these methods can correctly reference the function f above, allowing you to call template instantiations from other files!\nThe true overhead of template bloat Now for the most important part: we will introduce the true reasons for template bloat. Due to some historical legacy issues, the three types char, unsigned char, and signed char are always distinct in C++.\nstatic_assert(!std::is_same_v\u0026lt;char, unsigned char\u0026gt;); static_assert(!std::is_same_v\u0026lt;char, signed char\u0026gt;); static_assert(!std::is_same_v\u0026lt;unsigned char, signed char\u0026gt;); However, when it comes to the compiler\u0026rsquo;s final implementation, char is either signed or unsigned. Suppose we write a template function:\ntemplate \u0026lt;typename T\u0026gt; void f(T a, T b){ return a + b; } void g() { f\u0026lt;char\u0026gt;(\u0026#39;a\u0026#39;, \u0026#39;a\u0026#39;); f\u0026lt;unsigned char\u0026gt;(\u0026#39;a\u0026#39;, \u0026#39;a\u0026#39;); f\u0026lt;signed char\u0026gt;(\u0026#39;a\u0026#39;, \u0026#39;a\u0026#39;); } Instantiating function templates for these three types means that two of the instantiations will inevitably have identical code. Will the compiler merge two functions that have different function types but generate identical binary code? Let\u0026rsquo;s try it. The result is as follows:\nAs you can see, two identical functions are generated here, but they are not merged. Of course, if we enable -O2 optimization, such short functions will be inlined and no final symbols will be generated. As discussed in the first subsection, there would be no \u0026ldquo;template bloat overhead\u0026rdquo;. In actual code writing, there are many such short template functions, such as end, begin, operator[] for containers like vector. They are highly likely to be completely inlined, thus incurring no \u0026ldquo;additional bloat\u0026rdquo; overhead.\nNow the question is, what if the function is not inlined? Suppose the template function is more complex and has a larger body. For demonstration purposes, we will temporarily use GCC\u0026rsquo;s [[gnu::noinline]] attribute to achieve this effect, then enable O2, and compile the code again:\nAs you can see, even though optimization left only one instruction, the compiler still generated three copies of the function. In reality, functions that are truly not inlined by the compiler might have a larger body, and the situation could be much worse than this \u0026ldquo;disguised large function\u0026rdquo;. Thus, this is where the so-called \u0026ldquo;template bloat\u0026rdquo; arises. Code that could have been merged was not, and this is where the true overhead of template bloat lies.\nWhat if we really want the compiler/linker to merge these identical binary codes? Unfortunately, mainstream toolchains like ld, lld, and ms linker do not perform such merging. Currently, the only linker that supports this feature is gold, but it can only be used to link ELF-formatted executables, so it cannot be used on Windows. Below, I will demonstrate how to use it to merge identical binary code:\n// main.cpp #include \u0026lt;cstdio\u0026gt; #include \u0026lt;utility\u0026gt; template \u0026lt;std::size_t I\u0026gt; struct X { std::size_t x; [[gnu::noinline]] void f() { printf(\u0026#34;X\u0026lt;%zu\u0026gt;::f() called\\n\u0026#34;, x); } }; template \u0026lt;std::size_t... Is\u0026gt; void call_f(std::index_sequence\u0026lt;Is...\u0026gt;) { ((X\u0026lt;Is\u0026gt;{Is}).f(), ...); } int main(int argc, char *argv[]) { call_f(std::make_index_sequence\u0026lt;100\u0026gt;{}); return 0; } Here, I generated 100 different types using templates, but in reality, their underlying type is size_t, so the final compiled binary code generated is completely identical. Try compiling it with the following commands:\n$ g++ -O2 -ffunction-sections -fuse-ld=gold -Wl,--icf=all main.cpp -o main.o $ objdump -d main.o | c++filt Use -fuse-ld=gold to specify the linker, and -Wl,--icf=all to specify linker options. icf stands for identical code folding. Since the linker only operates at the section level, GCC needs to be used with -ffunction-sections enabled. The compiler above can also be replaced with clang.\n0000000000000740 \u0026lt;X\u0026lt;99ul\u0026gt;::f() [clone .isra.0]\u0026gt;: 740: 48 89 fa mov %rdi,%rdx 743: 48 8d 35 1a 04 00 00 lea 0x41a(%rip),%rsi 74a: bf 01 00 00 00 mov $0x1,%edi 74f: 31 c0 xor %eax,%eax 751: e9 ca fe ff ff jmp 620 \u0026lt;_init+0x68\u0026gt; 756: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 75d: 00 00 00 0000000000000760 \u0026lt;void call_f\u0026lt;0..99\u0026gt;(std::integer_sequence\u0026lt;unsigned long, 0..99\u0026gt;) [clone .isra.0]\u0026gt;: 760: 48 83 ec 08 sub $0x8,%rsp 764: 31 ff xor %edi,%edi 766: e8 d5 ff ff ff call 740 \u0026lt;X\u0026lt;99ul\u0026gt;::f() [clone .isra.0]\u0026gt; ... # repeated 98 times b48: e9 f3 fb ff ff jmp 740 \u0026lt;X\u0026lt;99ul\u0026gt;::f() [clone .isra.0]\u0026gt; b4d: 0f 1f 00 nopl (%rax) After some filtering of the output, it can be seen that gold merged 100 identical template functions into one, and the so-called \u0026ldquo;template bloat\u0026rdquo; disappeared. In contrast, the linkers that do not perform such merging naturally incur additional overhead.\nHowever, gold is not a panacea; it cannot handle some situations well. Suppose that for these 100 functions, the first 90% of the code is identical, but the last 10% is different; then it would be powerless. It simply compares the final generated binaries and merges functions that are completely identical. Are there other solutions? If there\u0026rsquo;s no automatic, we still have manual. We C++ programmers aren\u0026rsquo;t good at much else, but we\u0026rsquo;re good at driving manual.\nManually optimizing template bloat Below, taking the most commonly used vector as an example, I will demonstrate the main idea for solving template bloat. As mentioned earlier, short functions like iterator interfaces don\u0026rsquo;t need our attention. We mainly deal with functions with more complex logic. For vector, the primary candidate is the growth function.\nSuppose we have the following vector code:\ntemplate \u0026lt;typename T\u0026gt; struct vector { T* m_Begin; T* m_End; T* m_Capacity; void grow(std::size_t n); }; Consider a naive implementation of vector growth, temporarily without considering exception safety:\ntemplate \u0026lt;typename T\u0026gt; void vector\u0026lt;T\u0026gt;::grow(std::size_t n) { T* new_date = static_cast\u0026lt;T*\u0026gt;(::operator new(n * sizeof(T))); if constexpr (std::is_move_constructible_v\u0026lt;T\u0026gt;) { std::uninitialized_move(m_Begin, m_End, new_date); } else { std::uninitialized_copy(m_Begin, m_End, new_date); } std::destroy(m_Begin, m_End); ::operator delete(m_Begin); } The logic seems quite simple. But undoubtedly, it\u0026rsquo;s a relatively complex function, especially if the object\u0026rsquo;s constructor is inlined, the amount of code can be quite large. So, how to merge it? Note that the prerequisite for merging templates is to find common parts among different template instantiations. If a function generates completely different code for different types, it cannot be merged.\nFor vector, if the element types in T are different, can the growth logic still be the same? Considering constructor calls, it seems there\u0026rsquo;s no way. Here\u0026rsquo;s the key point: we need to introduce the concept of trivially_relocatable. For a detailed discussion, you can refer to: A brand new constructor, the relocate constructor in C++.\nHere, we\u0026rsquo;ll just state the result: if a type is trivially_relocatable, then memcpy can be used to move it from old memory to new memory, without needing to call constructors.\nConsider writing the following growth function:\nvoid trivially_grow(char*\u0026amp; begin, char*\u0026amp; end, char*\u0026amp; capacity, std::size_t n, std::size_t size) { char* new_data = static_cast\u0026lt;char*\u0026gt;(::operator new(n * size)); std::memcpy(new_data, begin, (end - begin) * size); ::operator delete(begin); begin = new_data; end = new_data + (end - begin); capacity = new_data + n; } Then, forward the original grow implementation to this function:\ntemplate \u0026lt;typename T\u0026gt; void vector\u0026lt;T\u0026gt;::grow(std::size_t n) { if constexpr (is_trivially_relocatable_v\u0026lt;T\u0026gt;) { trivially_grow(reinterpret_cast\u0026lt;char*\u0026amp;\u0026gt;(m_Begin), reinterpret_cast\u0026lt;char*\u0026amp;\u0026gt;(m_End), reinterpret_cast\u0026lt;char*\u0026amp;\u0026gt;(m_Capacity), n, sizeof(T)); } else { // Original implementation } } This completes the extraction of common logic. Thus, all Ts that satisfy trivially_relocatable can share a single copy of this code. And almost all types that do not contain self-references meet this condition, so 99% of types use the same growth logic! The optimization effect is very significant! In fact, many LLVM container source codes, such as SmallVector, StringMap, etc., use this technique. Additionally, if you feel that the reinterpret_cast above violates strict aliasing and makes you a bit uneasy, you can achieve the same effect through inheritance (using void* for base class members). The specific code will not be shown here.\nCode bloat caused by exceptions! Why does LLVM source code disable exceptions? Many people might subconsciously think the reason is that exceptions are slow and inefficient. But in fact, according to the LLVM Coding Standard, the main purpose of disabling exceptions and RTTI is to reduce binary size. It is said that enabling exceptions and RTTI can cause LLVM\u0026rsquo;s compiled output to bloat by 10%-15%. So, what is the actual situation?\nCurrently, there are two main exception implementations: the Itanium ABI implementation and the MS ABI implementation. Simply put, the MS ABI uses a runtime lookup approach, which incurs additional runtime overhead even for exceptions in the happy path, but its advantage is that the final generated binary code is relatively smaller. The Itanium ABI, on the other hand, is our focus today. It claims zero-cost exceptions, meaning no additional runtime overhead in the happy path. But Gul\u0026rsquo;dan, what is the cost? The cost is very severe binary bloat. Why does bloat occur? Simply put, if you don\u0026rsquo;t want to wait until runtime for lookup, you have to pre-generate tables. Due to the implicit propagation nature of exceptions, these tables can occupy a large amount of space. The specific implementation details are very complex and not the topic of this article. Here\u0026rsquo;s an image to give you a general idea:\nSo, what are we mainly discussing? There\u0026rsquo;s no doubt that exceptions cause binary bloat. We will mainly look at how to reduce binary bloat caused by exceptions, using the Itanium ABI as an example.\nLet\u0026rsquo;s first look at the following example code:\n#include \u0026lt;vector\u0026gt; void foo(); // Externally linked function, might throw an exception void bar() { std::vector\u0026lt;int\u0026gt; v(12); // Has a non-trivial destructor foo(); } Note that foo here is an externally linked function that might throw an exception. Also, the vector\u0026rsquo;s destructor call is after foo. If foo throws an exception, and the control flow jumps to an unknown location, the vector\u0026rsquo;s destructor might be skipped. If the compiler doesn\u0026rsquo;t handle this specially, it will lead to a memory leak. Let\u0026rsquo;s first enable only -O2 and see the program\u0026rsquo;s compilation result:\nbar(): ... call operator new(unsigned long) ... call foo() ... jmp operator delete(void*, unsigned long) mov rbp, rax jmp .L2 bar() [clone .cold]: .L2: mov rdi, rbx mov esi, 48 call operator delete(void*, unsigned long) mov rdi, rbp call _Unwind_Resume Omitting the unimportant parts, it\u0026rsquo;s roughly the same as what we just guessed. So what is this .L2 for? This .L2 is actually where the program jumps after an exception is handled by catch to complete any unfinished work (here, destructing objects that haven\u0026rsquo;t been destructed yet), and then Resumes back to the previous location.\nLet\u0026rsquo;s slightly adjust the code, moving the foo call before the vector construction, keeping everything else the same:\nbar(): sub rsp, 8 call foo() mov edi, 48 call operator new(unsigned long) ... jmp operator delete(void*, unsigned long) We can see that no stack cleanup code is generated, which is reasonable. The reason is simple: if foo throws an exception, control flow jumps away directly, and vector hasn\u0026rsquo;t even been constructed, so naturally, it doesn\u0026rsquo;t need to be destructed. Simply adjusting the call order reduces binary size! However, dependency relationships are only obvious in such particularly simple cases. If there are many functions that actually throw exceptions, it becomes difficult to analyze.\nnoexcept Let\u0026rsquo;s first discuss noexcept, introduced in C++11. Note that even with noexcept, this function might still throw an exception. If it does, the program will terminate directly. So you might ask, what\u0026rsquo;s the use of this thing? If I throw an exception and don\u0026rsquo;t catch it, doesn\u0026rsquo;t it also terminate?\nActually, this is somewhat similar to const. If you want to modify a const variable, although it\u0026rsquo;s undefined behavior, you can freely modify it at runtime with few restrictions. So you might ask, what\u0026rsquo;s the point of const? One important meaning is to provide optimization hints to the compiler. The compiler can use this for constant folding and common subexpression elimination.\nnoexcept is similar; it allows the compiler to assume that the function will not throw exceptions, thereby enabling some additional optimizations. Taking the code from the first example again, the only change is declaring the foo function as noexcept, and then compiling again:\nbar(): push rbx mov edi, 48 call operator new(unsigned long) ... call foo() ... jmp operator delete(void*, unsigned long) As you can see, the code path for exception handling is also gone. This is thanks to noexcept.\nfno-exceptions Finally, we come to the main event: -fno-exceptions. Note that this option is non-standard. However, all three major compilers provide it, though the specific implementation effects may vary slightly. There doesn\u0026rsquo;t seem to be very detailed documentation. Based on my experience with GCC, this option prohibits the use of keywords like try, catch, throw in user code, leading to a compilation error if used. However, it specifically allows the use of the standard library. If an exception is thrown, just like with noexcept, the program will terminate directly. Therefore, if this option is enabled, GCC will by default assume that all functions do not throw exceptions.\nUsing the same example as above, let\u0026rsquo;s try enabling -fno-exceptions and then compiling again:\nbar(): push rbx mov edi, 48 call operator new(unsigned long) ... call foo() ... jmp operator delete(void*, unsigned long) As you can see, the effect is similar to that produced by noexcept: both make the compiler assume that a certain function will not throw exceptions, thus eliminating the need to generate additional stack cleanup code and achieving a reduction in program binary size.\nThis article covers a wide range of topics, so errors in some places are inevitable. Discussions and exchanges in the comments section are welcome.\n","permalink":"https://www.ykiko.me/en/articles/686296374/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eReaders probably often hear people say that C++ code suffers from severe binary bloat, but usually few people point out the specific reasons. After a search online, I found that there aren\u0026rsquo;t many articles that delve deeply into this issue. The above statement is more like part of a cliché, passed down by word of mouth, but few can explain why. Today, your editor ykiko will take everyone on a journey to explore the ins and outs of C++ code bloat (^ω^)\u003c/p\u003e","title":"Where exactly does C++ code bloat occur?"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nPrequel: The History of constexpr in C++! (Part One)\n2015-2016: Syntactic Sugar for Templates In C++, there are many templates that support full specialization, but not many that support partial specialization. In fact, only class templates and variable templates support it. Variable templates can actually be seen as syntactic sugar for class templates, so rounding it up, only class templates truly support partial specialization. The lack of partial specialization can make some code very difficult to write.\nSuppose we want to implement a destroy_at function whose effect is to call the object\u0026rsquo;s destructor. Specifically, if the destructor is trivial, we omit this meaningless destructor call.\nIntuitively, we could write code like this:\ntemplate\u0026lt;typename T, bool value = std::is_trivially_destructible_v\u0026lt;T\u0026gt;\u0026gt; void destroy_at(T* p) { p-\u0026gt;~T(); } template\u0026lt;typename T\u0026gt; void destroy_at\u0026lt;T, true\u0026gt;(T* p) {} Unfortunately, clangd can already smartly remind you: Function template partial specialization is not allowed. Function templates cannot be partially specialized, so what to do? Of course, you can wrap it in a class template to solve the problem, but having to add an extra layer every time this situation arises is truly unacceptable.\nThe old-fashioned way to solve this problem was to use SFINAE:\ntemplate\u0026lt;typename T, std::enable_if_t\u0026lt;(!std::is_trivially_destructible_v\u0026lt;T\u0026gt;)\u0026gt;* = nullptr\u0026gt; void destroy_at(T* p) { p-\u0026gt;~T(); } template\u0026lt;typename T, std::enable_if_t\u0026lt;std::is_trivially_destructible_v\u0026lt;T\u0026gt;\u0026gt;* = nullptr\u0026gt; void destroy_at(T* p) {} The specific principle will not be elaborated here. Although it reduces one layer of wrapping, there are still many elements unrelated to the code\u0026rsquo;s logic. std::enable_if_t here is a typical example, severely impacting code readability.\nProposal N4461 aimed to introduce static_if (borrowed from D language) to control code generation at compile time, compiling only the branches actually used into the final binary. This would allow writing code like the following, where the condition for static_if must be a constant expression:\ntemplate\u0026lt;typename T\u0026gt; void destroy_at(T* p){ static_if(!std::is_trivially_destructible_v\u0026lt;T\u0026gt;){ p-\u0026gt;~T(); } } The logic is very clear, but the committee is generally cautious about adding new keywords. Later, static_if was renamed to constexpr_if, and then it evolved into the form we are familiar with today and entered C++17:\nif constexpr (...){...} else if constexpr (...){...} else {...} Cleverly avoiding new keywords, the C++ committee really likes keyword reuse.\n2015: constexpr lambda Proposal N4487 discussed the possibility of supporting constexpr lambdas, especially hoping to use lambda expressions in constexpr computations, and included an experimental implementation.\nIn fact, supporting constexpr lambda expressions is not difficult. We all know that lambdas are very transparent in C++, essentially just anonymous function objects. If function objects can be constexpr, then supporting constexpr lambdas is a natural consequence.\nThe only thing to note is that lambdas can capture variables. What happens when a constexpr variable is captured?\nvoid foo() { constexpr int x = 3; constexpr auto foo = [=]() { return x + 1; }; static_assert(sizeof(foo) == 1); } Intuitively, since x is a constant expression, there\u0026rsquo;s no need to allocate space for it. Thus, f would have no members, and in C++, the size of an empty class is at least 1. The code above seems reasonable, but as mentioned in the previous part of the article, constexpr variables can also occupy memory, and we can explicitly take their address.\nvoid foo() { constexpr int x = 3; constexpr auto foo = [=]() { return \u0026amp;x + 1; }; static_assert(sizeof(foo) == 4); } In this case, the compiler has to allocate memory for x. The actual rules for this are more complex; interested readers can refer to lambda capture. This proposal was eventually accepted and entered C++17.\n2017-2019: Compile-time and Run-time\u0026hellip; Different? By continuously relaxing constexpr restrictions, more and more functions can be executed at compile time. However, functions with external linkage (i.e., extern functions) cannot be executed at compile time under any circumstances. Most functions inherited from C are like this, such as memcpy, memmove, etc.\nSuppose I wrote a constexpr memcpy:\ntemplate \u0026lt;typename T\u0026gt; constexpr T* memcpy(T* dest, const T* src, std::size_t count) { for(std::size_t i = 0; i \u0026lt; count; ++i) { dest[i] = src[i]; } return dest; } While it can be used at compile time, compile-time execution efficiency is not a concern, but run-time efficiency would certainly be inferior to the standard library\u0026rsquo;s implementation. It would be ideal if my implementation could be used at compile time and the externally linked standard library functions at run time.\nProposal P0595 aimed to add a new magic function, constexpr(), to determine if the current function is being executed at compile time. It was later renamed to is_constant_evaluated and entered C++20. Its usage is as follows:\nconstexpr int foo(int x) { if(std::is_constant_evaluated()) { return x; } else { return x + 1; } } This way, different logic can be implemented for compile-time and run-time. We can wrap externally linked functions, exposing them internally as constexpr function interfaces, which allows for code reuse and ensures run-time efficiency, achieving the best of both worlds.\nThe only problem is, if the foo above runs at run time, you\u0026rsquo;ll find that the first branch is still compiled, although the compiler might eventually optimize away the if(false) branch. However, this branch will still undergo syntax checking and similar work. If templates are used inside, template instantiation will still be triggered (potentially even leading to unexpected instantiations causing compilation errors), which is clearly not what we want. What if we try to rewrite the above code using if constexpr?\nconstexpr int foo(int x) { if constexpr(std::is_constant_evaluated()) { // ... } } This way of writing is considered obviously incorrect, because the condition of if constexpr can only be evaluated at compile time, so is_constant_evaluated will always return true here, which contradicts our initial goal. Therefore, proposal P1938R3 proposed adding new syntax to solve this problem:\nif consteval /* !consteval */ { // ... } else { // ... } The code looks straightforward: one branch for compile-time, one for run-time. This upgraded version was eventually accepted and added to C++23.\n2017-2019: Efficient Debugging One of the most criticized problems with C++ templates is that error messages are very poor and difficult to debug. After an inner template instantiation fails, the entire instantiation stack is printed, easily generating hundreds or thousands of lines of errors. However, things haven\u0026rsquo;t really improved for constexpr functions; if a constexpr function\u0026rsquo;s constant evaluation fails, the entire function call stack is also printed.\nconstexpr int foo(){ return 13 + 2147483647; } constexpr int bar() { return oo(); } constexpr auto x = bar(); Error message:\nin \u0026#39;constexpr\u0026#39; expansion of \u0026#39;bar()\u0026#39; in \u0026#39;constexpr\u0026#39; expansion of \u0026#39;foo()\u0026#39; error: overflow in constant expression [-fpermissive] 233 | constexpr auto x = bar(); If functions are nested too deeply, the error messages are also very bad. Unlike templates, constexpr functions can also run at run time. So, we could debug the code at run time and then execute it at compile time. However, considering the is_constant_evaluated added in the previous section, this approach isn\u0026rsquo;t entirely feasible because the code logic might differ between compile-time and run-time. Proposal P0596 aimed to introduce constexpr_trace and constexpr_assert to facilitate compile-time debugging. Although the vote was unanimously in favor, it has not yet entered the C++ standard.\n2017: Compile-time Mutable Containers Although previous proposals allowed constexpr functions to use and modify variables, dynamic memory allocation was still not permitted. If data of unknown length needed to be processed, a large array would typically be allocated on the stack, which was fine. However, in practice, many functions rely on dynamic memory allocation, making support for vector in constexpr functions essential.\nAt the time, directly allowing new/delete in constexpr functions seemed too surprising. So, proposal P0597 came up with a compromise: first provide a magic container called std::constexpr_vector, implemented by the compiler, which supports use and modification in constexpr functions.\nconstexpr constexpr_vector\u0026lt;int\u0026gt; x; // ok constexpr constexpr_vector\u0026lt;int\u0026gt; y{ 1, 2, 3 }; // ok constexpr auto series(int n) { std::constexpr_vector\u0026lt;int\u0026gt; r{}; for(int k; k \u0026lt; n; ++k) { r.push_back(k); } return r; } This didn\u0026rsquo;t completely solve the problem; users still needed to rewrite their code to support constant evaluation. Judging from the section on supporting loops in constexpr functions, such additions that increase language inconsistency are unlikely to be added to the standard. Eventually, a better proposal replaced it, which will be mentioned later.\n2018: True Compile-time Polymorphism? Proposal P1064R0 aimed to support virtual function calls in constant evaluation. Oh, dynamic memory allocation isn\u0026rsquo;t even supported yet, so why virtual function calls? Actually, polymorphic pointers can be created without relying on dynamic memory allocation; they can point to objects on the stack or static storage.\nstruct Base { virtual int foo() const { return 1; } }; struct Derived : Base { int foo() const override { return 2; } }; constexpr auto foo() { Base* p; Derived d; p = \u0026amp;d; return p-\u0026gt;foo(); } There seems to be no reason to reject the compilation of the above code. Since it\u0026rsquo;s executed at compile time, the compiler can certainly know that p points to Derived, and then call Derived::f, which presents no practical difficulty. Indeed, a new proposal P1327R1 further aimed for dynamic_cast and typeid to also be usable in constant evaluation. Ultimately, both were accepted and added to C++20, and these features can now be freely used at compile time.\n2017-2019: True Dynamic Memory Allocation! In the demo video constexpr everything, an example of processing JSON objects at compile time was shown:\nconstexpr auto jsv= R\u0026#34;({ \u0026#34;feature-x-enabled\u0026#34;: true, \u0026#34;value-of-y\u0026#34;: 1729, \u0026#34;z-options\u0026#34;: {\u0026#34;a\u0026#34;: null, \u0026#34;b\u0026#34;: \u0026#34;220 and 284\u0026#34;, \u0026#34;c\u0026#34;: [6, 28, 496]} })\u0026#34;_json; if constexpr (jsv[\u0026#34;feature-x-enabled\u0026#34;]) { // feature x } else { // feature y } The hope was to directly use constant string parsing to act as configuration files (string literals can be introduced via #include). The authors were severely impacted by the inability to use STL containers and wrote their own alternatives. By using std::array to implement containers like std::vector and std::map, without dynamic memory allocation, they could only pre-calculate the required size (potentially leading to multiple traversals) or allocate a large block of memory on the stack.\nProposal P0784R7 revisited the possibility of supporting standard library containers in constant evaluation.\nThere were three main difficulties:\nDestructors cannot be declared constexpr (for constexpr objects, they must be trivial). Inability to perform dynamic memory allocation/deallocation. Inability to use placement new to call object constructors in constant evaluation. Regarding the first issue, the authors quickly discussed and resolved it with frontend developers from MSVC, GCC, Clang, EDG, and others. Starting with C++20, types that meet the literal type requirements can have constexpr destructors, rather than strictly requiring trivial destructors.\nThe second issue was not simple to address. Many undefined behaviors in C++ are caused by incorrect memory handling; scripting languages, which cannot directly manipulate memory, are much safer by comparison. However, for code reuse, the constant evaluator in C++ compilers had to directly manipulate memory. Since all information is known at compile time, it is theoretically possible to guarantee that memory errors (out of range, double free, memory leak, \u0026hellip;) will not occur during constant evaluation. If they do, compilation should be aborted and an error reported.\nThe constant evaluator needs to track meta-information for many objects to find these errors:\nRecord which field of a union is active; accessing an inactive member leads to undefined behavior, as clarified by P1330. Correctly record the object\u0026rsquo;s lifetime; accessing uninitialized memory or already destructed objects is not allowed. At the time, converting void* to T* was not allowed in constant evaluation, so naturally:\nvoid* operator new(std::size_t); was not supported in constant evaluation. Instead, the following was used:\n// new =\u0026gt; initialize when allocate auto pa = new int(42); delete pa; // std::allocator =\u0026gt; initialize after allocate std::allocator\u0026lt;int\u0026gt; alloc; auto pb = alloc.allocate(1); alloc.deallocate(pb, 1); Both return T* and are implemented by the compiler, which was sufficient for supporting standard library containers.\nFor the third issue, a magic function, std::construct_at, was added. Its purpose is to call an object\u0026rsquo;s constructor at a specified memory location, replacing placement new in constant evaluation. This allows us to first allocate memory via std::allocator and then construct objects via std::construct_at. This proposal was eventually accepted and entered C++20, simultaneously making std::vector and std::string available in constant evaluation (other containers are theoretically possible, but current implementations don\u0026rsquo;t support them yet; if you really want them, you\u0026rsquo;ll have to roll your own).\nAlthough dynamic memory allocation is supported, it\u0026rsquo;s not without restrictions. Memory allocated during a constant evaluation must be fully deallocated before that constant evaluation ends; there must be no memory leaks, otherwise it will result in a compilation error. This type of memory allocation is called transient constexpr allocations. The proposal also discussed non-transient allocation, where memory not released at compile time would be converted to static storage (essentially residing in the data segment, like global variables). However, the committee deemed this possibility \u0026ldquo;too brittle\u0026rdquo; and, for various reasons, it has not yet been adopted.\n2018: More constexpr At the time, many proposals merely aimed to mark certain parts of the standard library as constexpr. These were not discussed in this article because they followed the same pattern.\nProposal P1002 aimed to support try-catch blocks in constexpr functions. However, throw was not allowed, which was intended to enable more member functions of standard library containers to be marked as constexpr.\nconstexpr int foo(){ throw 1; return 1; } constexpr auto x = foo(); // error // expression \u0026#39;\u0026lt;throw-expression\u0026gt;\u0026#39; is not a constant expression // 233 | throw 1; If throw occurs at compile time, it directly leads to a compilation error. Since throw won\u0026rsquo;t happen, no exception will naturally be caught.\n2018: Guarantee Compile-time Execution! Sometimes we want to guarantee that a function executes at compile time:\nextern int foo(int x); constexpr int bar(int x){ return x; } foo(bar(1)); // evaluate at compile time ? In fact, g could theoretically execute at either compile time or run time. To guarantee its compile-time execution, we would need to write more code:\nconstexpr auto x = bar(1); foo(x); This guarantees g executes at compile time. Similarly, such meaningless local variables are redundant. Proposal P1073 aimed to add a constexpr! specifier to ensure a function executes at compile time, causing a compilation error if not met. This specifier was eventually renamed to consteval and entered C++20.\nextern int foo(int x); consteval int bar(int x){ return x; } foo(bar(1)); // ensure evaluation at compile time consteval functions cannot obtain pointers or references outside of a constant evaluation context. The compiler backend neither needs nor should be aware of the existence of these functions. In fact, this proposal also laid the groundwork for static reflection, which is planned for future inclusion in the standard, and will add many functions that can only be executed at compile time.\n2018: Default constexpr? At the time, many proposals merely aimed to mark certain parts of the standard library as constexpr. These were not discussed in this article because they followed the same pattern.\nProposal P1235 aimed to mark all functions as implicitly constexpr:\nnon: Mark methods as constexpr if possible. constexpr: Same as current behavior. constexpr(false): Cannot be called at compile time. constexpr(true): Can only be called at compile time. This proposal was ultimately not accepted.\n2020: Stronger Dynamic Memory Allocation? As previously mentioned, memory allocation is now allowed in constexpr functions, and containers like std::vector can also be used in constexpr functions. However, due to transient memory allocation, global std::vectors cannot be created:\nconstexpr std::vector\u0026lt;int\u0026gt; v{1, 2, 3}; // error Therefore, if a constexpr function returns a std::vector, it can only be wrapped an extra layer to convert this std::vector into a std::array and then used as a global variable:\nconstexpr auto f() { return std::vector\u0026lt;int\u0026gt;{1, 2, 3}; } constexpr auto arr = [](){ constexpr auto len = f().size(); std::array\u0026lt;int, len\u0026gt; result{}; auto temp = f(); for(std::size_t i = 0; i \u0026lt; len; ++i){ result[i] = temp[i]; } return result; }; Proposal P1974 proposed using propconst to support non-transient memory allocation, thus eliminating the need for the aforementioned extra wrapping code.\nThe principle of non-transient memory allocation is simple:\nconstexpr std::vector vec = {1, 2, 3}; The compiler would compile the above code into something similar to this:\nconstexpr int data[3] = {1, 2, 3}; constexpr std::vector vec{ .begin = data, .end = data + 3, .capacity = data + 3 }; Essentially, it changes pointers that would normally point to dynamically allocated memory to point to static memory. The principle is not complex; the real challenge is ensuring program correctness. Clearly, the vec above should not have its destructor called even at program termination, otherwise it would lead to a segmentation fault. This problem is simple to solve: we can stipulate that any variable marked constexpr will not have its destructor called.\nHowever, consider the following scenario:\nconstexpr unique_ptr\u0026lt;unique_ptr\u0026lt;int\u0026gt;\u0026gt; ppi { new unique_ptr\u0026lt;int\u0026gt; { new int { 42 } } }; int main(){ ppi.reset(new int { 43 }); // error, ppi is const auto\u0026amp; pi = *ppi; pi.reset(new int { 43 }); // ok } Since ppi is constexpr, its destructor should not be called. Attempting to call reset on ppi is not allowed because a constexpr marked variable implies const, and reset is not a const method. However, calling reset on pi is allowed because the outer const does not affect inner pointers.\nIf pi were allowed to call reset, this would clearly be a run-time call, performing dynamic memory allocation at run time. And since ppi\u0026rsquo;s destructor is not called, pi\u0026rsquo;s destructor inside it would also not be called, leading to a memory leak. This approach should clearly not be allowed.\nThe solution, naturally, is to find a way to prohibit pi from calling reset. The proposal introduced the propconst keyword, which can propagate the outer constexpr to the inner parts, making pi also const, thus preventing reset from being called and avoiding code logic issues.\nUnfortunately, it has not yet been accepted into the standard. Since then, there have been new proposals hoping to support this feature, such as P2670R1, and related discussions are ongoing.\n2021: constexpr Classes Many types in the C++ standard library, such as vector, string, and unique_ptr, have all their methods marked as constexpr and can truly execute at compile time. Naturally, we hope to directly mark an entire class as constexpr, which would save the repetitive writing of specifiers.\nProposal P2350 aimed to support this feature, where all methods in a class marked constexpr are implicitly marked as constexpr:\n// before class struct { constexpr bool empty() const { /* */ } constexpr auto size() const { /* */ } constexpr void clear() { /* */ } }; // after constexpr struct SomeType { bool empty() const { /* */ } auto size() const { /* */ } void clear() { /* */ } }; There\u0026rsquo;s an interesting story related to this proposal – before knowing of its existence, I (the original author of the article) proposed the same idea on stdcpp.ru.\nDuring the standardization process, many nearly identical proposals can emerge almost simultaneously. This demonstrates the correctness of the theory of multiple discovery: certain ideas or concepts appear independently among different groups of people, as if they are floating in the air, and who discovers them first is not important. If the community is large enough, these ideas or concepts will naturally evolve.\n2023: Compile-time Type Erasure! In constant evaluation, converting void* to T* has always been disallowed, which prevented type-erased containers like std::any and std::function from being used in constant evaluation. The reason is that void* could be used to bypass the type system, converting one type to an unrelated type:\nint* p = new int(42); double* p1 = static_cast\u0026lt;float*\u0026gt;(static_cast\u0026lt;void*\u0026gt;(p)); Dereferencing p1 would actually be undefined behavior, so this conversion was prohibited (note that reinterpret_cast has always been disabled in constant evaluation). However, this approach clearly harmed correct usage, because implementations like std::any would obviously not convert a void* to an unrelated type, but rather convert it back to its original type. Completely disallowing this conversion was unreasonable. Proposal P2738R0 aimed to support this conversion in constant evaluation. Theoretically, the compiler could record the original type of a void* pointer at compile time and report an error if the conversion was not to the original type.\nThis proposal was eventually accepted and added to C++26. Now, T* -\u0026gt; void* -\u0026gt; T* conversions are allowed:\nconstexpr void f(){ int x = 42; void* p = \u0026amp;x; int* p1 = static_cast\u0026lt;int*\u0026gt;(p); // ok float* p2 = static_cast\u0026lt;float*\u0026gt;(p); // error } 2023: Support for placement new? As mentioned earlier, to support vector in constant evaluation, construct_at was added to call constructors in constant evaluation. It has the following form:\ntemplate\u0026lt;typename T, typename... Args\u0026gt; constexpr T* construct_at(T* p, Args\u0026amp;\u0026amp;... args); While it solved the problem to some extent, it doesn\u0026rsquo;t fully provide the functionality of placement new:\nvalue initialization new (p) T(args...) // placement new version construct_at(p, args...) // construct_at version default initialization new (p) T // placement new version std::default_construct_at(p) // P2283R1 list initialization new (p) T{args...} // placement new version // construct_at version doesn\u0026#39;t exist designated initialization new (p) T{.x = 1, .y = 2} // placement new version // construct_at version cannot exist Proposal P2747R1 aims to directly support placement new in constant evaluation. It has not yet been added to the standard.\n2024-∞: The Future is Limitless! As of now, C++\u0026rsquo;s constant evaluation supports a rich set of features, including conditions, variables, loops, virtual function calls, dynamic memory allocation, and more. However, due to the C++ versions used in daily development, many features might not be available yet. You can conveniently check which version supports which features here.\nThere are still many possibilities for constexpr in the future. For example, perhaps functions like memcpy could also be used in constant evaluation? Or perhaps some implementations of current small_vectors cannot become constexpr without any code changes, because they use char arrays to provide storage for objects on the stack (to avoid default construction):\nconstexpr void foo(){ std::byte buf[100]; std::construct_at(reinterpret_cast\u0026lt;int*\u0026gt;(buf), 42); // no matter what } However, currently, objects cannot be directly constructed on char arrays in constant evaluation. Furthermore, could the implicit lifetime introduced in C++20 manifest in constant evaluation? These are theoretically possible to implement, only requiring the compiler to record more meta-information. And in the future, anything is possible! Ultimately, we might truly be able to constexpr everything!\n","permalink":"https://www.ykiko.me/en/articles/683463723/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003ePrequel: \u003ca href=\"https://www.ykiko.me/en/articles/682031684\"\u003eThe History of constexpr in C++! (Part One)\u003c/a\u003e\u003c/p\u003e\n\u003ch2 id=\"2015-2016-syntactic-sugar-for-templates\"\u003e2015-2016: Syntactic Sugar for Templates\u003c/h2\u003e\n\u003cp\u003eIn C++, there are many templates that support \u003ca href=\"https://en.cppreference.com/w/cpp/language/template_specialization\"\u003efull specialization\u003c/a\u003e, but not many that support \u003ca href=\"https://en.cppreference.com/w/cpp/language/partial_specialization\"\u003epartial specialization\u003c/a\u003e. In fact, only class templates and variable templates support it. Variable templates can actually be seen as syntactic sugar for class templates, so rounding it up, only class templates truly support partial specialization. The lack of partial specialization can make some code very difficult to write.\u003c/p\u003e","title":"The History of constexpr in C++! (Part Two)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nA few months ago, I wrote an article introducing C++ templates: Looking at Flowers in a Fog: A True Understanding of C++ Templates.\nIt clarified the position of templates in modern C++. Among the changes, using constexpr functions to replace templates for compile-time computation is arguably one of the most important improvements in modern C++. constexpr itself is not difficult to understand; it\u0026rsquo;s very intuitive. However, because it has been improved in almost every version of C++, the features available in different C++ versions vary greatly, which can sometimes give a feeling of inconsistency.\nCoincidentally, I recently came across this article: Design and evolution of constexpr in C++, which provides a comprehensive history of the development of constexpr in C++. It\u0026rsquo;s very well-written. So, I decided to translate it for the Chinese-speaking community.\nBut interestingly, this article is also a translation. The original author is Russian, and the article was first published on a Russian forum. Here is the author\u0026rsquo;s email: izaronplatz@gmail.com. I have already contacted him, and he replied:\nIt\u0026rsquo;s always good to spread knowledge in more languages.\nThis means translation is permitted. However, I don\u0026rsquo;t understand Russian, so I mainly followed the structure of the original article, but the main body is mostly my own re-narration.\nThe original content is quite long, so it is divided into two parts. This is the first part.\nIs it magical? constexpr is one of the most magical keywords in modern C++. It allows certain code to be executed at compile time.\nOver time, the capabilities of constexpr have become increasingly powerful. Now, almost all features of the standard library can be used in compile-time computations.\nThe history of constexpr can be traced back to the early versions of C++. By studying standard proposals and compiler source code, we can understand how this language feature was built step by step, why it exists in its current form, how constexpr expressions are actually computed, what future features might be possible, and which features could have existed but were not included in the standard.\nThis article is suitable for everyone, whether you are familiar with constexpr or not!\nC++98/03: I\u0026rsquo;m more const than you In C++, some places require integer constants (like the length of a built-in array type), and these values must be determined at compile time. The C++ standard allows constants to be constructed through simple expressions, for example:\nenum EPlants{ APRICOT = 1 \u0026lt;\u0026lt; 0, LIME = 1 \u0026lt;\u0026lt; 1, PAPAYA = 1 \u0026lt;\u0026lt; 2, TOMATO = 1 \u0026lt;\u0026lt; 3, PEPPER = 1 \u0026lt;\u0026lt; 4, FRUIT = APRICOT | LIME | PAPAYA, VEGETABLE = TOMATO | PEPPER, }; template \u0026lt;int V\u0026gt; int foo(int v = 0){ switch(v){ case 1 + 4 + 7: case 1 \u0026lt;\u0026lt; (5 | sizeof(int)): case (12 \u0026amp; 15) + PEPPER: return v; } } int f1 = foo\u0026lt;1 + 2 + 3\u0026gt;(); int f2 = foo\u0026lt;((1 \u0026lt; 2) ? 10 * 11 : VEGETABLE)\u0026gt;(); These expressions are defined in the [expr.const] section and are called constant expressions. They can only contain:\nLiterals: 1, 'A', true, ... Enum values Template parameters of integer or enum type (e.g., v in template\u0026lt;int v\u0026gt;) sizeof expressions const variables initialized by a constant expression The first few items are easy to understand, but the last one is a bit more complex. If a variable has static storage duration, its memory is normally filled with 0 and then changed when the program starts executing. But for the variables mentioned above, this is too late; their values need to be computed before compilation ends.\nIn C++98/03, there were two types of static initialization:\nZero initialization: memory is filled with 0 and then changed during program execution. Constant initialization: initialized with a constant expression, and the memory (if needed) is immediately filled with the computed value. All other initializations are called dynamic initialization, which we will not consider here.\nLet\u0026rsquo;s look at an example that includes both types of static initialization:\nint foo() { return 13; } const int v1 = 1 + 2 + 3 + 4; // const initialization const int v2 = 15 * v1 + 8; // const initialization const int v3 = foo() + 5; // zero initialization const int v4 = (1 \u0026lt; 2) ? 10 * v3 : 12345; // zero initialization const int v5 = (1 \u0026gt; 2) ? 10 * v3 : 12345; // const initialization The variables v1, v2, and v5 can be used as constant expressions, serving as template arguments, switch case labels, enum values, etc. v3 and v4 cannot. Even though we can clearly see that the value of foo() + 5 is 18, there was no suitable semantics to express this at the time.\nSince constant expressions are defined recursively, if any part of an expression is not a constant expression, the entire expression is not a constant expression. In this evaluation process, only the actually computed expressions are considered, which is why v5 is a constant expression, but v4 is not.\nIf the address of a constantly initialized variable is not taken, the compiler may not allocate memory for it. So we can force the compiler to reserve memory for a constantly initialized variable by taking its address (in fact, even ordinary local variables might be optimized away if their address is not explicitly taken; any optimization that does not violate the as-if rule is allowed. You can consider using the [[gnu::used]] attribute to prevent a variable from being optimized away).\nint main() { std::cout \u0026lt;\u0026lt; v1 \u0026lt;\u0026lt; \u0026amp;v1 \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; v2 \u0026lt;\u0026lt; \u0026amp;v2 \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; v3 \u0026lt;\u0026lt; \u0026amp;v3 \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; v4 \u0026lt;\u0026lt; \u0026amp;v4 \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; v5 \u0026lt;\u0026lt; \u0026amp;v5 \u0026lt;\u0026lt; std::endl; } Compile the above code and check the symbol table (environment is Windows x86-64):\n$ g++ --std=c++98 -c main.cpp $ objdump -t -C main.o (sec 6)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000000 v1 (sec 6)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000004 v2 (sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000000 v3 (sec 3)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000004 v4 (sec 6)(fl 0x00)(ty 0)(scl 3) (nx 0) 0x0000000000000008 v5 ---------------------------------------------------------------- (sec 3)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x0000000000000000 .bss (sec 4)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x0000000000000000 .xdata (sec 5)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x0000000000000000 .pdata (sec 6)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x0000000000000000 .rdata You can see that on my GCC 14, the zero-initialized variables v3 and v4 are placed in the .bss section, while the constant-initialized variables v1, v2, and v5 are placed in the .rdata section. The operating system protects the .rdata section, making it read-only; attempting to write to it will cause a segmentation fault.\nFrom the differences above, it\u0026rsquo;s clear that some const variables are more const than others. But at the time, we had no way to detect this difference (later, C++20 introduced constinit to ensure a variable undergoes constant initialization).\n0-∞: The Constant Evaluator in the Compiler To understand how constant expressions are evaluated, we need a brief overview of compiler construction. The process is roughly the same across different compilers; we will use Clang/LLVM as an example.\nIn general, a compiler can be seen as consisting of three parts:\nFront-end: Converts source code like C/C++/Rust into LLVM IR (a special intermediate representation). Clang is the compiler front-end for the C language family. Middle-end: Optimizes the LLVM IR based on relevant settings. Back-end: Converts the LLVM IR into machine code for a specific platform: x86/Arm/PowerPC, etc. For a simple programming language, you can implement a compiler in 1000 lines by calling LLVM. You only need to be responsible for implementing the language front-end; the back-end is handled by LLVM. You can even consider using existing parser generators like lex/yacc for the front-end.\nSpecifically, the work of a compiler front-end, like Clang here, can be divided into three stages:\nLexical analysis: Converts the source file into a Token Stream. For example, []() { return 13 + 37; } is converted to [, ], (, ), {, return, 13, +, 37, ;, }. Syntax analysis: Produces an Abstract Syntax Tree (AST), which converts the Token Stream from the previous step into a recursive tree-like structure like the one below. lambda-expr └── body └── return-expr └── plus-expr ├── number 13 └── number 37 Code generation: Generates LLVM IR from the given AST. Therefore, the computation of constant expressions (and related matters, like template instantiation) happens strictly in the front-end of the C++ compiler, and LLVM is not involved in such work. The tool that handles this processing of constant expressions (from the simple expressions of C++98 to the complex ones of C++23) is called a constant evaluator.\nOver the years, the restrictions on constant expressions have been continuously relaxed, and Clang\u0026rsquo;s constant evaluator has correspondingly become more and more complex, to the point of managing a memory model. There is an old document that describes constant evaluation in C++98/03. Since constant expressions were very simple back then, they were handled by analyzing the syntax tree for constant folding. Because all arithmetic expressions in the syntax tree have already been parsed into the form of subtrees, computing constants was simply a matter of traversing the subtrees.\nThe source code for the constant evaluator is located in lib/AST/ExprConstant.cpp, which has grown to nearly 17,000 lines at the time of this writing. Over time, it has learned to interpret many things, such as loops (EvaluateLoopBody), all of which are done on the syntax tree.\nConstant expressions have an important difference from runtime code: they must not cause undefined behavior. If the constant evaluator encounters undefined behavior, the compilation will fail.\nerror: constexpr variable \u0026#39;foo\u0026#39; must be initialized by a constant expression 2 | constexpr int foo = 13 + 2147483647; | ^ ~~~~~~~~~~~~~~~ note: value 2147483660 is outside the range of representable values of type \u0026#39;int\u0026#39; 2 | constexpr int foo = 13 + 2147483647; Therefore, they can sometimes be used to detect potential errors in a program.\n2003: Can we really be macro-free? Changes to the standard are made through proposals.\nWhere can proposals be found? What are they composed of?\nAll proposals related to the C++ standard can be found on open-std.org. Most of them have detailed descriptions and are easy to read. They usually consist of the following parts: - The problem currently being faced - Links to relevant wording in the standard - A solution to the aforementioned problem - Suggested changes to the standard\u0026rsquo;s wording - Links to related proposals (a proposal may have multiple versions or need to be compared with other proposals) - In advanced proposals, links to experimental implementations are often included\nYou can use these proposals to understand how each part of C++ has evolved. Not all proposals in the archive are ultimately accepted, but they all have a significant impact on the development of C++.\nAnyone can participate in the evolution of C++ by submitting new proposals.\nThe 2003 proposal N1521 Generalized Constant Expressions pointed out a problem. If a part of an expression contains a function call, the entire expression cannot be a constant expression, even if the function could ultimately be constant-folded. This forced people to use macros when dealing with complex constant expressions, and to some extent, led to the abuse of macros.\ninline int square(int x) { return x * x; } #define SQUARE(x) ((x) * (x)) square(9) std::numeric_limits\u0026lt;int\u0026gt;::max() // Theoretically usable in constant expressions, but actually not SQUARE(9) INT_MAX // Forced to use macros instead Therefore, it was proposed to introduce the concept of constant-valued functions, allowing these functions to be used in constant expressions. For a function to be a constant-valued function, it must satisfy:\ninline, non-recursive, and its return type is not void Consists only of a single return expr statement, and after replacing the function parameters in expr with constant expressions, the result is still a constant expression. If such a function is called with constant expression arguments, the function call expression is also a constant expression.\nint square(int x) { return x * x; } // constant-valued long long_max(int x) { return 2147483647; } // constant-valued int abs(int x) { return x \u0026lt; 0 ? -x : x; } // constant-valued int next(int x) { return ++x; } // non constant-valued With this, without modifying any code, v3 and v4 from the initial example could also be used as constant expressions, because foo would be considered a constant-valued function.\nThe proposal suggested that further support for the following situation could be considered:\nstruct cayley{ const int value; cayley(int a, int b) : value(square(a) + square(b)) {} operator int() const { return value; } }; std::bitset\u0026lt;cayley(98, -23)\u0026gt; s; // same as bitset\u0026lt;10133\u0026gt; Because the member value is totally constant, initialized in the constructor by two calls to constant-valued functions. In other words, according to the general logic of the proposal, this code could be roughly transformed into the following form (moving variables and functions outside the struct):\n// Simulate the constructor call and operator int() of cayley::cayley(98, -23) const int cayley_98_m23_value = square(98) + square(-23); int cayley_98_m23_operator_int() { return cayley_98_m23_value; } // Create bitset std::bitset\u0026lt;cayley_98_m23_operator_int()\u0026gt; s; // same as bitset\u0026lt;10133\u0026gt; But just like with variables, programmers cannot be certain whether a function is a constant-valued function; only the compiler knows.\nProposals usually do not delve into the details of how compilers should implement them. The above proposal stated that implementing it should not present any difficulties, only requiring a slight change to the constant folding that exists in most compilers. However, proposals are closely related to compiler implementation. If a proposal cannot be implemented in a reasonable amount of time, it is unlikely to be adopted. From a later perspective, many large proposals were eventually broken down into several smaller proposals and implemented gradually.\n2006-2007: When Everything Comes to the Surface Fortunately, three years later, a subsequent revision of this proposal, N2235, recognized that too many implicit features are bad, and programmers should have a way to ensure that a variable can be used as a constant, with a compilation error resulting if the corresponding conditions are not met.\nstruct S{ static const int size; }; const int limit = 2 * S::size; // dynamic initialization const int S::size = 256; // const initialization const int z = std::numeric_limits\u0026lt;int\u0026gt;::max(); // dynamic initialization According to the programmer\u0026rsquo;s intention, limit should be constantly initialized, but this is not the case because S::size is defined after limit, which is too late. This can be verified with constinit, which was added in C++20. constinit guarantees that a variable undergoes constant initialization, and if it cannot, a compilation error will occur.\nIn the new proposal, constant-valued functions were renamed to constexpr functions, and the requirements for them remained the same. But now, to be able to use them in constant expressions, they must be declared with the constexpr keyword. Furthermore, if the function body does not meet the relevant requirements, compilation will fail. It was also suggested that some standard library functions (like those in std::numeric_limits) be marked as constexpr, as they meet the relevant requirements. Variables or class members can also be declared as constexpr, in which case, if the variable is not initialized with a constant expression, compilation will fail.\nconstexpr constructors for user-defined classes were also legalized. Such a constructor must have an empty function body and initialize members with constant expressions. Implicitly generated constructors will be marked as constexpr whenever possible. For constexpr objects, the destructor must be trivial, because non-trivial destructors usually make changes in the context of a running program, and no such context exists in constexpr computation.\nHere is an example class containing constexpr:\nstruct complex { constexpr complex(double r, double i) : re(r), im(i) { } constexpr double real() { return re; } constexpr double imag() { return im; } private: double re; double im; }; constexpr complex I(0, 1); // OK In the proposal, objects like I were called user-defined literals. \u0026ldquo;Literals\u0026rdquo; are fundamental entities in C++. Just as \u0026ldquo;simple\u0026rdquo; literals (numbers, characters, etc.) are immediately embedded into assembly instructions, and string literals are stored in sections like .rodata, user-defined literals also have their place there.\nNow constexpr variables can be not only numbers and enums, but also of a literal type, which was introduced in this proposal (reference types were not yet supported). A literal type is a type that can be passed to a constexpr function, and these types are simple enough for the compiler to support them in constant computation.\nThe constexpr keyword eventually became a specifier, similar to override, used only as a marker. After discussion, it was decided not to create a new storage duration type or a new type qualifier, and it was also decided not to allow its use on function parameters, to avoid making the function overload resolution rules overly complex.\n2007: Trying to make the standard library more constexpr? In this year, proposal N2349 Constant Expressions in the Standard Library was put forward, which marked some functions and constants as constexpr, as well as some container functions, for example:\ntemplate\u0026lt;size_t N\u0026gt; class bitset{ // ... constexpr bitset(); constexpr bitset(unsigned long); // ... constexpr size_t size(); // ... constexpr bool operator[](size_t) const; }; The constructors initialize the class members via constant-expressions, and the other functions contain a single return statement, conforming to the current rules.\nOf all the proposals about constexpr, more than half suggest marking certain functions in the standard library as constexpr. In terms of content, they are not very interesting because they do not lead to changes in the core language rules.\n2008: Halting\u0026hellip; problem? I don\u0026rsquo;t care! constexpr unsigned int factorial(unsigned int n){ return n == 0 ? 1 : n * factorial(n - 1); } Initially, the proposal authors wanted to allow recursive calls in constexpr functions, but this was forbidden out of caution. However, during the review process, due to a change in wording, this practice was accidentally allowed. The CWG believed that recursion has enough use cases that it should be allowed. If mutual recursion between functions is allowed, then forward declarations of constexpr functions must also be allowed. When an undefined constexpr function is called in a context that requires constant evaluation, a diagnostic should be issued. This was clarified in N2826.\nSince there is recursion, infinite recursion is possible. Will a function actually recurse infinitely? In some simple cases, static analysis tools can determine if infinite recursion will occur. But in the general case, this is actually the halting problem, which is unsolvable.\nGenerally, compilers set a default recursion depth. If the recursion depth exceeds this default, compilation will fail.\nconstexpr int foo(){ return f() + 1; } constexpr int x = foo(); The above code results in a compilation error:\nerror: \u0026#39;constexpr\u0026#39; evaluation depth exceeds maximum of 512 (use \u0026#39;-fconstexpr-depth=\u0026#39; to increase the maximum) 24 | constexpr int x = foo(); In Clang, the default depth is 512, which can be changed with -fconstexpr-depth. In fact, template instantiation has a similar depth limit. In effect, this limit can be seen as analogous to the stack size for runtime function calls; exceeding this size results in a \u0026ldquo;stack overflow,\u0026rdquo; which is quite reasonable.\n2010: References or Pointers? At the time, many functions could not be marked as constexpr because their parameters contained references.\ntemplate \u0026lt;class T\u0026gt; constexpr const T\u0026amp; max(const T\u0026amp; a, const T\u0026amp; b); // error constexpr pair(); // ok pair(const T1\u0026amp; x, const T2\u0026amp; y); // error Proposal N3039 Constexpr functions with const reference parameters aimed to allow constant references as function parameters and return values.\nIn fact, this was a huge change. Before this, constant evaluation only involved values, not references (or pointers). It was enough to simply operate on values. The introduction of references forced the constant evaluator to build a memory model. To support const T\u0026amp;, the compiler needs to create a temporary object at compile time and then bind the reference to it. Any illegal access to this object should result in a compilation error.\ntemplate \u0026lt;typename T\u0026gt; constexpr T self(const T\u0026amp; a) { return *(\u0026amp;a); } template \u0026lt;typename T\u0026gt; constexpr const T* self_ptr(const T\u0026amp; a) { return \u0026amp;a; } template \u0026lt;typename T\u0026gt; constexpr const T\u0026amp; self_ref(const T\u0026amp; a) { return *(\u0026amp;a); } template \u0026lt;typename T\u0026gt; constexpr const T\u0026amp; near_ref(const T\u0026amp; a) { return *(\u0026amp;a + 1); } constexpr auto test1 = self(123); // OK constexpr auto test2 = self_ptr(123); // Fails, a pointer to a temporary object is not a constant expression constexpr auto test3 = self_ref(123); // OK constexpr auto tets4 = near_ref(123); // Fails, out-of-bounds pointer access 2011: Why no declarations? As mentioned earlier, a constexpr function could only consist of a single return statement. This meant that even declarations that did not affect the evaluation were not allowed. But at least three types of declarations would be helpful for writing such functions: static assertions, type aliases, and local variables initialized by constant expressions.\nconstexpr int f(int x){ constexpr int magic = 42; return x + magic; // should be ok } Proposal N3268 static_assert and list-initialization in constexpr functions aimed to support these static declarations in constexpr functions.\n2012: I need branches! There are many simple functions that one would want to compute at compile time, such as calculating a to the power of n:\nint pow(int a, int n){ if (n \u0026lt; 0) throw std::range_error(\u0026#34;negative exponent for integer power\u0026#34;); if (n == 0) return 1; int sqrt = pow(a, n / 2); int result = sqrt * sqrt; if (n % 2) return result * a; return result; } However, at that time (C++11), to make it constexpr, programmers had to write a completely new version in a pure functional style (no local variables or loops):\nconstexpr int pow_helper(int a, int n, int sqrt) { return sqrt * sqrt * ((n % 2) ? a : 1); } constexpr int pow(int a, int n){ return (n \u0026lt; 0) ? throw std::range_error(\u0026#34;negative exponent for integer power\u0026#34;) : (n == 0) ? 1 : pow_helper(a, n, pow(a, n / 2)); } Proposal N3444 Relaxing syntactic constraints on constexpr functions aimed to further relax the constraints on constexpr functions to allow writing more arbitrary code.\nAllow declaration of local variables of literal type. If they are initialized via a constructor, that constructor must also be marked as constexpr. This allows the constant evaluator to cache these variables, avoiding re-evaluation of the same expressions and improving the efficiency of the constant evaluator. However, modifying these variables is not allowed. Allow local type declarations. Allow the use of if and multiple return statements, requiring each branch to have at least one return statement. Allow expression statements (statements consisting only of an expression). Allow the address or reference of a static variable as a constant expression. constexpr mutex\u0026amp; get_mutex(bool which){ static mutex m1, m2; if (which) return m1; else return m2; } constexpr mutex\u0026amp; m = get_mutex(true); // OK However, for/while loops, goto, switch, and try were not allowed, as these could create complex control flow and even infinite loops.\n2013: Only kids make choices, I want loops too! However, the CWG believed that supporting loops (at least for) in constexpr functions was essential. In 2013, a revised version of the proposal, Relaxing constraints on constexpr functions, was published.\nFour options were considered for implementing constexpr for.\nAdd a completely new loop syntax that interacts well with the functional programming style required by constexpr. While this solves the lack of loops, it does not eliminate programmers\u0026rsquo; dissatisfaction with the existing language (having to rewrite existing code to support constexpr). Only support traditional C-style for loops. For this, at least, changes to variables within constexpr functions would need to be supported. Only support the range-based for loop. Such loops cannot be used with user-defined iterator types unless language rules are further relaxed. Allow a consistent and broad subset of C++ to be used in constexpr functions, potentially including all of C++. The last option was chosen, which greatly influenced the subsequent development of constexpr in C++.\nTo support this option, we had to introduce mutability for variables in constexpr functions, i.e., support modifying the value of variables. According to the proposal, objects created during constant evaluation can now be changed until the end of the evaluation process or the object\u0026rsquo;s lifetime. These evaluation processes will take place in a sandbox-like virtual machine and will not affect external code. Therefore, in theory, the same constexpr arguments will produce the same result.\nconstexpr int f(int a){ int n = a; ++n; // ++n is not a constant expression return n * a; } int k = f(4); // OK, this is a constant expression // n in f can be modified because its lifetime // begins during the evaluation of the expression constexpr int k2 = ++k; // Error, not a constant expression, cannot modify k // because its lifetime did not begin within this expression struct X{ constexpr X() : n(5){ n *= 2; // not a constant expression } int n; }; constexpr int g(){ X x; // initialization of x is a constant expression return x.n; } constexpr int k3 = g(); // OK, this is a constant expression // x.n can be modified because // the lifetime of x begins during the evaluation of g() Additionally, I want to point out that code like this can now also compile:\nconstexpr void add(X\u0026amp; x) { x.n++; } constexpr int g(){ X x; add(x); return x.n; } Local side effects are also allowed in constant evaluation!\n2013: constexpr is not a subset of const! Currently, constexpr functions of a class are automatically marked as const.\nThe proposal constexpr member functions and implicit const points out that if a member function is constexpr, it does not necessarily have to be const. As mutability in constexpr computations becomes more important, this point becomes more prominent. But even before this, it hindered the use of the same function in both constexpr and non-constexpr code:\nstruct B{ A a; constexpr B() : a() {} constexpr const A\u0026amp; getA() const /*implicit*/ { return a; } A\u0026amp; getA() { return a; } // code duplication }; Interestingly, the proposal provided three options, and the second one was chosen:\nMaintain the status quo -\u0026gt; leads to code duplication. A function marked constexpr is not implicitly const -\u0026gt; breaks ABI, as the const signature of a member function is part of the function\u0026rsquo;s type. Use mutable for marking: constexpr A \u0026amp;getA() mutable { return a; }; -\u0026gt; even more inconsistent. Ultimately, option 2 was accepted. Now, if a member function is marked constexpr, it does not mean it is an implicitly const member function.\nThe next part is here: The History of constexpr in C++ (Part 2).\n","permalink":"https://www.ykiko.me/en/articles/682031684/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eA few months ago, I wrote an article introducing C++ templates: \u003ca href=\"https://www.ykiko.me/en/articles/655902377\"\u003eLooking at Flowers in a Fog: A True Understanding of C++ Templates\u003c/a\u003e.\u003c/p\u003e","title":"The History of constexpr in C++! (Part One)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nno hard code Define an enum\nenum Color { RED, GREEN, BLUE }; Try to print\nColor color = RED; std::cout \u0026lt;\u0026lt; color \u0026lt;\u0026lt; std::endl; // output =\u0026gt; 0 If we need enums as log output, we don\u0026rsquo;t want to manually look up the corresponding string based on the enum value when viewing logs, which is troublesome and not intuitive. We want to directly output the string corresponding to the enum value, such as RED, GREEN, BLUE.\nManually write a switch to convert enum to string\nstd::string enum_to_string(Color color) { switch(color) { case Color::RED: return \u0026#34;RED\u0026#34;; case Color::GREEN: return \u0026#34;GREEN\u0026#34;; case Color::BLUE: return \u0026#34;BLUE\u0026#34;; } return \u0026#34;Unknown\u0026#34;; } However, when there are many enums, manual writing is not convenient and very tedious. Specifically, if we want to add several enum definitions, the corresponding content in the string mapping table also needs to be modified. When the number reaches hundreds, omissions are very likely. Or if we take over someone else\u0026rsquo;s project and find that they have a lot of enums, with too much content, manual writing is very time-consuming.\nWe need to find a solution that can automatically make the relevant modifications. In other languages, such as Java, C#, and Python, this functionality can be easily achieved through reflection. However, C++ currently does not have reflection, so this path is blocked. Currently, there are three main solutions to this problem.\ntemplate The content introduced in this section has already been encapsulated by others, and you can directly use the magic_enum library. The following mainly analyzes the principle of this library. For convenience of demonstration, it will be implemented with C++20, but C++17 is actually sufficient.\nIn the three major mainstream compilers, there are some special macro variables. __PRETTY_FUNCTION__ in GCC and Clang, and __FUNCSIG__ in MSVC. These macro variables will be replaced with the function signature during compilation. If the function is a template function, the template instantiation information will also be output (you can also use source_location added to the C++20 standard, which has a similar effect to these macros).\ntemplate \u0026lt;typename T\u0026gt; void print_fn(){ #if __GNUC__ || __clang__ std::cout \u0026lt;\u0026lt; __PRETTY_FUNCTION__ \u0026lt;\u0026lt; std::endl; #elif _MSC_VER std::cout \u0026lt;\u0026lt; __FUNCSIG__ \u0026lt;\u0026lt; std::endl; #endif } print_fn\u0026lt;int\u0026gt;(); // gcc and clang =\u0026gt; void print_fn() [with T = int] // msvc =\u0026gt; void __cdecl print_fn\u0026lt;int\u0026gt;(void) In particular, when the template parameter is an enum constant, the name of the enum constant will be output.\ntemplate \u0026lt;auto T\u0026gt; void print_fn(){ #if __GNUC__ || __clang__ std::cout \u0026lt;\u0026lt; __PRETTY_FUNCTION__ \u0026lt;\u0026lt; std::endl; #elif _MSC_VER std::cout \u0026lt;\u0026lt; __FUNCSIG__ \u0026lt;\u0026lt; std::endl; #endif } enum Color { RED, GREEN, BLUE }; print_fn\u0026lt;RED\u0026gt;(); // gcc and clang =\u0026gt; void print_fn() [with auto T = RED] // msvc =\u0026gt; void __cdecl print_fn\u0026lt;RED\u0026gt;(void) As you can see, the enum name appears in a specific position. By simple string trimming, we can get the content we want.\ntemplate\u0026lt;auto value\u0026gt; constexpr auto enum_name(){ std::string_view name; #if __GNUC__ || __clang__ name = __PRETTY_FUNCTION__; std::size_t start = name.find(\u0026#39;=\u0026#39;) + 2; std::size_t end = name.size() - 1; name = std::string_view{ name.data() + start, end - start }; start = name.rfind(\u0026#34;::\u0026#34;); #elif _MSC_VER name = __FUNCSIG__; std::size_t start = name.find(\u0026#39;\u0026lt;\u0026#39;) + 1; std::size_t end = name.rfind(\u0026#34;\u0026gt;(\u0026#34;); name = std::string_view{ name.data() + start, end - start }; start = name.rfind(\u0026#34;::\u0026#34;); #endif return start == std::string_view::npos ? name : std::string_view{ name.data() + start + 2, name.size() - start - 2 }; } Test it\nenum Color { RED, GREEN, BLUE }; int main(){ std::cout \u0026lt;\u0026lt; enum_name\u0026lt;RED\u0026gt;() \u0026lt;\u0026lt; std::endl; // output =\u0026gt; RED } This successfully meets our needs. But the story doesn\u0026rsquo;t end here; this form requires the enum to be a template parameter, which means it only supports compile-time constants. However, most of the time, the enums we use are runtime variables. What to do? To convert static to dynamic, we just need to create a lookup table. Consider generating an array through template metaprogramming, where each element is the string representation of the enum corresponding to its index. One problem is how large this array should be, which requires us to get the number of enum items. A more direct approach is to define a pair of start and end markers directly within the enum, so that subtracting them directly gives the maximum number of enums. However, often we cannot modify the enum definition. Fortunately, there is a small trick to solve this problem.\nconstexpr Color color = static_cast\u0026lt;Color\u0026gt;(-1); std::cout \u0026lt;\u0026lt; enum_name\u0026lt;color\u0026gt;() \u0026lt;\u0026lt; std::endl; // output =\u0026gt; (Color)2 As you can see, if an integer does not have a corresponding enum item, then the corresponding enum name will not be output, but rather a parenthesized cast expression. This way, we just need to check if the resulting string contains ) to know if the corresponding enum item exists. We can recursively determine the largest enum value (this search method has limited applicability, e.g., for scattered enum values, it might be relatively difficult).\ntemplate\u0026lt;typename T, std::size_t N = 0\u0026gt; constexpr auto enum_max(){ constexpr auto value = static_cast\u0026lt;T\u0026gt;(N); if constexpr (enum_name\u0026lt;value\u0026gt;().find(\u0026#34;)\u0026#34;) == std::string_view::npos) return enum_max\u0026lt;T, N + 1\u0026gt;(); else return N; } Then, generate a corresponding length array using make_index_sequence.\ntemplate\u0026lt;typename T\u0026gt; requires std::is_enum_v\u0026lt;T\u0026gt; constexpr auto enum_name(T value){ constexpr auto num = enum_max\u0026lt;T\u0026gt;(); constexpr auto names = []\u0026lt;std::size_t... Is\u0026gt;(std::index_sequence\u0026lt;Is...\u0026gt;){ return std::array\u0026lt;std::string_view, num\u0026gt;{ enum_name\u0026lt;static_cast\u0026lt;T\u0026gt;(Is)\u0026gt;()... }; }(std::make_index_sequence\u0026lt;num\u0026gt;{}); return names[static_cast\u0026lt;std::size_t\u0026gt;(value)]; } Let\u0026rsquo;s test it.\nenum Color { RED, GREEN, BLUE }; int main(){ Color color = RED; std::cout \u0026lt;\u0026lt; enum_name(color) \u0026lt;\u0026lt; std::endl; // output =\u0026gt; RED } Further, we could consider supporting bitwidth enums, i.e., enums of the form RED | BLUE, but we won\u0026rsquo;t go into that here.\nThe disadvantage of this method is obvious: generating a lookup table through template instantiation can significantly slow down compilation. If the number of items in the enum is large, on some compilers with low constant evaluation efficiency, such as MSVC, it may increase compilation time by tens of seconds or even longer. Therefore, it is generally only suitable for small enums. The advantage is that it is lightweight and ready to use, requiring no other actions.\ncode generation Since manually writing string-to-enum conversions is troublesome, why not write a script to generate the code? Indeed, we can easily accomplish this using libclang\u0026rsquo;s Python bindings. For details on how to use this tool, you can refer to Use clang tools to freely control C++ code. Below, only the code demonstrating the effect is shown.\nimport clang.cindex as CX def generate_enum_to_string(enum: CX.Cursor): branchs = \u0026#34;\u0026#34; for child in enum.get_children(): branchs += f\u0026#39;case {child.enum_value}: return \u0026#34;{child.spelling}\u0026#34;;\\n\u0026#39; code = f\u0026#34;\u0026#34;\u0026#34; std::string_view {enum.spelling}_to_string({enum.spelling} color) {{ switch(color) {{ {branchs} }} }}\u0026#34;\u0026#34;\u0026#34; return code def traverse(node: CX.Cursor): if node.kind == CX.CursorKind.ENUM_DECL: print(generate_enum_to_string(node)) return for child in node.get_children(): traverse(child) index = CX.Index.create() tu = index.parse(\u0026#39;main.cpp\u0026#39;) traverse(tu.cursor) Test code\n// main.cpp enum Color { RED, GREEN, BLUE }; This is the final generated code. You can directly generate a .cpp file, place it in a fixed directory, and then run this script before building.\nstd::string_view enum_to_string(Color color) { switch(color) { case 0: return \u0026#34;RED\u0026#34;; case 1: return \u0026#34;BLUE\u0026#34;; case 2: return \u0026#34;GREEN\u0026#34;; } } Advantages: Non-intrusive, can be used for a large number of enums. Disadvantages: Has external dependencies, requires integrating code generation into the build process. This might make the build process very complex.\nxmacro The above two methods are non-intrusive. That is, you might get someone else\u0026rsquo;s library and cannot modify its code, so you have to do it this way. What if you define the enums yourself entirely? In fact, you can handle them specially during the definition phase to facilitate subsequent use. For example (comments at the beginning of the code indicate the current filename):\n// Color.def #ifndef COLOR_ENUM #define COLOR_ENUM(...) #endif COLOR_ENUM(RED) COLOR_ENUM(GREEN) COLOR_ENUM(BLUE) #undef COLOR_ENUM Then, where it needs to be used, you can generate code by modifying the macro definition.\n// Color.h enum Color { #define COLOR_ENUM(x) x, #include \u0026#34;Color.def\u0026#34; }; std::string_view color_to_string(Color value){ switch(value){ #define COLOR_ENUM(x) case x: return #x; #include \u0026#34;Color.def\u0026#34; } } This way, you only need to add and modify relevant content in the def file. If you need to iterate through the enum later, you can also directly define a macro to generate the code, which is very convenient. In fact, for a large number of enums, many open-source projects adopt this solution. For example, when Clang defines TokenKind, it does so. Please refer to Token.def for the relevant code. Since Clang needs to adapt to multiple language frontends, the total number of TokenKinds reaches several hundred. If this approach were not used, adding and modifying Tokens would be extremely difficult.\nconclusion Non-intrusive and a small number of enums, compilation speed is not very important: use template lookup tables (requires at least C++17). Non-intrusive and a large number of enums, compilation speed is important: use external code generation. Intrusive: directly use macros. Year after year, we await reflection, still unsure when it will enter the standard. For those interested in learning about C++ static reflection in advance, you can read Analysis of C++26 Static Reflection Proposal. Or for those who don\u0026rsquo;t know what reflection is, you can refer to this article: Reflection Tutorial for C++ Programmers.\n","permalink":"https://www.ykiko.me/en/articles/680412313/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"no-hard-code\"\u003eno hard code\u003c/h2\u003e\n\u003cp\u003eDefine an \u003ccode\u003eenum\u003c/code\u003e\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-cpp\" data-lang=\"cpp\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003eenum\u003c/span\u003e \u003cspan class=\"nc\"\u003eColor\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eRED\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eGREEN\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eBLUE\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e};\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003cp\u003eTry to print\u003c/p\u003e","title":"How to elegantly convert enum to string in C++?"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nAs is well known, there are currently two special constructors in C++: the copy constructor and the move constructor.\nThe copy constructor was added in C++98 to copy an object. For resource-owning types like vector, copying involves copying the resources it owns.\nstd::vector\u0026lt;int\u0026gt; v1 = {1, 2, 3}; std::vector\u0026lt;int\u0026gt; v2 = v1; // copy Of course, the overhead of copying can sometimes be very large and completely unnecessary. Therefore, C++11 introduced the move constructor to transfer an object\u0026rsquo;s resources to another object. This results in much lower overhead compared to direct copying.\nstd::vector\u0026lt;int\u0026gt; v1 = {1, 2, 3}; std::vector\u0026lt;int\u0026gt; v2 = std::move(v1); // move Note that move in C++ is called non-destructive move. The C++ standard specifies that the state of an object after being moved is a valid state, and the implementation must ensure that its destructor can be called normally. A moved-from object may still be used again (whether it can be used depends on the implementation).\nIs that all? Are these two constructors enough? Certainly not. In fact, there\u0026rsquo;s another widely used operation that can be called relocate. Consider the following scenario:\nSuppose you are implementing a vector, and resizing is necessary. So you write a private member function grow for resizing (the following code example temporarily ignores exception safety).\nvoid grow(std::size_t new_capacity) { auto new_data = malloc(new_capacity * sizeof(T)); for (std::size_t i = 0; i \u0026lt; m_Size; ++i) { new (new_data + i) T(std::move(m_Data[i])); m_Data[i].~T(); } free(m_Data); m_Data = new_data; m_Capacity = new_capacity; } The code above is simple: first, allocate new memory using malloc, then initialize objects in the newly allocated memory by calling the move constructor via placement new. Note, as mentioned earlier: move in C++ is non-destructive, so after calling the move constructor, the original object still needs its destructor called to correctly end its lifetime. Finally, free the original memory and update the member variables.\nNote: The construction and destruction steps can also use std::construct_at and std::destroy_at introduced in C++20, which are essentially wrappers for placement new and destroy.\nHowever, this implementation is not efficient. In C++, there is a concept called trivially copyable, which can be checked using the is_trivially_copyable trait. For types satisfying this constraint, a new object can be created by directly using memcpy or memmove. Consider this example:\nstruct Point { int x; int y; }; static_assert(std::is_trivially_copyable_v\u0026lt;Point\u0026gt;); Point points[3] = {{1, 2}, {3, 4}, {5, 6}}; Point new_points[3]; std::memcpy(new_points, points, sizeof(points)); This not only saves multiple function calls, but memcpy and memmove themselves are highly optimized builtin functions (which can be vectorized using SIMD). Therefore, their efficiency is much higher compared to direct copying via copy constructors.\nTo make our vector faster, we can also apply this optimization. Using if constexpr introduced in C++17 for compile-time checks, we can easily write the following code:\nvoid grow(std::size_t new_capacity) { auto new_data = malloc(new_capacity * sizeof(T)); if constexpr (std::is_trivially_copyable_v\u0026lt;T\u0026gt;) { std::memcpy(new_data, m_Data, m_Size * sizeof(T)); } else if constexpr (std::is_move_constructible_v\u0026lt;T\u0026gt;) { for (std::size_t i = 0; i \u0026lt; m_Size; ++i) { std::construct_at(new_data + i, std::move(m_Data[i])); std::destroy_at(m_Data + i); } } else if constexpr (std::is_copy_constructible_v\u0026lt;T\u0026gt;) { for (std::size_t i = 0; i \u0026lt; m_Size; ++i) { std::construct_at(new_data + i, m_Data[i]); std::destroy_at(m_Data + i); } } free(m_Data); m_Data = new_data; m_Capacity = new_capacity; } Note: One could also consider directly using uninitialized_move_n and destroy_n introduced in C++17 to avoid reinventing the wheel, as these functions already include similar optimizations. However, due to pointer aliasing issues, they might at most optimize to memmove. In the context of vector resizing, it can be further optimized to memcpy, so optimizing it ourselves yields better results.\nOverkill This feels a bit strange. Our main goal is to move all objects from old memory to new memory, but we are using the trivially copyable trait, which seems too restrictive. Creating a completely new object and relocating an existing object to a new position feel quite different. Consider the following example. It seems that directly memcpying types like std::string is also possible. Since we manually manage memory and manually call destructors, there won\u0026rsquo;t be multiple destructor calls.\nstd::byte buffer[sizeof(std::string)]; auto\u0026amp; str1 = *std::construct_at((std::string*) buffer, \u0026#34;hello world\u0026#34;); std::byte new_buffer[sizeof(std::string)]; std::memcpy(new_buffer, buffer, sizeof(std::string)); auto\u0026amp; str2 = *(std::string*) new_buffer; str2.~basic_string(); Carefully considering the data flow and destructor calls, we find nothing amiss. It seems we should look for a concept called \u0026ldquo;trivially moveable\u0026rdquo; to relax the conditions and allow more types to be optimized. Unfortunately, there is no such concept in the current C++ standard. To distinguish it from the existing C++ move operation, we call this operation \u0026ldquo;relocate,\u0026rdquo; meaning to place the original object in a completely new location.\nIn fact, many famous open-source components have also implemented similar functionalities through template specialization, such as:\nBSL\u0026rsquo;s bslmf::IsBitwiseMoveable\u0026lt;T\u0026gt; Folly\u0026rsquo;s folly::IsRelocatable\u0026lt;T\u0026gt; QT\u0026rsquo;s QTypeInfo\u0026lt;T\u0026gt;::isRelocatable By marking specific types, they can benefit from this optimization. However, the above optimization is only logically equivalent in our minds; strictly speaking, writing it this way is currently undefined behavior in C++. So what to do? We can only try to introduce a new proposal and modify the standard wording to support the above optimization.\nCurrent Status First, this problem has been discovered a long time ago. For example, there have been related discussions on Zhihu for a while:\nCompared to malloc new / free old, how much performance advantage does realloc have? Why doesn\u0026rsquo;t C++ vector\u0026rsquo;s push_back resizing mechanism consider allocating memory after the tail element? There are quite a few similar issues. realloc attempts to resize in place; if it fails, it tries to allocate a new block of memory and then uses memcpy to copy the original data to the new memory. So, in the current C++ standard, if you want to use realloc directly for resizing, you must ensure that the object is trivially copyable. Of course, as mentioned earlier, this condition is quite strict, and a new concept needs to be introduced to relax it.\nRelated proposals were first put forward in 2015. The main active proposals in 2023 (all targeting C++26) are the following four:\nstd::is_trivially_relocatable Trivial Relocatability For C++26 Relocating prvalues Nontrivial Relocation via a New owning reference Type They can roughly be divided into two factions: conservatives and radicals.\nConservatives The conservative solution is to add the concepts of relocatable and trivially-relocatable, along with corresponding traits for checking.\nA type is relocatable if it is move-constructible and destructible.\nA type is trivially-relocatable if it satisfies one of the following conditions:\nIt is a trivially-copyable type. It is an array of trivially-relocatable types. It is a class type declared with the trivially_relocatable attribute having a true value. It is a class type satisfying the following conditions: No user-provided move constructor or move assignment operator. No user-provided copy constructor or copy assignment operator. No user-provided destructor. No virtual member functions. No virtual base classes. Every member is a reference or a trivially-relocatable type, and all base classes are trivially-relocatable types. A new attribute, trivially_relocatable, can be used to explicitly mark a type as trivially-relocatable. It can take a constant expression as an argument to support generic types.\ntemplate\u0026lt;typename T\u0026gt; struct [[trivially_relocatable(std::std::is_trivially_relocatable_v\u0026lt;T\u0026gt;)]] X { T t; }; Some new operations have also been added:\ntemplate\u0026lt;class T\u0026gt; T *relocate_at(T* source, T* dest); template\u0026lt;class T\u0026gt; [[nodiscard]] remove_cv_t\u0026lt;T\u0026gt; relocate(T* source); // ... template\u0026lt;class InputIterator, class Size, class NoThrowForwardIterator\u0026gt; auto uninitialized_relocate_n(InputIterator first, Size n, NoThrowForwardIterator result); These functions are implemented by the compiler and effectively perform a move + destroy of the original object. They also allow the compiler, under the as-if rule, to optimize operations on trivially_relocatable types into memcpy or memmove. For structures that cannot be optimized, such as those containing self-references, the move constructor + destructor is called normally. This way, when implementing vector, using these standard library functions directly allows for optimization.\nThis proposal is called conservative primarily because it does not affect existing APIs or ABIs, offering strong compatibility and ease of introduction.\nRadicals The more radical approach, which is the main topic today, advocates for introducing a relocate constructor and a new keyword reloc.\nreloc is a unary operator that can be used on non-static local variables of functions. reloc performs the following operations:\nIf the variable is a reference type, it performs perfect forwarding. Otherwise, it converts the source object into a prvalue and returns it. Furthermore, an object that has been relocated is considered a compile-time error if used again (the actual rules for determination are more detailed; see the relevant sections in the proposal).\nA new constructor, the relocate constructor, is introduced with the form T(T), where the function parameter is a prvalue of type T. This signature was chosen to complete the C++ value category system. Currently (C++17) and beyond, C++ copy constructors create objects from lvalues, move constructors create objects from xvalues, and relocate constructors create objects from prvalues. This completely covers all value categories, is very friendly to overload resolution, and semantically harmonious.\nstruct X { std::string s; X(X x): s(std::move(x.s)) {} } Another benefit is that this form of constructor T(T) is currently disallowed, so it won\u0026rsquo;t conflict with existing code. One point to note: you might have heard people explain why copy constructor parameters must be references. The reason given is that if it\u0026rsquo;s not a reference, function arguments would also need to be copied, leading to infinite recursion.\nIn fact, this explanation is outdated. Due to mandatory copy elision introduced in C++17, even if a type has no copy or move constructor, it can be constructed directly from a prvalue without any copy/move constructor calls.\nstruct X { X() = default; X(const X\u0026amp;) = delete; X(X\u0026amp;\u0026amp;) = delete; }; X f(){ return X{}; }; X x = f(); The above code compiles successfully with major compilers when C++17 is enabled. Therefore, the T(T) form of constructor will not lead to infinite recursion here. This proposal also introduces a relocate assignment operator, with the form T\u0026amp; operator=(T), where the function parameter is a prvalue of type T. Of course, there is also the concept of trivially-relocatable, which allows relocate constructors satisfying this condition to be optimized to memcpy. However, this is determined by rules like the relocate constructor itself, and users cannot explicitly mark it with an attribute. I think this is not ideal; users should be allowed to manually mark a type as trivially-relocatable. tuple cannot be trivially-copyable due to current implementation limitations, as it must have a constructor, and pair is also not trivially-copyable, which is clearly unreasonable. So I hope this proposal will eventually support marking a type as trivially-relocatable via an attribute.\nI personally quite like this proposal. With it, I even feel that the C++ value category system can be associated with elegance. Before this, I always thought the value category system was chaotic and evil, a messy patch to maintain compatibility with old code. But if this proposal passes:\nLvalue — Copy construction Xvalue — Move construction Prvalue — Relocate construction This has a sense of complete logical self-consistency and beauty. Other details in the proposal are more trivial, so I will omit them here. Interested readers can read them themselves.\nWhy has it taken so long to enter the standard? Regarding why this problem has not been solved after so many years, it\u0026rsquo;s actually a rather long history, caused by flaws in the C++ object model. Until the implicit lifetime proposal was accepted in C++20, even optimizing trivially-copyable types to memcpy in the initial grow function implementation was undefined behavior.\nOf course, don\u0026rsquo;t be afraid of \u0026ldquo;undefined behavior\u0026rdquo; as if it\u0026rsquo;s an insurmountable obstacle. In fact, this has always been considered a defect in the standard. This optimization has long been widely practiced in various codebases, and its reliability has been verified. It\u0026rsquo;s just that the C++ standard has never had appropriate wording to describe this situation. Considering it completely UB is certainly incorrect, and using it without restrictions is also incorrect. So the key is how to find an appropriate boundary between the two. I will write a dedicated article soon to introduce C++ object model related content, so I won\u0026rsquo;t elaborate here.\nOther Languages C++ certainly has its shortcomings. Considering historical compatibility and other factors, its design is constrained. What about new languages? How do they solve these problems?\nRust First, let\u0026rsquo;s look at Rust, which has been quite popular recently. In fact, as long as a structure does not contain self-referential members, using memcpy to move an old object to new memory is almost always feasible. Additionally, Rust doesn\u0026rsquo;t have things like multiple inheritance, virtual functions (complex vtable structures), or virtual inheritance (which are quite strange and rarely used in practice), so almost all types can directly use memcpy to create a new object from an old one. Conveniently, the move semantic in Safe Rust is a destructive move, so its default implementation of move is directly memcpy, which is much cleaner.\nHowever, the default move can only move local non-static variables. If a variable is a reference, you cannot move it. But thankfully, Safe Rust provides a std::mem::take function to solve this problem:\nuse std::mem; let mut v: Vec\u0026lt;i32\u0026gt; = vec![1, 2]; let old_v = mem::take(\u0026amp;mut v); assert_eq!(vec![1, 2], old_v); assert!(v.is_empty()); The effect is move + empty the original object, which is quite similar to C++\u0026rsquo;s move. There are also std::mem::swap and std::mem::replace for other scenarios where moving from a reference is needed.\nAlthough it might not happen often, what if a type contains a self-referential structure? In fact, allowing users to define custom constructors is a relatively simple solution, but the Rust community seems quite averse to it. The current solution is through Pin, but the Rust community also seems dissatisfied with this solution; it\u0026rsquo;s hard to understand and hard to use. Future new designs should be related to linear types; relevant discussions can be found in Changing the rules of Rust.\nMojo This language was also promoted on Zhihu some time ago, but it is still in a very early state. However, from the beginning, it considered providing four constructors:\n__init__() __copy__() __move__() __take__() Among them, copy is similar to the copy constructor, move is similar to the relocate constructor, and take is similar to the current move constructor. More details are currently unavailable.\n","permalink":"https://www.ykiko.me/en/articles/679782886/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAs is well known, there are currently two special constructors in C++: the copy constructor and the move constructor.\u003c/p\u003e","title":"Relocate Semantics in C++"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nIntroduction In C++17, a feature called \u0026ldquo;structured binding\u0026rdquo; was introduced. This feature is similar to pattern matching in other languages and allows us to conveniently access members of a struct.\nstruct Point { int x; int y; }; Point p = {1, 2}; auto [x, y] = p; // x = 1, y = 2 Using it, we can implement some interesting functionalities, including limited reflection capabilities for structs, such as implementing a for_each function.\nvoid for_each(auto\u0026amp;\u0026amp; object, auto\u0026amp;\u0026amp; func) { using T = std::remove_cvref_t\u0026lt;decltype(object)\u0026gt;; if constexpr(std::is_aggregate_v\u0026lt;T\u0026gt;) { auto\u0026amp;\u0026amp; [x, y] = object; for_each(x, func); for_each(y, func); } else { func(object); } } This way, for any aggregate type with two members, we can iterate over it.\nstruct Line { Point start; Point end; }; Line line = { {1, 2}, {3, 4}, }; int main() { for_each(line, [](auto\u0026amp;\u0026amp; object) { std::cout \u0026lt;\u0026lt; object \u0026lt;\u0026lt; std::endl; }); return 0; } However, there\u0026rsquo;s a problem: it only recursively supports structs with 2 fields. If you try to use a struct with 3 fields, the compiler will throw a hard error. This means an incorrect number of structured bindings, which cannot be handled by SFINAE or requires, and will directly cause compilation to abort.\nstruct Vec3 { float x; float y; float z; }; template \u0026lt;typename T\u0026gt; constexpr bool two = requires { []() { auto [x, y] = T{1, 2, 3}; }; }; static_assert(two\u0026lt;Vec3\u0026gt;); // !hard error We can solve this problem through manual dispatch.\nif constexpr(N == 1) { auto\u0026amp;\u0026amp; [x] = object; // ... } else if constexpr(N == 2) { auto\u0026amp;\u0026amp; [x, y] = object; // ... } else if constexpr(N == 3) { auto\u0026amp;\u0026amp; [x, y, z] = object; // ... } else { // ... } You can freely enumerate up to the number you want to support. Here, N is the number of struct fields. You might need to explicitly pass it as a template parameter, or specialize a template for each type to store its field count. But this is still cumbersome. So, is there a way for the compiler to automatically calculate the number of fields in a struct for us?\nAntony Polukhin A preliminary solution has already been provided in boost/pfr. Its author, Antony Polukhin, gave a detailed introduction to this at CppCon2016 and CppCon2018. However, the version used by the author was C++14/17, and the code was quite obscure and difficult to understand. After I rewrote it using C++20, readability improved significantly.\nFirst, in C++, we can write an Any type, which supports conversion to any type. Essentially, you just need to write its type conversion function as a template function.\nstruct Any { constexpr Any(int) {}; template \u0026lt;typename T\u0026gt; constexpr operator T () const; }; static_assert(std::is_convertible_v\u0026lt;Any, int\u0026gt;); // true static_assert(std::is_convertible_v\u0026lt;Any, std::string\u0026gt;); // true Next, we can leverage the property of aggregate initialization, which is that for expressions exceeding the maximum number of aggregate initializers, the requires statement will return false.\ntemplate \u0026lt;typename T, std::size_t N\u0026gt; constexpr auto test() { return []\u0026lt;std::size_t... I\u0026gt;(std::index_sequence\u0026lt;I...\u0026gt;) { return requires { T{Any(I)...}; }; }(std::make_index_sequence\u0026lt;N\u0026gt;{}); } static_assert(test\u0026lt;Point, 0\u0026gt;()); // true static_assert(test\u0026lt;Point, 1\u0026gt;()); // true static_assert(test\u0026lt;Point, 2\u0026gt;()); // true static_assert(!test\u0026lt;Point, 3\u0026gt;()); // false Note that Point only has two members here. When we pass three arguments to the initializer list, requires will return false. Using this property, we can change the above trial process to be recursive, i.e., linearly search this sequence until false is found.\ntemplate \u0026lt;typename T, int N = 0\u0026gt; constexpr auto member_count() { if constexpr(!test\u0026lt;T, N\u0026gt;()) { return N - 1; } else { return member_count\u0026lt;T, N + 1\u0026gt;(); } } If test\u0026lt;T, N\u0026gt; is true, it means N arguments can successfully construct T. Then we recursively try N + 1 arguments until test\u0026lt;T, N\u0026gt; is false. Then N - 1 is the number of members of T. This way, we can get the number of members of T using member_count\u0026lt;T\u0026gt;(). Let\u0026rsquo;s test the effect.\nstruct A { std::string a; }; static_assert(member_count\u0026lt;A\u0026gt;() == 1); struct B { std::string a; int b; }; static_assert(member_count\u0026lt;B\u0026gt;() == 2); Great, a big success! Does it end here?\nJoão Baptista Consider the following three examples:\nLValue Reference struct A { int\u0026amp; x; }; static_assert(member_count\u0026lt;A\u0026gt;() == 1); /// error Default Constructor Deleted struct A { A() = delete; }; struct B { A a1; A a2; }; static_assert(member_count\u0026lt;B\u0026gt;() == 2); // error Array struct C { int x[2]; }; static_assert(member_count\u0026lt;C\u0026gt;() == 1); // error In these three cases, the original method completely fails. Why is that?\nThe main content of this subsection is based on two blog posts by João Baptista:\nCounting the number of fields in an aggregate in C++20 Counting the number of fields in an aggregate in C++20 — part 2 He summarized the issues in boost/pfr and proposed solutions, addressing the three problems mentioned above.\nLValue Reference The first problem is relatively easy to understand, mainly because conversions produced by T() types are all prvalues. Lvalue references cannot bind to prvalues, but rvalue references can.\nstatic_assert(!std::is_constructible_v\u0026lt;int\u0026amp;, Any\u0026gt;); // false static_assert(std::is_constructible_v\u0026lt;int\u0026amp;\u0026amp;, Any\u0026gt;); // true What to do? Actually, there\u0026rsquo;s a clever way to solve this problem.\nstruct Any { constexpr Any(int) {} template \u0026lt;typename T\u0026gt; constexpr operator T\u0026amp; () const; template \u0026lt;typename T\u0026gt; constexpr operator T\u0026amp;\u0026amp; () const; }; One converts to an lvalue reference, the other to an rvalue reference. If only one of them can match, that one will be chosen. If both can match, the lvalue reference conversion has higher precedence than the rvalue reference conversion and will be preferred, avoiding overload resolution issues.\nstatic_assert(std::is_constructible_v\u0026lt;int, Any\u0026gt;); // true static_assert(std::is_constructible_v\u0026lt;int\u0026amp;, Any\u0026gt;); // true static_assert(std::is_constructible_v\u0026lt;int\u0026amp;\u0026amp;, Any\u0026gt;); // true static_assert(std::is_constructible_v\u0026lt;const int\u0026amp;, Any\u0026gt;); // true Great, first problem solved!\nDefault Constructor Why does it fail if the default constructor is deleted? Do you remember our initial Point type?\nstruct Point { int x; int y; }; Our test results show that 0, 1, 2 work, but 3 does not. I can understand why it fails if the number of elements in {} exceeds the number of Point members. But why does it succeed if the number of elements is less than the number of members? The reason is simple: members that are not explicitly initialized will be value-initialized. Therefore, the number of arguments in {} can be less than the actual number of fields. However, if a field\u0026rsquo;s default constructor is forbidden, value initialization cannot occur, leading to a compilation error.\nstruct A { A() = delete; }; struct B { A a1; A a2; int x; }; For the type below, if we try with Any, it should be that 0, 1 don\u0026rsquo;t work, 2, 3 work, and 4, 5, \u0026hellip; and beyond don\u0026rsquo;t work. This means that at least all members that cannot be default-initialized must be initialized. If a type supports default initialization, its valid search range is [0, N], where N is its maximum number of fields. If it does not support default initialization, then the search range becomes [M, N], where M is the minimum number of initializers required to initialize all non-default-initializable members.\nOur previous search strategy started from 0; if the current one was true, it would try the next, until false was encountered. Clearly, this search strategy is not suitable for the current situation, because in the range [0, M), it would also match the previous search strategy\u0026rsquo;s failure condition. We now need to change it so that it stops searching only if the current one is true AND the next one is false. This way, we can precisely find the upper bound of this range.\ntemplate \u0026lt;typename T, int N = 0\u0026gt; constexpr auto member_count() { if constexpr(test\u0026lt;T, N\u0026gt;() \u0026amp;\u0026amp; !test\u0026lt;T, N + 1\u0026gt;()) { return N; } else { return member_count\u0026lt;T, N + 1\u0026gt;(); } } Let\u0026rsquo;s test it.\nstruct A { int\u0026amp; x; }; static_assert(member_count\u0026lt;A\u0026gt;() == 1); struct B { A a1; A a2; }; static_assert(member_count\u0026lt;B\u0026gt;() == 2); OK, the second problem is also solved, that\u0026rsquo;s really cool!\nBuiltin Array If there is an array among the struct\u0026rsquo;s members, then the final result of the calculation will treat each member of the array as a separate field. This is essentially because aggregate initialization for standard arrays has a \u0026lsquo;backdoor\u0026rsquo;.\nstruct Array { int x[2]; }; Array array{1, 2}; // ok Notice that there\u0026rsquo;s only one field, yet two values can be filled. However, this \u0026lsquo;hole\u0026rsquo; for arrays leads to a dilemma: if a struct contains an array, it will ultimately result in an incorrect count. Is there any way to solve this problem?\nNote: This part might be a bit difficult to understand\nConsider the following example:\nstruct D { int x; int y[2]; int z[2]; } For example, let\u0026rsquo;s look at its initialization cases:\nD{ 1, 2, 3, 4, 5 } // ok // Position 0 D{ {1}, 2, 3, 4, 5 } // ok, position 0 can hold at most 1 element D{ {1, 2}, 3, 4, 5 } // error // Position 1 D{ 1, {2}, 3, 4, 5 } // error D{ 1, {2, 3}, 4, 5 } // ok, position 1 can hold at most 2 elements D{ 1, {2, 3, 4}, 5 } // error // Position 3 D{ 1, 2, 3, {4}, 5} // error D{ 1, 2, 3, {4, 5} } // ok, position 3 can hold at most 2 elements That\u0026rsquo;s right, we can use nested initialization to solve this problem! First, we use the original method to find the maximum possible number of struct fields (including array expansion, which is 5 here). Then, at each position, we try to insert the original sequence into this nested initialization. By continuously trying, we can find the maximum number of elements that can be placed at that position. If the maximum number exceeds 1, it means this position is an array. This maximum number is the number of elements in the array. We just subtract the excess quantity from the final result.\nSounds simple, but it\u0026rsquo;s a bit complex to implement.\nFirst, let\u0026rsquo;s write a helper function. By filling in different N1, N2, N3, we can correspond to the different cases above. Note that Any at I2 is nested initialization, with an extra pair of braces.\ntemplate \u0026lt;typename T, std::size_t N1, std::size_t N2, std::size_t N3\u0026gt; constexpr bool test_three_parts() { return []\u0026lt;std::size_t... I1, std::size_t... I2, std::size_t... I3\u0026gt;(std::index_sequence\u0026lt;I1...\u0026gt;, std::index_sequence\u0026lt;I2...\u0026gt;, std::index_sequence\u0026lt;I3...\u0026gt;) { return requires { T{Any(I1)..., {Any(I2)...}, Any(I3)...}; }; }(std::make_index_sequence\u0026lt;N1\u0026gt;{}, std::make_index_sequence\u0026lt;N2\u0026gt;{}, std::make_index_sequence\u0026lt;N3\u0026gt;{}); } Next, we need to write a function to test if it\u0026rsquo;s feasible to place N elements using two layers of {} at a specified position.\ntemplate \u0026lt;typename T, std::size_t position, std::size_t N\u0026gt; constexpr bool try_place_n_in_pos() { // Maximum possible number of fields constexpr auto Total = member_count\u0026lt;T\u0026gt;(); if constexpr(N == 0) { // Placing 0 elements has the same effect as original, definitely feasible return true; } else if constexpr(position + N \u0026lt;= Total) { // The sum of elements definitely cannot exceed the total return test_three_parts\u0026lt;T, position, N, Total - position - N\u0026gt;(); } else { return false; } } Since there\u0026rsquo;s a lot of content, it might be a bit difficult to understand. Let\u0026rsquo;s first show the test results of this function for easier understanding. This way, even if you don\u0026rsquo;t understand the function implementation, it\u0026rsquo;s fine. Let\u0026rsquo;s use the previous struct D as an example again.\ntry_place_n_in_pos\u0026lt;D, 0, 1\u0026gt;(); // This is testing the case D{ {1}, 2, 3, 4, 5 } // placing 1 element at position 0 try_place_n_in_pos\u0026lt;D, 1, 2\u0026gt;(); // This is testing the case D{ 1, {2, 3}, 4, 5 } // placing 2 elements at position 1 Alright, just understand what this function is doing. It keeps trying at a certain position, and then it can find the maximum number of elements that can be placed at that position.\ntemplate \u0026lt;typename T, std::size_t pos, std::size_t N = 0\u0026gt; constexpr auto search_max_in_pos() { constexpr auto Total = member_count\u0026lt;T\u0026gt;(); std::size_t result = 0; [\u0026amp;]\u0026lt;std::size_t... Is\u0026gt;(std::index_sequence\u0026lt;Is...\u0026gt;) { ((try_place_n_in_pos\u0026lt;T, pos, Is\u0026gt;() ? result = Is : 0), ...); }(std::make_index_sequence\u0026lt;Total + 1\u0026gt;()); return result; } This is where we search for the maximum number of elements that can be placed at this position.\nstatic_assert(search_max_in_pos\u0026lt;D, 0\u0026gt;() == 1); // Position 0 can hold at most 1 element static_assert(search_max_in_pos\u0026lt;D, 1\u0026gt;() == 2); // Position 1 can hold at most 2 elements static_assert(search_max_in_pos\u0026lt;D, 3\u0026gt;() == 2); // Position 3 can hold at most 2 elements This is consistent with our initial manual test results. Next, we iterate through all positions, find all additional array element counts, and then subtract these excess amounts from the initial maximum count.\ntemplate \u0026lt;typename T, std::size_t N = 0\u0026gt; constexpr auto search_all_extra_index(auto\u0026amp;\u0026amp; array) { constexpr auto total = member_count\u0026lt;T\u0026gt;(); constexpr auto num = search_max_in_pos\u0026lt;T, N\u0026gt;(); constexpr auto value = num \u0026gt; 1 ? num : 1; array[N] = value; if constexpr(N + value \u0026lt; total) { search_all_extra_index\u0026lt;T, N + value\u0026gt;(array); } } Here, it\u0026rsquo;s a recursive search, with results stored in an array. Note N + value here. If two elements are found here, we can directly skip two positions forward. For example, if position 1 can hold 2 elements, I can directly look for position 3, no need to check position 2.\nNext, we just store all results in an array and then subtract the excess.\ntemplate \u0026lt;typename T\u0026gt; constexpr auto true_member_count() { constexpr auto Total = member_count\u0026lt;T\u0026gt;(); if constexpr(Total == 0) { return 0; } else { std::array\u0026lt;std::size_t, Total\u0026gt; indices = {1}; search_all_extra_index\u0026lt;T\u0026gt;(indices); std::size_t result = Total; std::size_t index = 0; while(index \u0026lt; Total) { auto n = indices[index]; result -= n - 1; index += n; } return result; } } Let\u0026rsquo;s test the result.\nstruct D { int x; int y[2]; int z[2]; }; static_assert(true_member_count\u0026lt;D\u0026gt;() == 3); struct E { int\u0026amp; x; int y[2][2]; int z[2]; int\u0026amp;\u0026amp; w; }; static_assert(true_member_count\u0026lt;E\u0026gt;() == 4); Let\u0026rsquo;s take the array generated by type E as an example here; we can print them all out to see.\nindex: 0 num: 1 // Position 0 corresponds to x, count is 1, reasonable index: 1 num: 4 // Position 1 corresponds to y, count is 4, reasonable index: 5 num: 2 // Position 5 corresponds to z, count is 2, reasonable index: 7 num: 1 // Position 7 corresponds to w, count is 1, reasonable A perfect curtain call! I really admire the author\u0026rsquo;s idea; it\u0026rsquo;s so ingenious and breathtaking. However, at the end of the article, he said,\nAs it could be seen, I ran into some inconsistencies between gcc and clang (and for some reason I haven’t managed to make it work on MSVC at all, but that is another story).\nHe said he encountered inconsistencies in behavior between Clang and GCC, and couldn\u0026rsquo;t get this method to work on MSVC at all.\nIt seems things are far from over!\nYKIKO I spent some time understanding the previous author\u0026rsquo;s article. To be honest, his template code was very difficult for me to read. He didn\u0026rsquo;t like using if constexpr for branching, instead using many specializations for selection, which greatly impacted readability. Therefore, the code presented earlier is not entirely the original author\u0026rsquo;s code, but rather my interpretation in a more readable form.\nWhat situations would break the second author\u0026rsquo;s code?\nMove constructor deleted struct X { X(X\u0026amp;\u0026amp;) = delete; }; struct F { X x; }; static_assert(true_member_count\u0026lt;F\u0026gt;() == 1); // error Struct contains other struct members struct Y { int x; int y; }; struct G { Y x; int y; }; static_assert(true_member_count\u0026lt;G\u0026gt;() == 2); // error MSVC and GCC bugs Move Constructor All of this stems from a new rule introduced in C++17 regarding copy elision.\nSince C++17, a prvalue is not materialized until needed, and then it is constructed directly into the storage of its final destination. This sometimes means that even when the language syntax visually suggests a copy/move (e.g. copy initialization), no copy/move is performed — which means the type need not have an accessible copy/move constructor at all.\nWhat does this mean? An example will make it clearest.\nstruct M { M() = default; M(M\u0026amp;\u0026amp;) = delete; }; M m1 = M(); // ok in C++17, error in C++14 M m2 = std::move(M()); // error Huh? Why is this happening? The first one compiles, but the second one doesn\u0026rsquo;t. Did I write std::move unnecessarily?\nThe reason the second one fails to compile is quite understandable: because the move constructor was deleted, so it couldn\u0026rsquo;t be called, leading to a compilation failure. Note that the behavior of the first case is different in C++14 and C++17. In C++14, a temporary object is first created, then the move constructor is called to initialize m1. However, this behavior is actually redundant, so the compiler might optimize away this unnecessary step. But there was still a possibility of calling the move constructor, so if it was deleted, it would GG (game over) and fail to compile. In C++17, this optimization became a language-mandated requirement, so there is no move construction step at all, and naturally, no accessible constructor is needed, thus it compiles successfully in C++17.\nThis also means there\u0026rsquo;s a difference even among rvalues. prvalue, i.e., pure rvalues, can directly construct objects via copy elision (for example, the return value of a non-reference type function here is a prvalue), but xvalue, i.e., expiring values, must have a callable move constructor and cannot undergo copy elision (the return value of an rvalue reference type function is an xvalue). Therefore, std::move actually had a negative effect here.\nBack to our problem, note that Any has a conversion function that converts to an rvalue reference type, so if this situation is encountered, there\u0026rsquo;s no way around it. But with another clever modification, this problem can be solved again:\nstruct Any { constexpr Any(int) {} template \u0026lt;typename T\u0026gt; requires std::is_copy_constructible_v\u0026lt;T\u0026gt; operator T\u0026amp; (); template \u0026lt;typename T\u0026gt; requires std::is_move_constructible_v\u0026lt;T\u0026gt; operator T\u0026amp;\u0026amp; (); template \u0026lt;typename T\u0026gt; requires (!std::is_copy_constructible_v\u0026lt;T\u0026gt; \u0026amp;\u0026amp; !std::is_move_constructible_v\u0026lt;T\u0026gt;) operator T (); }; Note that we\u0026rsquo;ve added constraints to the types here. If it\u0026rsquo;s a non-movable type (move constructor deleted), it corresponds to the last type conversion function, directly producing a prvalue to construct the object. This cleverly solves the problem. The constraint for copy construction is to prevent overload resolution ambiguity (and can also fix an MSVC bug at the end).\nNested Struct In fact, the author\u0026rsquo;s original idea was good, but overlooked a problem, which is that not only array types can be initialized with double braces {{}}, structs can too.\nstruct A { int x; int y; }; struct B { A x; int y; }; B{{1, 2}, 3}; // ok Therefore, if this position is a struct member, it will lead to an incorrect count. So we need to first determine if this position is a struct. If it is, there\u0026rsquo;s no need to try to find the maximum number of elements to place at this position; just move on to the next position.\nSo how do we determine if the member at the current position is a struct? Consider the following example:\nstruct A { int x; int y; }; struct B { A x; int y[2]; }; Manually enumerate the test cases:\nB{any, any, any}; // ok B{{any}, any, any}; // ok B{{any, any}, any, any}; // ok B{any, {any}, any}; // error B{any, {any, any}, any}; // error OK, the answer is quite obvious: if the current position is a struct, extra elements can be added to it. Note that the original Total, i.e., the maximum possible number of elements, is 3. But if the current position is a struct, placing 4 elements is also possible, but it\u0026rsquo;s not possible if it\u0026rsquo;s an array. We use this property to determine if the current position is a struct. If it is, we skip to the next position; if not, we search for the maximum number of elements that can be placed at this position.\nEssentially, it\u0026rsquo;s recursively trying to place elements at this position. But there\u0026rsquo;s a problem here: the struct member at the current position might still contain members that cannot be default-initialized. So how many elements need to be placed to determine if this position can be initialized? This is still uncertain. I\u0026rsquo;ve set the maximum upper limit here to 10. If the non-default-initializable members in the sub-struct are located after 10, this method will fail.\ntemplate \u0026lt;typename T, std::size_t pos, std::size_t N = 0, std::size_t Max = 10\u0026gt; constexpr bool has_extra_elements() { constexpr auto Total = member_count\u0026lt;T\u0026gt;(); if constexpr(test_three_parts\u0026lt;T, pos, N, Total - pos - 1\u0026gt;()) { return false; } else if constexpr(N + 1 \u0026lt;= Max) { return has_extra_elements\u0026lt;T, pos, N + 1\u0026gt;(); } else { return true; } } With this function, we just need to slightly modify the logic of the original search function.\ntemplate \u0026lt;typename T, std::size_t pos, std::size_t N = 0\u0026gt; constexpr auto search_max_in_pos() { constexpr auto Total = member_count\u0026lt;T\u0026gt;(); if constexpr(!has_extra_elements\u0026lt;T, pos\u0026gt;()) { return 1; } else { // ... unchanged } } It\u0026rsquo;s just adding a branch condition: if there are no extra elements at the current position, it directly returns 1; if there are, it searches for the maximum boundary (of the array). This way, the problem in the original author\u0026rsquo;s code is solved.\nLet\u0026rsquo;s test it again.\nstruct Y { int x; int y; }; struct G { Y x; int y; }; static_assert(true_member_count\u0026lt;G\u0026gt;() == 2); // ok Success!\nCompiler Bug I also found the GCC and MSVC issues mentioned by the author in the original article. MSVC currently has a defect:\nstruct Any { template \u0026lt;typename T\u0026gt; // requires std::is_copy_constructible_v\u0026lt;T\u0026gt; operator T\u0026amp; () const; }; struct A { int x[2]; }; A a{Any{}}; The code above compiles successfully, which means MSVC allows aggregate initialization of array members directly from an array reference. However, this is not allowed by the C++ standard. This bug leads to incorrect member counting on MSVC. The solution is actually very simple: we\u0026rsquo;ve already incidentally solved this problem earlier; just uncomment that line. Because arrays are non-copy-constructible types, the constraint will exclude this overloaded function, thus preventing this issue.\nGCC13 also has a serious defect, which directly leads to an ICE (Internal Compiler Error). This bug can be reproduced with the following lines of code:\nstruct Number { int x; operator int\u0026amp; () { return x; } }; struct X { int\u0026amp; x; }; template \u0026lt;typename T\u0026gt; concept F = requires { T{{Number{}}}; }; int main() { static_assert(!F\u0026lt;X\u0026gt;); // internal compiler error } This clearly shouldn\u0026rsquo;t lead to an ICE, and it\u0026rsquo;s very strange that this bug only exists in GCC13. The test code is on godbolt. Clang has no issues, but GCC directly throws an internal compiler error. And GCC12 and Clang have different compilation results, but clang is actually correct. This is where the original author\u0026rsquo;s article mentioned inconsistencies between Clang and GCC.\nNote: As reminded in the comments section, Clang15 also encounters similar internal compiler errors.\nAfterword Later, after some discussion with people in the comments section, the above handling still had some shortcomings. A typical example is when a member variable\u0026rsquo;s constructor is a template function, it will cause an error, for example, std::any. The reason is that it doesn\u0026rsquo;t know whether to call the type conversion function or the template constructor (overload resolution failure).\nstd::any any = Any(0); // conversion from \u0026#39;Any\u0026#39; to \u0026#39;std::any\u0026#39; is ambiguous // candidate: \u0026#39;Any::operator T\u0026amp;() [with T = std::any]\u0026#39; // candidate: \u0026#39;std::any::any(_Tp\u0026amp;\u0026amp;) However, there is currently no perfect solution to this problem. Directly checking if T can be constructed from Any cannot solve this problem, as it would involve recursive constraints, ultimately leading to unsolvable problems and compilation errors. Here, a rather clever trick was used:\nstruct Any { constexpr Any(int) {} template \u0026lt;typename T\u0026gt; requires (std::is_copy_constructible_v\u0026lt;T\u0026gt;) operator T\u0026amp; (); template \u0026lt;typename T\u0026gt; requires (std::is_move_constructible_v\u0026lt;T\u0026gt; \u0026amp;\u0026amp; !std::is_copy_constructible_v\u0026lt;T\u0026gt;) operator T\u0026amp;\u0026amp; (); struct Empty {}; template \u0026lt;typename T\u0026gt; requires (!std::is_copy_constructible_v\u0026lt;T\u0026gt; \u0026amp;\u0026amp; !std::is_move_constructible_v\u0026lt;T\u0026gt; \u0026amp;\u0026amp; !std::is_constructible_v\u0026lt;T, Empty\u0026gt;) operator T (); }; It declares an empty class, and then tries to see if this empty class can be converted to type T. If not, it means T\u0026rsquo;s constructor should not be a template function, and thus the type conversion can take effect. If it can, it means T\u0026rsquo;s constructor is a template function, and this type conversion function should be excluded. Of course, if T\u0026rsquo;s constructor has some strange constraints, for example, explicitly excluding Empty but accepting Any, this would still lead to an error. But this would be intentional. Under normal circumstances, this problem is unlikely to be encountered, so this problem can be considered solved.\nIn addition, there\u0026rsquo;s another issue related to references: if a struct contains a reference member of a non-copyable/non-movable type, it will also fail. Let\u0026rsquo;s take an lvalue reference as an example below.\nstruct CanNotCopy { CanNotCopy(const CanNotCopy\u0026amp;) = delete; }; struct X { CanNotCopy\u0026amp; x; }; X x{Any(0)}; // error Here, T will be instantiated as CanNotCopy type. Clearly, because it\u0026rsquo;s non-copyable, overload resolution selects operator T(), but the actual result is an rvalue that cannot bind to an lvalue reference, leading to a compilation error. Can this problem be solved? It\u0026rsquo;s very difficult. In fact, we cannot make the following two expressions simultaneously valid:\nstruct X { CanNotCopy\u0026amp; x; }; struct Y { CanNotCopy x; }; X x{Any(0)}; Y y{Any(0)}; In these two aggregate initializations, the T instantiated by the type conversion function is CanNotCopy type. But if we want both x and y to be well-formed, it means for the same T, two different overloaded functions must be chosen: the first choosing operator T\u0026amp;(), the second choosing operator T(). However, there is no precedence between these two functions, and C++ cannot overload based on return type, so this cannot be done. One possible solution is to write three Any types, converting to T\u0026amp;, T\u0026amp;\u0026amp;, and T respectively, and then try these three at each position. This way, the problem could indeed be solved, but it could lead to an exponential increase in the number of template instantiations at a rate of 3^N. This implementation would have a greater overhead than all previous iteration methods combined, so I won\u0026rsquo;t demonstrate it here. Theoretically feasible, but practically it would exhaust compilers.\nConclusion All the code in this article is available on Compiler Explorer and passes on all three major compilers (GCC version 12). There is a lot of test code. If you find other corner cases, feel free to leave a comment and discuss.\nAlright, this article ends here. If you\u0026rsquo;ve patiently read through the entire article, you\u0026rsquo;re probably like me, enjoying these interesting things. The most interesting aspect of this kind of thing is using the small interfaces exposed by C++ to gradually extend it and finally achieve very elegant interfaces. Of course, for the author, it\u0026rsquo;s not actually elegant OvO. In short, this kind of thing is like a game, a daily pastime. Finding bugs in C++ compilers and delving into these obscure features is also a pleasure. If we must talk about practical value, this kind of thing is almost impossible to use in a real-world production code environment. Firstly, finding the number of struct fields by instantiating a large number of templates will significantly slow down compilation speed. Moreover, even with such a great effort, it only achieves iteration over aggregate types and does not support non-aggregate types. Not only are the side effects strong, but the main functionality is also not strong. Considering the trade-offs, it\u0026rsquo;s very much not worth it. For such needs resembling reflection, before C++ adds static reflection, the currently truly feasible automated solution is to use code generation for this task.\nI also have related articles that detail the principles, providing solutions that don\u0026rsquo;t rely on these clever tricks and are truly usable in real-world projects: A Reflection Tutorial for C++ Programmers.\nOf course, if these functionalities are used merely for logging, debugging, or learning the principles of how templates work, and not for any core code parts, and you don\u0026rsquo;t want to introduce heavy dependencies, then using these things might not be a bad idea.\n","permalink":"https://www.ykiko.me/en/articles/674157958/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn \u003ccode\u003eC++17\u003c/code\u003e, a feature called \u0026ldquo;\u003cstrong\u003estructured binding\u003c/strong\u003e\u0026rdquo; was introduced. This feature is similar to pattern matching in other languages and allows us to conveniently access members of a struct.\u003c/p\u003e","title":"A 7-Year Relay Race: Getting the Number of Fields in a C++ Struct"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nStatic and Dynamic The terms static typing and dynamic typing are probably familiar to everyone. The key to distinguishing between them lies in the timing of type checking. What does that mean?\nSuppose we have the following C++ code:\nstd::string s = \u0026#34;123\u0026#34;; int a = s + 1; As we know, a string cannot be directly added to an int, so there should be a TypeError here. C++ checks for type errors at compile time, so this code will trigger a compile time error.\nConsider the corresponding Python code:\ns = \u0026#34;123\u0026#34; a = s + 1 Python, on the other hand, checks for errors at runtime, so the above code will actually produce a runtime error.\nIt\u0026rsquo;s necessary to emphasize the meaning of compile time and runtime here. These terms might be encountered frequently, but their meanings can vary in different contexts. In our context:\ncompile time: Generally refers to the process of compiling code into target code, before the program actually starts running.\nFor AOT-compiled languages, such as C++, it\u0026rsquo;s the process of compiling C++ into machine code. For JIT-compiled languages, such as C#/Java, it generally refers to the process of compiling source code into IR. For transpiled languages, such as TypeScript, it\u0026rsquo;s the process of compiling TypeScript into JavaScript. runtime: Generally refers to when the program is actually running, for example, when machine code is executed on a CPU, or bytecode is executed on a virtual machine.\nTherefore, C++, Java, C#, and TypeScript are called statically typed languages. Although Python also has a stage where source code is compiled into bytecode, type checking is not performed at this stage, so Python is called a dynamically typed language.\nHowever, this is not absolute; the boundary between static and dynamic languages is not so clear. Although C++, Java, C#, and TypeScript are statically typed languages, they all provide several ways to bypass static type checking, such as C++\u0026rsquo;s pointer, Java/C#\u0026rsquo;s Object, and TypeScript\u0026rsquo;s Any. Dynamically typed languages are also gradually introducing static type checking, such as Python\u0026rsquo;s type hint and JavaScript\u0026rsquo;s TypeScript, etc. Both are borrowing features from each other.\nCurrently, C++ only provides std::any for type erasure, but often it\u0026rsquo;s not flexible enough. We want more advanced features, such as accessing members by field name, calling functions by function name, and creating class instances by type name. The goal of this article is to build a dynamic type in C++ similar to Java/C#\u0026rsquo;s Object.\nMeta Type Here, we don\u0026rsquo;t adopt an intrusive design like Java/C#\u0026rsquo;s Object (inheritance), but rather a non-intrusive design called a fat pointer. A fat pointer is essentially a struct that contains a pointer to the actual data and a pointer to type information. In the case of inheritance, the vtable pointer would be present in the object header.\nclass Any { Type* type; // type info, similar to vtable void* data; // pointer to the data uint8_t flag; // special flag public: Any() : type(nullptr), data(nullptr), flag(0) {} Any(Type* type, void* data) : type(type), data(data), flag(0B00000001) {} Any(const Any\u0026amp; other); Any(Any\u0026amp;\u0026amp; other); ~Any(); template \u0026lt;typename T\u0026gt; Any(T\u0026amp;\u0026amp; value); // box value to Any template \u0026lt;typename T\u0026gt; T\u0026amp; cast(); // unbox Any to value Type* GetType() const { return type; } // get type info Any invoke(std::string_view name, std::span\u0026lt;Any\u0026gt; args); // call method void foreach(const std::function\u0026lt;void(std::string_view, Any\u0026amp;)\u0026gt;\u0026amp; fn); // iterate fields }; The member functions will be implemented step by step in later sections. Next, let\u0026rsquo;s consider what is stored inside this Type type.\nMeta Information struct Type { std::string_view name; // type name void (*destroy)(void*); // destructor void* (*copy)(const void*); // copy constructor void* (*move)(void*); // move constructor using Field = std::pair\u0026lt;Type*, std::size_t\u0026gt;; // type and offset using Method = Any (*)(void*, std::span\u0026lt;Any\u0026gt;); // method std::unordered_map\u0026lt;std::string_view, Field\u0026gt; fields; // field info std::unordered_map\u0026lt;std::string_view, Method\u0026gt; methods; // method info }; The content here is simple: in Type, we store the type name, destructor, move constructor, copy constructor, field information, and method information. Field information stores the field type and field name, and method information stores the method name and function address. If we want to extend it further, we could also store parent class information and overloaded function information. Since this is just an example, we won\u0026rsquo;t consider them for now.\nFunction Type Erasure To store member functions of different types in the same container, we must perform function type erasure. All types of functions are erased into the type Any(*)(void*, std::span\u0026lt;Any\u0026gt;). Here, Any is the Any type we defined above, void* actually represents the this pointer, and std::span\u0026lt;Any\u0026gt; is the function\u0026rsquo;s parameter list. Now we need to consider how to perform this function type erasure.\nLet\u0026rsquo;s take the given member function say as an example:\nstruct Person { std::string_view name; std::size_t age; void say(std::string_view msg) { std::cout \u0026lt;\u0026lt; name \u0026lt;\u0026lt; \u0026#34; say: \u0026#34; \u0026lt;\u0026lt; msg \u0026lt;\u0026lt; std::endl; } }; First, for convenience, let\u0026rsquo;s implement Any\u0026rsquo;s cast method:\ntemplate \u0026lt;typename T\u0026gt; Type* type_of(); // type_of\u0026lt;T\u0026gt; returns type info of T template \u0026lt;typename T\u0026gt; T\u0026amp; Any::cast() { if(type != type_of\u0026lt;T\u0026gt;()) { throw std::runtime_error{\u0026#34;type mismatch\u0026#34;}; } return *static_cast\u0026lt;T*\u0026gt;(data); } Leveraging the C++ feature where a non-capturing lambda can be implicitly converted to a function pointer, this erasure can be easily achieved.\nauto f = +[](void* object, std::span\u0026lt;Any\u0026gt; args) { auto\u0026amp; self = *static_cast\u0026lt;Person*\u0026gt;(object); self.say(args[0].cast\u0026lt;std::string_view\u0026gt;()); return Any{}; }; The principle is actually very simple: just write a wrapper function to perform type conversion and then forward the call. However, manually writing such a large block of forwarding code for each member function is still cumbersome. We can consider using template metaprogramming for code generation to automatically generate the above code, simplifying the type erasure process.\ntemplate \u0026lt;typename T\u0026gt; struct member_fn_traits; template \u0026lt;typename R, typename C, typename... Args\u0026gt; struct member_fn_traits\u0026lt;R (C::*)(Args...)\u0026gt; { using return_type = R; using class_type = C; using args_type = std::tuple\u0026lt;Args...\u0026gt;; }; template \u0026lt;auto ptr\u0026gt; auto* type_ensure() { using traits = member_fn_traits\u0026lt;decltype(ptr)\u0026gt;; using class_type = typename traits::class_type; using result_type = typename traits::return_type; using args_type = typename traits::args_type; return +[](void* object, std::span\u0026lt;Any\u0026gt; args) -\u0026gt; Any { auto self = static_cast\u0026lt;class_type*\u0026gt;(object); return [=]\u0026lt;std::size_t... Is\u0026gt;(std::index_sequence\u0026lt;Is...\u0026gt;) { if constexpr(std::is_void_v\u0026lt;result_type\u0026gt;) { (self-\u0026gt;*ptr)(args[Is].cast\u0026lt;std::tuple_element_t\u0026lt;Is, args_type\u0026gt;\u0026gt;()...); return Any{}; } else { return Any{(self-\u0026gt;*ptr)(args[Is].cast\u0026lt;std::tuple_element_t\u0026lt;Is, args_type\u0026gt;\u0026gt;()...)}; } }(std::make_index_sequence\u0026lt;std::tuple_size_v\u0026lt;args_type\u0026gt;\u0026gt;{}); }; } I won\u0026rsquo;t explain the code here; it\u0026rsquo;s okay if you don\u0026rsquo;t understand it. Essentially, it automates the process of member function type erasure using template metaprogramming. You just need to know how to use it, and it\u0026rsquo;s very simple to use. \u0026amp;Person::say here is the syntax for a pointer to member; if you\u0026rsquo;re not familiar with it, you can refer to Complete Analysis of C++ Member Pointers.\nauto f = type_ensure\u0026lt;\u0026amp;Person::say\u0026gt;(); // decltype(f) =\u0026gt; Any (*)(void*, std::span\u0026lt;Any\u0026gt;) Type Information Registration In fact, we need to generate a corresponding Type struct for each type to store its information, so that it can be accessed correctly. This functionality is handled by the type_of function mentioned above.\ntemplate \u0026lt;typename T\u0026gt; Type* type_of() { static Type type; type.name = typeid(T).name(); type.destroy = [](void* obj) { delete static_cast\u0026lt;T*\u0026gt;(obj); }; type.copy = [](const void* obj) { return (void*)(new T(*static_cast\u0026lt;const T*\u0026gt;(obj))); }; type.move = [](void* obj) { return (void*)(new T(std::move(*static_cast\u0026lt;T*\u0026gt;(obj)))); }; return \u0026amp;type; } template \u0026lt;\u0026gt; Type* type_of\u0026lt;Person\u0026gt;() { static Type type; type.name = \u0026#34;Person\u0026#34;; type.destroy = [](void* obj) { delete static_cast\u0026lt;Person*\u0026gt;(obj); }; type.copy = [](const void* obj) { return (void*)(new Person(*static_cast\u0026lt;const Person*\u0026gt;(obj))); }; type.move = [](void* obj) { return (void*)(new Person(std::move(*static_cast\u0026lt;Person*\u0026gt;(obj)))); }; type.fields.insert({\u0026#34;name\u0026#34;, {type_of\u0026lt;std::string_view\u0026gt;(), offsetof(Person, name)}}); type.fields.insert({\u0026#34;age\u0026#34;, {type_of\u0026lt;std::size_t\u0026gt;(), offsetof(Person, age)}}); type.methods.insert({\u0026#34;say\u0026#34;, type_ensure\u0026lt;\u0026amp;Person::say\u0026gt;()}); return \u0026amp;type; }; We provide a default implementation so that if built-in basic types are used, some information can be automatically registered. Then, through specialization, we can provide implementations for custom types. Now that we have this meta-information, we can complete the implementation of Any\u0026rsquo;s member functions.\nComplete Any Implementation Any::Any(const Any\u0026amp; other) { type = other.type; data = type-\u0026gt;copy(other.data); flag = 0; } Any::Any(Any\u0026amp;\u0026amp; other) { type = other.type; data = type-\u0026gt;move(other.data); flag = 0; } template \u0026lt;typename T\u0026gt; Any::Any(T\u0026amp;\u0026amp; value) { type = type_of\u0026lt;std::decay_t\u0026lt;T\u0026gt;\u0026gt;(); data = new std::decay_t\u0026lt;T\u0026gt;(std::forward\u0026lt;T\u0026gt;(value)); flag = 0; } Any::~Any() { if(!(flag \u0026amp; 0B00000001) \u0026amp;\u0026amp; data \u0026amp;\u0026amp; type) { type-\u0026gt;destroy(data); } } void Any::foreach(const std::function\u0026lt;void(std::string_view, Any\u0026amp;)\u0026gt;\u0026amp; fn) { for(auto\u0026amp; [name, field]: type-\u0026gt;fields) { Any any = Any{field.first, static_cast\u0026lt;char*\u0026gt;(data) + field.second}; fn(name, any); } } Any Any::invoke(std::string_view name, std::span\u0026lt;Any\u0026gt; args) { auto it = type-\u0026gt;methods.find(name); if(it == type-\u0026gt;methods.end()) { throw std::runtime_error{\u0026#34;method not found\u0026#34;}; } return it-\u0026gt;second(data, args); } The foreach implementation iterates through all Fields, gets their offset and type, and then wraps them into an Any type. Note that this is just a simple wrapper; in fact, because we set a flag, this wrapping will not lead to multiple destructions. invoke finds the corresponding function from the list of member functions and then calls it.\nExample Code int main() { Any person = Person{\u0026#34;Tom\u0026#34;, 18}; std::vector\u0026lt;Any\u0026gt; args = {std::string_view{\u0026#34;Hello\u0026#34;}}; person.invoke(\u0026#34;say\u0026#34;, args); // =\u0026gt; Tom say: Hello auto f = [](std::string_view name, Any\u0026amp; value) { if(value.GetType() == type_of\u0026lt;std::string_view\u0026gt;()) { std::cout \u0026lt;\u0026lt; name \u0026lt;\u0026lt; \u0026#34; = \u0026#34; \u0026lt;\u0026lt; value.cast\u0026lt;std::string_view\u0026gt;() \u0026lt;\u0026lt; std::endl; } else if(value.GetType() == type_of\u0026lt;std::size_t\u0026gt;()) { std::cout \u0026lt;\u0026lt; name \u0026lt;\u0026lt; \u0026#34; = \u0026#34; \u0026lt;\u0026lt; value.cast\u0026lt;std::size_t\u0026gt;() \u0026lt;\u0026lt; std::endl; } }; person.foreach(f); // name = Tom // age = 18 return 0; } The complete code is available on Github. With this, we have implemented an extremely dynamic, non-intrusive Any.\nExtensions and Optimizations This article provides only a very simple introduction to the principles, and the scenarios considered are also quite basic. For example, inheritance and function overloading are not considered here, and there are several areas where runtime efficiency could be optimized. Nevertheless, the features I\u0026rsquo;ve written might still be excessive for your needs. The main point this article aims to convey is that for a performance-oriented language like C++, there are indeed scenarios where these more dynamic features are required. However, efficiency and generality are often contradictory; at the language level, because generality must be considered, efficiency is often not ideal. For instance, RTTI and dynamic_cast are often complained about, but fortunately, compilers provide options to disable them. Similarly, my implementation may not perfectly fit your scenario, but once you understand these not-so-difficult principles, you can certainly implement a version that is more suitable for your specific needs.\nPoints for extension:\nSupport modifying members by name Add a global map to record information for all types, thereby supporting the creation of class instances by class name ... Points for optimization:\nReduce the number of new calls, or implement your own object pool Or, if too much meta-information is currently stored, trim it according to your own needs In addition, a current pain point is that all this meta-information has to be written manually, making it difficult to maintain. If internal class definitions are modified, these registration codes must also be modified, otherwise errors will occur. A practical solution here is to use a code generator to automatically generate this boilerplate code. For information on how to perform these operations, you can refer to other articles in this series.\n","permalink":"https://www.ykiko.me/en/articles/670191053/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"static-and-dynamic\"\u003eStatic and Dynamic\u003c/h2\u003e\n\u003cp\u003eThe terms static typing and dynamic typing are probably familiar to everyone. The key to distinguishing between them lies in the timing of type checking. What does that mean?\u003c/p\u003e","title":"Implement Object in C++!"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nFirst, what is metadata? Consider the following python code. We want to automatically modify the corresponding field value based on the input string.\nclass Person: def __init__(self, age, name): self.age = age self.name = name person = Person(10, \u0026#34;xiaohong\u0026#34;) setattr(person, \u0026#34;age\u0026#34;, 12) setattr(person, \u0026#34;name\u0026#34;, \u0026#34;xiaoming\u0026#34;) print(f\u0026#34;name: {person.name}, age: {person.age}\u0026#34;) # =\u0026gt; name: xiaoming, age: 12 setattr is a built-in python function that perfectly meets our needs. It modifies the corresponding value based on the input field name.\nWhat if we want to implement this in C++? C++ does not have a built-in function like setattr. Here\u0026rsquo;s a code example. (For now, let\u0026rsquo;s only consider types that can be directly memcpy\u0026rsquo;d, i.e., trivially copyable types).\nstruct Person { int age; std::string_view name; }; // Name -\u0026gt; field offset, field size std::map\u0026lt;std::string_view, std::pair\u0026lt;std::size_t, std::size_t\u0026gt;\u0026gt; fieldInfo = { {\u0026#34;age\u0026#34;, {offsetof(Person, age), sizeof(int)}}, {\u0026#34;name\u0026#34;, {offsetof(Person, name), sizeof(std::string_view)}}, }; void setattr(Person* point, std::string_view name, void* data) { if (!fieldInfo.contains(name)) { throw std::runtime_error(\u0026#34;Field not found\u0026#34;); } auto\u0026amp; [offset, size] = fieldInfo[name]; std::memcpy(reinterpret_cast\u0026lt;char*\u0026gt;(point) + offset, data, size); } int main() { Person person = {.age = 1, .name = \u0026#34;xiaoming\u0026#34;}; int age = 10; std::string_view name = \u0026#34;xiaohong\u0026#34;; setattr(\u0026amp;person, \u0026#34;age\u0026#34;, \u0026amp;age); setattr(\u0026amp;person, \u0026#34;name\u0026#34;, \u0026amp;name); std::cout \u0026lt;\u0026lt; person.age \u0026lt;\u0026lt; \u0026#34; \u0026#34; \u0026lt;\u0026lt; person.name \u0026lt;\u0026lt; std::endl; // =\u0026gt; 10 xiaohong } We can see that we basically implemented the setattr function ourselves, and this implementation seems to be generic. We just need to provide the fieldInfo for specific types. This fieldInfo stores the field name, field offset, and field type size. It can be regarded as metadata. In addition, there might be variable names, function names, and so on. This information does not directly participate in program execution but provides additional information about program structure, data, types, etc. The contents stored in metadata also seem to be fixed patterns, all known information to us, because they exist in the program\u0026rsquo;s source code. Does the C/C++ compiler provide such functionality? The answer is: for programs in debug mode, some information might be retained for debugging, but in release mode, nothing is stored. The advantage of this is obvious, as this information is not essential for program execution, and not retaining it can significantly reduce the size of the binary executable.\nWhy is this information unnecessary, and when is it needed? Next, I will use the C language as an example to map its source code to its binary representation. What information is actually needed to execute the code?\nVariable Definition int value; In fact, a variable declaration does not have a directly corresponding binary representation; it merely tells the compiler to allocate a block of space to store a variable named value. The exact amount of memory to allocate is determined by its type. Therefore, if the type size is unknown during variable declaration, a compilation error will occur.\nstruct A; A x; // error: storage size of \u0026#39;x\u0026#39; isn\u0026#39;t known A* y; // ok the size of pointer is always konwn struct Node { int val; Node next; }; // error Node is not a complete type // This essentially means that when defining the Node type, its size is still unknown struct Node { int val; Node* next; }; // ok I believe you might have thought that this is somewhat similar to malloc, and indeed it is. The difference is that malloc allocates memory on the heap at runtime. Direct variable declarations generally allocate memory in the data segment or on the stack. The compiler might internally maintain a symbol table that maps variable names to their addresses. When you subsequently operate on this variable, you are actually operating on this memory region.\nBuilt-in Operators Built-in operators in C language generally correspond directly to CPU instructions. To understand how the CPU implements these operations, you can study digital electronics. Taking x86_64 as an example, the possible correspondences are as follows:\n| Operator | Meaning | Operator | Meaning | |----------|---------|----------|---------| | + | add | * | mul | | - | sub | / | div | | % | div | \u0026amp; | and | | \\| | or | ^ | xor | | ~ | not | \u0026lt;\u0026lt; | shl | | \u0026gt;\u0026gt; | shr | \u0026amp;\u0026amp; | and | | || | or | ! | not | | == | cmp | != | cmp | | \u0026gt; | cmp | \u0026gt;= | cmp | | \u0026lt; | cmp | \u0026lt;= | cmp | | ++ | inc | -- | dec | Assignment might be done via the mov instruction, for example:\na = 3; // mov [addressof(a)] 3 Structures struct Point { int x; int y; } int main() { Point point; point.x = 1; point.y = 2; } The size of a structure can generally be calculated from its members according to specific rules, often considering memory alignment, and is determined by the compiler. For example, msvc. But in any case, the size of the structure is known at compile time, and we can also get the size of a type or variable using sizeof. So, the Point point variable definition here is easy to understand: the type size is known, and a block of memory is allocated on the stack.\nNow let\u0026rsquo;s focus on structure member access. In fact, the C language has a macro that can get the offset of a structure member relative to the structure\u0026rsquo;s starting address, called offsetof (even if we can\u0026rsquo;t get it, the compiler will calculate the field offsets, so offset information is always known to the compiler). For example, here offsetof(Point, x) is 0, and offsetof(Point, y) is 4. So the above code can be understood as:\nint main() { char point[sizeof(Point)]; // 8 = sizeof(Point) *(int*)(point + offsetof(Point, x)) = 1; // point.x = 1 *(int*)(point + offsetof(Point, y)) = 2; // point.y = 2 } The compiler might also maintain a symbol table of field name -\u0026gt; offset, and the field name will eventually be replaced by the offset. There is no need to keep it in the program.\nFunction Calls Generally implemented through the function call stack, which is too common to elaborate on. The function name will eventually be directly replaced by the function address.\nSummary Through the above analysis, I believe you have discovered that symbol names, type names, variable names, function names, structure field names, and other information in C language are all replaced by numbers, addresses, offsets, etc. The absence of these has no impact on program execution. Therefore, they are discarded to reduce the size of the binary file. For C++, the situation is basically similar. C++ only retains some metadata in special cases, such as type_info, and you can manually choose to disable RTTI to ensure that this information is not generated.\nSo when do we need to use this information? Obviously, the setattr introduced at the beginning requires it. When debugging a program, we need to know the variable name, function name, member name, etc., corresponding to an address to facilitate debugging, so we need it then. When serializing a structure to json, we need to know its field names, so we also need this information. After type erasure to void*, we still need to know what its actual corresponding type is, so we also need it then. In short, whenever we need to distinguish what a string of binary content originally was at runtime, we need this information (of course, if we want to use this information for code generation at compile time, it is also needed).\nHow to get this information? C/C++ compilers do not provide us with interfaces to obtain this information, but as mentioned earlier, this information is clearly in the source code: variable names, function names, type names, field names. We can choose to manually understand the code and then manually store the metadata. Thousands of classes, dozens of member functions, it might take a few months to write. Just kidding, or we could write some programs, such as regular expression matching, to help us get this information? However, we actually have a better option to get this information, which is through AST.\nAST (Abstract Syntax Tree) AST is an abbreviation for Abstract Syntax Tree. It is a data structure in programming language processing used to represent the abstract syntactic structure of source code. An AST is the result of source code being processed by a parser. It captures the grammatical structure of the code but does not include all details, such as whitespace or comments. In an AST, each node represents a syntactic construct in the source code, such as variable declarations, function calls, loops, etc. These nodes are connected by parent-child and sibling relationships, forming a tree-like structure that is easier for computer programs to understand and process. If you have the clang compiler installed on your computer, you can use the following command to view the syntax tree of a source file:\nclang -Xclang -ast-dump -fsyntax-only \u0026lt;your.cpp\u0026gt; The output is as follows; I have filtered out the important information, and irrelevant parts have been removed:\n|-CXXRecordDecl 0x2103cd9c318 \u0026lt;col:1, col:8\u0026gt; col:8 implicit struct Point |-FieldDecl 0x2103cd9c3c0 \u0026lt;line:4:5, col:9\u0026gt; col:9 referenced x \u0026#39;int\u0026#39; |-FieldDecl 0x2103e8661f0 \u0026lt;line:5:5, col:9\u0026gt; col:9 referenced y \u0026#39;int\u0026#39; `-FunctionDecl 0x2103e8662b0 \u0026lt;line:8:1, line:13:1\u0026gt; line:8:5 main \u0026#39;int ()\u0026#39; `-CompoundStmt 0x2103e866c68 \u0026lt;line:9:1, line:13:1\u0026gt; |-DeclStmt 0x2103e866b30 \u0026lt;line:10:5, col:16\u0026gt; | `-VarDecl 0x2103e866410 \u0026lt;col:5, col:11\u0026gt; col:11 used point \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; callinit | `-CXXConstructExpr 0x2103e866b08 \u0026lt;col:11\u0026gt; \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; \u0026#39;void () noexcept\u0026#39; |-BinaryOperator 0x2103e866bb8 \u0026lt;line:11:5, col:15\u0026gt; \u0026#39;int\u0026#39; lvalue \u0026#39;=\u0026#39; | |-MemberExpr 0x2103e866b68 \u0026lt;col:5, col:11\u0026gt; \u0026#39;int\u0026#39; lvalue .x 0x2103cd9c3c0 | | `-DeclRefExpr 0x2103e866b48 \u0026lt;col:5\u0026gt; \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; lvalue Var 0x2103e866410 \u0026#39;point\u0026#39; \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; | `-IntegerLiteral 0x2103e866b98 \u0026lt;col:15\u0026gt; \u0026#39;int\u0026#39; 1 `-BinaryOperator 0x2103e866c48 \u0026lt;line:12:5, col:15\u0026gt; \u0026#39;int\u0026#39; lvalue \u0026#39;=\u0026#39; |-MemberExpr 0x2103e866bf8 \u0026lt;col:5, col:11\u0026gt; \u0026#39;int\u0026#39; lvalue .y 0x2103e8661f0 | `-DeclRefExpr 0x2103e866bd8 \u0026lt;col:5\u0026gt; \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; lvalue Var 0x2103e866410 \u0026#39;point\u0026#39; \u0026#39;Point\u0026#39;:\u0026#39;Point\u0026#39; `-IntegerLiteral 0x2103e866c28 \u0026lt;col:15\u0026gt; \u0026#39;int\u0026#39; 2 Or, if your vscode has the clangd plugin installed, you can right-click a block of code and then right-click show AST to view the AST of that code snippet. You can see that the source code content is indeed presented to us in a tree structure. Since it is a tree, we can freely traverse the nodes of the tree and filter to get the information we want. The two examples above are visual outputs; usually, there are also direct code interfaces to obtain them. For example, python has a built-in ast module to get them, and C++ generally obtains this content through clang-related tools. If you want to know how to use clang tools specifically, you can refer to the article: Let\u0026rsquo;s freely control C++ code with clang tools!\nIf you are curious about how the compiler transforms source code into an AST, you can study the frontend content of compiler design.\nHow to store this information? This question sounds a bit confusing, but in reality, only C++ programmers might need to consider it.\nIn fact, it\u0026rsquo;s all caused by constexpr. Storing the information like this:\nstruct FieldInfo { std::string_view name; std::size_t offset; std::size_t size; }; struct Point { int x; int y; }; constexpr std::array\u0026lt;FieldInfo, 2\u0026gt; fieldInfos = {{ {\u0026#34;x\u0026#34;, offsetof(Point, x), sizeof(int)}, {\u0026#34;y\u0026#34;, offsetof(Point, y), sizeof(int)}, }}; means that we can query this information not only at runtime but also at compile time.\nEven more, it can be stored in template parameters, allowing types to be stored as well:\ntemplate\u0026lt;fixed_string name, std::size_t offset, typename Type\u0026gt; struct Field{}; using FieldInfos = std::tuple \u0026lt; Field\u0026lt;\u0026#34;x\u0026#34;, offsetof(Point, x), int\u0026gt;, Field\u0026lt;\u0026#34;y\u0026#34;, offsetof(Point, y), int\u0026gt; \u0026gt;; This undoubtedly gives us greater operational flexibility. So, with this information, what should we do next? In fact, we can choose to generate code based on this information. Related content can be found in other sections of this series of articles.\n","permalink":"https://www.ykiko.me/en/articles/670190357/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"first-what-is-metadata\"\u003eFirst, what is metadata?\u003c/h2\u003e\n\u003cp\u003eConsider the following \u003ccode\u003epython\u003c/code\u003e code. We want to automatically modify the corresponding field value based on the input string.\u003c/p\u003e","title":"Why is it said that C/C++ compilers do not preserve metadata?"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nClang is a C-language family compiler frontend provided by the LLVM project. It was originally developed to replace the C language frontend of the GNU Compiler Collection (GCC), with the goal of providing faster compilation speeds, better diagnostic information, and a more flexible architecture. Clang includes C, C++, and Objective-C compiler frontends, which are designed to be embedded in other projects. A key feature of Clang is its modular architecture, which makes it easier for developers to extend and customize compiler functionality. Clang is widely used in many projects, including LLVM itself, the development of some operating system kernels, and the implementation of compilers for some programming languages.\nIn addition to being used as a compiler, Clang can also be provided as a library, allowing developers to leverage compiler features in their applications, such as source code analysis and generation. Clang can be used to obtain the Abstract Syntax Tree (AST) of C++ source files for further processing. This article will introduce how to use Clang tools.\nInstallation \u0026amp; Usage Currently, Clang is divided into the following libraries and tools: libsupport, libsystem, libbasic, libast, liblex, libparse, libsema, libcodegen, librewrite, libanalysis. Since Clang itself is written in C++, the related interfaces are all C++. However, due to the complexity and instability of the C++ interface itself (e.g., a DLL compiled by GCC on Windows cannot be used by MSVC, or API changes due to Clang version upgrades leading to incompatibility), the official recommendation is not to prioritize the C++ interface.\nIn addition to the C++ interface, the official project also provides a C language interface called libclang. This interface is not only relatively simple to use but also quite stable. The only drawback is that it cannot obtain a complete C++ Abstract Syntax Tree (AST). However, given that a complete C++ AST is inherently extremely complex, and often we only need a small portion of its information, this issue can usually be ignored unless you genuinely have such a requirement.\nIf you want to use libclang, you need to install LLVM and Clang first. On the LLVM Release page, several pre-built binary packages are available for download. If you have custom requirements, please refer to the Getting Started page for manual compilation. After installation, you only need to link libclang.dll from the llvm/lib directory to your program and include the clang-c/Index.h header file from the llvm/include directory to use it.\nHowever, since the C language lacks some high-level abstractions, manipulating strings can be cumbersome. For large-scale use, we would need to wrap it with a C++ layer ourselves. Fortunately, the official project also provides a Python binding based on this C interface, namely the clang package, which makes it even more convenient to use. However, the official Python binding does not package the libclang DLL, so you still need to manually configure the LLVM environment on your computer, which can be a bit troublesome. Nevertheless, someone in the community has provided a packaged version on PyPI: libclang.\nSo, if you want to use libclang to get a C++ syntax tree, you just need to\npip install libclang No extra steps are required. This article will introduce it based on this Python binding version. The C version API and the Python version API are basically identical. If you feel that Python\u0026rsquo;s performance is insufficient, you can also refer to this tutorial to write C version code. Additionally, the official package does not provide type hints, which means there\u0026rsquo;s no code completion when writing in Python, making it less comfortable to use. I\u0026rsquo;ve added a type-hinted cindex.pyi; just download it and place it in the same folder to get code completion.\nQuick Start The example C++ source file code is as follows:\n// main.cpp struct Person { int age; const char* name; }; int main() { Person person = {1, \u0026#34;John\u0026#34;}; return 0; } The Python code to parse it is as follows:\nimport clang.cindex as CX def traverse(node: CX.Cursor, prefix=\u0026#34;\u0026#34;, is_last=True): branch = \u0026#34;└──\u0026#34; if is_last else \u0026#34;├──\u0026#34; text = f\u0026#34;{str(node.kind).removeprefix(\u0026#39;CursorKind.\u0026#39;)}: {node.spelling}\u0026#34; if node.kind == CX.CursorKind.INTEGER_LITERAL: value = list(node.get_tokens())[0].spelling text = f\u0026#34;{text}{value}\u0026#34; print(f\u0026#34;{prefix}{branch} {text}\u0026#34;) new_prefix = prefix + (\u0026#34; \u0026#34; if is_last else \u0026#34;│ \u0026#34;) children = list(node.get_children()) for child in children: traverse(child, new_prefix, child is children[-1]) index = CX.Index.create(excludeDecls=True) tu = index.parse(\u0026#39;main.cpp\u0026#39;, args=[\u0026#39;-std=c++20\u0026#39;]) traverse(tu.cursor) The output is as follows:\nTRANSLATION_UNIT: main.cpp ├── STRUCT_DECL: Person │ ├── FIELD_DECL: age │ └── FIELD_DECL: name └── FUNCTION_DECL: main └── COMPOUND_STMT: ├── DECL_STMT: │ └── VAR_DECL: person │ ├── TYPE_REF: struct Person │ └── INIT_LIST_EXPR: │ ├── INTEGER_LITERAL: 1 │ └── STRING_LITERAL: \u0026#34;John\u0026#34; └── RETURN_STMT: └── INTEGER_LITERAL: 0 The first part is the syntax tree node type, and the second part is the node content. As you can see, it\u0026rsquo;s very clear and almost corresponds one-to-one with the source code.\nBasic Types Note that this article assumes the reader has some understanding of syntax trees and will not go into too much detail here. If you don\u0026rsquo;t know what a syntax tree is, you can refer to Why C/C++ Compilers Don\u0026rsquo;t Retain Metadata. Below are some common types in cindex.\nCursor Equivalent to a basic node in the syntax tree, the entire syntax tree is composed of Cursors. The kind attribute returns a CursorKind enumeration value, which represents the actual type corresponding to this node.\nfor kind in CursorKind.get_all_kinds(): print(kind) This can print all supported node types, or you can directly check the source code. Cursor also has other attributes and methods for us to use, commonly including the following:\n@property def spelling(self) -\u0026gt; str: @property def displayname(self) -\u0026gt; str: @property def mangled_name(self) -\u0026gt; str: Gets the name of the node. For example, for a variable declaration node, its spelling is the name of the variable. displayname is the short name of the node, which is the same as spelling most of the time. However, there are sometimes differences; for example, a function\u0026rsquo;s spelling might include parameter types, such as func(int), but its displayname would just be func. mangled_name is the name of the symbol after name mangling, used for linking.\n@property def type(self) -\u0026gt; Type: The type of the node element. For example, for a variable declaration node, its type is the type of the variable. Or for a field declaration node, its type is the type of the field. The return type is Type.\n@property def location(self) -\u0026gt; SourceLocation: The location information of the node, returning a SourceLocation type, which carries information such as the line number, column number, and filename of the node in the source code.\n@property def extent(self) -\u0026gt; SourceRange: The range information of the node, returning a SourceRange type, composed of two SourceLocations, which carry the start and end positions of the node in the source code.\n@property def access_specifier(self) -\u0026gt; AccessSpecifier: The access specifier of the node, returning an AccessSpecifier type. There are five types: PUBLIC, PROTECTED, PRIVATE, NONE, INVALID.\ndef get_children(self) -\u0026gt; iterable[Cursor]: Gets all child nodes, returning an iterable of Cursors. This function is the most commonly used because we can traverse the entire syntax tree recursively.\ndef get_tokens(self) -\u0026gt; iterable[Token]: Gets all tokens representing this node, returning an iterable of Tokens. A token is the smallest unit of a syntax tree. For example, for a variable declaration node, its tokens are int, a, ;. This function can be used to obtain detailed information, such as the values of integer and floating-point literals.\ndef is_definition(self) -\u0026gt; bool: def is_const_method(self) -\u0026gt; bool: def is_converting_constructor(self) -\u0026gt; bool: def is_copy_constructor(self) -\u0026gt; bool: def is_default_constructor(self) -\u0026gt; bool: def is_move_constructor(self) -\u0026gt; bool: def is_default_method(self) -\u0026gt; bool: def is_deleted_method(self) -\u0026gt; bool: def is_copy_assignment_operator_method(self) -\u0026gt; bool: def is_move_assignment_operator_method(self) -\u0026gt; bool: def is_mutable_field(self) -\u0026gt; bool: def is_pure_virtual_method(self) -\u0026gt; bool: def is_static_method(self) -\u0026gt; bool: def is_virtual_method(self) -\u0026gt; bool: def is_abstract_record(self) -\u0026gt; bool: def is_scoped_enum(self) -\u0026gt; bool: These functions are mostly self-explanatory. For example, is_definition checks if the node is a definition, and is_const_method checks if the node is a const method.\nType If the node has a type, this represents the type of that node. Common attributes include:\n@property def kind(self) -\u0026gt; TypeKind: The kind of the type, returning a TypeKind. For example, INT, FLOAT, POINTER, FUNCTIONPROTO, etc.\n@property def spelling(self) -\u0026gt; str: The name of the type, for example, int, float, void, etc.\ndef get_align(self) -\u0026gt; int: def get_size(self) -\u0026gt; int: def get_offset(self, fieldname: str) -\u0026gt; int: Gets the alignment, size, field offset, etc., of the type.\nAnd some is prefixed functions, such as is_const_qualified, is_function_variadic, is_pod, etc. We won\u0026rsquo;t elaborate on them here.\nTranslationUnit Generally, a C++ source file represents a TranslationUnit, which is what we commonly refer to as a compilation unit.\nCommonly used are:\n@property def cursor(self) -\u0026gt; Cursor: Gets the root node of this TranslationUnit, which is a Cursor of type TRANSLATION_UNIT.\n@property def spelling(self) -\u0026gt; str: Gets the filename of this TranslationUnit.\ndef get_includes(self, depth: int = -1) -\u0026gt; iterable[FileInclusion]: Gets all includes of this TranslationUnit, returning a list of FileInclusions. Note that since included files might contain other files, you can use the depth parameter to limit this. For example, if you only want to get the first level, i.e., directly included header files, you can write it like this:\nindex = CX.Index.create() tu = index.parse(\u0026#39;main.cpp\u0026#39;, args=[\u0026#39;-std=c++20\u0026#39;]) for file in tu.get_includes(): if file.depth == 1: print(file.include.name) This will print all directly used header files.\nIndex An Index is a collection of TranslationUnits that are ultimately linked together to form an executable or library.\nThere is a static method create to create a new Index, and then the member method parse can parse a C++ source file, returning a TranslationUnit.\ndef parse(self, path: str, args: list[str] | None = ..., unsaved_files: list[tuple[str, str]] | None = ..., options: int = ...) -\u0026gt; TranslationUnit: path is the source file path, args are compilation arguments, unsaved_files are unsaved files, and options are parameters defined in TranslationUnit.PARSE_XXX, such as PARSE_SKIP_FUNCTION_BODIES and PARSE_INCOMPLETE. These can be used to customize the parsing process, speed up parsing, or retain macro information, etc.\nExamples Namespace Since Clang expands all header files during parsing, the complete output is too extensive. However, we might primarily be interested in information from our own code. In such cases, we can use namespaces for filtering. Here\u0026rsquo;s an example:\n#include \u0026lt;iostream\u0026gt; namespace local { struct Person { int age; std::string name; }; } The parsing code is as follows:\nimport clang.cindex as CX def traverse_my(node: CX.Cursor): if node.kind == CX.CursorKind.NAMESPACE: if node.spelling == \u0026#34;local\u0026#34;: traverse(node) # forward to the previous function for child in node.get_children(): traverse_my(child) index = CX.Index.create() tu = index.parse(\u0026#39;main.cpp\u0026#39;, args=[\u0026#39;-std=c++20\u0026#39;]) traverse_my(tu.cursor) We write a function to filter by namespace name and then forward to our previous function. This way, only the content within the desired namespace will be output.\nClass \u0026amp; Struct We mainly want to get their field names, types, method names, types, etc. Here\u0026rsquo;s an example:\nstruct Person { int age; const char* name; void say_hello(int a, char b); }; The parsing code is as follows:\ndef traverse_class(node: CX.Cursor): match node.kind: case CX.CursorKind.STRUCT_DECL | CX.CursorKind.CLASS_DECL: print(f\u0026#34;Class: {node.spelling}:\u0026#34;) case CX.CursorKind.FIELD_DECL: print(f\u0026#34; Field: {node.spelling}: {node.type.spelling}\u0026#34;) case CX.CursorKind.CXX_METHOD: print(f\u0026#34; Method: {node.spelling}: {node.type.spelling}\u0026#34;) for arg in node.get_arguments(): print(f\u0026#34; Param: {arg.spelling}: {arg.type.spelling}\u0026#34;) for child in node.get_children(): traverse_class(child) # Class: Person: # Field: age: int # Field: name: const char * # Method: say_hello: void (int, char) # Param: a: int # Param: b: char Comment Doxygen-style comments can be retrieved:\n@property def brief_comment(self) -\u0026gt; str: @property def raw_comment(self) -\u0026gt; str: brief_comment gets the content after @brief, and raw_comment gets the entire comment content.\n/** * @brief func description * @param param1 * @return int */ int func(int param1){ return param1 + 10000000; } The parsing code is as follows:\ndef traverse_comment(node: CX.Cursor): if node.brief_comment: print(f\u0026#34;brief_comment =\u0026gt; {node.brief_comment}\u0026#34;) if node.raw_comment: print(f\u0026#34;raw_comment =\u0026gt; {node.raw_comment}\u0026#34;) for child in node.get_children(): traverse_comment(child) # brief_comment =\u0026gt; func description # raw_comment =\u0026gt; /** # * @brief func description # * @param param1 # * @return int # */ Enum Get the enum name, its corresponding enum constant values, and its underlying type.\nenum class Color{ RED = 0, GREEN, BLUE }; The parsing code is as follows:\ndef traverse_enum(node: CX.Cursor): if node.kind == CX.CursorKind.ENUM_DECL: print(f\u0026#34;enum: {node.spelling}, underlying type: {node.enum_type.spelling}\u0026#34;) print(f\u0026#34;is scoped?: {node.is_scoped_enum()}\u0026#34;) for child in node.get_children(): print(f\u0026#34; enum_value: {child.spelling}: {child.enum_value}\u0026#34;) for child in node.get_children(): traverse_enum(child) # enum: Color, underlying type: int # is scoped?: True # enum_value: RED: 0 # enum_value: GREEN: 1 # enum_value: BLUE: 2 Attribute C++11 introduced new attribute syntax: [[ ... ]], which can be used to add extra information to functions or variables. Examples include [[nodiscard]] and [[deprecated]]. However, sometimes we define our own markers for our pre-processing tools, such as marking whether a type needs metadata generation. We also hope that these markers can be recognized by libclang. Unfortunately, if non-standard attributes are written directly, they will be ignored by libclang, meaning they won\u0026rsquo;t appear in the final AST.\nstruct [[Reflect]] Person{}; // ignored A feasible solution is to use get_tokens to retrieve all tokens in the declaration and then manually extract the desired information. For example, the result obtained here would be struct, [, [, Reflect, ], ], Person, {, }, from which we can extract the information we want.\nHowever, Clang provides a better way: using the clang::annotate(...) Clang extension attribute. For example, like this:\n#define Reflect clang::annotate(\u0026#34;reflect\u0026#34;) struct [[Reflect]] A {}; With this, for the A Cursor, its child nodes will include an ANNOTATE_ATTR type Cursor, and its spelling will contain the stored information, which is reflect in this case. This allows us to easily retrieve our custom attributes. Furthermore, the C++ standard specifies that when a compiler encounters an unrecognized attribute, it ignores it rather than reporting an error. This means the attribute only affects our preprocessor and does not interfere with normal compilation.\nMacro Before actually parsing the syntax tree, Clang replaces all preprocessor directives with actual code. Therefore, these directives are no longer present in the final syntax tree information. However, sometimes we do want to obtain this information, for example, if we want to get information about #define. To do this, the options parameter of parse needs to be set to TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD. If you want to get the content of a macro, just use get_tokens.\n#define CONCAT(a, b) a#b auto x = CONCAT(1, 2); The parsing code is as follows:\ndef traverse_macro(node: CX.Cursor): if node.kind == CX.CursorKind.MACRO_DEFINITION: if not node.spelling.startswith(\u0026#39;_\u0026#39;): # Exclude internal macros print(f\u0026#34;MACRO: {node.spelling}\u0026#34;) print([token.spelling for token in node.get_tokens()]) elif node.kind == CX.CursorKind.MACRO_INSTANTIATION: print(f\u0026#34;MACRO_INSTANTIATION: {node.spelling}\u0026#34;) print([token.spelling for token in node.get_tokens()]) for child in node.get_children(): traverse_macro(child) # MACRO: CONCAT # [\u0026#39;CONCAT\u0026#39;, \u0026#39;(\u0026#39;, \u0026#39;a\u0026#39;, \u0026#39;,\u0026#39;, \u0026#39;b\u0026#39;, \u0026#39;)\u0026#39;, \u0026#39;a\u0026#39;, \u0026#39;#\u0026#39;, \u0026#39;b\u0026#39;] # MACRO_INSTANTIATION: CONCAT # [\u0026#39;CONCAT\u0026#39;, \u0026#39;(\u0026#39;, \u0026#39;1\u0026#39;, \u0026#39;,\u0026#39;, \u0026#39;2\u0026#39;, \u0026#39;)\u0026#39;] Rewrite Sometimes we want to make simple modifications to the source code, such as inserting or deleting a piece of code at a certain position. In such cases, we can use the Rewriter class. Here\u0026rsquo;s an example:\nvoid func(){ int a = 1; int b = 2; int c = 3; } Use the following code to modify the source file:\ndef rewrite(node: CX.Cursor, rewriter: CX.Rewriter): if node.kind == CX.CursorKind.VAR_DECL: if node.spelling == \u0026#34;a\u0026#34;: rewriter.replace_text(node.extent, \u0026#34;int a = 100\u0026#34;) elif node.spelling == \u0026#34;b\u0026#34;: rewriter.remove_text(node.extent) elif node.spelling == \u0026#34;c\u0026#34;: rewriter.insert_text_before(node.extent.start, \u0026#34;[[maybe_unused]]\u0026#34;) for child in node.get_children(): rewrite(child, rewriter) index = CX.Index.create() tu = index.parse(\u0026#39;main.cpp\u0026#39;, args=[\u0026#39;-std=c++20\u0026#39;]) rewriter = CX.Rewriter.create(tu) rewrite(tu.cursor, rewriter) rewriter.overwrite_changed_files() After running, the content of main.cpp becomes:\nvoid func() { int a = 100; ; [[maybe_unused]] int c = 3; } Conclusion When retrieving ABI-related information such as type size, align, and offset, it\u0026rsquo;s important to consider the platform. Their values might differ across different ABIs; for example, MSVC and GCC generally have different values for these. You can specify the target platform by using -target in the compilation arguments. If you need results consistent with MSVC, you can use --target=x86_64-pc-windows-msvc. For GCC, you can use --target=x86_64-pc-linux-gnu.\nAs mentioned earlier, libclang cannot provide a complete C++ syntax tree. For instance, it lacks many interfaces for parsing Expressions. This means that if you need to parse specific expression content, using its C++ interface might be more suitable, as it provides a complete and complex syntax tree.\nThere are relatively few articles in China specifically on the practical use of Clang tools. This article attempts to provide a concrete introduction to some common functionalities, although it is not entirely comprehensive. If you have any questions, you can directly read the Index.h source code, which contains very detailed comments. Alternatively, you can leave a comment in the comments section, and I will do my best to answer. Furthermore, if you need information not provided by libclang, you can use the get_tokens function to retrieve it yourself. For example, libclang does not support getting the values of integer and floating-point literals, in which case you can manually retrieve them via get_tokens.\nAfter extracting this information from the syntax tree, you can further process it, such as generating metadata or directly generating code. Of course, these are subsequent steps, depending on your specific needs.\nThis article concludes here. This is one of the articles in the reflection series; feel free to read other articles in the series!\n","permalink":"https://www.ykiko.me/en/articles/669360731/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eClang is a C-language family compiler frontend provided by the LLVM project. It was originally developed to replace the C language frontend of the GNU Compiler Collection (GCC), with the goal of providing faster compilation speeds, better diagnostic information, and a more flexible architecture. Clang includes C, C++, and Objective-C compiler frontends, which are designed to be embedded in other projects. A key feature of Clang is its modular architecture, which makes it easier for developers to extend and customize compiler functionality. Clang is widely used in many projects, including LLVM itself, the development of some operating system kernels, and the implementation of compilers for some programming languages.\u003c/p\u003e","title":"Master your C++ code with Clang tools."},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nIntroduction Let\u0026rsquo;s take a recent requirement as an introduction. We all know that Markdown can use lang to fill in code blocks and supports code highlighting. However, I wanted to support my own custom code highlighting rules and encountered the following problems:\nSome websites render Markdown statically and cannot run scripts, so it\u0026rsquo;s impossible to directly call those Javascript code highlighting libraries. For example, how Markdown files are rendered on Github. Which languages are supported is generally determined by the rendering engine. For example, Github\u0026rsquo;s rendering support is different from that of [another engine, implied]. If you want to write extensions for different rendering engines, you\u0026rsquo;d have to write one for each, which is too much work, and there\u0026rsquo;s very little related documentation. Is there really no way? Well, there is a way. Fortunately, most engines support direct use of HTML rules, such such as \u0026lt;code\u0026gt; for rendering.\n\u0026lt;code style=\u0026#34;color: #5C6370;font-style: italic;\u0026#34;\u0026gt; # this a variable named \u0026amp;#x27;a\u0026amp;#x27; \u0026lt;/code\u0026gt; This provides us with the possibility of adding custom styles. But we can\u0026rsquo;t manually write this kind of code in our Markdown source files. If a statement has three different colors, and it\u0026rsquo;s a statement like let a = 3;, it means we\u0026rsquo;d have to write three different \u0026lt;span\u0026gt; tags for just one line. It\u0026rsquo;s very difficult to write and hard to maintain later.\nIn fact, we can do this: read the Markdown source file, which is written according to normal Markdown syntax. Then, when we read it and encounter lang, we extract the text, hand it over to a rendering library (I chose highlight.js) to render it into DOM text. Then, we replace the original text and output it separately into a new folder. For example, if the original folder is src, the new one is out. This way, the source file doesn\u0026rsquo;t need any modification, and the content in the out folder is what actually gets rendered. Every time we modify the source file, we just run this program to perform the conversion.\nWhat is Code Generation? The case above is actually a typical example of solving a problem using \u0026ldquo;code generation\u0026rdquo;. So, what exactly is code generation? This is also a very broad term. Generally speaking,\nCode generation refers to the process of using computer programs to generate other programs or code.\nThis includes, but is not limited to:\nCompilers generating target code: This is the most typical example, where a compiler translates source code written in a high-level programming language into machine-executable target code. Generating code using configuration files or DSLs: Actual code is generated through specific configuration files or Domain-Specific Languages (DSLs). An example is using XML configuration files to define a UI interface and then generating the corresponding code. Language built-in features generating code: Some programming languages have built-in features, such as macros, generics, etc., that can generate code at compile time or runtime. Such mechanisms can improve code flexibility and reusability. External code generators: Some frameworks or libraries use external code generators to create the required code. For example, the Qt framework uses the Meta-Object Compiler (MOC) to process the meta-object system and generate code related to signals and slots. Below are some specific examples for these points:\nCompile-time Code Generation Macros C language\u0026rsquo;s macro is one of the most classic and simplest compile-time code generation techniques. It\u0026rsquo;s pure text replacement. For example, if we want to repeat the string \u0026quot;Hello World\u0026quot; 100 times. What should we do? Obviously, we don\u0026rsquo;t want to manually copy and paste. Consider using macros to accomplish this task.\n#define REPEAT(x) (REPEAT1(x) REPEAT1(x) REPEAT1(x) REPEAT1(x) REPEAT1(x)) #define REPEAT1(x) REPEAT2(x) REPEAT2(x) REPEAT2(x) REPEAT2(x) REPEAT2(x) #define REPEAT2(x) x x x x int main(){ const char* str = REPEAT(\u0026#34;Hello world \u0026#34;); } This primarily uses a feature in C language where \u0026quot;a\u0026quot;\u0026quot;b\u0026quot; is equivalent to \u0026quot;ab\u0026quot;. Then, through macro expansion, 5*5*4 times, which is exactly one hundred, the task is easily completed. Of course, C language macros are very limited in functionality because they are essentially just token replacements and do not allow users to obtain the token stream for input analysis. Nevertheless, there are some interesting uses. If you\u0026rsquo;re interested, you can read this article: The Art of C/C++ Macro Programming. Of course, macros are not exclusive to C; other programming languages also have them, and they can support even stronger features. For example, Rust\u0026rsquo;s macros are much more flexible than C\u0026rsquo;s, mainly because Rust allows you to analyze the input Token Stream, rather than simply performing replacements. You can choose to generate different code based on different input tokens. Even more so, macros in languages like Lisp are super flexible.\nGenerics/Templates In some programming languages, Generics are also considered a code generation technique, generating actually different code based on different types. Of course, this is the most basic. Some programming languages also support more powerful features, such as C++\u0026rsquo;s template metaprogramming for advanced code generation. A typical case is building a function pointer table (jump table) at compile time.\ntemplate\u0026lt;std::size_t N, typename T, typename F\u0026gt; void helper(T t, F f) { f(std::get\u0026lt;N\u0026gt;(t)); } template\u0026lt;typename Tuple, typename Func\u0026gt; constexpr void access(std::size_t index, Tuple\u0026amp;\u0026amp; tuple, Func\u0026amp;\u0026amp; f){ constexpr auto length = std::tuple_size\u0026lt;std::decay_t\u0026lt;decltype(tuple)\u0026gt;\u0026gt;::value; using FuncType = void (*)(decltype(tuple), decltype(f)); constexpr auto fn_table = []\u0026lt;std::size_t... I\u0026gt;(std::index_sequence\u0026lt;I...\u0026gt;){ std::array\u0026lt;FuncType, length\u0026gt; table = { helper\u0026lt;I, decltype(tuple), decltype(f)\u0026gt;... }; return table; }(std::make_index_sequence\u0026lt;length\u0026gt;{}); return fn_table[index](std::forward\u0026lt;Tuple\u0026gt;(tuple), std::forward\u0026lt;Func\u0026gt;(f)); } int main(){ std::tuple a = { 1, \u0026#39;a\u0026#39;, \u0026#34;123\u0026#34; }; auto f = [](auto\u0026amp;\u0026amp; v) { std::cout \u0026lt;\u0026lt; v \u0026lt;\u0026lt; std::endl; }; std::size_t index = 0; access(index, a, f); // =\u0026gt; 1 index = 2; access(index, a, f); // =\u0026gt; 123 } This way, we have achieved the effect of accessing elements in a tuple based on a runtime index. The specific principle is to manually create a function pointer table and then dispatch based on the index.\nCode Generators The two points above discuss language built-in features. However, in many scenarios, language built-in features are not flexible enough and cannot meet our needs. For example, in C++, if you want to generate entire functions and types, neither macros nor templates can achieve this.\nBut code is just strings in source files. Based on this idea, we can completely write a dedicated program to generate such strings. For example, write a Python code to generate the C program that prints \u0026ldquo;Hello World\u0026rdquo; 100 times.\ns = \u0026#34;\u0026#34;; for i in range(100): s += \u0026#39;\u0026#34;Hello World \u0026#34;\u0026#39; code = f\u0026#34;\u0026#34;\u0026#34; int main() {{ const char* str = {s}; }}\u0026#34;\u0026#34;\u0026#34; with open(\u0026#34;hello.c\u0026#34;, \u0026#34;w\u0026#34;) as file: file.write(code) Well, this generates the source file mentioned above. Of course, this is just the simplest application. Alternatively, we can use Protocol Buffer to automatically generate serialization and deserialization code. Or, we can obtain information from the AST, and even the type\u0026rsquo;s meta-information is generated by the code generator. The principle of such programs is very simple: string concatenation, and its upper limit completely depends on how your code is written.\nHowever, more often, language built-in features are more convenient to use. Using external code generators can complicate the compilation process. Nevertheless, some languages have incorporated this feature as one of their built-in characteristics, such as C#\u0026rsquo;s code generation.\nRuntime Code Generation exec Alright, we\u0026rsquo;ve talked a lot about static language features. Now let\u0026rsquo;s look at sufficiently dynamic code generation. First up are features like eval and exec in languages like Python/JavaScript, which allow us to load strings directly as code and execute them at runtime.\neval is a mechanism that parses a string into executable code. In Python, the eval function can accept a string as an argument, execute the expression within it, and return the result. This provides powerful tools for dynamic computation and code generation. result = eval(\u0026#34;2 + 3\u0026#34;) print(result) # Output: 5 exec, unlike eval, can execute multiple statements, even including function and class definitions. code_block = \u0026#34;\u0026#34;\u0026#34; def multiply(x, y): return x * y result = multiply(4, 5) \u0026#34;\u0026#34;\u0026#34; exec(code_block) print(result) # Output: 20 Undoubtedly, generating code at runtime merely by string concatenation can easily fulfill some demanding requirements when used in appropriate scenarios.\nDynamic Compilation Now, there\u0026rsquo;s a question: can C language achieve the dynamic compilation features mentioned above? You might say we could implement a C language interpreter, and that would naturally do it. But in fact, there\u0026rsquo;s a simpler way.\nThere are two main points:\nCompile code at runtime If you have gcc installed on your computer, you can run the following two commands:\n# Compile the source file into an object file gcc -c source.c source.o # Extract the .text section from the object file to generate a binary file objcopy -O binary -j .text source.o source.bin This way, you can obtain the binary form of the code in source.c. But having only the code is not enough; we need to execute it.\nAllocate executable memory Code is also binary data. Couldn\u0026rsquo;t we just write the code data we just obtained into a block of memory and then jmp to it to execute? The idea is straightforward, but unfortunately, most operating systems protect memory, and generally allocated memory is not executable. If you try to write data and then execute it, it will directly cause a segmentation fault. However, we can use VirtualAlloc or mmap to allocate a block of memory with execution permissions, then write the code into it, and execute it.\n// Windows VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); // Linux mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); Combining these two points and making some minor adjustments, we can achieve reading code and input from the command line and then directly running and outputting the result.\n#include \u0026lt;fstream\u0026gt; #include \u0026lt;iostream\u0026gt; #include \u0026lt;string\u0026gt; #ifdef _WIN32 #include \u0026lt;Windows.h\u0026gt; #define Alloc(size) VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE) #elif __linux__ #include \u0026lt;sys/mman.h\u0026gt; #define Alloc(size) mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0) #endif int main(int argc, char* argv[]) { std::ofstream(\u0026#34;source.c\u0026#34;) \u0026lt;\u0026lt; argv[1]; system(\u0026#34;gcc -c source.c \u0026amp;\u0026amp; objcopy -O binary -j .text source.o source.bin\u0026#34;); std::ifstream file(\u0026#34;source.bin\u0026#34;, std::ios::binary); std::string source((std::istreambuf_iterator\u0026lt;char\u0026gt;(file)), {}); auto p = Alloc(source.size()); memcpy(p, source.c_str(), source.size()); using Fn = int (*)(int, int); std::cout \u0026lt;\u0026lt; reinterpret_cast\u0026lt;Fn\u0026gt;(p)(std::stoi(argv[2]), std::stoi(argv[3])) \u0026lt;\u0026lt; std::endl; return 0; } The final effect:\n.\\main.exe \u0026#34;int f(int a, int b){ return a + b; }\u0026#34; 1 2 # output: 3 .\\main.exe \u0026#34;int f(int a, int b){ return a - b; }\u0026#34; 1 2 # output: -1 Perfectly implemented.\nConclusion This article mainly introduced some basic concepts and examples of code generation, as well as some simple applications. Code generation is a very powerful technique. If we limit our perspective to only the built-in features of programming languages, many times we cannot fulfill complex requirements. If we broaden our perspective, we will unexpectedly discover a new world. This is one article in a series on reflection; welcome to read other articles in the series!\n","permalink":"https://www.ykiko.me/en/articles/669359855/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eLet\u0026rsquo;s take a recent requirement as an introduction. We all know that Markdown can use \u003ccode\u003elang\u003c/code\u003e to fill in code blocks and supports code highlighting. However, I wanted to support my own custom code highlighting rules and encountered the following problems:\u003c/p\u003e","title":"Various approaches to code generation"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nWhat is Reflection? The term Reflection is probably not new to anyone; perhaps you haven\u0026rsquo;t used it, but you\u0026rsquo;ve certainly heard of it. However, like many other idiomatic terms in the CS field, there isn\u0026rsquo;t a clear and precise definition for reflection. This leads to a situation where, for languages like C#, Java, and Python that have reflection, discussing it naturally brings to mind related facilities, APIs, and code examples in those languages, making it very concrete. But for languages like C, C++, and Rust, which don\u0026rsquo;t have reflection, when reflection is discussed, people are often unsure what the other person is referring to, making it very abstract. For example, someone might tell me that Rust has reflection, and the example they provide is the introduction to std::Any module in Rust\u0026rsquo;s official documentation. It mentions:\nUtilities for dynamic typing or type reflection\nBut the awkwardness lies in this: if you call it reflection, its functionality is very limited; if you say it\u0026rsquo;s not, it\u0026rsquo;s not entirely wrong to argue that it exhibits some form of it.\nSimilar situations often occur in C++. I\u0026rsquo;m sure you often hear views like these: C++ only has very weak reflection, namely RTTI (Run Time Type Information), but some C++ frameworks like QT and UE implement their own reflection. In recent discussions, online blogs, or C++ new standard proposals, you might again hear about:\nStatic reflection Dynamic reflection Compile-time reflection Runtime reflection Such a plethora of terms can be utterly confusing and disorienting. Moreover, prefixes like static, dynamic, compile time, and runtime are themselves idiomatic terms, often combined with various words, and have many meanings depending on the context.\nSome readers might say, \u0026ldquo;I checked WIKI, and reflection clearly has a definition, as follows:\u0026rdquo;\nIn computer science, reflective programming or reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.\nFirst, WIKI is also written by people and does not possess absolute authority; if you are not satisfied with this definition, you can modify it yourself. Secondly, the wording here is also very vague. What does \u0026ldquo;introspect\u0026rdquo; mean? Self-reflection, what does this term mean in CS? So this definition is also very awkward. What to do then? I choose to break it down into several processes for explanation, so we don\u0026rsquo;t have to dwell on the conceptual question of \u0026ldquo;what exactly is reflection.\u0026rdquo; Instead, by understanding these processes, you will naturally understand what reflection does.\nHow to Understand Reflection? Reflection in all languages can be seen as the following three steps:\nGenerate Metadata First, what is metadata? When we write code, we give names to variables, types, struct fields, etc. These names are primarily for the convenience of programmers to understand and maintain the source code. For C/C++, these names are usually discarded after compilation to save binary space, which is understandable. For a detailed discussion, please see Why C/C++ compilers do not retain metadata.\nHowever, gradually, we found that in some cases, this data is needed. For example, when serializing a struct into json, the struct field names are required, and when printing logs, we don\u0026rsquo;t want to print enum values but rather the corresponding enum names directly. What to do? Early on, it could only be done through hard coding, i.e., manual writing, or perhaps some macros for more advanced cases. This is actually very inconvenient and not conducive to subsequent code maintenance.\nLater, some languages, such as Java and C#, emerged. Their compilers retain a lot of data, including these names, during compilation. This data is called metadata. At the same time, there are also means to allow programmers to attach metadata to certain structures themselves, such as C#\u0026rsquo;s attribute and Java\u0026rsquo;s annotation.\nWhat about C++? Currently, C++ compilers only retain type names for implementing RTTI, i.e., the related facilities of std::type_info in the standard. Other information is erased by the compiler. What to do? Manually writing metadata is acceptable for a small number of classes, but when the project scale increases, for example, with dozens or hundreds of classes, it becomes very tedious and error-prone. In fact, we can run a script before actual compilation to generate this data, which is called Code Generation. For related content, please refer to Freely control C++ code with clang tools.\nQuery Metadata After generation, the next step is to query the metadata. Many languages\u0026rsquo; built-in reflection modules, such as Python\u0026rsquo;s inspect, Java\u0026rsquo;s Reflection, and C#\u0026rsquo;s System.Reflection, actually encapsulate some operations, making it more convenient for users to avoid direct contact with raw metadata.\nIt is worth noting that the queries in the above cases all occur at runtime. Searching and matching based on strings at runtime is actually a relatively slow process, which is why we often say that calling methods via reflection is slower than calling them normally.\nFor C++, the compiler provides some limited interfaces for us to access (reflect) some information at compile time. For example, decltype can be used to get the type of a variable, and further determine whether two variable types are equal, whether they are a subclass of a certain type, etc., but the functionality is very limited.\nHowever, you can generate metadata yourself using the method from the previous subsection, mark them all as constexpr, and then query them at compile time. In fact, C++26\u0026rsquo;s static reflection follows this idea: the compiler generates metadata and exposes some interfaces for users to query. For related content, please see Analysis of C++26 Static Reflection Proposal. The timing of the query is the distinction between so-called dynamic reflection and static reflection.\nOf course, what can be done at compile time is certainly not as much as at runtime. For example, if you want to create a class instance based on a runtime type name, it\u0026rsquo;s impossible to do so at compile time. But you can build dynamic reflection based on this static metadata. For related content, please see Implementing Object in C++.\nOperate Metadata Then, further operations are performed based on the metadata, such as code generation. In C++, this can be understood as compile-time code generation, while in Java and C#, it can be considered runtime code generation. See Various ways to generate code for details.\nConclusion Finally, let\u0026rsquo;s use the three steps above to break down reflection in different languages:\nPython, JavaScript, Java, C#: Metadata is generated by the compiler/interpreter, standard libraries provide interfaces, users can query metadata at runtime, and thanks to the Virtual Machine (VM), code can be conveniently generated. Go: Metadata is generated by the compiler, standard libraries provide interfaces, users can query metadata at runtime. However, since Go is primarily AOT (Ahead-of-Time) compiled, runtime code generation is not convenient. Zig, C++26 Static Reflection: Metadata is generated by the compiler, standard libraries provide interfaces, users can query metadata at compile time. Similarly, since it\u0026rsquo;s AOT compiled, runtime code generation is not convenient, but code generation can be performed at compile time. QT and UE, on the other hand, generate their own metadata through code generation, encapsulate interfaces, and allow users to query metadata at runtime. The underlying principle is similar to Go\u0026rsquo;s reflection.\nI hope this series of tutorials is helpful to you! If there are any errors, please feel free to discuss them in the comments section. Thank you for reading.\n","permalink":"https://www.ykiko.me/en/articles/669358870/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"what-is-reflection\"\u003eWhat is Reflection?\u003c/h2\u003e\n\u003cp\u003eThe term Reflection is probably not new to anyone; perhaps you haven\u0026rsquo;t used it, but you\u0026rsquo;ve certainly heard of it. However, like many other \u003cstrong\u003eidiomatic terms\u003c/strong\u003e in the CS field, there isn\u0026rsquo;t a clear and precise definition for reflection. This leads to a situation where, for languages like C#, Java, and Python that have reflection, discussing it naturally brings to mind related facilities, APIs, and code examples in those languages, making it very concrete. But for languages like C, C++, and Rust, which don\u0026rsquo;t have reflection, when reflection is discussed, people are often unsure what the other person is referring to, making it very abstract. For example, someone might tell me that Rust has reflection, and the example they provide is the introduction to \u003ca href=\"https://doc.rust-lang.org/stable/std/any/index.html\"\u003estd::Any module\u003c/a\u003e in Rust\u0026rsquo;s official documentation. It mentions:\u003c/p\u003e","title":"A Reflection Tutorial for C++ Programmers"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nRecently, I\u0026rsquo;ve been planning to write a series of articles discussing the concept of reflection in detail. Coincidentally, C++26 has a new reflection proposal, and I noticed there aren\u0026rsquo;t many related articles on Zhihu, despite this topic being frequently discussed. So, I\u0026rsquo;m taking this opportunity to talk about static reflection in C++, as a warm-up for the series.\nThis article is outdated. Static reflection has officially entered C++26. Please refer to Reflection for C++26!!!\nWhat is Static Reflection? First, what exactly is reflection? This term, like many other idioms in computer science, doesn\u0026rsquo;t have a detailed and precise definition. I won\u0026rsquo;t discuss this question in this article; I\u0026rsquo;ll explain it in detail in subsequent articles. The focus of this article is C++\u0026rsquo;s static reflection. Why emphasize \u0026ldquo;static\u0026rdquo;? Mainly because when we usually talk about reflection, we almost always refer to reflection in languages like Java, C#, and Python, and their implementations invariably involve type erasure and querying information at runtime. This approach, of course, has unavoidable runtime overhead, which clearly violates C++\u0026rsquo;s zero-cost abstraction principle. To distinguish it from their reflection, the qualifier \u0026ldquo;static\u0026rdquo; is added, also indicating that C++\u0026rsquo;s reflection is completed at compile time. Of course, this statement still lacks some rigor. A detailed discussion will be provided in subsequent articles; you just need to know that C++\u0026rsquo;s static reflection is different from Java, C#, and Python\u0026rsquo;s reflection, and it is primarily completed at compile time.\nWhat can static reflection do? Type as Value As we all know, with the continuous updates of C++ versions, the capabilities of compile-time computation have been constantly enhanced. Through constexpr/consteval functions, we can largely reuse runtime code directly, making compile-time computation convenient. This has completely replaced the method of using template metaprogramming for compile-time computation from a long time ago. It\u0026rsquo;s not only easier to write but also compiles faster.\nObserve the following snippets of code for compile-time factorial calculation:\nIn C++03/98, we could only achieve this through recursive template instantiation, and the code could not be reused at runtime.\ntemplate\u0026lt;int N\u0026gt; struct factorial { enum { value = N * factorial\u0026lt;N - 1\u0026gt;::value }; }; template\u0026lt;\u0026gt; struct factorial\u0026lt;0\u0026gt; { enum { value = 1 }; }; C++11 first introduced the concept of constexpr functions, allowing us to write code that can be reused at both compile time and runtime. However, there were many restrictions; without variables and loops, we could only write code in a purely functional style.\nconstexpr int factorial(int n) { return n == 0 ? 1 : n * factorial(n - 1); } int main() { constexpr std::size_t a = factorial(5); // Compile-time calculation std::size_t\u0026amp; n = *new std::size_t(6); std::size_t b = factorial(n); // Runtime calculation std::cout \u0026lt;\u0026lt; a \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; b \u0026lt;\u0026lt; std::endl; } With the arrival of C++14/17, the restrictions in constexpr functions were further relaxed. Now, local variables and loops can be used in constexpr functions, as shown below:\nconstexpr std::size_t factorial(std::size_t N) { std::size_t result = 1; for (std::size_t i = 1; i \u0026lt;= N; ++i) { result *= i; } return result; } After C++20, we can also use new/delete at compile time, allowing us to use vector in compile-time code. Many runtime codes can be directly reused at compile time without any changes, simply by adding a constexpr marker before the function. There\u0026rsquo;s no longer a need to use obscure template metaprogramming for compile-time calculations. However, the examples above only apply to values. In C++, besides values, there are also types and higher-kind types.\ntemplate\u0026lt;typename ...Ts\u0026gt; struct type_list; template\u0026lt;typename T, typename U, typename ...Ts\u0026gt; struct find_first_of { constexpr static auto value = find_first_of\u0026lt;T, Ts...\u0026gt;::value + 1; }; template\u0026lt;typename T, typename ...Ts\u0026gt; struct find_first_of\u0026lt;T, T, Ts...\u0026gt; { constexpr static std::size_t value = 0; }; static_assert(find_first_of\u0026lt;int, double, char, int, char\u0026gt;::value == 2); Since types and higher-kind types can only be template arguments, they still have to be processed through recursive template matching. It would be great if we could manipulate them like values, so constexpr functions could also handle them. But C++ is not a language like Zig, where type is value. What to do? No problem, we can just map types to values, right? To achieve the effect of type as value. Before static reflection was added, we could achieve this effect through some tricks. We could map types to type names at compile time, and then just compute on the type names. For how to perform this mapping, you can refer to How to elegantly convert enum to string in C++.\ntemplate\u0026lt;typename ...Ts\u0026gt; struct type_list{}; template\u0026lt;typename T, typename ...Ts\u0026gt; constexpr std::size_t find(type_list\u0026lt;Ts...\u0026gt;) { // type_name is used to get the compile-time type name std::array arr{ type_name\u0026lt;Ts\u0026gt;()... }; for(auto i = 0; i \u0026lt; arr.size(); i++) { if(arr[i] == type_name\u0026lt;T\u0026gt;()) { return i; } } } This code is very intuitive, but it\u0026rsquo;s more difficult if we want to map values back to types. However, it doesn\u0026rsquo;t matter, in the upcoming static reflection, this bidirectional mapping between types and values has become a language feature, and we no longer need to handle it manually.\nUse the ^ operator to map a type to a value.\nconstexpr std::meta::info value = ^int; Use [: ... :] to map it back. Note that this is a symbol-level mapping.\nusing Int = typename[:value:]; // In this context, typename can be omitted typename[:value:] a = 3; // Equivalent to int a = 3; Now we can write code like this:\ntemplate\u0026lt;typename ...Ts\u0026gt; struct type_list { constexpr static std::array types = {^Ts...}; template\u0026lt;std::size_t N\u0026gt; using at = typename[:types[N]:]; }; using Second = typename type_list\u0026lt;int, double, char\u0026gt;::at\u0026lt;1\u0026gt;; static_assert(std::is_same_v\u0026lt;Second, double\u0026gt;); No more recursive matching; we can compute types like values. Once you understand this mapping relationship, writing code becomes very simple. Template metaprogramming for type computation can now retire!\nIn fact, ^ can not only map types, but also has the following main functions:\n^:: —— Represents the global namespace ^namespace-name —— Namespace name ^type-id —— Type ^cast-expression —— Special expressions, currently including: Primary expression representing a function or member function Primary expression representing a variable, static member variable, or structured binding Primary expression representing a non-static member Primary expression representing a template Constant expression Similarly, [: ... :] can restore to the corresponding entities. Note that it restores to the corresponding symbols, so this operator is called a Splicer.\n[: r :] —— Restores to the corresponding entity or expression typename[: r :] —— Restores to the corresponding type template[: r :] —— Restores to template arguments namespace[: r :] —— Restores to a namespace [:r:]:: —— Restores to the corresponding namespace, class, or enum nested specifier See the following example:\nint x = 0; void g() { [:^x:] = 42; // Okay. Same as: x = 42; } If the restored entity is different from what was originally stored, it will result in a compilation error.\ntypename[: ^:: :] x = 0; // Error metainfo Just the feature above is enough to be exciting. However, there\u0026rsquo;s much more; the ability to obtain metadata for entities like class is also available.\nMost basically, getting the type name (variable name, field name can all use this function):\nnamespace std::meta { consteval auto name_of(info r) -\u0026gt; string_view; consteval auto display_name_of(info r) -\u0026gt; string_view; } For example:\ndisplay_name_of(^std::vector\u0026lt;int\u0026gt;) // =\u0026gt; std::vector\u0026lt;int\u0026gt; name_of(^std::vector\u0026lt;int\u0026gt;) // =\u0026gt; std::vector\u0026lt;int, std::allocator\u0026lt;int\u0026gt;\u0026gt; Determine if a template is a specialization of another higher-order template and extract the parameters from the higher-order template:\nnamespace std::meta { consteval auto template_of(info r) -\u0026gt; info; consteval auto template_arguments_of(info r) -\u0026gt; vector\u0026lt;info\u0026gt;; } std::vector\u0026lt;int\u0026gt; v = {1, 2, 3}; static_assert(template_of(type_of(^v)) == ^std::vector); static_assert(template_arguments_of(type_of(^v))[0] == ^int); Fill template parameters into a higher-order template:\nnamespace std::meta { consteval auto substitute(info templ, span\u0026lt;info const\u0026gt; args) -\u0026gt; info; } constexpr auto r = substitute(^std::vector, std::vector{^int}); using T = [:r:]; // Ok, T is std::vector\u0026lt;int\u0026gt; Get member information for struct, class, union, enum:\nnamespace std::meta{ template\u0026lt;typename ...Fs\u0026gt; consteval auto members_of(info class_type, Fs ...filters) -\u0026gt; vector\u0026lt;info\u0026gt;; template\u0026lt;typename ...Fs\u0026gt; consteval auto nonstatic_data_members_of(info class_type, Fs ...filters) -\u0026gt; vector\u0026lt;info\u0026gt; { return members_of(class_type, is_nonstatic_data_member, filters...); } template\u0026lt;typename ...Fs\u0026gt; consteval auto bases_of(info class_type, Fs ...filters) -\u0026gt; vector\u0026lt;info\u0026gt; { return members_of(class_type, is_base, filters...); } template\u0026lt;typename ...Fs\u0026gt; consteval auto enumerators_of(info class_type, Fs ...filters) -\u0026gt; vector\u0026lt;info\u0026gt;; template\u0026lt;typename ...Fs\u0026gt; consteval auto subobjects_of(info class_type, Fs ...filters) -\u0026gt; vector\u0026lt;info\u0026gt;; } With this, we can implement features like iterating through structs and enums. Further, we can implement advanced features like serialization and deserialization. Some examples will be given later. In addition, there are other compile-time functions for various features; only a part of the content is shown above. More APIs can be found in the proposal. Since functions are provided to directly get parameters from higher-order templates, there is no longer a need to use templates for type extraction! Template metaprogramming for type extraction can also retire.\nBetter compile facilities The main part of reflection has been introduced; now let\u0026rsquo;s talk about other things. Although these are contents of other proposals, they can make code easier to write and give it stronger expressive power.\ntemplate for How to generate a large number of code snippets in C++ is a very difficult problem to solve. Thanks to C++\u0026rsquo;s unique (and amazing) mechanism, current code snippet generation is almost entirely based on lambda expressions + variadic pack expansion. Look at the example below:\nconstexpr auto dynamic_tuple_get(std::size_t N, auto\u0026amp; tuple) { constexpr auto size = std::tuple_size_v\u0026lt;std::decay_t\u0026lt;decltype(tuple)\u0026gt;\u0026gt;; [\u0026amp;]\u0026lt;std::size_t ...Is\u0026gt;(std::index_sequence\u0026lt;Is...\u0026gt;) { auto f = [\u0026amp;]\u0026lt;std::size_t Index\u0026gt; { if(Index == N) { std::cout \u0026lt;\u0026lt; std::get\u0026lt;Index\u0026gt;(tuple) \u0026lt;\u0026lt; std::endl; } }; (f.template operator()\u0026lt;Is\u0026gt;(), ...); }(std::make_index_sequence\u0026lt;size\u0026gt;{}); } int main() { std::tuple tuple = {1, \u0026#34;Hello\u0026#34;, 3.14, 42}; auto n1 = 0; dynamic_tuple_get(n1, tuple); // 1 auto n2 = 3; dynamic_tuple_get(n2, tuple); // 42 } A classic example, the principle is to distribute runtime variables to compile-time constants through multiple branch judgments. This achieves accessing elements in a tuple based on a runtime index. Note: A more efficient way here would be to generate an array of function pointers at compile time and then jump directly based on the index, but this is just for demonstration, don\u0026rsquo;t dwell on it too much.\nThe expanded code above is equivalent to:\nconstexpr auto dynamic_tuple_get(std::size_t N, auto\u0026amp; tuple) { if(N == 0) { std::cout \u0026lt;\u0026lt; std::get\u0026lt;0\u0026gt;(tuple) \u0026lt;\u0026lt; std::endl; } // ... if(N == 3) { std::cout \u0026lt;\u0026lt; std::get\u0026lt;3\u0026gt;(tuple) \u0026lt;\u0026lt; std::endl; } } It can be seen that we used an extremely awkward way to achieve an extremely simple effect. Moreover, since a lambda is essentially a function, it cannot directly return to the parent function from within the lambda. This leads to us doing a lot of redundant if checks.\nWith template for, the code looks much cleaner:\nconstexpr void dynamic_tuple_get(std::size_t N, auto\u0026amp; tuple) { constexpr auto size = std::tuple_size_v\u0026lt;std::decay_t\u0026lt;decltype(tuple)\u0026gt;\u0026gt;; template for(constexpr auto num : std::views::iota(0, size)) { if(num == N) { std::cout \u0026lt;\u0026lt; std::get\u0026lt;num\u0026gt;(tuple) \u0026lt;\u0026lt; std::endl; return; } } } template for can be considered an enhanced syntactic sugar for lambda expansion, and it\u0026rsquo;s very useful. If this is added, using template metaprogramming to generate functions (code) can retire.\nnon-transient constexpr allocation This proposal mainly discusses two issues together.\nC++ can reserve space in the data segment at compile time by controlling template instantiation of static members, which can be seen as compile-time memory allocation. template\u0026lt;auto... items\u0026gt; struct make_array { using type = std::common_type_t\u0026lt;decltype(items)...\u0026gt;; static inline type value[sizeof ...(items)] = {items...}; }; template\u0026lt;auto... items\u0026gt; constexpr auto make_array_v = make_array\u0026lt;items...\u0026gt;::value; int main() { constexpr auto arr = make_array_v\u0026lt;1, 2, 3, 4, 5\u0026gt;; std::cout \u0026lt;\u0026lt; arr[0] \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; arr[1] \u0026lt;\u0026lt; std::endl; // Successfully reserves space in the data segment, storing 1 2 3 4 5 } C++20 allows new in constexpr, but memory newed at compile time must be deleted at compile time. constexpr auto size(auto... Is) { std::vector\u0026lt;int\u0026gt; v = {Is...}; return v.size(); } So, can\u0026rsquo;t we new at compile time and not delete it? And store the actual data in the data segment? This is the problem this proposal aims to solve. It hopes we can use:\nconstexpr std::vector\u0026lt;int\u0026gt; v = {1, 2, 3, 4, 5}; // Global The main difficulty is that memory allocated in the data segment does not have ownership like memory on the heap, and does not require delete. As long as this problem is solved, we can use compile-time std::map and std::vector and retain them at runtime. The author\u0026rsquo;s approach is to use tagging. The specific details will not be discussed here. If this is added, using template metaprogramming to create constant tables can also retire.\nSome examples Alright, after all that, let\u0026rsquo;s see what we can do with reflection.\nprint any type template\u0026lt;typename T\u0026gt; constexpr auto print(const T\u0026amp; t) { template for(constexpr auto member : nonstatic_data_members_of(type_of(^t))) { if constexpr (is_class(type_of(member))) { // If it\u0026#39;s a class, recursively iterate through members println(\u0026#34;{}= \u0026#34;, name_of(member)); print(t.[:member:]); } else { // Non-class types can be printed directly std::println(\u0026#34;{}= {}\u0026#34;, name_of(member), t.[:member:]); } } } enum to string template \u0026lt;typename E\u0026gt; requires std::is_enum_v\u0026lt;E\u0026gt; constexpr std::string enum_to_string(E value) { template for (constexpr auto e : std::meta::members_of(^E)) { if (value == [:e:]) { return std::string(std::meta::name_of(e)); } } return \u0026#34;\u0026lt;unnamed\u0026gt;\u0026#34;; } conclusion I\u0026rsquo;ve spent a long time introducing C++\u0026rsquo;s static reflection. In fact, I\u0026rsquo;m very fond of C++\u0026rsquo;s compile-time computation, and I\u0026rsquo;m also very interested in its history. C++\u0026rsquo;s compile-time computation has been explored step by step, with many wise masters proposing their unique ideas, making the impossible a reality. From the abnormal template metaprogramming of C++03, to constexpr variables in C++11, to the gradual relaxation of restrictions in constexpr functions from C++14 to C++23, moving more and more operations to compile time. And now, with static reflection, C++ is gradually breaking free from the clutches of template metaprogramming. All those old-fashioned template metaprogramming styles can be eliminated! If you haven\u0026rsquo;t written old-style template metaprogramming code before, you probably can\u0026rsquo;t appreciate how terrible it was.\nTo get static reflection into the standard sooner, the author team specifically selected a core subset of the original proposal. I hope, as the author wishes, that static reflection can enter the standard in C++26! Of course, the core part will enter first, and then more useful features will be added, so this is by no means the entirety of reflection.\nExperimental compiler:\nTry online: https://godbolt.org/z/13anqE1Pa Build locally: clang-p2996 Reflection series articles: Reflection Tutorial for C++ Programmers\n","permalink":"https://www.ykiko.me/en/articles/661692275/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eRecently, I\u0026rsquo;ve been planning to write a series of articles discussing the concept of reflection in detail. Coincidentally, C++26 has a new reflection proposal, and I noticed there aren\u0026rsquo;t many related articles on Zhihu, despite this topic being frequently discussed. So, I\u0026rsquo;m taking this opportunity to talk about static reflection in C++, as a warm-up for the series.\u003c/p\u003e","title":"C++26 Static Reflection Proposal Analysis"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nIntroduction In C++, an expression like \u0026amp;T::name returns a pointer to member. It\u0026rsquo;s occasionally used when writing code, but this concept might not be familiar to many. Consider the following code:\nstruct Point { int x; int y; }; int main() { Point point; *(int*)((char*)\u0026amp;point + offsetof(Point, x)) = 20; *(int*)((char*)\u0026amp;point + offsetof(Point, y)) = 20; } In C, we often access struct members by calculating their offsets in this manner. If encapsulated into a function, it could even be used to dynamically access struct members based on passed parameters. However, the code above results in undefined behavior in C++. For specific reasons, you can refer to this discussion on Stack Overflow. But if we indeed have such a requirement, how can we implement it legally? C++ provides an abstraction for us: pointers to members, which allows such operations legally.\nUsage pointer to data member A pointer to a non-static member m of class C can be initialized with \u0026amp;C::m. When \u0026amp;C::m is used inside a member function of C, it can be ambiguous. That is, it could refer to taking the address of member m (\u0026amp;this-\u0026gt;m), or it could refer to a pointer to member. To resolve this, the standard specifies that \u0026amp;C::m denotes a pointer to member, while \u0026amp;(C::m) or \u0026amp;m denotes taking the address of the member m. The corresponding member can be accessed using the operators .* and -\u0026gt;*. The example code is as follows:\nstruct C { int m; void foo() { int C::*x1 = \u0026amp;C::m; // pointer to member m of C int* x2 = \u0026amp;(C::m); // pointer to member this-\u0026gt;m } }; int main() { int C::*p = \u0026amp;C::m; // type of a member pointer is: T U::* // T is the type of the member, U is the class type // here, T is int, U is C C c = {7}; std::cout \u0026lt;\u0026lt; c.*p \u0026lt;\u0026lt; \u0026#39;\\n\u0026#39;; // same as c.m, print 7 C* cp = \u0026amp;c; cp-\u0026gt;m = 10; std::cout \u0026lt;\u0026lt; cp-\u0026gt;*p \u0026lt;\u0026lt; \u0026#39;\\n\u0026#39;; // same as cp-\u0026gt;m, print 10 } A pointer to a data member of a base class can be implicitly converted to a pointer to a data member of a non-virtually inherited derived class. struct Base { int m; }; struct Derived1 : Base {}; // non-virtual inheritance struct Derived2 : virtual Base {}; // virtual inheritance int main() { int Base::*bp = \u0026amp;Base::m; int Derived1::*dp = bp; // ok, implicit cast int Derived2::*dp2 = bp; // error Derived1 d; d.m = 1; std::cout \u0026lt;\u0026lt; d.*dp \u0026lt;\u0026lt; \u0026#39; \u0026#39; \u0026lt;\u0026lt; d.*bp \u0026lt;\u0026lt; \u0026#39;\\n\u0026#39;; // ok, prints 1 1 } Dynamically access struct fields based on the passed pointer. struct Point { int x; int y; }; auto\u0026amp; access(Point\u0026amp; point, auto pm) { return point.*pm; } int main() { Point point; access(point, \u0026amp;Point::x) = 10; access(point, \u0026amp;Point::y) = 20; std::cout \u0026lt;\u0026lt; point.x \u0026lt;\u0026lt; \u0026#39; \u0026#39; \u0026lt;\u0026lt; point.y \u0026lt;\u0026lt; \u0026#39;\\n\u0026#39;; // 10 20 }} pointer to member function A pointer to a non-static member function f can be initialized with \u0026amp;C::f. Since the address of a non-static member function cannot be taken, \u0026amp;(C::f) and \u0026amp;f mean nothing. Similarly, the corresponding member function can be accessed using the operators .* and -\u0026gt;*. If the member function is an overloaded function, to get the corresponding member function pointer, please refer to How to get the address of an overloaded function. The example code is as follows:\nstruct C { void foo(int x) { std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; } }; int main() { using F = void(int); // function type using MP = F C::*; // pointer to member function using T = void (C::*)(int); // pointer to member function static_assert(std::is_same_v\u0026lt;MP, T\u0026gt;); auto mp = \u0026amp;C::foo; T mp2 = \u0026amp;C::foo; static_assert(std::is_same_v\u0026lt;decltype(mp), T\u0026gt;); C c; (c.*mp)(1); // call foo, print 1 C* cp = \u0026amp;c; (cp-\u0026gt;*mp)(2); // call foo, print 2 } A pointer to a member function of a base class can be implicitly converted to a pointer to a member function of a non-virtually inherited derived class. struct Base { void f(int) {} }; struct Derived1 : Base {}; // non-virtual inheritance struct Derived2 : virtual Base {}; // virtual inheritance int main() { void (Base::*bp)(int) = \u0026amp;Base::f; void (Derived1::*dp)(int) = bp; // ok, implicit cast void (Derived2::*dp2)(int) = bp; // error Derived1 d; (d.*dp)(1); // ok } Dynamically call member functions based on passed parameters. struct C { void f(int x) { std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl;} void g(int x) { std::cout \u0026lt;\u0026lt; x + 1 \u0026lt;\u0026lt; std::endl;} }; auto access(C\u0026amp; c, auto pm, auto... args){ return (c.*pm)(args...); } int main(){ C c; access(c, \u0026amp;C::f, 1); // 1 access(c, \u0026amp;C::g, 1); // 2 } Implementation First, it must be clear that the C++ standard does not specify how member pointers are implemented. In this regard, it\u0026rsquo;s similar to virtual functions; the standard does not specify how virtual functions are implemented, only their behavior. Therefore, the implementation of member pointers is entirely implementation defined. Originally, it would be sufficient to understand how to use them without caring about the underlying implementation. However, there are too many incorrect articles on this topic online that have severely misled people, so clarification is necessary.\nFor the three major compilers, GCC follows the Itanium C++ ABI, MSVC follows the MSVC C++ ABI, and Clang can be configured for either ABI through different compilation options. For a detailed discussion of ABIs, please refer to Thoroughly Understanding C++ ABI and How to make dynamic libraries generated by MSVC and GCC interchangeable; we won\u0026rsquo;t go into too much detail here.\nItanium ABI has public documentation, and the following descriptions mainly refer to this document. MSVC ABI has no public documentation, and the following descriptions mainly refer to the blog post MSVC C++ ABI Member Function Pointers. Please note: This article is time-sensitive; future implementations may change. It is for reference only, and official documentation should be considered authoritative.\nFirst, let\u0026rsquo;s try to print the value of a member pointer:\nstruct C { int m; void foo(int x) { std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl;} }; int main(){ int C::* p = \u0026amp;C::m; void (C::* p2)(int) = \u0026amp;C::foo; std::cout \u0026lt;\u0026lt; p \u0026lt;\u0026lt; std::endl; // 1 std::cout \u0026lt;\u0026lt; p2 \u0026lt;\u0026lt; std::endl; // 1 } The output is 1 for both. If you hover over \u0026lt;\u0026lt;, you\u0026rsquo;ll find that an implicit type conversion to bool occurred. \u0026lt;\u0026lt; is not overloaded for member pointer types. We can only inspect their binary representation through other means.\nItanium C++ ABI pointer to data member Generally, a data member pointer can be represented by the following struct, indicating the offset relative to the object\u0026rsquo;s base address. If it\u0026rsquo;s nullptr, it stores -1. In this case, the size of the member pointer is sizeof(ptrdiff_t).\nstruct data_member_pointer{ ptrdiff_t offset; }; As mentioned earlier, the C++ standard does not allow member pointer conversion along virtual inheritance chains. Therefore, the offset required for conversion can be calculated at compile time based on the inheritance relationship, without needing to look up the vtable at runtime.\nstruct A { int a; }; struct B { int b; }; struct C : A, B {}; void log(auto mp) { std::cout \u0026lt;\u0026lt; \u0026#34;offset is \u0026#34; \u0026lt;\u0026lt; *reinterpret_cast\u0026lt;ptrdiff_t*\u0026gt;(\u0026amp;mp) // or use std::bit_cast after C++20 // std::bit_cast\u0026lt;std::ptrdiff_t\u0026gt;(mp) \u0026lt;\u0026lt; std::endl; } int main() { auto a = \u0026amp;A::a; log(a); // offset is 0 auto b = \u0026amp;B::b; log(b); // offset is 0 int C::*c = a; log(c); // offset is 0 // implicit cast int C::*c2 = b; log(c2); // offset is 4 } pointer to member function On mainstream platforms, a member function pointer can generally be represented by the following struct:\nstruct member_function_pointer { std::ptrdiff_t ptr; // function address or vtable offset // if low bit is 0, it\u0026#39;s a function address, otherwise it\u0026#39;s a vtable offset ptrdiff_t offset; // offset to the base(unless multiple inheritance, it\u0026#39;s always 0) }; This implementation relies on some assumptions made by most platforms:\nConsidering address alignment, the lowest bit of a non-static member function\u0026rsquo;s address is almost always 0. A null function pointer is 0, so a null function pointer can be distinguished from a vtable offset. The architecture is byte-addressable, and pointer size is even, so the vtable offset is even. As long as the vtable address, vtable offset, and function type are known, a function call can be made; the specific implementation details are determined by the compiler according to the ABI. Of course, some platforms do not satisfy the above assumptions, such as certain cases on ARM32 platforms, where the implementation method differs from what we just described. So now you should better understand what \u0026ldquo;implementation-defined behavior\u0026rdquo; means: even with the same compiler, the implementation might differ across target platforms.\nIn my environment, x64 Windows, it conforms to the requirements of mainstream implementations. So, based on this ABI, a \u0026ldquo;de-sugaring\u0026rdquo; was performed.\nstruct member_func_pointer { std::size_t ptr; ptrdiff_t offset; }; template \u0026lt;typename Derived, typename Ret, typename Base, typename... Args\u0026gt; Ret invoke(Derived\u0026amp; object, Ret (Base::*ptr)(Args...), Args... args) { Ret (Derived::*dptr)(Args...) = ptr; member_func_pointer mfp = *(member_func_pointer*)(\u0026amp;dptr); using func = Ret (*)(void*, Args...); void* self = (char*)\u0026amp;object + mfp.offset; func fp = nullptr; bool is_virtual = mfp.ptr \u0026amp; 1; if(is_virtual) { auto vptr = (char*)(*(void***)self); auto voffset = mfp.ptr - 1; auto address = *(void**)(vptr + voffset); fp = (func)address; } else { fp = (func)mfp.ptr; } return fp(self, args...); } struct A { int a; A(int a) : a(a) {} virtual void foo(int b) { std::cout \u0026lt;\u0026lt; \u0026#34;A::foo \u0026#34; \u0026lt;\u0026lt; a \u0026lt;\u0026lt; b \u0026lt;\u0026lt; std::endl; } void bar(int b) { std::cout \u0026lt;\u0026lt; \u0026#34;A::bar \u0026#34; \u0026lt;\u0026lt; a \u0026lt;\u0026lt; b \u0026lt;\u0026lt; std::endl; } }; int main() { A a = {4}; invoke(a, \u0026amp;A::foo, 3); // A::foo 43 invoke(a, \u0026amp;A::bar, 3); // A::bar 43 } MSVC C++ ABI MSVC\u0026rsquo;s implementation for this is very complex and also extends the C++ standard. If you want a detailed and comprehensive understanding, it is still recommended to read the blog post mentioned above.\nThe C++ standard does not allow conversion of virtual base class member pointers to derived class member pointers, but MSVC does.\nstruct Base { int m; }; struct Derived1 : Base {}; // non-virtual inheritance struct Derived2 : virtual Base {}; // virtual inheritance int main() { int Base::*bp = \u0026amp;Base::m; int Derived1::*dp = bp; // ok, implicit cast int Derived2::*dp2 = bp; // ok in MSVC， error in GCC } To avoid wasting space, even within the same program, MSVC\u0026rsquo;s member pointer size can vary (whereas in Itanium, due to uniform implementation, they are always the same size). MSVC handles different situations differently.\nAlso note that MSVC\u0026rsquo;s implementation of virtual inheritance differs from Itanium\u0026rsquo;s. You can refer to the relevant introduction in the article C++ Virtual Function and Virtual Inheritance Memory Model.\npointer to data member For non-virtual inheritance, the implementation is similar to GCC\u0026rsquo;s, except for some size differences. In 64-bit programs, GCC uses 8 bytes, while MSVC uses 4 bytes. Both use -1 to represent nullptr.\nstruct data_member_pointer { int offset; }; For virtual inheritance (a standard extension), an additional voffset needs to be stored. This is used at runtime to find the offset of the corresponding virtual base class member from the vtable.\nstruct Base { int m; }; struct Base2 { int n; }; struct Base3 { int n; }; struct Derived : virtual Base, Base2, Base3 {}; struct dmp { int offset; int voffset; }; template \u0026lt;typename T\u0026gt; void log(T mp) { dmp d = *reinterpret_cast\u0026lt;dmp*\u0026gt;(\u0026amp;mp); std::cout \u0026lt;\u0026lt; \u0026#34;offset is \u0026#34; \u0026lt;\u0026lt; d.offset \u0026lt;\u0026lt; \u0026#34;, voffset is \u0026#34; \u0026lt;\u0026lt; d.voffset \u0026lt;\u0026lt; std::endl; } int main() { int Derived::*dp = \u0026amp;Base::m; log(dp); // offset is 0, voffset is 4 dp = \u0026amp;Base3::n; log(dp); // offset is 4, voffset is 0 } pointer to member function Member function pointers are even more complex, with four cases:\nNon-virtual inheritance, non-multiple inheritance struct member_function_ptr{ void* address; }; Non-virtual inheritance, multiple inheritance struct member_function_ptr{ void* address; int offset; }; Virtual inheritance, multiple inheritance struct member_function_ptr{ void* address; int offset; int vindex; }; Unknown inheritance struct member_function_ptr{ void* address; int offset; int vadjust; // use to find vptr int vindex; }; Also note: In 32-bit programs, the calling convention for member functions is different from ordinary functions. So, if you want to convert to a function pointer and call it, you need to specify the calling convention in the function pointer, otherwise the call will fail.\nConclusion When discussing C++ issues, never take things for granted; your test results on a specific platform do not represent all possible implementations. Moreover, MSVC has already told you that even within the same program, your tests might not cover all cases. I was startled when I first discovered that MSVC\u0026rsquo;s member function pointer sizes varied, thinking there was an issue with my code. If you wish to write a std::function-like container and want to perform SBO optimization, it\u0026rsquo;s best to set the SBO size to 16 bytes or more to cover most member function pointers.\nIf member functions are needed as callbacks, it is recommended to wrap them with a lambda expression, like this:\nstruct A { int a; void bar(int b) { std::cout \u0026lt;\u0026lt; \u0026#34;A::bar \u0026#34; \u0026lt;\u0026lt; a \u0026lt;\u0026lt; b \u0026lt;\u0026lt; std::endl; } }; int main() { auto f = +[](A\u0026amp; a, int b) { a.bar(b); }; // + is unary plus operator, use to cast a non-capturing lambda to a function pointer // f is function pointer } After C++23, if member functions are defined using explicit this, then \u0026amp;C::f can directly obtain the function pointer for the corresponding member function, without needing an extra wrapper like above.\nstruct A { void bar(this A\u0026amp; self, int b); }; auto p = \u0026amp;A::bar; // p is function pointer, rather than member function pointer ","permalink":"https://www.ykiko.me/en/articles/659510753/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn C++, an expression like \u003ccode\u003e\u0026amp;T::name\u003c/code\u003e returns a pointer to member. It\u0026rsquo;s occasionally used when writing code, but this concept might not be familiar to many. Consider the following code:\u003c/p\u003e","title":"C++ Pointers to Members: A Comprehensive Guide"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nThe concept of templates in C++ has existed for over twenty years. As one of the language\u0026rsquo;s most important constructs, there is no shortage of related discussion. Unfortunately, truly in-depth and valuable discussions are rare — especially those that examine the feature from multiple perspectives. Many articles on templates tend to entangle the topic with various syntactic details, easily leaving readers with a hazy impression. Similar things happen elsewhere: introductions to coroutines often mix them with all kinds of I/O concerns, and discussions of reflection seem confined to reflection in Java or C#. This isn\u0026rsquo;t unreasonable, but it often leaves readers unable to grasp the essence. After consuming a lot of content, one still can\u0026rsquo;t get to the heart of the matter, and it becomes easy to conflate different concepts.\nPersonally, I prefer to discuss a topic from multiple levels and angles rather than limiting myself to one particular aspect. This leads to a better understanding of the subject itself and prevents one\u0026rsquo;s perspective from becoming too narrow. This article will therefore attempt to trace the evolution of templates in C++ from their origins, examining the feature through four lenses. Note that this is not a tutorial — it will not dive into syntactic minutiae. The focus is on design philosophy and trade-offs. A basic familiarity with templates is sufficient to follow along. Some rigor may be sacrificed as a result; if there are any errors, please feel free to discuss them in the comments.\nWe will primarily discuss four themes:\nCode Generation Type Constraint Compile-time Computing Type Manipulation The first theme is generally considered to be ordinary Template usage. The latter three are usually grouped under TMP — Template Meta-programming. Because the original intent of template design was not to realize these three functions, yet they ended up being implemented through some clever (and often arcane) tricks, with the resulting code being quite obscure and hard to read, they are generally called metaprogramming.\nCode Generation Generic Generic programming, which means writing the same code for different types to achieve code reuse. Before templates were introduced, we could only simulate generics using macros. Consider the simple example below:\n#define add(T) _ADD_IMPL_##T #define ADD_IMPL(T) \\ T add(T)(T a, T b) { \\ return a + b; \\ } ADD_IMPL(int); ADD_IMPL(float); int main() { add(int)(1, 2); add(float)(1.0f, 2.0f); } Its principle is very simple: it replaces the type in a regular function with a macro parameter and uses macro symbol concatenation to generate different names for different type parameters. Then, the IMPL macro is used to generate definitions for specific functions. This process can be called instantiation.\nOf course, this is just the simplest example, and perhaps it looks fine to you. But what if you wanted to implement a vector using macros? Just thinking about it is a bit scary. Specifically, using macros to implement generics has the following disadvantages:\nPoor code readability: Macro concatenation and code logic are coupled, making error messages difficult to read. Difficult to debug: Breakpoints can only be set at the macro expansion location, not inside the macro definition. Requires explicit writing of type parameters: If there are many parameters, it becomes very verbose. Requires manual instantiation of function definitions: In larger codebases, a generic might have dozens of instantiations, and writing them all manually is too cumbersome. These problems are all solved in templates:\ntemplate \u0026lt;typename T\u0026gt; T add(T a, T b) { return a + b; } template int add\u0026lt;\u0026gt;(int, int); // explicit instantiation int main() { add(1, 2); // auto deduce T add(1.0f, 2.0f); // implicit instantiation add\u0026lt;float\u0026gt;(1, 2); // explicitly specify T } Templates are placeholders; they don\u0026rsquo;t require character concatenation and are indistinguishable from ordinary code, except for the template parameter declaration. Errors and debugging can accurately point to the template definition location, not the template instantiation location. Supports automatic template parameter deduction, eliminating the need to explicitly write type parameters, while also supporting explicit specification of type parameters. Supports implicit instantiation, where the compiler automatically instantiates used functions. It also supports explicit instantiation, which is manual instantiation. In addition, there are a series of features such as partial specialization, full specialization, variadic templates, and variable templates, none of which can be achieved with macros alone. It is precisely because of the advent of templates that the implementation of generic libraries like STL became possible.\nTable Gen The generics mentioned above can be seen as the most direct use of templates. Based on them, we can have more advanced code generation, for example, generating a fixed table at compile time for runtime lookup. The implementation of std::visit in the standard library uses this technique; below is a simple simulation of it:\ntemplate \u0026lt;typename T, typename Variant, typename Callback\u0026gt; void wrapper(Variant\u0026amp; variant, Callback\u0026amp; callback) { callback(std::get\u0026lt;T\u0026gt;(variant)); } template \u0026lt;typename... Ts, typename Callback\u0026gt; void visit(std::variant\u0026lt;Ts...\u0026gt;\u0026amp; variant, Callback\u0026amp;\u0026amp; callback) { using Variant = std::variant\u0026lt;Ts...\u0026gt;; constexpr static std::array table = {\u0026amp;wrapper\u0026lt;Ts, Variant, Callback\u0026gt;...}; table[variant.index()](variant, callback); } int main() { auto callback = [](auto\u0026amp; value) { std::cout \u0026lt;\u0026lt; value \u0026lt;\u0026lt; std::endl; }; std::variant\u0026lt;int, float, std::string\u0026gt; variant = 42; visit(variant, callback); variant = 3.14f; visit(variant, callback); variant = \u0026#34;Hello, World!\u0026#34;; visit(variant, callback); return 0; } Although the type of elements stored in variant can only be determined at runtime, the set of possible types it can take can be determined at compile time. So, we use callback to instantiate a corresponding wrapper function for each possible type in the set and store them in an array. At runtime, we can directly use variant\u0026rsquo;s index to access the corresponding member in the array to complete the call.\nOf course, using the folding expression introduced in C++17, we actually have a better approach:\ntemplate \u0026lt;typename... Ts, typename Callback\u0026gt; void visit(std::variant\u0026lt;Ts...\u0026gt;\u0026amp; variant, Callback\u0026amp;\u0026amp; callback) { auto foreach = []\u0026lt;typename T\u0026gt;(std::variant\u0026lt;Ts...\u0026gt;\u0026amp; variant, Callback\u0026amp; callback) { if(auto value = std::get_if\u0026lt;T\u0026gt;(\u0026amp;variant)) { callback(*value); return true; } return false; }; (foreach.template operator()\u0026lt;Ts\u0026gt;(variant, callback) || ...); } By utilizing the short-circuiting property of logical operators, we can exit the evaluation of subsequent folding expressions early, and shorter functions are more conducive to inlining.\nType Constraint I agree with everything else, but template error messages are clearly not easy to read! Compared to macros, isn\u0026rsquo;t it just a case of \u0026ldquo;fifty paces behind a hundred paces\u0026rdquo;? Perhaps even worse. Easily producing hundreds or thousands of lines of errors, I think only C++ templates can do that.\nThis is the next problem to discuss: why are C++ compilation error messages so long? And sometimes very difficult to understand.\nFunction Overload Consider this simple example with only a few lines:\nstruct A {}; int main() { std::cout \u0026lt;\u0026lt; A{} \u0026lt;\u0026lt; std::endl; return 0; } On my GCC compiler, it produced a whopping 239 lines of error messages. The good news is that GCC highlighted the critical part, as shown below:\nno match for \u0026#39;operator\u0026lt;\u0026lt;\u0026#39; (operand types are \u0026#39;std::ostream\u0026#39; {aka \u0026#39;std::basic_ostream\u0026lt;char\u0026gt;\u0026#39;} and \u0026#39;A\u0026#39;) 9 | std::cout \u0026lt;\u0026lt; A{} \u0026lt;\u0026lt; std::endl; | ~~~~~~~~~ ^~ ~~~ | | | | | A | std::ostream {aka std::basic_ostream\u0026lt;char\u0026gt;} That\u0026rsquo;s probably understandable; it means no matching overloaded function was found, so we need to overload operator\u0026lt;\u0026lt; for A. But what we\u0026rsquo;re curious about is, what are the remaining two hundred lines of errors doing? The key lies in Overload Resolution. Let\u0026rsquo;s look at one piece of information:\nnote: template argument deduction/substitution failed: note: cannot convert \u0026#39;A()\u0026#39; (type \u0026#39;A\u0026#39;) to type \u0026#39;const char*\u0026#39; 9 | std::cout \u0026lt;\u0026lt; A{} \u0026lt;\u0026lt; std::endl; This means that an attempt was made to match type A with the const char* overload (via implicit type conversion), and it failed. Standard library functions like this have many overloads, for example, operator\u0026lt;\u0026lt; is overloaded for int, bool, long, double, etc., nearly dozens of functions. The error message then lists the reasons why all these overloaded functions failed, easily reaching hundreds of lines. Coupled with the cryptic naming in the standard library, it looks like gibberish.\nInstantiation Stack Function overloading is one reason why error messages are difficult to read, but not the only one. In fact, as shown above, merely enumerating all possibilities only results in a few hundred lines of errors. Keep in mind that we can still produce thousands of lines; the difference in magnitude cannot be easily compensated by quantity. Moreover, this subsection is about type constraints, so what does it have to do with compiler errors? Consider the following example:\nstruct A {}; struct B {}; template \u0026lt;typename T\u0026gt; void test(T a, T b) { std::cout \u0026lt;\u0026lt; a \u0026lt;\u0026lt; b \u0026lt;\u0026lt; std::endl; } int main() { test(A{}, B{}); // #1: a few lines test(A{}, A{}); // #2: hundred lines } In the example above, #1 only has a few lines of error messages, while #2 has hundreds of lines. Why is there such a large difference? Do you remember the two advantages of templates over macros that we discussed in the first part? One is automatic type deduction, and the other is implicit instantiation. Only when template parameter deduction succeeds will template instantiation be triggered, and only then will errors in the function body be checked.\nIn test(A{}, B{}), template parameter deduction failed. This is because the test function implies an important condition: that the types of a and b are the same. So, it actually reports an error that no matching function was found. For the second function, test(A{}, A{}), template parameter deduction succeeded, and it entered the instantiation phase, but an error occurred during instantiation. This means that T has been deduced as A, and an error occurred when trying to substitute A into the function body. Therefore, the compiler has to list the reasons for the substitution failure within the function body.\nThis leads to a problem: when there are many layers of nested templates, an error might occur in the innermost template function, but the compiler has to print out the entire template instantiation stack.\nSo what\u0026rsquo;s the use of constraining types? Look at the example below:\nstruct A {}; template \u0026lt;typename T\u0026gt; void print1(T x) { std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; } template \u0026lt;typename T\u0026gt; // requires requires (T x) { std::cout \u0026lt;\u0026lt; x; } void print2(T x) { print1(x); std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; } int main() { print2(A{}); return 0; } Just a few lines, but on my GCC, it produced 700 lines of compilation errors. If we make a slight change and uncomment the commented line of code, the error message in this situation is only a few lines:\nIn substitution of \u0026#39;template\u0026lt;class T\u0026gt; requires requires(T x) {std::cout \u0026lt;\u0026lt; x;} void print2(T) [with T = A]\u0026#39;: required from here required by the constraints of \u0026#39;template\u0026lt;class T\u0026gt; requires requires(T x) {std::cout \u0026lt;\u0026lt; x;} void print2(T)\u0026#39; in requirements with \u0026#39;T x\u0026#39; [with T = A] note: the required expression \u0026#39;(std::cout \u0026lt;\u0026lt; x)\u0026#39; is invalid 15 | requires requires (T x) { std::cout \u0026lt;\u0026lt; x; } This means that an instance x of type A does not satisfy the requires clause std::cout \u0026lt;\u0026lt; x. In fact, with such syntax, we can limit errors to the type deduction phase, without proceeding to instantiation. This results in much more concise error messages.\nIn other words, requires allows us to prevent the propagation of compilation errors. However, unfortunately, constraint-related syntax was only added in C++20. What about before that?\nBefore C++20 Before C++20, we didn\u0026rsquo;t have such a convenient method. We could only achieve similar functionality, constraining types, through a technique called SFINAE. For example, the functionality above could only be written like this before C++20:\ntemplate \u0026lt;typename T, typename = decltype(std::cout \u0026lt;\u0026lt; std::declval\u0026lt;T\u0026gt;())\u0026gt; void print2(T x) { print1(x); std::cout \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; } I won\u0026rsquo;t go into the specific rules here; if you\u0026rsquo;re interested, you can search for related articles.\nThe result is:\ntypename = decltype(std::cout \u0026lt;\u0026lt; std::declval\u0026lt;T\u0026gt;()) This line of code is baffling; it\u0026rsquo;s completely unclear what it\u0026rsquo;s trying to express. Only after a deep understanding of C++ template rules can one comprehend what it\u0026rsquo;s doing. For why requires was only added in C++20, you can read the autobiography written by the creator of C++ himself.\nCompile-time Computing Meaning First, it must be affirmed that compile-time computation is definitely useful. As for how significant it is in specific scenarios, that certainly cannot be generalized. Many people dread compile-time computation, calling it hard to understand, a \u0026ldquo;dragon-slaying skill,\u0026rdquo; or worthless. This can easily mislead beginners. In fact, such demands do exist. If a programming language lacks this feature but there is a need for it, programmers will find ways to implement it through other means.\nI will give two examples to illustrate:\nFirst, the compiler\u0026rsquo;s optimization of constant expressions, which I believe everyone is familiar with. In extremely simple cases, like 1+1+x, the compiler will optimize it to 2+x. In fact, modern compilers can perform many optimizations for similar situations, such as this question. The questioner asked whether the C language\u0026rsquo;s strlen function, when its parameter is a constant string, would directly optimize the function call into a constant. For example, would strlen(\u0026quot;hello\u0026quot;) be directly optimized to 5? From the experimental results of mainstream compilers, the answer is yes. Similar situations are countless; you are using compile-time computation without even realizing it. It\u0026rsquo;s just categorized as part of compiler optimization. However, the compiler\u0026rsquo;s optimization capabilities always have limits, and allowing users to define such optimization rules themselves would be more flexible and free. For example, in C++, if strlen is explicitly constexpr, this optimization will necessarily occur. Second, in the early days of programming language development, when compiler optimization capabilities were not as strong, external scripting languages were already widely used to pre-calculate data (or even generate code) to reduce runtime overhead. A typical example is calculating constant tables like trigonometric function tables, which can then be used directly at runtime. For instance, running a script to generate some necessary code before compiling the main code. C++\u0026rsquo;s compile-time computation has clear semantic guarantees and is embedded within the language, allowing good interaction with other parts. From this perspective, it effectively solves the two problems mentioned above. Of course, many people\u0026rsquo;s criticisms of it are not without reason: compile-time computation performed through template metaprogramming results in ugly and obscure code, involves many syntactic details, and significantly slows down compilation time and increases binary file size. Undeniably, these problems do exist. However, with continuous updates to C++ versions, compile-time computation is now very easy to understand, no longer requiring complex template metaprogramming, and even beginners can quickly learn it because it is almost identical to runtime code. We will gradually clarify this as we trace its development history.\nHistory Historically, TMP was an accident. In the process of standardizing the C++ language, it was discovered that its template system was Turing-complete, meaning it could, in principle, compute anything computable. The first concrete demonstration was a program written by Erwin Unruh that computed prime numbers, although it didn\u0026rsquo;t actually compile: the list of prime numbers was part of the error message generated by the compiler when trying to compile the code. For a specific example, please refer to here.\nAs an introductory programming example, here\u0026rsquo;s a method for compile-time factorial calculation:\ntemplate \u0026lt;int N\u0026gt; struct factorial { enum { value = N * factorial\u0026lt;N - 1\u0026gt;::value }; }; template \u0026lt;\u0026gt; struct factorial\u0026lt;0\u0026gt; { enum { value = 1 }; }; constexpr auto value = factorial\u0026lt;5\u0026gt;::value; // =\u0026gt; 120 This code could compile even before C++11. After that, C++ introduced many new features to simplify compile-time computation. The most important one is the constexpr keyword. It can be seen that before C++11, we didn\u0026rsquo;t have a suitable way to represent the concept of a compile-time constant, only borrowing enum to express it. After C++11, we can write it like this:\ntemplate \u0026lt;int N\u0026gt; struct factorial { constexpr static int value = N * factorial\u0026lt;N - 1\u0026gt;::value; }; template \u0026lt;\u0026gt; struct factorial\u0026lt;0\u0026gt; { constexpr static int value = 1; }; Despite some simplification, we are still relying on templates for compile-time computation. Code written this way is difficult to read, mainly due to two points:\nTemplate parameters can only be compile-time constants; there is no concept of compile-time variables, neither global nor local. Programming can only be done through recursion, not loops. Imagine if, in your everyday coding, variables and loops were forbidden; how uncomfortable would that be to write?\nAre there programming languages that satisfy these two characteristics? In fact, programming languages that satisfy these two points are generally called pure functional programming languages. Haskell is a typical example. However, Haskell has powerful pattern matching, and once familiar with Haskell\u0026rsquo;s way of thinking, one can write concise and elegant code (and Haskell itself can simulate local variables using do syntax, because using local variables is essentially passing them down as function parameters level by level). C++ has none of these; it inherits all the disadvantages of others and none of the advantages. Fortunately, all these problems have been solved in constexpr function.\nconstexpr std::size_t factorial(std::size_t N) { std::size_t result = 1; for(std::size_t i = 1; i \u0026lt;= N; ++i) { result *= i; } return result; } int main() { constexpr auto a = factorial(5); // compile-time std::size_t\u0026amp; n = *new std::size_t(6); auto b = factorial(n); // run-time } C++ allows a function to be directly modified with the constexpr keyword. This indicates that the function can be called both at runtime and compile-time, with almost no changes to the function\u0026rsquo;s content itself. This way, we can directly reuse runtime code at compile-time. It also allows programming with loops and local variables, meaning it\u0026rsquo;s indistinguishable from ordinary code. Quite astonishing, isn\u0026rsquo;t it? So, compile-time computation has long been commonplace in C++, and users don\u0026rsquo;t need to write complex template metaprogramming. After C++20, almost all standard library functions are also constexpr, allowing us to easily call them, such as compile-time sorting.\nconstexpr auto sort(auto\u0026amp;\u0026amp; range) { std::sort(std::begin(range), std::end(range)); return range; } int main() { constexpr auto arr = sort(std::array{1, 3, 4, 2, 3}); for(auto i: arr) { std::cout \u0026lt;\u0026lt; i; } } True code reuse! If you want this function to execute only at compile time, you can mark it with consteval. Additionally, in C++20, compile-time dynamic memory allocation is allowed; you can use new in a constexpr function for memory allocation, but memory allocated at compile time must also be deallocated at compile time. You can also directly use containers like vector and string at compile time. And please note, constexpr functions compile much faster compared to compile-time computation using templates. If you\u0026rsquo;re curious how the compiler implements this powerful feature, you can imagine that the C++ compiler internally embeds a small interpreter. When it encounters a constexpr function, it interprets it with this interpreter and then returns the calculated result.\nI believe you have fully witnessed C++\u0026rsquo;s efforts in compile-time computation. Compile-time computation has long been decoupled from template metaprogramming and has become a very natural feature in C++, requiring no special syntax yet wielding powerful capabilities. So, from now on, don\u0026rsquo;t panic when C++ compile-time computation is mentioned, thinking it\u0026rsquo;s some \u0026ldquo;dragon-slaying skill.\u0026rdquo; It has already become very gentle and beautiful.\nAlthough compile-time computation has escaped the clutches of template metaprogramming, C++ has not. There are still two situations where we are forced to write awkward template metaprogramming code.\nType Manipulation Match Type How do you determine if two types are equal, or rather, if the types of two variables are equal? Some might think this is redundant, as variable types are known at compile time, so why would we need to check? This question actually arose with generic programming. Consider the following example:\ntemplate \u0026lt;typename T\u0026gt; void test() { if(T == int) { /* ... */ } } Such code aligns with our intuition, but unfortunately, C++ doesn\u0026rsquo;t allow you to write it this way. However, in languages like Python/Java, such syntax does exist, but their checks are mostly performed at runtime. C++ does allow us to operate on types at compile time, but unfortunately, types cannot be first-class citizens, treated as ordinary values; they can only be template parameters. We can only write code like this:\ntemplate \u0026lt;typename T\u0026gt; void test() { if constexpr(std::is_same_v\u0026lt;T, int\u0026gt;) { /* ... */ } } Types can only exist within template parameters, which directly nullifies all the advantages of constexpr compile-time computation mentioned in the previous section. We are back to the Stone Age, without variables or loops.\nBelow is code to check if two type_lists satisfy a subsequence relationship:\ntemplate \u0026lt;typename... Ts\u0026gt; struct type_list {}; template \u0026lt;typename SubFirst, typename... SubRest, typename SuperFirst, typename... SuperRest\u0026gt; constexpr auto is_subsequence_of_impl(type_list\u0026lt;SubFirst, SubRest...\u0026gt;, type_list\u0026lt;SuperFirst, SuperRest...\u0026gt;) { if constexpr(std::is_same_v\u0026lt;SubFirst, SuperFirst\u0026gt;) if constexpr(sizeof...(SubRest) == 0) return true; else return is_subsequence_of(type_list\u0026lt;SubRest...\u0026gt;{}, type_list\u0026lt;SuperRest...\u0026gt;{}); else if constexpr(sizeof...(SuperRest) == 0) return false; else return is_subsequence_of(type_list\u0026lt;SubFirst, SubRest...\u0026gt;{}, type_list\u0026lt;SuperRest...\u0026gt;{}); } template \u0026lt;typename... Sub, typename... Super\u0026gt; constexpr auto is_subsequence_of(type_list\u0026lt;Sub...\u0026gt;, type_list\u0026lt;Super...\u0026gt;) { if constexpr(sizeof...(Sub) == 0) return true; else if constexpr(sizeof...(Super) == 0) return false; else return is_subsequence_of_impl(type_list\u0026lt;Sub...\u0026gt;{}, type_list\u0026lt;Super...\u0026gt;{}); } int main() { static_assert(is_subsequence_of(type_list\u0026lt;int, double\u0026gt;{}, type_list\u0026lt;int, double, float\u0026gt;{})); static_assert(!is_subsequence_of(type_list\u0026lt;int, double\u0026gt;{}, type_list\u0026lt;double, long, char, double\u0026gt;{})); static_assert(is_subsequence_of(type_list\u0026lt;\u0026gt;{}, type_list\u0026lt;\u0026gt;{})); } It\u0026rsquo;s very uncomfortable to write. If I write the same code logic using a constexpr function, replacing type parameters with std::size_t:\nconstexpr bool is_subsequence_of(auto\u0026amp;\u0026amp; sub, auto\u0026amp;\u0026amp; super) { std::size_t index = 0; for(std::size_t i = index; index \u0026lt; sub.size() \u0026amp;\u0026amp; i \u0026lt; super.size(); i++) { if(super[i] == sub[index]) { index++; } } return index == sub.size(); } static_assert(is_subsequence_of(std::array{1, 2}, std::array{1, 2, 3})); static_assert(!is_subsequence_of(std::array{1, 2, 4}, std::array{1, 2, 3})); It instantly becomes a million times cleaner, simply because types are not first-class citizens in C++ and can only be template parameters. When it comes to type-related computations, we are forced to write cumbersome template metaprogramming code. In fact, the need to compute with types has always existed; a typical example is std::variant. When writing operator=, we need to find a certain type from a type list (the variant\u0026rsquo;s template parameter list) and return an index, which is essentially finding an element that satisfies a specific condition from an array. The related implementation will not be shown here. The truly terrible thing is not using template metaprogramming itself, but rather that for C++ itself, such a change as treating types as values is completely unacceptable. This means that this situation will continue indefinitely, and there will be no fundamental change in the future, and this fact is the most disheartening. However, it is still important to realize that not many languages support computing with types; Rust, for example, has almost no support in this area. Although C++ code is awkward to write, at least it can be written.\nBut thankfully, there\u0026rsquo;s another path we can take: mapping types to values through certain means. For example, mapping types to strings, where matching types can be similar to matching strings; we just need to compute with strings, which can achieve a certain degree of type as value. Before C++23, there was no standardized way to perform this mapping. It could be done through some special compiler extensions; you can refer to How to elegantly convert enum to string in C++.\ntemplate \u0026lt;typename... Ts\u0026gt; struct type_list {}; template \u0026lt;typename T, typename... Ts\u0026gt; constexpr std::size_t find(type_list\u0026lt;Ts...\u0026gt;) { // type_name returns the name of the type std::array arr = {type_name\u0026lt;Ts\u0026gt;()...}; for(auto i = 0; i \u0026lt; arr.size(); i++) { if(arr[i] == type_name\u0026lt;T\u0026gt;()) { return i; } } } After C++23, typeid can also be used directly for mapping, instead of string mapping. However, mapping types to values is simple, but mapping values back to types is not simple at all, unless you use black magic like STMP to conveniently map values back to types. But, if static reflection is introduced in the future, this bidirectional mapping between types and values will be very simple. In that case, although it won\u0026rsquo;t directly support treating types as values, it will be pretty close. However, there\u0026rsquo;s still a long way to go, and when it will be added to the standard is still unknown. If you\u0026rsquo;re interested in static reflection, you can read Analysis of the C++26 Static Reflection Proposal.\nComptime Variable Besides the necessity of using template metaprogramming for type computations as mentioned above, if you need to instantiate templates while performing compile-time computations, you also have to use template metaprogramming.\nconsteval auto test(std::size_t length) { return std::array\u0026lt;std::size_t, length\u0026gt;{}; // error length is not constant expression } The error means that length is not a compile-time constant; it\u0026rsquo;s generally considered a compile-time variable. This is quite annoying. Consider the following requirement: we want to implement a completely type-safe format function. That is, based on the content of the first constant string, constrain the number of subsequent function parameters. For example, if it\u0026rsquo;s \u0026quot;{}\u0026quot;, the format function should have 1 parameter.\nconsteval auto count(std::string_view fmt) { std::size_t num = 0; for(auto i = 0; i \u0026lt; fmt.length(); i++) { if(fmt[i] == \u0026#39;{\u0026#39; \u0026amp;\u0026amp; i + 1 \u0026lt; fmt.length()) { if(fmt[i + 1] == \u0026#39;}\u0026#39;) { num += 1; } } } return num; } template \u0026lt;typename... Args\u0026gt; constexpr auto format(std::string_view fmt, Args\u0026amp;\u0026amp;... args) requires (sizeof...(Args) == count(fmt)) { /* ... */ } In fact, we have no way to guarantee that a function parameter is a compile-time constant, so the code above cannot compile. To have a compile-time constant, this content must be put into template parameters, for example, the above function might eventually be modified to format\u0026lt;\u0026quot;{}\u0026quot;\u0026gt;(1). Although it\u0026rsquo;s only a formal difference, this undoubtedly creates difficulties for the user. From this perspective, it\u0026rsquo;s not hard to understand why things like std::make_index_sequence are so prevalent. To truly have compile-time variables that can be template parameters, it can also be achieved through black magic like STMP, but as mentioned earlier, it\u0026rsquo;s difficult to actually use it in everyday programming.\nType is Value It\u0026rsquo;s worth mentioning that there\u0026rsquo;s a relatively new language called Zig. It solves the problems mentioned above, supporting not only compile-time variables but also treating types as first-class citizens. Thanks to Zig\u0026rsquo;s unique comptime mechanism, variables or code blocks marked with it are executed at compile time. This allows us to write code like this:\nconst std = @import(\u0026#34;std\u0026#34;); fn is_subsequence_of(comptime sub: anytype, comptime super: anytype) bool { comptime { var subIndex = 0; var superIndex = 0; while(superIndex \u0026lt; super.len and subIndex \u0026lt; sub.len) : (superIndex += 1) { if(sub[subIndex] == super[superIndex]) { subIndex += 1; } } return subIndex == sub.len; } } pub fn main() !void { comptime var sub = [_] type { i32, f32, i64 }; comptime var super = [_] type { i32, f32, i64, i32, f32, i64 }; std.debug.print(\u0026#34;{}\\n\u0026#34;, .{comptime is_subsequence_of(sub, super)}); comptime var sub2 = [_] type { i32, f32, bool, i64 }; comptime var super2 = [_] type { i32, f32, i64, i32, f32 }; std.debug.print(\u0026#34;{}\\n\u0026#34;, .{comptime is_subsequence_of(sub2, super2)}); } This is the code we\u0026rsquo;ve dreamed of writing; it\u0026rsquo;s truly elegant! In terms of type computation, Zig completely outperforms current C++. Interested readers can check out the Zig official website, but in other areas besides type computation, such as generics and code generation, Zig actually doesn\u0026rsquo;t do as well. This is not the focus of this article, so I won\u0026rsquo;t discuss it.\nConclusion It can be seen that templates initially took on too many roles, and their usage was not what was originally intended during their design; they were used as tricks to compensate for the language\u0026rsquo;s lack of expressive power. With the continuous development of C++, these additional roles have gradually been replaced by simpler, more direct, and easier-to-understand syntax. Type constraints are handled by concept and requires, compile-time computation by constexpr, and type manipulation by future static reflection. Templates are gradually returning to their original form, responsible for code generation. Those obscure and difficult-to-understand workarounds are also gradually being phased out, which is a good sign, although we often still have to deal with legacy code. But at least we know that the future will be better!\n","permalink":"https://www.ykiko.me/en/articles/655902377/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eThe concept of templates in C++ has existed for over twenty years. As one of the language\u0026rsquo;s most important constructs, there is no shortage of related discussion. Unfortunately, truly in-depth and valuable discussions are rare — especially those that examine the feature from multiple perspectives. Many articles on templates tend to entangle the topic with various syntactic details, easily leaving readers with a hazy impression. Similar things happen elsewhere: introductions to coroutines often mix them with all kinds of I/O concerns, and discussions of reflection seem confined to reflection in Java or C#. This isn\u0026rsquo;t unreasonable, but it often leaves readers unable to grasp the essence. After consuming a lot of content, one still can\u0026rsquo;t get to the heart of the matter, and it becomes easy to conflate different concepts.\u003c/p\u003e","title":"Seeing Through the Fog: A True Understanding of C++ Templates"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nIn the previous article, we gained a preliminary understanding of the principles of STMP and used it to implement a simple compile-time counter. However, its power extends far beyond that. This article will discuss some advanced applications based on STMP.\ntype \u0026lt;=\u0026gt; value In C++, the need for computations on types has always existed, for example:\nstd::variant allows duplicate template parameters, but this requires constructing it with in_place_index, which is cumbersome. We can deduplicate the type list before using variant to solve this problem. It\u0026rsquo;s necessary to sort variant type lists. After sorting, identical types, such as std::variant\u0026lt;int, double\u0026gt; and std::variant\u0026lt;double, int\u0026gt;, can share the same code, reducing binary bloat. Get a type from a type list based on a given index. Map function parameters in a reordered way, often used for automatic cross-language binding generation. And so on, I won\u0026rsquo;t list them all here. However, in C++, types are not first-class citizens and can only be passed as template parameters. To perform computations on types, we often have to resort to obscure template metaprogramming. It would be great if types could be passed to constexpr functions for computation just like values, making type computations much simpler. Direct passing is certainly impossible. Consider establishing a one-to-one mapping between types and values: map types to values before computation, and then map values back to types after computation. This can also fulfill our requirements.\ntype -\u0026gt; value First, consider mapping types to values:\nstruct identity { int size; }; using meta_value = const identity*; template \u0026lt;typename T\u0026gt; struct storage { constexpr inline static identity value = {sizeof(T)}; }; template \u0026lt;typename T\u0026gt; consteval meta_value value_of() { return \u0026amp;storage\u0026lt;T\u0026gt;::value; } By leveraging the characteristic that static variable addresses of different template instantiations are also different, we can easily map types to unique values (addresses).\nvalue -\u0026gt; type How do we map values back to types? Consider using naive template specialization:\ntemplate \u0026lt;meta_value value\u0026gt; struct type_of; template \u0026lt;\u0026gt; struct type_of\u0026lt;value_of\u0026lt;int\u0026gt;()\u0026gt; { using type = int; }; // ... This certainly works, but it requires us to specialize all types we intend to use beforehand, which is impractical for most programs. Is there a way to add this specialization at evaluation time? The answer is the friend injection we mentioned in the previous article.\ntemplate \u0026lt;typename T\u0026gt; struct self { using type = T; }; template \u0026lt;meta_value value\u0026gt; struct reader { friend consteval auto to_type(reader); }; template \u0026lt;meta_value value, typename T\u0026gt; struct setter { friend consteval auto to_type(reader\u0026lt;value\u0026gt;) { return self\u0026lt;T\u0026gt;{}; } }; Then, we just need to instantiate a setter simultaneously with value_of to complete the registration:\ntemplate \u0026lt;typename T\u0026gt; consteval meta_value value_of() { constexpr auto value = \u0026amp;storage\u0026lt;T\u0026gt;::value; setter\u0026lt;value, T\u0026gt; setter; return value; } Finally, type_of can be implemented by directly reading the registered result through reader:\ntemplate \u0026lt;meta_value value\u0026gt; using type_of = typename decltype(to_type(reader\u0026lt;value\u0026gt;{}))::type; sort types! Without further ado, let\u0026rsquo;s try to sort a type_list using std::sort:\n#include \u0026lt;array\u0026gt; #include \u0026lt;algorithm\u0026gt; template \u0026lt;typename... Ts\u0026gt; struct type_list {}; template \u0026lt;std::array types, typename = std::make_index_sequence\u0026lt;types.size()\u0026gt;\u0026gt; struct array_to_list; template \u0026lt;std::array types, std::size_t... Is\u0026gt; struct array_to_list\u0026lt;types, std::index_sequence\u0026lt;Is...\u0026gt;\u0026gt; { using result = type_list\u0026lt;type_of\u0026lt;types[Is]\u0026gt;...\u0026gt;; }; template \u0026lt;typename List\u0026gt; struct sort_list; template \u0026lt;typename... Ts\u0026gt; struct sort_list\u0026lt;type_list\u0026lt;Ts...\u0026gt;\u0026gt; { constexpr inline static std::array sorted_types = [] { std::array types{value_of\u0026lt;Ts\u0026gt;()...}; std::ranges::sort(types, [](auto lhs, auto rhs) { return lhs-\u0026gt;size \u0026lt; rhs-\u0026gt;size; }); return types; }(); using result = typename array_to_list\u0026lt;sorted_types\u0026gt;::result; }; type_list is a simple type container. array_to_list is used to map types from std::array back to type_list. sort_list is the specific implementation of sorting. The process is to first map all types into a std::array, then sort this array using std::ranges::sort, and finally map the sorted std::array back to type_list.\nLet\u0026rsquo;s test it:\nusing list = type_list\u0026lt;int, char, int, double, char, char, double\u0026gt;; using sorted = typename sort_list\u0026lt;list\u0026gt;::result; using expected = type_list\u0026lt;char, char, char, int, int, double, double\u0026gt;; static_assert(std::is_same_v\u0026lt;sorted, expected\u0026gt;); All three major compilers compile this successfully with C++20! The code is available on Compiler Explorer. To prevent link rot, a copy is also available on Github.\nIt\u0026rsquo;s worth noting that this bidirectional mapping between types and values has become a built-in language feature in Reflection for C++26. We no longer need to use clever tricks like friend injection; we can directly use the ^^ and [: :] operators to achieve the mapping. See Reflection for C++26!!! for details.\nthe true any std::any is often used for type erasure, allowing completely different types to be stored in the same container. However, erasure is easy, but restoration is difficult, especially when you want to print the object stored in any; you have to cast each type individually. Is there a possibility of writing a \u0026ldquo;true\u0026rdquo; any type? One that doesn\u0026rsquo;t require us to manually cast and can directly call member functions corresponding to the type it holds?\nFor a single compilation unit, this is entirely possible, because the set of types constructed into any within a single compilation unit is determined at compile time. We only need to record all instantiated types and then automatically try each type using template metaprogramming.\ntype register First, let\u0026rsquo;s consider how to register types:\ntemplate \u0026lt;typename T\u0026gt; struct self { using type = T; }; template \u0026lt;int N\u0026gt; struct reader { friend consteval auto at(reader); }; template \u0026lt;int N, typename T\u0026gt; struct setter { friend consteval auto at(reader\u0026lt;N\u0026gt;) { return self\u0026lt;T\u0026gt;{}; } }; template \u0026lt;typename T, int N = 0\u0026gt; consteval int lookup() { constexpr bool exist = requires { at(reader\u0026lt;N\u0026gt;{}); }; if constexpr(exist) { using type = decltype(at(reader\u0026lt;N\u0026gt;{}))::type; if constexpr(std::is_same_v\u0026lt;T, type\u0026gt;) { return N; } else { return lookup\u0026lt;T, N + 1\u0026gt;(); } } else { setter\u0026lt;N, T\u0026gt; setter{}; return N; } } template \u0026lt;int N = 0, auto seed = [] {}\u0026gt; consteval int count() { constexpr bool exist = requires { at(reader\u0026lt;N\u0026gt;{}); }; if constexpr(exist) { return count\u0026lt;N + 1, seed\u0026gt;(); } else { return N; } } We still use setter to register types. lookup is used to find the index of a type in the type set. The principle is to iterate through the set, compare each type with is_same_v, and return the corresponding index if found. If not found by the end, a new type is registered. count is used to calculate the size of the type set.\nany type Next, we define a simple any type and a make_any function to construct any objects:\nstruct any { void* data; void (*destructor)(void*); std::size_t index; constexpr any(void* data, void (*destructor)(void*), std::size_t index) noexcept : data(data), destructor(destructor), index(index) {} constexpr any(any\u0026amp;\u0026amp; other) noexcept : data(other.data), destructor(other.destructor), index(other.index) { other.data = nullptr; other.destructor = nullptr; } constexpr ~any() { if(data \u0026amp;\u0026amp; destructor) { destructor(data); } } }; template \u0026lt;typename T, typename Decay = std::decay_t\u0026lt;T\u0026gt;\u0026gt; auto make_any(T\u0026amp;\u0026amp; value) { constexpr int index = lookup\u0026lt;Decay\u0026gt;(); auto data = new Decay(std::forward\u0026lt;T\u0026gt;(value)); auto destructor = [](void* data) { delete static_cast\u0026lt;Decay*\u0026gt;(data); }; return any{data, destructor, index}; } Why write a separate make_any instead of directly writing a template constructor? This is because after my actual attempts, I found that the three major compilers instantiate template constructors in different and sometimes strange locations, leading to different evaluation results. However, for ordinary template functions, the instantiation locations are consistent, so I wrote it as a separate function.\nvisit it! Here comes the highlight: we can implement a function similar to std::visit to access any objects. It takes a callback function, then iterates through the any object\u0026rsquo;s type set. If it finds the corresponding type, it converts any to that type and then calls the callback function.\ntemplate \u0026lt;typename Callback, auto seed = [] {}\u0026gt; constexpr void visit(any\u0026amp; any, Callback\u0026amp;\u0026amp; callback) { constexpr std::size_t n = count\u0026lt;0, seed\u0026gt;(); [\u0026amp;]\u0026lt;std::size_t... Is\u0026gt;(std::index_sequence\u0026lt;Is...\u0026gt;) { auto for_each = [\u0026amp;]\u0026lt;std::size_t I\u0026gt;() { if(any.index == I) { callback(*static_cast\u0026lt;type_at\u0026lt;I\u0026gt;*\u0026gt;(any.data)); return true; } return false; }; return (for_each.template operator()\u0026lt;Is\u0026gt;() || ...); }(std::make_index_sequence\u0026lt;n\u0026gt;{}); } Now let\u0026rsquo;s try it:\nstruct String { std::string value; friend std::ostream\u0026amp; operator\u0026lt;\u0026lt; (std::ostream\u0026amp; os, const String\u0026amp; string) { return os \u0026lt;\u0026lt; string.value; } }; int main() { std::vector\u0026lt;any\u0026gt; vec; vec.push_back(make_any(42)); vec.push_back(make_any(std::string{\u0026#34;Hello world\u0026#34;})); vec.push_back(make_any(3.14)); for(auto\u0026amp; any: vec) { visit(any, [](auto\u0026amp; value) { std::cout \u0026lt;\u0026lt; value \u0026lt;\u0026lt; \u0026#39; \u0026#39;; }); // =\u0026gt; 42 Hello world 3.14 } std::cout \u0026lt;\u0026lt; \u0026#34;\\n-----------------------------------------------------\\n\u0026#34;; vec.push_back(make_any(String{\u0026#34;\\nPowerful Stateful Template Metaprogramming!!!\u0026#34;})); for(auto\u0026amp; any: vec) { visit(any, [](auto\u0026amp; value) { std::cout \u0026lt;\u0026lt; value \u0026lt;\u0026lt; \u0026#39; \u0026#39;; }); // =\u0026gt; 42 Hello world 3.14 // =\u0026gt; Powerful Stateful Template Metaprogramming!!! } return 0; } All three major compilers output the results as we expected! The code is also available on Compiler Explorer and Github.\nconclusion These two articles on STMP fulfill a long-standing wish of mine. Before this, I had been thinking about how to implement a \u0026ldquo;true\u0026rdquo; any type, like the code above, without requiring the user to register types beforehand. I tried many methods, but none succeeded. However, the emergence of STMP gave me hope. After realizing the heights it could reach, I immediately stayed up all night to write the articles and examples.\nOf course, it\u0026rsquo;s not recommended to use this technique in actual projects. Because this kind of code relies heavily on the instantiation location of templates, it can easily lead to ODR violations, and repeated instantiations will significantly increase compilation time. For such stateful code requirements, we can often transform them into stateless code, but pure manual implementation might be extremely laborious. It\u0026rsquo;s more recommended to use code generators for additional code generation to fulfill this requirement. For example, we could use libclang to collect all any instantiation information across compilation units and then generate a corresponding table.\n","permalink":"https://www.ykiko.me/en/articles/646812253/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eIn the previous \u003ca href=\"https://www.ykiko.me/en/articles/646752343\"\u003earticle\u003c/a\u003e, we gained a preliminary understanding of the principles of STMP and used it to implement a simple compile-time counter. However, its power extends far beyond that. This article will discuss some advanced applications based on STMP.\u003c/p\u003e","title":"C++ Forbidden Black Magic: STMP (Part 2)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nAs is well known, traditional C++ constant expression evaluation neither depends on nor changes the global state of the program. For any identical input, its output is always the same, and it is considered purely functional. Template Meta Programming, as a subset of constant evaluation, should also adhere to this rule.\nBut is this really the case? Without violating the C++ standard, could the following code compile?\nconstexpr auto a = value(); constexpr auto b = value(); static_assert(a != b); Could a compile-time counter like the following be implemented?\nconstexpr auto a = next(); constexpr auto b = next(); constexpr auto c = next(); static_assert(a == 0 \u0026amp;\u0026amp; b == 1 \u0026amp;\u0026amp; c == 2); Each time a constant expression is evaluated, a different result is obtained, indicating that the evaluation has changed the global state. This kind of stateful meta-programming is called state meta-programming. If it is further associated with templates, it is called Stateful Template Meta Programming (STMP).\nIn fact, with the help of some compiler-built-in macros, we can achieve such an effect, for example:\nconstexpr auto a = __COUNTER__; constexpr auto b = __COUNTER__; constexpr auto c = __COUNTER__; static_assert(a == 0 \u0026amp;\u0026amp; b == 1 \u0026amp;\u0026amp; c == 2); During preprocessing, the compiler increments the replacement result of the __COUNTER__ macro. If you preprocess the source file, you will find that the source file becomes like this:\nconstexpr auto a = 0; constexpr auto b = 1; constexpr auto c = 2; static_assert(a == 0 \u0026amp;\u0026amp; b == 1 \u0026amp;\u0026amp; c == 2); This is still quite different from the effect we want to achieve, as preprocessing does not involve the semantic part of a C++ program. Moreover, such a counter is globally unique, and we cannot create many counters. Is there another way?\nThe answer is yes. Unbelievable as it may seem, relevant discussions actually existed as early as 2015, and there are also related articles on Zhihu. However, that article was published in 2017 and used C++14; much of its content is now outdated. Moreover, with C++26 standards now being drafted, many things need to be re-discussed. The version we will choose is C++20.\nIf you are only interested in the code, I have placed the relevant code on Compiler Explorer. All three major compilers compile it successfully with C++20, and you can directly see the compiler\u0026rsquo;s output. To prevent link rot, it\u0026rsquo;s also available on GitHub. If you want to understand its principles, please continue reading. The C++ standard is very complex, and the author cannot guarantee that the content of this article is entirely correct. If there are any errors, feel free to discuss them in the comments section.\nAccording to CWG 2118, the related code is considered ill-formed. However, the later introduction of C++26 static reflection, whose proposal itself provides similar counter examples, seems to affirm this approach. Overall, I believe this is an inherent flaw caused by C++ distinguishing declaration order. If, like many modern programming languages, it performed lazy parsing, didn\u0026rsquo;t distinguish declaration order, and used a two-pass scan, perhaps this compile-time mutable state could truly be eliminated. If you intend to try this in your code, proceed with extreme caution, as STMP can easily lead to ODR violations.\nobservable state Before we can change it, we must first be able to observe changes in the global state at compile time. Because C++ supports forward declaration, a struct is considered an incomplete type before its definition is seen, meaning the completeness of a class differs in different contexts.\nThe C++ standard stipulates that sizeof can only be used on complete types (after all, an incomplete type has no definition and its size cannot be calculated). Using it on an incomplete type will result in a compilation error, and this error is not a hard error, so we can use SFINAE or requires to catch this error. Thus, we can detect the completeness of a class in the following way:\ntemplate \u0026lt;typename T\u0026gt; constexpr inline bool is_complete_v = requires { sizeof(T); }; Some readers might ask, why not use concepts in C++20? Using concepts here would lead to some strange effects, caused by the wording in the standard regarding atomic constraints. We won\u0026rsquo;t delve into it deeply, but interested readers can try it themselves.\nLet\u0026rsquo;s try using it to observe type completeness:\nstruct X; static_assert(!is_complete_v\u0026lt;X\u0026gt;); struct X {}; static_assert(is_complete_v\u0026lt;X\u0026gt;); In fact, the code above will result in a compilation error; the second static assertion fails. That\u0026rsquo;s strange, what\u0026rsquo;s going on? Let\u0026rsquo;s try them separately:\n// first time struct X; static_assert(!is_complete_v\u0026lt;X\u0026gt;); struct X {}; // second time struct X; struct X {}; static_assert(is_complete_v\u0026lt;X\u0026gt;); Trying them separately works, but together it doesn\u0026rsquo;t. Why is this? This is because the compiler caches the result of the first template instantiation, and subsequent encounters with the same template will directly use that cached result. In the initial example, the second is_complete_v\u0026lt;X\u0026gt; still used the result of the first template instantiation, so it still evaluated to false, causing compilation to fail. Is the compiler\u0026rsquo;s behavior reasonable? Yes, it is. Because templates can ultimately produce symbols with external linkage, if two instantiations yield different results, which one should be chosen during linking? However, this does affect our ability to observe compile-time state. How can we solve this? The answer is to add a template parameter as a seed, and provide a different parameter each time it\u0026rsquo;s evaluated, forcing the compiler to instantiate a new template:\ntemplate \u0026lt;typename T, int seed = 0\u0026gt; constexpr inline bool is_complete_v = requires { sizeof(T); }; struct X; static_assert(!is_complete_v\u0026lt;X, 0\u0026gt;); struct X {}; static_assert(is_complete_v\u0026lt;X, 1\u0026gt;); Manually entering a different parameter each time is cumbersome. Is there a way to automatically fill it in?\nNote that if a lambda expression is used as a default Non-Type Template Parameter (NTTP), the template will be a different type each time it is instantiated:\n#include \u0026lt;iostream\u0026gt; template \u0026lt;auto seed = [] {}\u0026gt; void test() { std::cout \u0026lt;\u0026lt; typeid(seed).name() \u0026lt;\u0026lt; std::endl; } int main() { test(); // class \u0026lt;lambda_1\u0026gt; test(); // class \u0026lt;lambda_2\u0026gt; test(); // class \u0026lt;lambda_3\u0026gt; return 0; } This feature perfectly meets our needs, as it can automatically fill in a different seed each time. Thus, the final is_complete_v implementation is as follows:\ntemplate \u0026lt;typename T, auto seed = [] {}\u0026gt; constexpr inline bool is_complete_v = requires { sizeof(T); }; Let\u0026rsquo;s try using it again to observe type completeness:\nstruct X; static_assert(!is_complete_v\u0026lt;X\u0026gt;); struct X {}; static_assert(is_complete_v\u0026lt;X\u0026gt;); Compilation successful! At this point, we have successfully observed changes in the global state at compile time.\nmodifiable state After being able to observe state changes, we now need to consider whether we can actively change the state through code. Unfortunately, for most declarations, the only way to change their state is by modifying the source code to add a definition; there are no other means to achieve this effect.\nThe only exception is friend functions. But before considering how friend functions work, let\u0026rsquo;s first consider how to observe whether a function has been defined. For most functions, this is not observable, given that a function might be defined in another compilation unit, and calling a function does not require its definition to be visible.\nThe exception is functions with an auto return type; if their function definition is not visible, the return type cannot be deduced, and thus the function cannot be called. The following code can detect whether the foo function is defined:\ntemplate \u0026lt;auto seed = [] {}\u0026gt; constexpr inline bool is_complete_v = requires { foo(seed); }; auto foo(auto); static_assert(!is_complete_v\u0026lt;\u0026gt;); auto foo(auto value) { return sizeof(value); } static_assert(is_complete_v\u0026lt;\u0026gt;); Next, let\u0026rsquo;s discuss how to change the global state using friend functions.\nThe biggest difference between friend functions and ordinary functions is that the function definition and function declaration are not required to be in the same scope. Consider the following example:\nstruct X { friend auto foo(X); }; struct Y { friend auto foo(X) { return 42; } }; int x = foo(X{}); The code above compiles successfully with all three major compilers and fully conforms to the C++ standard. This gives us room to maneuver: we can instantiate a class template and simultaneously instantiate its internally defined friend function, thereby adding a definition to a function declaration located elsewhere. This technique is also known as friend injection.\nauto foo(auto); template \u0026lt;typename T\u0026gt; struct X { friend auto foo(auto value) { return sizeof(value); } }; static_assert(!is_complete_v\u0026lt;\u0026gt;); // #1 X\u0026lt;void\u0026gt; x; // #2 static_assert(is_complete_v\u0026lt;\u0026gt;); // #3 Note that at #1, template X has not been instantiated, so the foo function is not yet defined, and is_complete_v returns false. At #2, we instantiate an X\u0026lt;void\u0026gt;, which in turn causes the foo function within X to be instantiated, adding a definition for foo. Consequently, at #3, is_complete_v returns true. Of course, a function can have at most one definition; if you try to instantiate another X\u0026lt;int\u0026gt;, the compiler will report an error that foo is redefined.\nconstant switch Combining the techniques mentioned above, we can easily instantiate a compile-time switch:\nauto flag(auto); template \u0026lt;auto value\u0026gt; struct setter { friend auto flag(auto) {} }; template \u0026lt;auto N = 0, auto seed = [] {}\u0026gt; consteval auto value() { constexpr bool exist = requires { flag(N); }; if constexpr(!exist) { setter\u0026lt;exist\u0026gt; setter; } return exist; } int main() { constexpr auto a = value(); constexpr auto b = value(); static_assert(a != b); } Its principle is simple. The first time, setter has not been instantiated, so the flag function is not defined. Thus, exist evaluates to false, entering the if constexpr branch, instantiating a setter\u0026lt;false\u0026gt;, and returning false. The second time, setter has been instantiated, and the flag function is defined. Thus, exist evaluates to true, and true is returned directly.\nNote that the type of N here must be auto, not std::size_t. Only then will flag(N) be a dependent name, allowing requires to check the validity of the expression. Due to two-phase lookup for templates, if written as flag(0), it would be looked up in the first phase, fail to be called, and produce a hard error, leading to a compilation error.\nconstant counter Furthermore, we can directly implement a compile-time counter:\ntemplate \u0026lt;int N\u0026gt; struct reader { friend auto flag(reader); }; template \u0026lt;int N\u0026gt; struct setter { friend auto flag(reader\u0026lt;N\u0026gt;) {} }; template \u0026lt;int N = 0, auto seed = [] {}\u0026gt; consteval auto next() { constexpr bool exist = requires { flag(reader\u0026lt;N\u0026gt;{}); }; if constexpr(!exist) { setter\u0026lt;N\u0026gt; setter; return N; } else { return next\u0026lt;N + 1\u0026gt;(); } } int main() { constexpr auto a = next(); constexpr auto b = next(); constexpr auto c = next(); static_assert(a == 0 \u0026amp;\u0026amp; b == 1 \u0026amp;\u0026amp; c == 2); } Its logic is as follows: starting with N at 0, it checks if flag(reader\u0026lt;N\u0026gt;) is defined. If it\u0026rsquo;s not defined, it instantiates a setter\u0026lt;N\u0026gt;, which means adding a definition for flag(reader\u0026lt;N\u0026gt;), and then returns N. Otherwise, it recursively calls next\u0026lt;N + 1\u0026gt;() to check the N+1 case. Therefore, this counter actually records the number of setter instantiations.\n§: access private First, it\u0026rsquo;s important to clarify a point: class access specifiers private, public, protected only apply to compile-time checks. If there\u0026rsquo;s a way to bypass this compile-time check, then any member of the class can be legally accessed.\nSo, does such a method exist? Yes: template explicit instantiation ignores class scope access permissions:\nThe C++11/14 standards state the following in note 14.7.2/12 [temp.explicit]: The usual access checking rules do not apply to names used to specify explicit instantiations. [ Note: In particular, the template arguments and names used in the function declarator (including parameter types, return types and exception speciﬁcations) may be private types or objects which would normally not be accessible and the template may be a member template or member function which would not normally be accessible. — end note ]\nThis means that during explicit instantiation of a template, we can directly access private members of a class.\n#include \u0026lt;iostream\u0026gt; class Bank { double money = 999\u0026#39;999\u0026#39;999\u0026#39;999; public: void check() const { std::cout \u0026lt;\u0026lt; money \u0026lt;\u0026lt; std::endl; } }; template \u0026lt;auto mp\u0026gt; struct Thief { friend double\u0026amp; steal(Bank\u0026amp; bank) { return bank.*mp; } }; double\u0026amp; steal(Bank\u0026amp; bank); // #1 template struct Thief\u0026lt;\u0026amp;Bank::money\u0026gt;; // #2 int main() { Bank bank; steal(bank) = 100; // #3 bank.check(); // 100 return 0; } The syntax at #2 is template explicit instantiation, allowing us to directly access the private member money of Bank. By using \u0026amp;Bank::money, we obtain the member pointer corresponding to that member. Concurrently, through explicit template instantiation, a definition is added to the steal function at #1, allowing us to directly call this function at #3 and obtain a reference to money. Finally, 100 is successfully output.\n","permalink":"https://www.ykiko.me/en/articles/646752343/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003eAs is well known, traditional C++ constant expression evaluation neither depends on nor changes the global state of the program. For any identical input, its output is always the same, and it is considered \u003cstrong\u003epurely functional\u003c/strong\u003e. \u003cstrong\u003eTemplate Meta Programming\u003c/strong\u003e, as a subset of constant evaluation, should also adhere to this rule.\u003c/p\u003e","title":"C++ Forbidden Black Magic: STMP (Part 1)"},{"content":" This article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\nstd::variant was added to the standard library in C++17. This article will discuss the background of its inclusion and some issues related to its usage.\nsum type First, let\u0026rsquo;s discuss sum types, also known as tagged unions. A sum type is a type that can hold a value of only one of several possible types.\nFor example, if we have the following two types:\nstruct Circle { double radius; }; struct Rectangle { double width; double height; }; Then a sum type of Circle and Rectangle, let\u0026rsquo;s call it Shape, can be implemented in C as follows:\nstruct Shape { enum Type { Circle, Rectangle } type; union { struct Circle circle; struct Rectangle rectangle; }; }; This uses a feature called anonymous union, which is equivalent to declaring a union member of the corresponding type and injecting its field names into the current scope.\nThis way, we can assign different types of values to a Shape variable, while also updating the type to record the type of the assigned value. When accessing, we can then use the type to determine which type to access it as. For example:\nvoid foo(Shape shape) { if(shape.type == Shape::Circle) { Circle c = shape.circle; printf(\u0026#34;circle: radius is %f\\n\u0026#34;, c.radius); } else if(shape.type == Shape::Rectangle) { Rectangle r = shape.rectangle; printf(\u0026#34;rectangle: width is %f, height is %f\\n\u0026#34;, r.width, r.height); } } int main() { Shape shape; shape.type = Shape::Circle; shape.circle.radius = 1.0; foo(shape); shape.type = Shape::Rectangle; shape.rectangle.width = 1.0; shape.rectangle.height = 2.0; foo(shape); } not trivial However, things are not so simple in C++. Consider the following code:\nstruct Settings { enum class Type { int_, double_, string } type; union { int i; double d; std::string s; }; }; int main(){ Settings settings; settings.type = Settings::Type::String; settings.s = std::string(\u0026#34;hello\u0026#34;); } This code actually won\u0026rsquo;t compile. The compiler will report an error: use of deleted function Settings::Settings(). Why is the constructor for Settings deleted? This is because std::string has a non-trivial constructor. When a union contains members of non-trivial types, the compiler cannot correctly generate constructors and destructors (it doesn\u0026rsquo;t know which member you intend to initialize or destroy). For more details, you can refer to the cppreference documentation on union.\nHow to solve this? We need to define the union\u0026rsquo;s constructor and destructor ourselves. For example, we can define an empty constructor and destructor for it, meaning they do nothing:\nunion Value { int i; double d; std::string s; Value() {} ~Value() {} }; struct Settings { enum class Type { int_, double_, string } type; Value value; }; When using it, we are required to explicitly call the constructor to initialize a member using placement new. Similarly, we must manually call the destructor to destroy a member.\nint main(){ Settings settings; settings.type = Settings::Type::string; new (\u0026amp;settings.value.s) std::string(\u0026#34;hello\u0026#34;); std::cout \u0026lt;\u0026lt; settings.value.s \u0026lt;\u0026lt; std::endl; settings.value.s.~basic_string(); settings.type = Settings::Type::int_; new (\u0026amp;settings.value.i) int(1); std::cout \u0026lt;\u0026lt; settings.value.i \u0026lt;\u0026lt; std::endl; settings.value.i.~int(); } Note that you cannot directly assign here. This is because an assignment operation actually calls the member function operator=, and member functions can only be called on objects that have already been initialized.\nFrom the code above, it\u0026rsquo;s clear that directly using a union to represent a sum type in C++ is very cumbersome. Not only do you need to update type promptly, but you also need to correctly call constructors and destructors, and pay attention to the timing of assignments. Forgetting any of these steps can lead to undefined behavior, which is a major headache. Fortunately, C++17 provides std::variant to solve this problem.\nstd::variant Let\u0026rsquo;s look directly at the code:\n#include \u0026lt;string\u0026gt; #include \u0026lt;variant\u0026gt; using Settings = std::variant\u0026lt;int, bool, std::string\u0026gt;; int main() { Settings s = {1}; s = true; s = std::string(\u0026#34;hello\u0026#34;); } The code above is completely well-defined. Through template metaprogramming, variant handles object construction and destruction at the appropriate times.\nIt has an index member function that can retrieve the index of the current active type within the list of types you provided.\nSettings s; s = std::string(\u0026#34;hello\u0026#34;); // s.index() =\u0026gt; 2 s = 1; // s.index() =\u0026gt; 0 s = true; // s.index() =\u0026gt; 1 You can use std::get to retrieve the corresponding value from the variant.\nSettings s; s = std::string(\u0026#34;hello\u0026#34;); std::cout \u0026lt;\u0026lt; std::get\u0026lt;std::string\u0026gt;(s); // =\u0026gt; hello Some might wonder, \u0026ldquo;If I already know it stores a string, why would I use std::variant?\u0026rdquo; Notice that get also has an overload where the template parameter is an integer. Can that solve this problem?\nstd::cout \u0026lt;\u0026lt; std::get\u0026lt;2\u0026gt;(s); // =\u0026gt; hello Oh, I see. Since I can get it directly using index, why not just write it like this?\nstd::cout \u0026lt;\u0026lt; std::get\u0026lt;s.index()\u0026gt;(s); Unfortunately, while the idea is good, this won\u0026rsquo;t work. Template parameters must be compile-time constants, and variant, as a means of type erasure, will have its index value determined at runtime. What to do then? To convert dynamic to static, you have to dispatch one by one. For example:\nif (s.index() == 0){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;0\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } else if (s.index() == 1){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;1\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } else if (s.index() == 2){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;2\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } Using numbers for readability is quite poor. We can use std::holds_alternative to check based on type:\nif (std::holds_alternative\u0026lt;std::string\u0026gt;(s)){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;std::string\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } else if (std::holds_alternative\u0026lt;int\u0026gt;(s)){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;int\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } else if (std::holds_alternative\u0026lt;bool\u0026gt;(s)){ std::cout \u0026lt;\u0026lt; std::get\u0026lt;bool\u0026gt;(s) \u0026lt;\u0026lt; std::endl; } While it works, there\u0026rsquo;s too much redundant code. Is there a better way to operate on the value inside a variant?\nstd::visit The name visit actually comes from the visitor design pattern. Using it, we can write code like this:\nSettings s; s = std::string(\u0026#34;hello\u0026#34;); auto callback = [](auto\u0026amp;\u0026amp; value){ std::cout \u0026lt;\u0026lt; value \u0026lt;\u0026lt; std::endl; }; std::visit(callback, s); // =\u0026gt; hello settings = 1; std::visit(callback, s); // =\u0026gt; 1 Isn\u0026rsquo;t that amazing? You just need to pass a callback, and you can directly access the value inside the variant without any manual dispatch. There\u0026rsquo;s an iron rule in software engineering: complexity doesn\u0026rsquo;t disappear, it just moves around, and this is no exception. In fact, visit internally instantiates a function for each type within the variant based on your callback, pre-builds a function table, and then at runtime, directly calls the function from that table based on the index.\nMore often, however, we want to do different things based on different types. This can be conveniently achieved through pattern matching in other languages:\nHaskell:\ndata Settings = IntValue Int | BoolValue Bool | StringValue String deriving (Show, Eq) match :: Settings -\u0026gt; IO () match (IntValue x) = putStrLn $ \u0026#34;Int: \u0026#34; ++ show (x + 1) match (BoolValue x) = putStrLn $ \u0026#34;Bool: \u0026#34; ++ show (not x) match (StringValue x) = putStrLn $ \u0026#34;String: \u0026#34; ++ (x ++ \u0026#34; \u0026#34;) Rust:\nenum Settings{ Int(i32), Bool(bool), String(String), } fn main(){ let settings = Settings::Int(1); match settings{ Settings::Int(x) =\u0026gt; println!(\u0026#34;Int: {}\u0026#34;, x + 1), Settings::Bool(x) =\u0026gt; println!(\u0026#34;Bool: {}\u0026#34;, !x), Settings::String(x) =\u0026gt; println!(\u0026#34;String: {}\u0026#34;, x + \u0026#34; \u0026#34;), } } Unfortunately, as of C++23, C++ still lacks pattern matching. To achieve an effect similar to the code above in C++, there are currently two ways to simulate it:\nfunction overload:\ntemplate\u0026lt;typename ...Ts\u0026gt; struct Overload : Ts... { using Ts::operator()...; }; template\u0026lt;typename ...Ts\u0026gt; Overload(Ts...) -\u0026gt; Overload\u0026lt;Ts...\u0026gt;; int main() { using Settings = std::variant\u0026lt;int, bool, std::string\u0026gt;; Overload overloads{ [](int x) { std::cout \u0026lt;\u0026lt; \u0026#34;Int: \u0026#34; \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; }, [](bool x) { std::cout \u0026lt;\u0026lt; \u0026#34;Bool: \u0026#34; \u0026lt;\u0026lt; std::boolalpha \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; }, [](std::string x) { std::cout \u0026lt;\u0026lt; \u0026#34;String: \u0026#34; \u0026lt;\u0026lt; x \u0026lt;\u0026lt; std::endl; }, }; Settings settings = 1; std::visit(overloads, settings); } if constexpr:\nint main() { using Settings = std::variant\u0026lt;int, bool, std::string\u0026gt;; auto callback = [](auto\u0026amp;\u0026amp; value) { using type = std::decay_t\u0026lt;decltype(value)\u0026gt;; if constexpr(std::is_same_v\u0026lt;type, int\u0026gt;) { std::cout \u0026lt;\u0026lt; \u0026#34;Int: \u0026#34; \u0026lt;\u0026lt; value + 1 \u0026lt;\u0026lt; std::endl; } else if constexpr(std::is_same_v\u0026lt;type, bool\u0026gt;) { std::cout \u0026lt;\u0026lt; \u0026#34;Bool: \u0026#34; \u0026lt;\u0026lt; !value \u0026lt;\u0026lt; std::endl; } else if constexpr(std::is_same_v\u0026lt;type, std::string\u0026gt;) { std::cout \u0026lt;\u0026lt; \u0026#34;String: \u0026#34; \u0026lt;\u0026lt; value \u0026lt;\u0026lt; std::endl; } }; Settings settings = 1; std::visit(callback, settings); } Both methods are quite awkward. Using templates for such tricks not only slows down compilation but also results in less readable error messages. This also means that the current variant is very difficult to use, lacking accompanying language features to simplify its operations, and is deeply entangled with templates, making it daunting for users.\n","permalink":"https://www.ykiko.me/en/articles/645810896/","summary":"\u003cblockquote\u003e\n\u003cp\u003eThis article was translated by AI using Gemini 2.5 Pro from the original Chinese version. Minor inaccuracies may remain.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003cp\u003e\u003ccode\u003estd::variant\u003c/code\u003e was added to the standard library in C++17. This article will discuss the background of its inclusion and some issues related to its usage.\u003c/p\u003e","title":"std::variant is hard to use!"}]