Portability of newer assembly instructions

I was reading through Arm64 instructions and saw that some instructions (e.g. CTZ) are optional and become mandatory with some extensions (e.g. Armv8.9 for CTZ).
I am now wondering about portability of using such instructions. If I am compiling some C code to use it on my machine, then the compiler could check what my hardware actually supports.
What if I want to pre-compile some binaries for a release of a program? Do compilers usually refrain from using too recent extensions based on some statistical knowledge of CPUs and their implemented instruction sets? Or is there some secret compatibility trick allowing to check support at runtime that I am unaware of?
(For clarification, I know CTZ can be replaced by RBIT and CLZ and is probably not that big of a deal, I am just wondering about a more general case.)

Answer

Compilers use the ISA feature-set you tell them they can assume for the target. e.g. -march=armv8.9-a, or an ISA level plus some specific features like -march=armv8.9-a+sve. For ARM / AArch64 (but not most other ISAs), GCC and Clang have a -mcpu option (e.g. -mcpu=cortex-a720) which implies everything that core has, and to tune for it.

For other ISAs like x86-64, -march takes CPU names. -march=znver4 for example implies all features of Zen 4 and -mtune=znver4. (More recently, there are x86-64 feature levels like -march=x86-64-v3 which is AVX2 + FMA + BMI1/2 + some more obscure stuff that's still widespread, like Haswell had. But not Intel-only things like TSX transactional memory.)

The default target config (if you don't use any options) is often quite old, either the earliest for the ISA (like ARMv8.0 or first-gen x86-64), or for 32-bit x86 for example is usually configured with i686 (Pentium Pro) as the default baseline, not 386, so cmov and stuff is available. (And more recently, distros often configure compilers to use SSE2 by default for 32-bit x86).

If you want the compiler to check your hardware and make a binary that uses everything it has, use Clang or GCC -march=native. That is not the default.

It's also possible to write code that checks features at run-time and dispatches to different versions of a function. e.g. to take advantage of CPUs with different SIMD features, or in your example a scalar loop that's slightly faster with ctz instead of rbit/clz. This has overhead and of course can't inline, so wrapping just clz would defeat the purpose, you need to multiversion a function that has a loop.
This can be done fully manually with an array of function pointers you init at program startup, or with some help from the compiler like GCC ifunc stuff where you use __attribute__((target("whatever"))) on different definitions of the same function.

This answer is primarily about GCC and Clang; other compilers will have different names for their options, but the basics are generally similar.

Portability of newer assembly instructions

Answer

Related Articles

Do I need a local install of Firefox if using Firefox driver in Selenium?

C# updating notifyicon increases memory consumption

SQL Server query returns a date when I select WHERE column = ''

Handling bad date time value transparently as null in select?

source is not working for deserializing POST data in django rest framework

Is there a way to give the app a minimum frame without setting the min frame of the contents?