libc/docs/dev/printf_behavior.rst - toolchain/llvm-project - Git at Google

 .. _printf_behavior:

 ====================================
 Printf Behavior Under All Conditions
 ====================================

 Introduction:
 =============
 On the "defining undefined behavior" page, I said you should write down your
 decisions regarding undefined behavior in your functions. This is that document
 for my printf implementation.

 Unless otherwise specified, the functionality described is aligned with the ISO
 C standard and POSIX standard. If any behavior is not mentioned here, it should
 be assumed to follow the behavior described in those standards.

 The LLVM-libc codebase is under active development, and may change. This
 document was last updated [January 8, 2024] by [michaelrj] and may
 not be accurate after this point.

 The behavior of LLVM-libc's printf is heavily influenced by compile-time flags.
 Make sure to check what flags are defined before filing a bug report. It is also
 not relevant to any other libc implementation of printf, which may or may not
 share the same behavior.

 This document assumes familiarity with the definition of the printf function and
 is intended as a reference, not a replacement for the original standards.

 --------------
 General Flags:
 --------------
 These compile-time flags will change the behavior of LLVM-libc's printf when it
 is compiled. Combinations of flags that are incompatible will be marked.

 LIBC_COPT_STDIO_USE_SYSTEM_FILE
 -------------------------------
 When set, this flag changes fprintf and printf to use the FILE API from the
 system's libc, instead of LLVM-libc's internal FILE API. This is set by default
 when LLVM-libc is built in overlay mode.

 LIBC_COPT_PRINTF_DISABLE_INDEX_MODE
 -----------------------------------
 When set, this flag disables support for the POSIX "%n$" format, hereafter
 referred to as "index mode"; conversions using the index mode format will be
 treated as invalid. This reduces code size.

 LIBC_COPT_PRINTF_INDEX_ARR_LEN
 ------------------------------
 This flag takes a positive integer value, defaulting to 128. This flag
 determines the number of entries the parser's type descriptor array has. This is
 used in index mode to avoid re-parsing the format string to determine types when
 an index lower than the previously specified one is requested. This has no
 effect when index mode is disabled.

 LIBC_COPT_PRINTF_DISABLE_WRITE_INT
 ----------------------------------
 When set, this flag disables support for the C Standard "%n" conversion; any
 "%n" conversion will be treated as invalid. This is set by default to improve
 security.

 LIBC_COPT_PRINTF_DISABLE_FLOAT
 ------------------------------
 When set, this flag disables support for floating point numbers and all their
 conversions (%a, %f, %e, %g); any floating point number conversion will be
 treated as invalid. This reduces code size.

 LIBC_COPT_PRINTF_DISABLE_FIXED_POINT
 ------------------------------------
 When set, this flag disables support for fixed point numbers and all their
 conversions (%r, %k); any fixed point number conversion will be treated as
 invalid. This reduces code size. This has no effect if the current compiler does
 not support fixed point numbers.

 LIBC_COPT_PRINTF_NO_NULLPTR_CHECKS
 ----------------------------------
 When set, this flag disables the nullptr checks in %n and %s.

 LIBC_COPT_PRINTF_CONV_ATLAS
 ---------------------------
 When set, this flag changes the include path for the "converter atlas" which is
 a header that includes all the files containing the conversion functions. This
 is not recommended to be set without careful consideration.

 LIBC_COPT_PRINTF_HEX_LONG_DOUBLE
 --------------------------------
 When set, this flag replaces all decimal long double conversions (%Lf, %Le, %Lg)
 with hexadecimal long double conversions (%La). This will improve performance
 significantly, but may cause some tests to fail. This has no effect when float
 conversions are disabled.

 --------------------------------
 Float Conversion Internal Flags:
 --------------------------------
 The following floating point conversion flags are provided for reference, but
 are not recommended to be adjusted except by persons familiar with the Printf
 Ryu Algorithm. Additionally they have no effect when float conversions are
 disabled.

 LIBC_COPT_FLOAT_TO_STR_NO_SPECIALIZE_LD
 ---------------------------------------
 This flag disables the separate long double conversion implementation. It is
 not based on the Ryu algorithm, instead generating the digits by
 multiplying/dividing the written-out number by 10^9 to get blocks. It's
 significantly faster than INT_CALC, only about 10x slower than MEGA_TABLE,
 and is small in binary size. Its downside is that it always calculates all
 of the digits above the decimal point, making it slightly inefficient for %e
 calls with large exponents. This is the default. This specialization overrides
 other flags, so this flag must be set for other flags to effect the long double
 behavior.

 LIBC_COPT_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE
 -------------------------------------------------
 When set, the float to string decimal conversion algorithm will use a larger
 table to accelerate long double conversions. This larger table is around 5MB of
 size when compiled.

 LIBC_COPT_FLOAT_TO_STR_USE_DYADIC_FLOAT
 ---------------------------------------
 When set, the float to string decimal conversion algorithm will use dyadic
 floats instead of a table when performing floating point conversions. This
 results in ~50 digits of accuracy in the result, then zeroes for the remaining
 values. This may improve performance but may also cause some tests to fail. The
 flag ending in _LD is the same, but only applies to long double decimal
 conversions.

 LIBC_COPT_FLOAT_TO_STR_USE_INT_CALC
 -----------------------------------
 When set, the float to string decimal conversion algorithm will use wide
 integers instead of a table when performing floating point conversions. This
 gives the same results as the table, but is very slow at the extreme ends of
 the long double range.

 LIBC_COPT_FLOAT_TO_STR_NO_TABLE
 -------------------------------
 When set, the float to string decimal conversion algorithm will not use either
 the mega table or the normal table for any conversions. Instead it will set
 algorithmic constants to improve performance when using calculation algorithms.
 If this flag is set without any calculation algorithm flag set, an error will
 occur.

 --------
 Parsing:
 --------

 When printf encounters an invalid conversion specification, the entire
 conversion specification will be passed literally to the output string.
 As an example, printf("%Z") would display "%Z".

 If an index mode conversion is requested for index "n" and there exists a number
 in [1,n) that does not have a conversion specified in the format string, then
 the conversion for index "n" is considered invalid.

 If a non-index mode (also referred to as sequential mode) conversion is
 specified after an index mode conversion, the next argument will be read but the
 current index will not be incremented. From this point on, the arguments
 selected by each conversion may or may not be correct. This is considered
 dangerously undefined and may change without warning.

 If a conversion specification is provided an invalid type modifier, that type
 modifier will be ignored, and the default type for that conversion will be used.
 In the case of the length modifier "L" and integer conversions, it will be
 treated as if it was "ll" (lowercase LL). For this purpose the list of integer
 conversions is d, i, u, o, x, X, b, B, n.

 If a conversion specification ending in % has any options that consume arguments
 (e.g. "%*.*%") those arguments will be consumed as normal, but their values will
 be ignored.

 If a conversion specification ends in a null byte ('\0') then it shall be
 treated as an invalid conversion followed by a null byte.

 If a number passed as a field width or precision value is out of range for an
 int, then it will be treated as the largest value in the int range
 (e.g. "%-999999999999.999999999999s" is the same as "%-2147483647.2147483647s").

 If the field width is set to INT_MIN by using the '*' form,
 e.g. printf("%*d", INT_MIN, 1), it will be treated as INT_MAX, since -INT_MIN is
 not representable as an int.

 If a number passed as a bit width is less than or equal to zero, the conversion
 is considered invalid. If the provided bit width is larger than the width of
 uintmax_t, it will be clamped to the width of uintmax_t.

 ----------
 Conversion
 ----------
 Any conversion specification that contains a flag or option that it does not
 have defined behavior for will ignore that flag or option (e.g. %.5c is the same
 as %c).

 If a conversion specification ends in %, then it will be treated as if it is
 "%%", ignoring all options.

 If a null pointer is passed to a %s conversion specification and null pointer
 checks are enabled, it will be treated as if the provided string is "null".

 If a null pointer is passed to a %n conversion specification and null pointer
 checks are enabled, the conversion will fail and printf will return a negative
 value.

 If a null pointer is passed to a %p conversion specification, the string
 "(nullptr)" will be returned instead of an integer value.

 The %p conversion will display any non-null pointer as if it was a uintptr value
 passed to a "%#tx" conversion, with all other options remaining the same as the
 original conversion.

 The %p conversion will display a null pointer as if it was the string
 "(nullptr)" passed to a "%s" conversion, with all other options remaining the
 same as the original conversion.

 The %r, %R, %k, and %K fixed point number format specifiers are accepted as
 defined in ISO/IEC TR 18037 (the fixed point number extension). These are
 available when the compiler is detected as having support for fixed point
 numbers and the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag is not set.
	.. _printf_behavior:

	====================================
	Printf Behavior Under All Conditions
	====================================

	Introduction:
	=============
	On the "defining undefined behavior" page, I said you should write down your
	decisions regarding undefined behavior in your functions. This is that document
	for my printf implementation.

	Unless otherwise specified, the functionality described is aligned with the ISO
	C standard and POSIX standard. If any behavior is not mentioned here, it should
	be assumed to follow the behavior described in those standards.

	The LLVM-libc codebase is under active development, and may change. This
	document was last updated [January 8, 2024] by [michaelrj] and may
	not be accurate after this point.

	The behavior of LLVM-libc's printf is heavily influenced by compile-time flags.
	Make sure to check what flags are defined before filing a bug report. It is also
	not relevant to any other libc implementation of printf, which may or may not
	share the same behavior.

	This document assumes familiarity with the definition of the printf function and
	is intended as a reference, not a replacement for the original standards.

	--------------
	General Flags:
	--------------
	These compile-time flags will change the behavior of LLVM-libc's printf when it
	is compiled. Combinations of flags that are incompatible will be marked.

	LIBC_COPT_STDIO_USE_SYSTEM_FILE
	-------------------------------
	When set, this flag changes fprintf and printf to use the FILE API from the
	system's libc, instead of LLVM-libc's internal FILE API. This is set by default
	when LLVM-libc is built in overlay mode.

	LIBC_COPT_PRINTF_DISABLE_INDEX_MODE
	-----------------------------------
	When set, this flag disables support for the POSIX "%n$" format, hereafter
	referred to as "index mode"; conversions using the index mode format will be
	treated as invalid. This reduces code size.

	LIBC_COPT_PRINTF_INDEX_ARR_LEN
	------------------------------
	This flag takes a positive integer value, defaulting to 128. This flag
	determines the number of entries the parser's type descriptor array has. This is
	used in index mode to avoid re-parsing the format string to determine types when
	an index lower than the previously specified one is requested. This has no
	effect when index mode is disabled.

	LIBC_COPT_PRINTF_DISABLE_WRITE_INT
	----------------------------------
	When set, this flag disables support for the C Standard "%n" conversion; any
	"%n" conversion will be treated as invalid. This is set by default to improve
	security.

	LIBC_COPT_PRINTF_DISABLE_FLOAT
	------------------------------
	When set, this flag disables support for floating point numbers and all their
	conversions (%a, %f, %e, %g); any floating point number conversion will be
	treated as invalid. This reduces code size.

	LIBC_COPT_PRINTF_DISABLE_FIXED_POINT
	------------------------------------
	When set, this flag disables support for fixed point numbers and all their
	conversions (%r, %k); any fixed point number conversion will be treated as
	invalid. This reduces code size. This has no effect if the current compiler does
	not support fixed point numbers.

	LIBC_COPT_PRINTF_NO_NULLPTR_CHECKS
	----------------------------------
	When set, this flag disables the nullptr checks in %n and %s.

	LIBC_COPT_PRINTF_CONV_ATLAS
	---------------------------
	When set, this flag changes the include path for the "converter atlas" which is
	a header that includes all the files containing the conversion functions. This
	is not recommended to be set without careful consideration.

	LIBC_COPT_PRINTF_HEX_LONG_DOUBLE
	--------------------------------
	When set, this flag replaces all decimal long double conversions (%Lf, %Le, %Lg)
	with hexadecimal long double conversions (%La). This will improve performance
	significantly, but may cause some tests to fail. This has no effect when float
	conversions are disabled.

	--------------------------------
	Float Conversion Internal Flags:
	--------------------------------
	The following floating point conversion flags are provided for reference, but
	are not recommended to be adjusted except by persons familiar with the Printf
	Ryu Algorithm. Additionally they have no effect when float conversions are
	disabled.

	LIBC_COPT_FLOAT_TO_STR_NO_SPECIALIZE_LD
	---------------------------------------
	This flag disables the separate long double conversion implementation. It is
	not based on the Ryu algorithm, instead generating the digits by
	multiplying/dividing the written-out number by 10^9 to get blocks. It's
	significantly faster than INT_CALC, only about 10x slower than MEGA_TABLE,
	and is small in binary size. Its downside is that it always calculates all
	of the digits above the decimal point, making it slightly inefficient for %e
	calls with large exponents. This is the default. This specialization overrides
	other flags, so this flag must be set for other flags to effect the long double
	behavior.

	LIBC_COPT_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE
	-------------------------------------------------
	When set, the float to string decimal conversion algorithm will use a larger
	table to accelerate long double conversions. This larger table is around 5MB of
	size when compiled.

	LIBC_COPT_FLOAT_TO_STR_USE_DYADIC_FLOAT
	---------------------------------------
	When set, the float to string decimal conversion algorithm will use dyadic
	floats instead of a table when performing floating point conversions. This
	results in ~50 digits of accuracy in the result, then zeroes for the remaining
	values. This may improve performance but may also cause some tests to fail. The
	flag ending in _LD is the same, but only applies to long double decimal
	conversions.

	LIBC_COPT_FLOAT_TO_STR_USE_INT_CALC
	-----------------------------------
	When set, the float to string decimal conversion algorithm will use wide
	integers instead of a table when performing floating point conversions. This
	gives the same results as the table, but is very slow at the extreme ends of
	the long double range.

	LIBC_COPT_FLOAT_TO_STR_NO_TABLE
	-------------------------------
	When set, the float to string decimal conversion algorithm will not use either
	the mega table or the normal table for any conversions. Instead it will set
	algorithmic constants to improve performance when using calculation algorithms.
	If this flag is set without any calculation algorithm flag set, an error will
	occur.

	--------
	Parsing:
	--------

	When printf encounters an invalid conversion specification, the entire
	conversion specification will be passed literally to the output string.
	As an example, printf("%Z") would display "%Z".

	If an index mode conversion is requested for index "n" and there exists a number
	in [1,n) that does not have a conversion specified in the format string, then
	the conversion for index "n" is considered invalid.

	If a non-index mode (also referred to as sequential mode) conversion is
	specified after an index mode conversion, the next argument will be read but the
	current index will not be incremented. From this point on, the arguments
	selected by each conversion may or may not be correct. This is considered
	dangerously undefined and may change without warning.

	If a conversion specification is provided an invalid type modifier, that type
	modifier will be ignored, and the default type for that conversion will be used.
	In the case of the length modifier "L" and integer conversions, it will be
	treated as if it was "ll" (lowercase LL). For this purpose the list of integer
	conversions is d, i, u, o, x, X, b, B, n.

	If a conversion specification ending in % has any options that consume arguments
	(e.g. "%.%") those arguments will be consumed as normal, but their values will
	be ignored.

	If a conversion specification ends in a null byte ('\0') then it shall be
	treated as an invalid conversion followed by a null byte.

	If a number passed as a field width or precision value is out of range for an
	int, then it will be treated as the largest value in the int range
	(e.g. "%-999999999999.999999999999s" is the same as "%-2147483647.2147483647s").

	If the field width is set to INT_MIN by using the '*' form,
	e.g. printf("%*d", INT_MIN, 1), it will be treated as INT_MAX, since -INT_MIN is
	not representable as an int.

	If a number passed as a bit width is less than or equal to zero, the conversion
	is considered invalid. If the provided bit width is larger than the width of
	uintmax_t, it will be clamped to the width of uintmax_t.

	----------
	Conversion
	----------
	Any conversion specification that contains a flag or option that it does not
	have defined behavior for will ignore that flag or option (e.g. %.5c is the same
	as %c).

	If a conversion specification ends in %, then it will be treated as if it is
	"%%", ignoring all options.

	If a null pointer is passed to a %s conversion specification and null pointer
	checks are enabled, it will be treated as if the provided string is "null".

	If a null pointer is passed to a %n conversion specification and null pointer
	checks are enabled, the conversion will fail and printf will return a negative
	value.

	If a null pointer is passed to a %p conversion specification, the string
	"(nullptr)" will be returned instead of an integer value.

	The %p conversion will display any non-null pointer as if it was a uintptr value
	passed to a "%#tx" conversion, with all other options remaining the same as the
	original conversion.

	The %p conversion will display a null pointer as if it was the string
	"(nullptr)" passed to a "%s" conversion, with all other options remaining the
	same as the original conversion.

	The %r, %R, %k, and %K fixed point number format specifiers are accepted as
	defined in ISO/IEC TR 18037 (the fixed point number extension). These are
	available when the compiler is detected as having support for fixed point
	numbers and the LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag is not set.