docs/source/basics.rst - platform/external/python/ecdsa - Git at Google

 ======================
 Basics of ECC handling
 ======================

 The :term:`ECC`, as any asymmetric cryptography system, deals with private
 keys and public keys. Private keys are generally used to create signatures,
 and are kept, as the name suggest, private. That's because possession of a
 private key allows creating a signature that can be verified with a public key.
 If the public key is associated with an identity (like a person or an
 institution), possession of the private key will allow to impersonate
 that identity.

 The public keys on the other hand are widely distributed, and they don't
 have to be kept private. The primary purpose of them, is to allow
 checking if a given signature was made with the associated private key.

 Number representations
 ======================

 On a more low level, the private key is a single number, usually the
 size of the curve size: a NIST P-256 private key will have a size of 256 bits,
 though as it needs to be selected randomly, it may be a slightly smaller
 number (255-bit, 248-bit, etc.).
 Public points are a pair of numbers. That pair specifies a point on an
 elliptic curve (a pair of integers that satisfy the curve equation).
 Those two numbers are similarly close in size to the curve size, so both the
 ``x`` and ``y`` coordinate of a NIST P-256 curve will also be around 256 bit in
 size.

 .. note::
    To be more precise, the size of the private key is related to the
    curve *order*, i.e. the number of points on a curve. The coordinates
    of the curve depend on the *field* of the curve, which usually means the
    size of the *prime* used for operations on points. While the *order* and
    the *prime* size are related and fairly close in size, it's possible
    to have a curve where either of them is larger by a bit (i.e.
    it's possible to have a curve that uses a 256 bit *prime* that has a 257 bit
    *order*).

 Since normally computers work with much smaller numbers, like 32 bit or 64 bit,
 we need to use special approaches to represent numbers that are hundreds of
 bits large.

 First is to decide if the numbers should be stored in a big
 endian format, or in little endian format. In big endian, the most
 significant bits are stored first, so a number like :math:`2^{16}` is saved
 as a three bytes: byte with value 1 and two bytes with value 0.
 In little endian format the least significant bits are stored first, so
 the number like :math:`2^{16}` would be stored as three bytes:
 first two bytes with value 0, than a byte with value 1.

 For :term:`ECDSA` big endian encoding is usually used, for :term:`EdDSA`
 little endian encoding is usually used.

 Secondly, we need to decide if the numbers need to be stored as fixed length
 strings (zero padded if necessary), or if they should be stored with
 minimal number of bytes necessary.
 That depends on the format and place it's used, some require strict
 sizes (so even if the number encoded is 1, but the curve used is 128 bit large,
 that number 1 still needs to be encoded with 16 bytes, with fifteen most
 significant bytes equal zero).

 Public key encoding
 ===================

 Generally, public keys (i.e. points) are expressed as fixed size byte strings.

 While public keys can be saved as two integers, one to represent the
 ``x`` coordinate and one to represent ``y`` coordinate, that actually
 provides a lot of redundancy. Because of the specifics of elliptic curves,
 for every valid ``x`` value there are only two valid ``y`` values.
 Moreover, if you have an ``x`` value, you can compute those two possible
 ``y`` values (if they exist).
 As such, it's possible to save just the ``x`` coordinate and the sign
 of the ``y`` coordinate (as the two possible values are negatives of
 each-other: :math:`y_1 == -y_2`).

 That gives us few options to represent the public point, the most common are:

 1. As a concatenation of two fixed-length big-endian integers, so called
    :term:`raw encoding`.
 2. As a concatenation of two fixed-length big-endian integers prefixed with
    the type of the encoding, so called :term:`uncompressed` point
    representation (the type is represented by a 0x04 byte).
 3. As a fixed-length big-endian integer representing the ``x`` coordinate
    prefixed with the byte representing the combined type of the encoding
    and the sign of the ``y`` coordinate, so called :term:`compressed`
    point representation (the type is then represented by a 0x02 or a 0x03
    byte).

 Interoperable file formats
 ==========================

 Now, while we can save the byte strings as-is and "remember" which curve
 was used to generate those private and public keys, interoperability usually
 requires to also save information about the curve together with the
 corresponding key. Here too there are many ways to do it:
 save the parameters of the used curve explicitly, use the name of the
 well-known curve as a string, use a numerical identifier of the well-known
 curve, etc.

 For public keys the most interoperable format is the one described
 in RFC5912 (look for SubjectPublicKeyInfo structure).
 For private keys, the RFC5915 format (also known as the ssleay format)
 and the PKCS#8 format (described in RFC5958) are the most popular.

 All three formats effectively support two ways of providing the information
 about the curve used: by specifying the curve parameters explicitly or
 by specifying the curve using ASN.1 OBJECT IDENTIFIER (OID), which is
 called ``named_curve``. ASN.1 OIDs are a hierarchical system of representing
 types of objects, for example, NIST P-256 curve is identified by the
 1.2.840.10045.3.1.7 OID (in dotted-decimal formatting of the OID, also
 known by the ``prime256v1`` OID node name or short name). Those OIDs
 uniquely, identify a particular curve, but the receiver needs to know
 which numerical OID maps to which curve parameters. Thus the prospect of
 using the explicit encoding, where all the needed parameters are provided
 is tempting, the downside is that curve parameters may specify a *weak*
 curve, which is easy to attack and break (that is to deduce the private key
 from the public key). To verify curve parameters is complex and computationally
 expensive, thus generally protocols use few specific curves and require
 all implementations to carry the parameters of them. As such, use of
 ``named_curve`` parameters is generally recommended.

 All of the mentioned formats specify a binary encoding, called DER. That
 encoding uses bytes with all possible numerical values, which means it's not
 possible to embed it directly in text files. For uses where it's useful to
 limit bytes to printable characters, so that the keys can be embedded in text
 files or text-only protocols (like email), the PEM formatting of the
 DER-encoded data can be used. The PEM formatting is just a base64 encoding
 with appropriate header and footer.

 Signature formats
 =================

 Finally, ECDSA signatures at the lowest level are a pair of numbers, usually
 called ``r`` and ``s``. While they are the ``x`` coordinates of special
 points on the curve, they are saved modulo *order* of the curve, not
 modulo *prime* of the curve (as a coordinate needs to be).

 That again means we have multiple ways of encoding those two numbers.
 The two most popular formats are to save them as a concatenation of big-endian
 integers of fixed size (determined by the curve *order*) or as a DER
 structure with two INTEGERS.
 The first of those is called the :term:``raw encoding`` inside the Python
 ecdsa library.

 As ASN.1 signature format requires the encoding of INTEGERS, and DER INTEGERs
 must use the fewest possible number of bytes, a numerically small value of
 ``r`` or ``s`` will require fewer
 bytes to represent in the DER structure. Thus, DER encoding isn't fixed
 size for a given curve, but has a maximum possible size.

 .. note::

     As DER INTEGER uses so-called two's complement representation of
     numbers, the most significant bit of the most significant byte
     represents the *sign* of the number. If that bit is set, then the
     number is considered to be negative. Thus, to represent a number like
     255, which in binary representation is 0b11111111 (i.e. a byte with all
     bits set high), the DER encoding of it will require two bytes, one
     zero byte to make sure the sign bit is 0, and a byte with value 255 to
     encode the numerical value of the integer.
	======================
	Basics of ECC handling
	======================

	The :term:`ECC`, as any asymmetric cryptography system, deals with private
	keys and public keys. Private keys are generally used to create signatures,
	and are kept, as the name suggest, private. That's because possession of a
	private key allows creating a signature that can be verified with a public key.
	If the public key is associated with an identity (like a person or an
	institution), possession of the private key will allow to impersonate
	that identity.

	The public keys on the other hand are widely distributed, and they don't
	have to be kept private. The primary purpose of them, is to allow
	checking if a given signature was made with the associated private key.

	Number representations
	======================

	On a more low level, the private key is a single number, usually the
	size of the curve size: a NIST P-256 private key will have a size of 256 bits,
	though as it needs to be selected randomly, it may be a slightly smaller
	number (255-bit, 248-bit, etc.).
	Public points are a pair of numbers. That pair specifies a point on an
	elliptic curve (a pair of integers that satisfy the curve equation).
	Those two numbers are similarly close in size to the curve size, so both the
	``x`` and ``y`` coordinate of a NIST P-256 curve will also be around 256 bit in
	size.

	.. note::
	To be more precise, the size of the private key is related to the
	curve order, i.e. the number of points on a curve. The coordinates
	of the curve depend on the field of the curve, which usually means the
	size of the prime used for operations on points. While the order and
	the prime size are related and fairly close in size, it's possible
	to have a curve where either of them is larger by a bit (i.e.
	it's possible to have a curve that uses a 256 bit prime that has a 257 bit
	order).

	Since normally computers work with much smaller numbers, like 32 bit or 64 bit,
	we need to use special approaches to represent numbers that are hundreds of
	bits large.

	First is to decide if the numbers should be stored in a big
	endian format, or in little endian format. In big endian, the most
	significant bits are stored first, so a number like :math:`2^{16}` is saved
	as a three bytes: byte with value 1 and two bytes with value 0.
	In little endian format the least significant bits are stored first, so
	the number like :math:`2^{16}` would be stored as three bytes:
	first two bytes with value 0, than a byte with value 1.

	For :term:`ECDSA` big endian encoding is usually used, for :term:`EdDSA`
	little endian encoding is usually used.

	Secondly, we need to decide if the numbers need to be stored as fixed length
	strings (zero padded if necessary), or if they should be stored with
	minimal number of bytes necessary.
	That depends on the format and place it's used, some require strict
	sizes (so even if the number encoded is 1, but the curve used is 128 bit large,
	that number 1 still needs to be encoded with 16 bytes, with fifteen most
	significant bytes equal zero).

	Public key encoding
	===================

	Generally, public keys (i.e. points) are expressed as fixed size byte strings.

	While public keys can be saved as two integers, one to represent the
	``x`` coordinate and one to represent ``y`` coordinate, that actually
	provides a lot of redundancy. Because of the specifics of elliptic curves,
	for every valid ``x`` value there are only two valid ``y`` values.
	Moreover, if you have an ``x`` value, you can compute those two possible
	``y`` values (if they exist).
	As such, it's possible to save just the ``x`` coordinate and the sign
	of the ``y`` coordinate (as the two possible values are negatives of
	each-other: :math:`y_1 == -y_2`).

	That gives us few options to represent the public point, the most common are:

	1. As a concatenation of two fixed-length big-endian integers, so called
	:term:`raw encoding`.
	2. As a concatenation of two fixed-length big-endian integers prefixed with
	the type of the encoding, so called :term:`uncompressed` point
	representation (the type is represented by a 0x04 byte).
	3. As a fixed-length big-endian integer representing the ``x`` coordinate
	prefixed with the byte representing the combined type of the encoding
	and the sign of the ``y`` coordinate, so called :term:`compressed`
	point representation (the type is then represented by a 0x02 or a 0x03
	byte).

	Interoperable file formats
	==========================

	Now, while we can save the byte strings as-is and "remember" which curve
	was used to generate those private and public keys, interoperability usually
	requires to also save information about the curve together with the
	corresponding key. Here too there are many ways to do it:
	save the parameters of the used curve explicitly, use the name of the
	well-known curve as a string, use a numerical identifier of the well-known
	curve, etc.

	For public keys the most interoperable format is the one described
	in RFC5912 (look for SubjectPublicKeyInfo structure).
	For private keys, the RFC5915 format (also known as the ssleay format)
	and the PKCS#8 format (described in RFC5958) are the most popular.

	All three formats effectively support two ways of providing the information
	about the curve used: by specifying the curve parameters explicitly or
	by specifying the curve using ASN.1 OBJECT IDENTIFIER (OID), which is
	called ``named_curve``. ASN.1 OIDs are a hierarchical system of representing
	types of objects, for example, NIST P-256 curve is identified by the
	1.2.840.10045.3.1.7 OID (in dotted-decimal formatting of the OID, also
	known by the ``prime256v1`` OID node name or short name). Those OIDs
	uniquely, identify a particular curve, but the receiver needs to know
	which numerical OID maps to which curve parameters. Thus the prospect of
	using the explicit encoding, where all the needed parameters are provided
	is tempting, the downside is that curve parameters may specify a weak
	curve, which is easy to attack and break (that is to deduce the private key
	from the public key). To verify curve parameters is complex and computationally
	expensive, thus generally protocols use few specific curves and require
	all implementations to carry the parameters of them. As such, use of
	``named_curve`` parameters is generally recommended.

	All of the mentioned formats specify a binary encoding, called DER. That
	encoding uses bytes with all possible numerical values, which means it's not
	possible to embed it directly in text files. For uses where it's useful to
	limit bytes to printable characters, so that the keys can be embedded in text
	files or text-only protocols (like email), the PEM formatting of the
	DER-encoded data can be used. The PEM formatting is just a base64 encoding
	with appropriate header and footer.

	Signature formats
	=================

	Finally, ECDSA signatures at the lowest level are a pair of numbers, usually
	called ``r`` and ``s``. While they are the ``x`` coordinates of special
	points on the curve, they are saved modulo order of the curve, not
	modulo prime of the curve (as a coordinate needs to be).

	That again means we have multiple ways of encoding those two numbers.
	The two most popular formats are to save them as a concatenation of big-endian
	integers of fixed size (determined by the curve order) or as a DER
	structure with two INTEGERS.
	The first of those is called the :term:``raw encoding`` inside the Python
	ecdsa library.

	As ASN.1 signature format requires the encoding of INTEGERS, and DER INTEGERs
	must use the fewest possible number of bytes, a numerically small value of
	``r`` or ``s`` will require fewer
	bytes to represent in the DER structure. Thus, DER encoding isn't fixed
	size for a given curve, but has a maximum possible size.

	.. note::

	As DER INTEGER uses so-called two's complement representation of
	numbers, the most significant bit of the most significant byte
	represents the sign of the number. If that bit is set, then the
	number is considered to be negative. Thus, to represent a number like
	255, which in binary representation is 0b11111111 (i.e. a byte with all
	bits set high), the DER encoding of it will require two bytes, one
	zero byte to make sure the sign bit is 0, and a byte with value 255 to
	encode the numerical value of the integer.