Hash#
Everyone should be relatively familiar with how hash functions work. The hash functions used in cryptography are called crypto-graphic hash functions
.
It has two important properties: one is called collision resistance. Here, collision refers to hash collisions. If there are two inputs x, y
such that x ≠ y
, and the hash function is H(v)
, but H(x) = H(y)
, this is called a hash collision. The hash values calculated from two different inputs are equal. Hash collisions are quite common. For example, we encounter hash collisions when using hash tables. Different inputs may be mapped to the same position in the hash table. Generally speaking, hash collisions are unavoidable. This is because the input space is much larger than the output space. For instance, if we have a 256-bit hash value, how large is the output space? The total possibilities for all hash values are 2 to the power of 256, which is the size of the output space. However, the input space can be infinitely large. Therefore, there are infinitely many possibilities. According to the pigeonhole principle, there will inevitably be cases where two inputs are mapped to the same output. So when we talk about collision resistance
, it means that hash collisions do not occur. Some 📖 refer to this property as collision free
. I don't particularly like this term because it can easily mislead people into thinking that collisions won't happen. In reality, collisions objectively exist. What it means is that there is no efficient method to artificially create hash collisions. Given an x
, there is no good way to find another y
such that the hash values H(x)
and H(y)
are exactly equal. There is no efficient way to find it. If you insist on finding it, you can use a brute-force method. For example, for x
and y
, you would traverse all possible inputs and see which one produces a hash value that is exactly equal. This is called brute-force
. You traverse all possible values and eventually find a hash value that collides. If the input space is large, for instance, for a 256-bit hash value, using brute-force in practice is not feasible due to the enormous workload.
So what is the use of the property collision resistance
? It can be used to compute a digest
for a message
. For example, if we have a message
called m
, we take its hash value H(m)
. This hash value can be considered the digest
of this message, used to detect tampering with the message
. For instance, if someone alters the content of this message
, its hash value will change. The property of collision resistance
means that you cannot find another m'
such that H(m')
is exactly equal to H(m)
. There is no way to tamper with the content without being detected. For example, if you have a large file that you want to store on a cloud storage service, and you want to download it later, how do you know that the version you downloaded is the same as the version you uploaded? This is where the collision resistance property of hash functions comes into play. Before uploading the file, you compute a hash value. This hash value is stored locally. Later, after downloading, you compute another hash value and compare it with the original stored hash value. If they are the same, it indicates that the uploaded file has not been tampered with, and the downloaded version is indeed the original. This is one application of collision resistance.
One thing to note is that there is no hash function that can be mathematically proven to be collision resistant. In other words, this important property we just discussed cannot be theoretically proven. It can only rely on practical experience. Some hash functions have been tested over a long period of time. With so many cryptography experts in the world, no one has been able to find a way to artificially create hash collisions. Therefore, we consider these hash functions to be collision resistant based on practical experience. However, there are also some hash functions that we once thought were collision resistant, but later methods to create hash collisions were discovered. A significant example of this is MD5, which was once a popular hash function that was thought to be secure, but is no longer considered so, as we now know how to artificially create hash collisions.
The hash functions used in cryptography also have a second property: hiding. What does hiding mean? Hiding means that the computation process of the hash function is one-way and irreversible. Given an x
, you can compute its hash value H(x)
, but from the hash value H(x)
, there is no way to reverse-engineer the original input x
. In other words, this hash value H(x)
does not leak any information about the input x
. This is called hiding. However, if you think about it, if you want to know the input, there is a way to do it. How? You can still use brute force. You can traverse all possible values of the input and see which input value produces a hash H(x)
that matches the original H(x)
. This way, you can guess what the original input x
is. So brute-force is one method. The premise for the hiding property to hold is that the input space must be large enough so that this brute-force method is not feasible, and the distribution of inputs must be relatively uniform. If the input space is large but most values are concentrated in a few small values, it can also be relatively easy to crack.
What is the use of the hiding property? It can be combined with the collision resistance property to achieve digital commitment. This digital commitment is also called the digital equivalent of a sealed envelope.
Let's talk about what a sealed envelope is used for in real life. For example, if someone can predict the stock market and knows which stocks will hit the limit up the next day, how can they prove that their prediction is accurate? One way is for this person to publicly announce their prediction on television a day in advance (I predict that xxx stock will hit the limit up tomorrow). After the market closes the next day, you can check whether the stock actually hit the limit up to determine the accuracy of the prediction. Is there a problem with this approach? It seems like a method to verify the accuracy of the prediction. What’s the issue? If you publicly announce the prediction results in advance, it may affect the stock market. For instance, if this person is very famous and people think of them as a stock guru, the stock might not have hit the limit up, but because of their public prediction, everyone rushes to buy it, causing it to hit the limit up. Of course, the opposite could also happen. The stock might genuinely need to hit the limit up, but someone might want to sabotage it. If you predicted it would hit the limit up, they might try to prevent it from doing so by selling heavily. This could happen as well. This shows that the prediction results cannot be publicly announced in advance. However, if the prediction results are not announced in advance, and you wait until after the market closes to announce them, how do you know that the prediction results have not been tampered with? How do you know that the final announced result is the one you produced a day earlier? This is where the concept of a sealed envelope comes into play. You write your prediction on a piece of paper, put it in an envelope, and seal it. This envelope must be given to a third-party impartial organization for safekeeping. The next day, after the market closes, the envelope is opened to verify whether the result is accurate. This is what a sealed envelope is in reality.
So how do I create a digital sealed envelope in the electronic world? I take the prediction result as input x
to compute a hash value. Then I can publish this hash value. Because of the hiding property, you cannot know what the prediction result is from this hash value. Then, after the market closes the next day, I publish the prediction result. Because of the collision resistance property, my prediction result cannot be tampered with. If it is altered, it will not match the originally published hash value. This serves the function of a sealed envelope. In practical operations, there are some details to pay attention to. We mentioned that the premise for the hiding property is that the input space must be large enough and the distribution must be relatively uniform. If this input does not meet this property, like in the example of predicting which stock will hit the limit up the next day, the input space is not large enough since there are only a few thousand stocks. A common method is to append a random number (nonce) to the input and then hash it together. So it is no longer just x
, but x
concatenated with a nonce, and then the entire thing is hashed: H(x | Nonce)
. This nonce is a randomly chosen number that ensures that after selection, the entire input is sufficiently random, and the distribution is also sufficiently uniform. These are some practical details to pay attention to.
In addition to the two properties required in cryptography, the hash function used in Bitcoin also requires a third property called puzzle friendly. This means that the computation of the hash is unpredictably difficult in advance. Just by looking at the input, it is hard to know what the hash value is. So if you want to know that the hash value falls within a certain range, you have no good way to do it; you can only try one by one to see which input produces a hash value that falls within the required range. For example, if you want to get a hash value where the first k
bits are all 0s: 0000000…0000000xxxxxxx..xxx
, the entire thing is 256 bits and must start with k
zeros. What input will produce this hash value? You don’t know in advance; the puzzle friendly property means that you cannot predict which input is more likely to produce this hash value. To obtain this hash value, you must try one by one. There are no shortcuts. This property is called puzzle friendly because later we will discuss the process of Bitcoin mining. You may have heard the term mining. Mining actually involves finding a nonce. Finding such a random number, this nonce is combined with other information in the block header. As input, it produces a hash that must be less than or equal to a specified target threshold: H(block header) ≤ target
. Bitcoin is a blockchain, and a blockchain is a linked list composed of blocks, each block has a block header. The block header contains many fields, one of which is a randomly settable nonce. The mining process involves continuously trying various nonces so that the entire block header, when hashed, falls within the specified range. For example, this is the entire output space. We require that the computed hash value is only valid in this small area, which is the target space. The puzzle friendly property means that there is no shortcut in the mining process. You can only continuously try a large number of nonces to find a solution that meets the requirements. Therefore, this process can be used as proof of work. When you mine and find a nonce that meets the requirements, it is certainly because you have done a lot of work, as there are no other shortcuts.
Note that although the mining process requires a lot of work to find a nonce that meets the requirements, once someone finds such a nonce and publishes it, it is very easy for others to verify whether this nonce meets the requirements. You just need to compute the hash once. This nonce, as part of the header, is hashed once to see if it is less than or equal to the target threshold. Mining is hard to do but easy to verify. This property is called difficult to solve, but easy to verify. We need to pay attention to this property when designing such mining puzzles.
The hash function used in Bitcoin is SHA-256, which stands for secure hash algorithm. The three properties we discussed are all satisfied. Some students may feel that puzzle friendly and collision resistance are quite similar. These two properties are related but not exactly the same. We said that Bitcoin utilizes two functions from cryptography: one is hash and the other is signature.
Signature#
Now that we have completed the discussion on the first function, hash, let's move on to signatures.
To discuss signatures, I need to talk about account management in the Bitcoin system. In daily life, if you want to open an account, you would take your identification to the bank to complete the account opening procedures. This is the account management method in a centralized system. However, Bitcoin is decentralized; it does not have banks or similar institutions. So how do you open an account? Each user decides to open an account themselves without needing anyone's approval. The process of opening an account is simple: it involves creating a pair of public and private keys (public key, private key). You create a public-private key pair locally, which represents an account in Bitcoin. The concept of public and private keys comes from asymmetric encryption. The earliest encryption systems were symmetric, known as symmetric encryption algorithms. For example, if two people want to communicate, and I want to send you some information, but the communication network might be eavesdropped on, what do we do? We agree on a key in advance, called the encryption key. I encrypt the information and send it to you. You receive it and then decrypt it using this key. Since both encryption and decryption use the same key, this is called a symmetric encryption system. The premise of this system assumes that there is a secure channel to distribute this key to both parties. Clearly, you cannot transmit this key in plaintext over the network, as we assume the network itself is insecure and may be eavesdropped on. This is a weakness of symmetric encryption systems: key distribution is not very convenient. To solve this problem, asymmetric encryption systems propose using a pair of keys instead of one. There is a public key and a private key. The public key is used for encryption, and the private key is used for decryption. For example, if I want to send you some information, I encrypt it using your public key, and you decrypt it using your private key to obtain the original information. Note that this encryption and decryption use the public and private keys of the same person, which is the recipient.
What are the benefits of this? The public key does not need to be kept secret; the public key used for encryption can be shared with everyone. Some people list their public keys on their homepage for everyone to see. The private key must be kept secret; decryption requires the private key, but the private key only needs to be stored locally. There is no need to transmit it to the other party. The person you are communicating with does not need to know your private key. They encrypt using your public key, and if you want to reply, you encrypt using their public key. There is no need to know the other party's private key. This solves the inconvenience of key distribution in symmetric encryption systems. In the Bitcoin system, to create an account, you generate a pair of public and private keys locally. The public key is equivalent to your bank account number. If someone wants to transfer money to you, they only need to know your public key. The private key is like your account password; knowing this private key allows you to transfer money from that account.
Now, there is a question: we previously mentioned that the Bitcoin system is not encrypted. It is called cryptocurrency, but it is not encrypted. The information is all public. So what do I need the public and private keys for? They are actually used for signing.
For example, if I want to transfer 10 bitcoins to you, and I publish this transaction on the blockchain, how can others know that this transaction was initiated by me? Could someone impersonate me and secretly transfer money from my account? This is where I need to sign the transaction using my private key when I publish it. Then, when others receive this transaction, they can use my public key to verify the correctness of the signature. The signature uses the private key, while the verification of the signature uses the public key of the person.
Still, it is the same person. Since each person independently generates their account and locally generates their public-private key pair without needing anyone's approval, what if two people generate the same public-private key pair? For example, if someone wants to steal bitcoins, one method could be to continuously generate public-private keys and compare the generated public key with an existing public key on the blockchain. If they match, they could use the private key to steal money from that account. This attack method seems theoretically possible, but in practice, it is not feasible. For instance, if you are dealing with 256-bit hash values, the probability of generating the same public-private key pair is extremely low. Even if you have a supercomputer generating a large number of public-private key pairs every day, the probability of two people having the same public-private key pair is negligible. This probability is smaller than the probability of the Earth exploding. So far, there have been no known cases of someone successfully attacking using this method. It is important to emphasize that we assume there is a good source of randomness when generating public-private keys. This is called a good source of randomness. The process of generating public-private keys must be random. If the chosen source of randomness is poor, the previous analysis does not hold, and it could result in two people generating the same public-private key pair. The signature algorithm used in Bitcoin not only requires a good source of randomness when generating public-private keys but also requires a good source of randomness for every signature. If any single signature uses a poor source of randomness, it could potentially leak the private key, and that would be disastrous. This is something everyone must pay attention to.
We have discussed two functions: one is hash and the other is signature. These two functions can be combined. In the Bitcoin system, it is common to first compute a hash of a message and then sign this hash value.