MultiversX Tracker is Live!

How much entropy is lost alphabetising your mnemonics?

Bitcoin Stack Exchange

Bitcoin News / Bitcoin Stack Exchange 188 Views

TL;DR: you get around a factor 500 million attack speedup for 12 words, bringing it not quite in the realm of feasible attacks, but it's getting close. For 24 words, the speedup is around a factor half a septillion, but still far beyond the realm of feasible attacks.

Repetition of words or not?

First I need to ask: are words permitted to occur multiple times within the same sorted phrase? If not, Mike D's answer is correct. If they are, the question just got a lot more complicated, at least if you care about the exact numbers.

The difficulty with allowing repetition

Once you allow repetition, the distribution you obtain by sorting uniformly chosen phrases, is no longer uniform. For example the sorted phrase "act act act act act act act act act act act zoo" can be obtained from 12 distinct unsorted phrases ("act ... act zoo", "act ... act zoo act", "act ... act zoo act act", ...). The sorted phrase "mad mad mad mad mad mad mad mad mad mad mad mad" can only be obtained in one way. Thus, if you start with a random unsorted phrase, and sort it, the "act ... act zoo" phrase is 12 times more likely than the "mad ... mad" phrase.

This non-uniformity has a subtle but complicated impact on our question. If the phrases are uniformly distributed, with N possibilities, then it holds that:

  1. The number of distinct phrases is N.
  2. The entropy (in bits) of a sampled phrase is log2(N).
  3. The average number of tries an attacker needs to brute-force guess a sampled phrase is (N + 1) / 2.

When the phrases are not uniform (as would be the case when sorting uniform phrases with repetition), the answer to (2) and (3) cannot be simply expressed anymore as a function of N. It depends on exactly how non-uniform the distribution is. Worse, you can't even compute (3) from (2) or the other way around, because entropy is not a measure for how secure something is, but for how compressible it is. These two are strongly correlated, but not exactly the same, and the difference between the two again will depend on the exact distribution.

This is all to say that the answer to these three questions is distinct, and it depends on what you're interested in:

  1. How many distinct sorted phrases are there?
  2. How much entropy do phrases obtained by sorting uniform unsorted phrases have?
  3. How hard is it for attackers to guess a phrase obtained by sorting a uniform unsorted phrase has?

1. How many distinct sorted phrases are there?

The number of distinct multisets of k elements, drawn from a set of N elements is .

  • For k=12 and N=2048 that is 11737947382912650038702322439680, or ~2103.211.
  • For k=24 and N=2048 that is 54640990776272965318633918350257589558711775320768326400, or ~2185.156

Note that these numbers only differ slightly from the corresponding numbers when repetition is not allowed (2103.118 and 2184.767 respectively)

2. How much entropy do phrases obtained by sorting uniform unsorted phrases have?

The entropy of a distribution is how many bits on average an optimal compression scheme needs per element to encode a (long) list of randomly sampled elements from that distribution. For a discrete distribution taking on possibilities a1, a2, ..., an, with respective probabilities p1, p2, ..., pn, that is . By applying this formula to the probability distribution of sorted phrases exactly, you get:

  • For 12 words: ~103.196 bits
  • For 24 words: ~185.097 bits

3. How hard is it for attackers to guess a phrase obtained by sorting a uniform unsorted phrase has?

The optimal strategy for an attacker that tries to brute force, and who knows the probability distribution, is to try values from most likely to least likely. The expected number of tries under such a strategy is , assuming the pi are sorted from high to low. Applying this formula to the exact probability distribution in question here we get:

  • For 12 words: ~(2103.166 + 1) / 2
  • For 24 words: ~(2184.985 + 1) / 2

In other words, the average time an attacker needs for guessing these sorted phrase distributions is as if they were uniform distributions with ~103.166 resp. ~184.985 bits of entropy (but, as demonstrated in the previous section, that is not exactly the entropy the sorted phrase distributions have).

Summary

The following table summarizes the numbers for all 4 scenarios (repetition allows and not allowed, sorted or not), under all 3 metrics (number of possibilities, entropy, and expected number of tries for an attacker):

12 words# possibilitiesentropymean tries
Normal, repetition2132132 bits(2132 + 1) / 2
Normal, no repetition~2131.953~131.953 bits~(2131.953 + 1) / 2
After sorting, repetition (*)~2103.211~103.196 bits~(2103.166 + 1) / 2
After sorting, no repetition~2103.118~103.118 bits~(2103.118 + 1) / 2

24 words# possibilitiesentropymean tries
Normal, repetition2264264 bits(2264 + 1) / 2
Normal, no repetition~2263.805~263.805 bits~(2263.805 + 1) / 2
After sorting, repetition (*)~2185.156~185.097 bits~(2184.985 + 1) / 2
After sorting, no repetition~2184.767~184.767 bits~(2184.767 + 1) / 2

(*) = Note that only the lines for after sorting with repetition have distinct numbers in their expressions for the three columns. For all others, they follow the pattern 2v, v, (2v + 1) / 2 for some value v perfectly, because they induce uniform distributions.


Get BONUS $200 for FREE!

You can get bonuses upto $100 FREE BONUS when you:
πŸ’° Install these recommended apps:
πŸ’² SocialGood - 100% Crypto Back on Everyday Shopping
πŸ’² xPortal - The DeFi For The Next Billion
πŸ’² CryptoTab Browser - Lightweight, fast, and ready to mine!
πŸ’° Register on these recommended exchanges:
🟑 Binance🟑 Bitfinex🟑 Bitmart🟑 Bittrex🟑 Bitget
🟑 CoinEx🟑 Crypto.com🟑 Gate.io🟑 Huobi🟑 Kucoin.



Comments