You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24-27Lines changed: 24 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ Creating a random ID generator using `puid-js` is a simple as:
68
68
`puid-js` uses `crypto.randomBytes` as the default entropy source. Options can be used to configure a specific entropy source:
69
69
70
70
-`entropyBytes`: any function of the form `(n: number): Uint8Array`, such as `crypto.randomBytes`
71
-
-`entropyValues` any function of the form `(buf: Uint8Array): void`, such as `crypto.getRandomValues`
71
+
-`entropyValues`: any function of the form `(buf: Uint8Array): void`, such as `crypto.getRandomValues`
72
72
73
73
These options make it easy to use `puid-js` in either a `nodejs` or web environment. It also allows for using any suitable third-party entropy source. The project includes an experimental PRNG random bytes generator, `prngBytes`, and the test code uses the `entropyBytes` option for deterministic testing.
74
74
@@ -163,14 +163,15 @@ The optional `PuidConfig` object has the following fields:
163
163
-`entropyBytes` is the form of the function `crypto.randomBytes`
164
164
-`entropyValues` is the form of the function `crypto.getRandomValues`
165
165
- Only one of `entropyBytes` or `entropyValues` can be set
166
-
- Defaults
167
-
-`bits`: 128
168
-
-`chars`: `Chars.Safe64`
169
-
-`entropyBytes`: `crypto.randomBytes`
166
+
167
+
##### Defaults
168
+
-`bits`: 128
169
+
-`chars`: `Chars.Safe64`
170
+
-`entropyBytes`: `crypto.randomBytes`
170
171
171
172
#### PuidInfo
172
173
173
-
The HOF `puid` includes an `info` field that displays generator configuration:
174
+
The `puid` generator function includes an `info` field that displays generator configuration:
174
175
175
176
-`bits`: ID entropy
176
177
-`bitsPerChar`: Entropy bits per ID character
@@ -210,10 +211,10 @@ There are 16 pre-defined character sets:
Any string of up to 256 unique characters can be used for **`puid`** generation.
228
+
Any string of up to 256 unique characters, including unicode, can be used for **`puid`** generation.
228
229
229
230
[TOC](#TOC)
230
231
@@ -260,7 +261,7 @@ In this case, the entropy of the string **`'18f6303a'`** is 1 bit. That's it; 1
260
261
261
262
> _**Entropy is a measure in the uncertainty of an event, independent of the representation of that uncertainty**_
262
263
263
-
In information theory you would state the above process has two symbols, **`18f6303a`** and **`1`**, and the outcome is equally likely to be either symbol. Hence there is 1 bit of entropy in the process. The symbols don't really matter. It would be much more likely to see the symbols **`T`** and **`F`**, or **`0`** and **`1`**, or even **`ON`** and **`OFF`**, but regardless, the process _produces_ 1 bit of entropy and symbols used to _represent_ that entropy does not effect the entropy itself.
264
+
In information theory you would state the above process has two symbols, **`18f6303a`** and **`1`**, and the outcome is equally likely to be either symbol. Hence there is 1 bit of entropy in the process. The symbols don't really matter. It would be much more likely to see the symbols **`T`** and **`F`**, or **`0`** and **`1`**, or even **`ON`** and **`OFF`**, but regardless, the process _produces_ 1 bit of entropy and symbols used to _represent_ that entropy do not effect the entropy itself.
264
265
265
266
#### Entropy source
266
267
@@ -286,7 +287,7 @@ The goal of `puid-js` is to provide simple, intuitive random ID generation using
286
287
> _**Random strings do not produce unique IDs**_
287
288
288
289
289
-
Recall that entropy is the measure of uncertainty in the possible outcomes of an event. It is critical that the uncertainty of each event is *independent* of all prior events. This means two separate events *can* produce the same result (i.e., the same ID); otherwise the process isn't random. You could, of course, compare each generated random string with all prior IDs and thereby achieve uniqueness. But some such post-processing must occur to ensure random IDs are truly unique.
290
+
Recall that entropy is the measure of uncertainty in the possible outcomes of an event. It is critical that the uncertainty of each event is *independent* of all prior events. This means two separate events *can* produce the same result (i.e., the same ID); otherwise the process isn't random. You could, of course, compare each generated random string to all prior IDs and thereby achieve uniqueness. But some such post-processing must occur to ensure random IDs are truly unique.
290
291
291
292
Deterministic uniqueness checks, however, incur significant processing overhead and are rarely used. Instead, developers (knowingly?) relax the requirement that random IDs are truly, deterministically unique for a much lesser standard, one of probabilistic uniqueness. We "trust" that randomly generated IDs are unique by virtue of the chance of a repeated ID being very low.
292
293
@@ -308,7 +309,7 @@ And here again we hit another subtlety. It turns out the question, as posed, is
308
309
309
310
Fortunately, there is a mathematical correlation between entropy and the probability of uniqueness. This correlation is often explored via the [Birthday Paradox](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem). Why paradox? Because the relationship, when cast as a problem of unique birthdays in some number of people, is initially quite surprising. But nonetheless, the relationship exists, it is well-known, and `puid-js` will take care of the math for us.
310
311
311
-
At this point we can now note that rather than say "*these IDs have **N** bits of entropy*", we actually want to say "_generating **T** of these IDs has a risk **R** of a repeat_". And fortunately, `puid-js` allows straightforward specification of that very statement for our random IDs. Using `puid-js`, you can easily specify "*I want **T** random IDs with a risk **R** of repeat*". `puid-js` will take care of using the correct entropy in efficiently generating the IDs.
312
+
At this point we can now note that rather than say "*these IDs have **N** bits of entropy*", we actually want to say "_generating **T** of these IDs has a risk **R** of a repeat_". And fortunately, `puid-js` allows straightforward specification of that very statement for random ID generation. Using `puid-js`, you can easily specify "*I want **T** random IDs with a risk **R** of repeat*". `puid-js` will take care of using the correct entropy in efficiently generating the IDs.
312
313
313
314
[TOC](#TOC)
314
315
@@ -320,7 +321,7 @@ The efficiency of generating random IDs has no bearing on the statistical charac
320
321
321
322
As previously stated, random ID generation is basically a *transformation* of an entropy source into a character *representation* of captured entropy. But the entropy of the source and the entropy of the captured ID *is not the same thing*.
322
323
323
-
To understand the difference, we'll investigate an example that is, surprisingly, quite common. Consider the following strategy for generating random strings: using a fixed list of **k** characters, generate a random uniform integer **i**, `0 <= i < k`, as an index into the list to select a character. Repeat this **n** times, where **n** is the length of the desired string. In JavaScript this might look like:
324
+
To understand the difference, we'll investigate an example that is, surprisingly, quite common. Consider the following strategy for generating random strings: using a fixed list of **k** characters, use a random uniform integer **i**, `0 <= i < k`, as an index into the list to select a character. Repeat this **n** times, where **n** is the length of the desired string. In JavaScript this might look like:
324
325
325
326
```js
326
327
constcommonId= (n) => {
@@ -340,17 +341,17 @@ First, consider the amount of source entropy used in the code above. The JavaScr
340
341
341
342
Second, consider how much entropy was captured by the ID. Given there are 26 characters, each character represents log<sub>2</sub>(26) = 4.7 bits of entropy. So each generated ID represents 8 * 4.7 = 37.6 bits of entropy.
342
343
343
-
Hmmmm. That means the ratio of ID entropy to source entropy is 37.6 / 424 = 0.09, or a whopping **9%**. That's not an efficiency most developers would be comfortable with. Granted, this is a particularly egregious example, but most random ID generation suffers such inefficient use of source entropy.
344
+
Hmmm. That means the ratio of ID entropy to source entropy is 37.6 / 424 = 0.09, or a whopping **9%**. That's not an efficiency most developers would be comfortable with. Granted, this is a particularly egregious example, but most random ID generation suffers such inefficient use of source entropy.
344
345
345
-
Without delving into the specifics (see the code?), `puid-js` employs various means to maximize the use of source entropy. In comparison, `puid-js` uses **87.5%** of source entropy in generating random IDs using lower case alpha characters. For character sets with counts equal a power of 2, `puid-js` uses 100% of source entropy.
346
+
Without delving into the specifics (see the code?), `puid-js` employs various means to maximize the use of source entropy. As comparison, `puid-js` uses **87.5%** of source entropy in generating random IDs using lower case alpha characters. For character sets with counts equal a power of 2, `puid-js` uses 100% of source entropy.
346
347
347
348
#### Characters
348
349
349
-
As previous noted, the entropy of a random string is equal to the entropy per character times the length of the string. Using this value leads to an easy calculation of **entropy representation efficiency** (`ere`). We can define `ere` as the ratio of random string entropy to the number of bits required to represent the string. As example, the lower case alphabet has an entropy per character of 4.7, so an ID of length 8 using those characters has 37.6 bits of entropy. Since each lower case character requires 1 byte, this leads to an `ere` of 37.6 / 64 = 0.59, or 59%. Non-ascii characters, of course, occupy more than 1 byte. `puid-js` uses the `Buffer.byteLength` function to compute `ere`.
350
+
As previous noted, the entropy of a random string is equal to the entropy per character times the length of the string. Using this value leads to an easy calculation of **entropy representation efficiency** (`ere`). We can define `ere` as the ratio of random string entropy to the number of bits required to represent the string. For example, the lower case alphabet has an entropy per character of 4.7, so an ID of length 8 using those characters has 37.6 bits of entropy. Since each lower case character requires 1 byte, this leads to an `ere` of 37.6 / 64 = 0.59, or 59%. Non-ascii characters, of course, occupy more than 1 byte. `puid-js` uses the `Buffer.byteLength` function to compute `ere`.
350
351
351
352
<aname="UUIDCharacters"></a>
352
353
353
-
The total entropy of a string is the product of the entropy per character times the string length *only* if each character in the final string is equally probable. This is always the case for `puid-js`, and is usually the case for other random string generators. There is, however, a notable exception: the version 4 string representation of a `uuid`. As defined in [RFC 4122, Section 4.4](https://tools.ietf.org/html/rfc4122#section-4.4), a v4 `uuid` uses a total of 32 hex and 4 hyphen characters. Although the hex characters can represent 4 bits of entropy each, 6 bits of the hex representation in a `uuid` are actually fixed, so there is only `32*4 - 6 = 122`-bits of entropy (not 128). The 4 fixed-position hyphen characters contribute zero entropy. So a 36 character `uuid` has an `ere` of `122 / (36*8) = 0.40`, or **40%**. Compare that to, say, the default `puid-js` generator, which has slightly higher entropy (128 bits) and yet yields an `ere` of 0.75, or **75%**. Who doesn't love efficiency?
354
+
The total entropy of a string is the product of the entropy per character times the string length *only* if each character in the final string is equally probable. This is always the case for `puid-js`, and is usually the case for other random string generators. There is, however, a notable exception: the version 4 string representation of a `uuid`. As defined in [RFC 4122, Section 4.4](https://tools.ietf.org/html/rfc4122#section-4.4), a v4 `uuid` uses a total of 32 hex and 4 hyphen characters. Although the hex characters can represent 4 bits of entropy each, 6 bits of the hex representation in a `uuid` are actually fixed, so there is only `32*4 - 6 = 122`-bits of entropy (not 128). The 4 fixed-position hyphen characters contribute zero entropy. So a 36 character `uuid` has an `ere` of `122 / (36*8) = 0.40`, or **40%**. Compare that to, say, the default `puid-js` generator, which has slightly higher entropy (132 bits) and yet yields an `ere` of 0.75, or **75%**. Who doesn't love efficiency?
354
355
355
356
[TOC](#TOC)
356
357
@@ -371,7 +372,7 @@ Or why not be a bit more reasonable and think about the problem for a moment. Su
371
372
// => 'c1DVnnbI3RTr'
372
373
```
373
374
374
-
The resulting ID have 72 bits of entropy. But guess what? You don't care. What you care is having explicitly stated you expect to have 1000 IDs and your level of repeat risk is 1 in a quadrillion. And as added bonus, the IDs are only 12 characters long, not 36. Who doesn't like ease, control and efficiency?
375
+
The resulting ID have 72 bits of entropy. But guess what? You don't care. What you care is having explicitly stated you expect to have 1000 IDs and your level of repeat risk is 1 in a quadrillion. It's right there in the code. And as added bonus, the IDs are only 12 characters long, not 36. Who doesn't like ease, control and efficiency?
375
376
376
377
#### Under specify
377
378
@@ -388,7 +389,7 @@ Now, suppose you are tasked to maintain this code:
388
389
// => 'u4a4fbhhwlsikct'
389
390
```
390
391
391
-
Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a trillion. No guessing. It's right there in the code. Oh, and by the way, the IDs are 15 characters long. But who cares? It's the ID randomness that matters, not the length.
392
+
Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a trillion. No guessing. The code is explicit. Oh, and by the way, the IDs are 15 characters long. But who cares? It's the ID randomness that matters, not the length.
392
393
393
394
[TOC](#TOC)
394
395
@@ -397,7 +398,7 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
397
398
`Puid` employs a number of efficiencies for random ID generation:
398
399
399
400
- Only the number of bytes necessary to generate the next `puid` are fetched from the entropy source
400
-
- Each `puid` character is generated by slicing the minimum number of bits possible
401
+
- Each `puid` character is generated by slicing the minimum number of entropy bits possible
401
402
- Any left-over bits are carried forward and used in generating the next `puid`
402
403
- All characters are equally probable to maximize captured entropy
403
404
- Only characters that represent entropy are present in the final ID
@@ -407,7 +408,7 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
407
408
408
409
### <aname="tl;dr"></a>tl;dr
409
410
410
-
`Puid` is a simple, fast, flexible and efficient random ID generator:
411
+
`Puid` is a simple, flexible and efficient random ID generator:
411
412
412
413
-**Ease**
413
414
@@ -425,10 +426,6 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
425
426
426
427
Maximum use of system entropy
427
428
428
-
-**Secure**
429
-
430
-
Defaults to a secure source of entropy
431
-
432
429
-**Compact**
433
430
434
431
ID strings represent maximum entropy for characters used
0 commit comments