Skip to content

Commit ed66f7d

Browse files
committed
README updates
1 parent ddc54df commit ed66f7d

File tree

2 files changed

+25
-28
lines changed

2 files changed

+25
-28
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
### Added
1414
- Chars.Symbol and encoder
15-
- Encoders for Chars.AlphanumLower and Chars.ALphanumUpper
15+
- Encoders for Chars.AlphanumLower and Chars.AlphanumUpper
1616

1717
### Testing
1818
- Test for above fixes, changes and additions

README.md

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Creating a random ID generator using `puid-js` is a simple as:
6868
`puid-js` uses `crypto.randomBytes` as the default entropy source. Options can be used to configure a specific entropy source:
6969

7070
- `entropyBytes`: any function of the form `(n: number): Uint8Array`, such as `crypto.randomBytes`
71-
- `entropyValues` any function of the form `(buf: Uint8Array): void`, such as `crypto.getRandomValues`
71+
- `entropyValues`: any function of the form `(buf: Uint8Array): void`, such as `crypto.getRandomValues`
7272

7373
These options make it easy to use `puid-js` in either a `nodejs` or web environment. It also allows for using any suitable third-party entropy source. The project includes an experimental PRNG random bytes generator, `prngBytes`, and the test code uses the `entropyBytes` option for deterministic testing.
7474

@@ -163,14 +163,15 @@ The optional `PuidConfig` object has the following fields:
163163
- `entropyBytes` is the form of the function `crypto.randomBytes`
164164
- `entropyValues` is the form of the function `crypto.getRandomValues`
165165
- Only one of `entropyBytes` or `entropyValues` can be set
166-
- Defaults
167-
- `bits`: 128
168-
- `chars`: `Chars.Safe64`
169-
- `entropyBytes`: `crypto.randomBytes`
166+
167+
##### Defaults
168+
- `bits`: 128
169+
- `chars`: `Chars.Safe64`
170+
- `entropyBytes`: `crypto.randomBytes`
170171

171172
#### PuidInfo
172173

173-
The HOF `puid` includes an `info` field that displays generator configuration:
174+
The `puid` generator function includes an `info` field that displays generator configuration:
174175

175176
- `bits`: ID entropy
176177
- `bitsPerChar`: Entropy bits per ID character
@@ -210,10 +211,10 @@ There are 16 pre-defined character sets:
210211
| Alpha | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz |
211212
| AlphaLower | abcdefghijklmnopqrstuvwxyz |
212213
| AlphaUupper | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
213-
| Alphanum | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 |
214-
| AalphanumLower | abcdefghijklmnopqrstuvwxyz0123456789 |
215-
| AalphanumUpper | ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 |
216-
| Base32 | ABCDEFGHIJKLMNOPQRSTUVWXYZ234567 |
214+
| Alphanum | 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz |
215+
| AalphanumLower | 0123456789abcdefghijklmnopqrstuvwxyz |
216+
| AalphanumUpper | 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ |
217+
| Base32 | 234567ABCDEFGHIJKLMNOPQRSTUVWXYZ |
217218
| Base32Hex | 0123456789abcdefghijklmnopqrstuv |
218219
| base32HexUpper | 0123456789ABCDEFGHIJKLMNOPQRSTUV |
219220
| Decimal | 0123456789 |
@@ -224,7 +225,7 @@ There are 16 pre-defined character sets:
224225
| Safe64 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_ |
225226
| Symbol | !#$%&()*+,-./:;<=>?@[]^_{\|}~ |
226227

227-
Any string of up to 256 unique characters can be used for **`puid`** generation.
228+
Any string of up to 256 unique characters, including unicode, can be used for **`puid`** generation.
228229

229230
[TOC](#TOC)
230231

@@ -260,7 +261,7 @@ In this case, the entropy of the string **`'18f6303a'`** is 1 bit. That's it; 1
260261

261262
> _**Entropy is a measure in the uncertainty of an event, independent of the representation of that uncertainty**_
262263
263-
In information theory you would state the above process has two symbols, **`18f6303a`** and **`1`**, and the outcome is equally likely to be either symbol. Hence there is 1 bit of entropy in the process. The symbols don't really matter. It would be much more likely to see the symbols **`T`** and **`F`**, or **`0`** and **`1`**, or even **`ON`** and **`OFF`**, but regardless, the process _produces_ 1 bit of entropy and symbols used to _represent_ that entropy does not effect the entropy itself.
264+
In information theory you would state the above process has two symbols, **`18f6303a`** and **`1`**, and the outcome is equally likely to be either symbol. Hence there is 1 bit of entropy in the process. The symbols don't really matter. It would be much more likely to see the symbols **`T`** and **`F`**, or **`0`** and **`1`**, or even **`ON`** and **`OFF`**, but regardless, the process _produces_ 1 bit of entropy and symbols used to _represent_ that entropy do not effect the entropy itself.
264265

265266
#### Entropy source
266267

@@ -286,7 +287,7 @@ The goal of `puid-js` is to provide simple, intuitive random ID generation using
286287
> _**Random strings do not produce unique IDs**_
287288
288289

289-
Recall that entropy is the measure of uncertainty in the possible outcomes of an event. It is critical that the uncertainty of each event is *independent* of all prior events. This means two separate events *can* produce the same result (i.e., the same ID); otherwise the process isn't random. You could, of course, compare each generated random string with all prior IDs and thereby achieve uniqueness. But some such post-processing must occur to ensure random IDs are truly unique.
290+
Recall that entropy is the measure of uncertainty in the possible outcomes of an event. It is critical that the uncertainty of each event is *independent* of all prior events. This means two separate events *can* produce the same result (i.e., the same ID); otherwise the process isn't random. You could, of course, compare each generated random string to all prior IDs and thereby achieve uniqueness. But some such post-processing must occur to ensure random IDs are truly unique.
290291

291292
Deterministic uniqueness checks, however, incur significant processing overhead and are rarely used. Instead, developers (knowingly?) relax the requirement that random IDs are truly, deterministically unique for a much lesser standard, one of probabilistic uniqueness. We "trust" that randomly generated IDs are unique by virtue of the chance of a repeated ID being very low.
292293

@@ -308,7 +309,7 @@ And here again we hit another subtlety. It turns out the question, as posed, is
308309

309310
Fortunately, there is a mathematical correlation between entropy and the probability of uniqueness. This correlation is often explored via the [Birthday Paradox](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem). Why paradox? Because the relationship, when cast as a problem of unique birthdays in some number of people, is initially quite surprising. But nonetheless, the relationship exists, it is well-known, and `puid-js` will take care of the math for us.
310311

311-
At this point we can now note that rather than say "*these IDs have **N** bits of entropy*", we actually want to say "_generating **T** of these IDs has a risk **R** of a repeat_". And fortunately, `puid-js` allows straightforward specification of that very statement for our random IDs. Using `puid-js`, you can easily specify "*I want **T** random IDs with a risk **R** of repeat*". `puid-js` will take care of using the correct entropy in efficiently generating the IDs.
312+
At this point we can now note that rather than say "*these IDs have **N** bits of entropy*", we actually want to say "_generating **T** of these IDs has a risk **R** of a repeat_". And fortunately, `puid-js` allows straightforward specification of that very statement for random ID generation. Using `puid-js`, you can easily specify "*I want **T** random IDs with a risk **R** of repeat*". `puid-js` will take care of using the correct entropy in efficiently generating the IDs.
312313

313314
[TOC](#TOC)
314315

@@ -320,7 +321,7 @@ The efficiency of generating random IDs has no bearing on the statistical charac
320321

321322
As previously stated, random ID generation is basically a *transformation* of an entropy source into a character *representation* of captured entropy. But the entropy of the source and the entropy of the captured ID *is not the same thing*.
322323

323-
To understand the difference, we'll investigate an example that is, surprisingly, quite common. Consider the following strategy for generating random strings: using a fixed list of **k** characters, generate a random uniform integer **i**, `0 <= i < k`, as an index into the list to select a character. Repeat this **n** times, where **n** is the length of the desired string. In JavaScript this might look like:
324+
To understand the difference, we'll investigate an example that is, surprisingly, quite common. Consider the following strategy for generating random strings: using a fixed list of **k** characters, use a random uniform integer **i**, `0 <= i < k`, as an index into the list to select a character. Repeat this **n** times, where **n** is the length of the desired string. In JavaScript this might look like:
324325

325326
```js
326327
const commonId = (n) => {
@@ -340,17 +341,17 @@ First, consider the amount of source entropy used in the code above. The JavaScr
340341

341342
Second, consider how much entropy was captured by the ID. Given there are 26 characters, each character represents log<sub>2</sub>(26) = 4.7 bits of entropy. So each generated ID represents 8 * 4.7 = 37.6 bits of entropy.
342343

343-
Hmmmm. That means the ratio of ID entropy to source entropy is 37.6 / 424 = 0.09, or a whopping **9%**. That's not an efficiency most developers would be comfortable with. Granted, this is a particularly egregious example, but most random ID generation suffers such inefficient use of source entropy.
344+
Hmmm. That means the ratio of ID entropy to source entropy is 37.6 / 424 = 0.09, or a whopping **9%**. That's not an efficiency most developers would be comfortable with. Granted, this is a particularly egregious example, but most random ID generation suffers such inefficient use of source entropy.
344345

345-
Without delving into the specifics (see the code?), `puid-js` employs various means to maximize the use of source entropy. In comparison, `puid-js` uses **87.5%** of source entropy in generating random IDs using lower case alpha characters. For character sets with counts equal a power of 2, `puid-js` uses 100% of source entropy.
346+
Without delving into the specifics (see the code?), `puid-js` employs various means to maximize the use of source entropy. As comparison, `puid-js` uses **87.5%** of source entropy in generating random IDs using lower case alpha characters. For character sets with counts equal a power of 2, `puid-js` uses 100% of source entropy.
346347

347348
#### Characters
348349

349-
As previous noted, the entropy of a random string is equal to the entropy per character times the length of the string. Using this value leads to an easy calculation of **entropy representation efficiency** (`ere`). We can define `ere` as the ratio of random string entropy to the number of bits required to represent the string. As example, the lower case alphabet has an entropy per character of 4.7, so an ID of length 8 using those characters has 37.6 bits of entropy. Since each lower case character requires 1 byte, this leads to an `ere` of 37.6 / 64 = 0.59, or 59%. Non-ascii characters, of course, occupy more than 1 byte. `puid-js` uses the `Buffer.byteLength` function to compute `ere`.
350+
As previous noted, the entropy of a random string is equal to the entropy per character times the length of the string. Using this value leads to an easy calculation of **entropy representation efficiency** (`ere`). We can define `ere` as the ratio of random string entropy to the number of bits required to represent the string. For example, the lower case alphabet has an entropy per character of 4.7, so an ID of length 8 using those characters has 37.6 bits of entropy. Since each lower case character requires 1 byte, this leads to an `ere` of 37.6 / 64 = 0.59, or 59%. Non-ascii characters, of course, occupy more than 1 byte. `puid-js` uses the `Buffer.byteLength` function to compute `ere`.
350351

351352
<a name="UUIDCharacters"></a>
352353

353-
The total entropy of a string is the product of the entropy per character times the string length *only* if each character in the final string is equally probable. This is always the case for `puid-js`, and is usually the case for other random string generators. There is, however, a notable exception: the version 4 string representation of a `uuid`. As defined in [RFC 4122, Section 4.4](https://tools.ietf.org/html/rfc4122#section-4.4), a v4 `uuid` uses a total of 32 hex and 4 hyphen characters. Although the hex characters can represent 4 bits of entropy each, 6 bits of the hex representation in a `uuid` are actually fixed, so there is only `32*4 - 6 = 122`-bits of entropy (not 128). The 4 fixed-position hyphen characters contribute zero entropy. So a 36 character `uuid` has an `ere` of `122 / (36*8) = 0.40`, or **40%**. Compare that to, say, the default `puid-js` generator, which has slightly higher entropy (128 bits) and yet yields an `ere` of 0.75, or **75%**. Who doesn't love efficiency?
354+
The total entropy of a string is the product of the entropy per character times the string length *only* if each character in the final string is equally probable. This is always the case for `puid-js`, and is usually the case for other random string generators. There is, however, a notable exception: the version 4 string representation of a `uuid`. As defined in [RFC 4122, Section 4.4](https://tools.ietf.org/html/rfc4122#section-4.4), a v4 `uuid` uses a total of 32 hex and 4 hyphen characters. Although the hex characters can represent 4 bits of entropy each, 6 bits of the hex representation in a `uuid` are actually fixed, so there is only `32*4 - 6 = 122`-bits of entropy (not 128). The 4 fixed-position hyphen characters contribute zero entropy. So a 36 character `uuid` has an `ere` of `122 / (36*8) = 0.40`, or **40%**. Compare that to, say, the default `puid-js` generator, which has slightly higher entropy (132 bits) and yet yields an `ere` of 0.75, or **75%**. Who doesn't love efficiency?
354355

355356
[TOC](#TOC)
356357

@@ -371,7 +372,7 @@ Or why not be a bit more reasonable and think about the problem for a moment. Su
371372
// => 'c1DVnnbI3RTr'
372373
```
373374

374-
The resulting ID have 72 bits of entropy. But guess what? You don't care. What you care is having explicitly stated you expect to have 1000 IDs and your level of repeat risk is 1 in a quadrillion. And as added bonus, the IDs are only 12 characters long, not 36. Who doesn't like ease, control and efficiency?
375+
The resulting ID have 72 bits of entropy. But guess what? You don't care. What you care is having explicitly stated you expect to have 1000 IDs and your level of repeat risk is 1 in a quadrillion. It's right there in the code. And as added bonus, the IDs are only 12 characters long, not 36. Who doesn't like ease, control and efficiency?
375376

376377
#### Under specify
377378

@@ -388,7 +389,7 @@ Now, suppose you are tasked to maintain this code:
388389
// => 'u4a4fbhhwlsikct'
389390
```
390391

391-
Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a trillion. No guessing. It's right there in the code. Oh, and by the way, the IDs are 15 characters long. But who cares? It's the ID randomness that matters, not the length.
392+
Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a trillion. No guessing. The code is explicit. Oh, and by the way, the IDs are 15 characters long. But who cares? It's the ID randomness that matters, not the length.
392393

393394
[TOC](#TOC)
394395

@@ -397,7 +398,7 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
397398
`Puid` employs a number of efficiencies for random ID generation:
398399

399400
- Only the number of bytes necessary to generate the next `puid` are fetched from the entropy source
400-
- Each `puid` character is generated by slicing the minimum number of bits possible
401+
- Each `puid` character is generated by slicing the minimum number of entropy bits possible
401402
- Any left-over bits are carried forward and used in generating the next `puid`
402403
- All characters are equally probable to maximize captured entropy
403404
- Only characters that represent entropy are present in the final ID
@@ -407,7 +408,7 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
407408

408409
### <a name="tl;dr"></a>tl;dr
409410

410-
`Puid` is a simple, fast, flexible and efficient random ID generator:
411+
`Puid` is a simple, flexible and efficient random ID generator:
411412

412413
- **Ease**
413414

@@ -425,10 +426,6 @@ Hmmm. Looks like there are 500,000 IDs expected and the repeat risk is 1 in a tr
425426

426427
Maximum use of system entropy
427428

428-
- **Secure**
429-
430-
Defaults to a secure source of entropy
431-
432429
- **Compact**
433430

434431
ID strings represent maximum entropy for characters used

0 commit comments

Comments
 (0)