Generating Random Numbers under a Maximum

Recently I asked a question about generating random numbers.

My first approach to generating random numbers lower than 1000 was to take ten random bits (which can represent 0..1023).

If that random number was less than 1000, then that was the result. If it was not, I would repeat the process until the ten bits yielded a result less than 1000. (Reusing any of those ten bits introduces bias.) This process is known as sample-and-reject.

Stef pointed out that if the result was greater than 999, then we had log2(24) bits of unused entropy.

This was an awkward amount of entropy to reuse, so I started to think if I could avoid taking it in the first place….

The algorithm below is an entropy efficient way of generating unbiased random numbers, less than a maximum value “max”, from a stream of random bits.

It is more entropy-efficient than sample-and-reject if and only if “max” has a factor which is a positive power of two.

It makes use of the fact that if the maximum has a factor which is a positive power of two, then we can decide whether a number will be too big by taking a number of random bits which are fewer than the length of the maximum (in binary). The difference in the numbers of bits is the exponent in the maximum’s power-of-two factor.

Example:

If max is 1000, it has a factor of two-to-the-power-of-three (8). 1000 is a ten bit number. But we only need to take seven bits to decide if a number 0..1023 is less than 1000.

        two_power == 3    log2(8)
             bits == 10   log2(1024)
 significant_bits == 7    10 - 3

We first generate 7 bits and shift them left three places, to give us:

                r == ..._000000??_?????000

We compare this with 1000:

              max == ..._00000011_11101000

So long as r >= max, we have to try again with another 7 bits.

Once r < max, we fill in the final three bits of precision:

                r == ..._000000??_????????

Entropy Efficient Algorithm for Generating Random Numbers under a Maximum

const BITS_IN_U64: u32 = 64;

fn random_under_max(max: u64) -> u64 {
    let mut rng = rand::thread_rng();

    // How many bits do we need?
    let two_power = max.trailing_zeros();
    let bits = BITS_IN_U64 - max.leading_zeros();
    let significant_bits = bits - two_power;

    let mut r = u64::MAX;

    while r >= max {    
        // First collect the minimal amount of random bits which
        // will let us decide whether the random number is going to be less
        // than max.
        r = 0;
        for _ in 0..significant_bits {
            // Generate one random bit
            let b: bool = rng.gen();

            // and shift it into our imprecise random number.
            r = r << 1 | b as u64;
        }

        // Shift the inexact random number into the correct position.
        r = r << two_power;
    }

    // Now fill-in the precise value of this random number.
    r = r >> two_power; // Scrub out the insignificant zeros.
    for _ in 0..two_power {
        // Generate one random bit
        let b: bool = rng.gen();

        // and shift it into our random number.
        r = r << 1 | b as u64;
    }

    // Return the generated number.
    r
}

Installing Python 2.7 on Debian 12 (Bookworm)

There is no Python 2 or Python 2.7 package in Debian 12 (Bookworm).

You can successfully install it from the Debian 11 (Bullseye) packages, but it has several dependecies which are also not provided by Debian 12 (Bookworm).

Note: I have been warned that this will create a Frankendebian. It is unlikely that Debian will release clashing Debian 12 (Bookworm) versions of libffi7, libssl1.1 or the python2.7 packages. Follow these instructions at your own risk and remove the packages as soon as you have finished using python2.7 and before upgrading to Debian 13.

Download these packages:

Install them with sudo dpkg -i <filenames>.

fadedbee@box:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

fadedbee@box:~$ python2.7
bash: python2.7: command not found

fadedbee@box:~$ cd ~/Downloads/

fadedbee@box:~/Downloads$ ls *.deb
code_1.85.0-1701902998_amd64.deb                 
libssl1.1_1.1.1w-0+deb11u1_amd64.deb
libffi7_3.3-6_amd64.deb                          
python2.7_2.7.18-8+deb11u1_amd64.deb
libpython2.7-minimal_2.7.18-8+deb11u1_amd64.deb  
python2.7-minimal_2.7.18-8+deb11u1_amd64.deb
libpython2.7-stdlib_2.7.18-8+deb11u1_amd64.deb

fadedbee@box:~/Downloads$ sudo dpkg -i libffi7_3.3-6_amd64.deb libssl1.1_1.1.1w-0+deb11u1_amd64.deb libpython2.7-minimal_2.7.18-8+deb11u1_amd64.deb python2.7-minimal_2.7.18-8+deb11u1_amd64.deb libpython2.7-stdlib_2.7.18-8+deb11u1_amd64.deb python2.7_2.7.18-8+deb11u1_amd64.deb 
[sudo] password for fadedbee: 
Selecting previously unselected package libffi7:amd64.
(Reading database ... 241776 files and directories currently installed.)
Preparing to unpack libffi7_3.3-6_amd64.deb ...
Unpacking libffi7:amd64 (3.3-6) ...
Selecting previously unselected package libssl1.1:amd64.
Preparing to unpack libssl1.1_1.1.1w-0+deb11u1_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1w-0+deb11u1) ...
Selecting previously unselected package libpython2.7-minimal:amd64.
Preparing to unpack libpython2.7-minimal_2.7.18-8+deb11u1_amd64.deb ...
Unpacking libpython2.7-minimal:amd64 (2.7.18-8+deb11u1) ...
Selecting previously unselected package python2.7-minimal.
Preparing to unpack python2.7-minimal_2.7.18-8+deb11u1_amd64.deb ...
Unpacking python2.7-minimal (2.7.18-8+deb11u1) ...
Selecting previously unselected package libpython2.7-stdlib:amd64.
Preparing to unpack libpython2.7-stdlib_2.7.18-8+deb11u1_amd64.deb ...
Unpacking libpython2.7-stdlib:amd64 (2.7.18-8+deb11u1) ...
Selecting previously unselected package python2.7.
Preparing to unpack python2.7_2.7.18-8+deb11u1_amd64.deb ...
Unpacking python2.7 (2.7.18-8+deb11u1) ...
Setting up libffi7:amd64 (3.3-6) ...
Setting up libssl1.1:amd64 (1.1.1w-0+deb11u1) ...
Setting up libpython2.7-minimal:amd64 (2.7.18-8+deb11u1) ...
Setting up python2.7-minimal (2.7.18-8+deb11u1) ...
Linking and byte-compiling packages for runtime python2.7...
Setting up libpython2.7-stdlib:amd64 (2.7.18-8+deb11u1) ...
Setting up python2.7 (2.7.18-8+deb11u1) ...
Processing triggers for libc-bin (2.36-9+deb12u3) ...
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for gnome-menus (3.36.0-1.1) ...
Processing triggers for desktop-file-utils (0.26-1) ...
Processing triggers for mailcap (3.70+nmu1) ...

fadedbee@box:~/Downloads$ python2.7
Python 2.7.18 (default, Sep 19 2023, 07:10:59) 
[GCC 10.2.1 20210110] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

CSS Colours and Fenceposts

In CSS, RGB colours can be written as #rgb or #rrggbb, where r, g and b are hex digits.

The digits in the #rgb format map values of 0-15 to the range 0-255 in such a way that 0 maps to 0 and 15 maps to 255.

This is a fencepost system, where each of the smaller values is a post.

It happens that 16 posts and 15 rails can fill the interval 0-255, with each rail having a length of 16.

0   1   2   3         12  13  14  15
|---|---|---|-... ...-|---|---|---|
0   17  34  51        209 226 243 255

Any number in the small ranges maps to one 17 times larger in the large range. The smallest and largest numbers map to the smallest and largest in the large range and all are equally spaced.

In hex, the relation is trivial:

0   1   2   3         C   D   E   F
|---|---|---|-... ...-|---|---|---|
0   11h 22h 33h       CC  DD  EE  FF

Cause

First I thought that this worked because both 16 and 256 are powers of two.

This was quickly disproved as there’s no post-and-rail way of mapping 16 onto ranges of 32, 64 or 128 values. (4,096, 65,536 and higher are possible.)

After some thought, it can be seen that any small range can neatly map onto any larger range whose size is an integer power of the size of the small range.

In CSS colours, these are 16 and 16-squared (256).

Other examples

5 and 5-squared (25):

0   1   2   3   4
|---|---|---|---|
0   6   12  18  24

3 and 3-cubed (27):

0   1   2
|---|---|
0   13  26

Size of the multiplier

If the size of the small range is S, and the larger range is S**N, then the multiplier to map between the ranges is:

 N
___
\   S**n
/__
n=0

This sum corresponds exactly to repeating digits in base S.

e.g. The 3-cubed example can be written in base-3 as:

0   1   2
|---|---|
0   111 222

Auto-completion

I’ve recently started a little project called ‘completion’. It’s purpose is to allow auto-completion of user input, according to a grammar. It’s readily understandable, if you already use Pyparsing. I’d like to merge this functionality into Pyparsing at some point.

Here’s the alpha code, with explanatory doctests:

#!/usr/bin/python
"""
Completion is a module for performing autocompletion of text according to a 
programatically expressed grammar.
 
If you don't get any of that ;-) have a look at the doctests below...
 
>>> foo = Literal('foo')
>>> print foo.parse('f')
ParseResult(False, input=None, completions=['oo'], message=None)
 
>>> foo = Literal('foo')
>>> print foo.parse('ba')
ParseResult(False, input=None, completions=[], message=None)
 
>>> foobar = Literal('foobar')
>>> food = Literal('food')
>>> foobar_or_food = Or(foobar, food)
>>> print foobar_or_food.parse('f')
ParseResult(False, input=None, completions=['oobar', 'ood'], message=None)
>>> print foobar_or_food.parse('g')
ParseResult(False, input=None, completions=[], message=None)
>>> print foobar_or_food.parse('food')
ParseResult(True, input=, completions=[], message=None)
>>> print foobar_or_food.parse('foo')
ParseResult(False, input=None, completions=['bar', 'd'], message=None)
 
>>> foobar_then_food = Then(foobar, WhiteSpace(), food)
>>> print foobar_then_food.parse('fo')
ParseResult(False, input=None, completions=['obar'], message=None)
>>> print foobar_then_food.parse('')
ParseResult(False, input=None, completions=['foobar'], message=None)
>>> print foobar_then_food.parse('foobar f')
ParseResult(False, input=None, completions=['ood'], message=None)
>>> print foobar_then_food.parse('foobar')
ParseResult(False, input=, completions=[' ', '\\t'], message=None)
 
>>> optional_foo = Occurances(foo, min=0, max=1)
>>> optional_foo.parse("")
ParseResult(True, input=None, completions=[], message=None)
>>> optional_foo.parse("fo")
ParseResult(True, input=None, completions=[], message=None)
 
>>> three_or_four_foos = Occurances(foo, min=3, max=4)
>>> three_or_four_foos.parse("")
ParseResult(False, input=None, completions=['foo'], message=None)
>>> three_or_four_foos.parse("foof")
ParseResult(False, input=None, completions=['oo'], message=None)
 
>>> name = Alphas()
>>> name.parse("")
ParseResult(False, input=, completions=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], message=None)
>>> name.parse("abc")
ParseResult(True, input=, completions=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'], message=None)
 
>>> bar = Literal('bar')
>>> complex = Then(name, WhiteSpace(), bar)
>>> complex.parse('f')
ParseResult(False, input=, completions=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '\\t'], message=None)
>>> complex = Or(Or(foo, food, foobar), Then(name, WhiteSpace(), bar))
>>> complex.parse('f')
ParseResult(False, input=None, completions=['oo', 'ood', 'oobar', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ', '\\t'], message=None)
 
"""
 
import sys
 
class ParseResult:
    def __init__(self, success, input=None, completions=[], message=None):
        self.success = success
        self.input = input
        self.completions = completions
        self.message = message
 
    def __repr__(self):
        return "ParseResult(%s, input=%s, completions=%s, message=%s)" % (
            str(self.success),
            str(self.input),
            str(self.completions),
            str(self.message),
        )
 
class Parser:
    def parse(self, input):
        pass
 
class Literal(Parser):
    def __init__(self, text):
        self.text = text
 
    def parse(self, input):
        if input.startswith(self.text):
            return ParseResult(True, input[len(self.text):])
        # if the input isn't at the beginning of self.text, then this 
        # cannot be autocompleted
        if not self.text.startswith(input):
            return ParseResult(False)
        return ParseResult(False, completions=[self.text[len(input):]])
 
class Occurances(Parser):
    def __init__(self, parser, min=0, max=1):
        self.parser = parser
        if min < 0:
            raise Exception("Min must not be negative.")
        if max < min:
            raise Exception("Max must not be less than min.")
        if max < 1:
            raise Exception("Max must be more than zero.")
        self.min = min
        self.max = max
 
    def parse(self, input):
        # see how many times parser can be applied, up to max
        for i in range(self.max):
            parse_result = self.parser.parse(input)
            input = parse_result.input
            if not parse_result.success:
                break
        #print i, self.min
        if i < self.min and self.min > 0:
            return parse_result
        return ParseResult(True, input)
 
 
class Chars(Parser):
    def __init__(self, chars):
        self.chars = chars
 
    def parse(self, input):
        if len(input) == 0:
            return ParseResult(False, input, list(self.chars))
        if input[0] not in self.chars:
            return ParseResult(False, input)
        while len(input) > 0 and input[0] in self.chars:
            input = input[1:]
        if len(input) == 0:
            return ParseResult(True, input, list(self.chars))
        return ParseResult(True, input)
 
class WhiteSpace(Chars):
    def __init__(self):
        self.chars = ' \t'
 
class Alphas(Chars):
    def __init__(self):
        self.chars = 'abcdefghijklmnopqrstuvwxyz'
 
class Alphanums(Chars):
    def __init__(self):
        self.chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
 
class Then(Parser):
    def __init__(self, *parsers):
        self.parsers = parsers
 
    def parse(self, input):
        # try all the parsers in order, until one fails
        completions = []
        for parser in self.parsers:
            parse_result = parser.parse(input)
            completions += parse_result.completions
            if not parse_result.success:
                # add the completions for successful parsings which can take more
                parse_result.completions = completions
                return parse_result
            input = parse_result.input
        return ParseResult(True, input)
 
class Or(Parser):
    def __init__(self, *parsers):
        self.parsers = parsers
 
    def parse(self, input):
        # try all the parsers and see whether any work
        parse_results = map(lambda x: x.parse(input), self.parsers)
 
        # were any successful?
        successful_result = None
        completions = []
        for parse_result in parse_results:
            if parse_result.success and not successful_result:
                # first matched
                successful_result = parse_result
            completions += parse_result.completions
 
        if successful_result:
            return successful_result
 
        # otherwise return all the completions
        return ParseResult(False, None, completions) 
 
 
 
def _test():
    """Test with doctest."""
    import doctest
    doctest.testmod()
 
 
if __name__ == "__main__":
    _test()

Patch creation hints.

I have a mental block on how to create and apply a patch. I’m forever needing to go back to the man pages, which is frustrating.

Here are the commands which I use:

    diff -ruNx .svn olddir newdir > patch
    cd targetdir ; patch -p1 < ../patch

I run the diff from the comment parent directory of the two trees I wish to diff. The .svn parameter is the name of subdirectories I do not want included in the diff. (I use subversion, if you hadn’t guessed.) The -p1, on the patch command, causes the first component of the path (olddir/newdir) to be ignored.