Integer representation of K-mers

This package relies on representing K-mers as integers for indexing.

For DNA, each non-ambiguous nucleotide is assigned a number between 0 and 3:

NucleotideBase-4Base-2
A000
C101
G210
T311

Any ordering works, but this is the one used by BioSequences.jl. It also has some nice properties, like being in alphabetical order, and that XOR-ing a base with 3 gives you its complement.

We could theoretically convert any DNA sequence to an integer, but 64-bit unsigned integers limit us to 32-mers.

Consider the DNA sequence GATTACA. If we convert it to an integer using the table above, we get $2033010_4 = 10001111000100_2 = 9156_{10}$, so the integer value of GATTACA is 9156. Since Julia uses 1-based indexing, we would add 1 to this value to get the index for the value in a vector associated with GATTACA.