gf-complete/explanation.html

<h3>Code structure as of 7/20/2012</h3>

written by Jim.
<p>
Ok -- once again, I have messed with the structure.  My goal is flexible and efficient.
It's similar to the stuff before, but better because it makes things like Euclid's
method much cleaner.
<p>
I think we're ready to hack.
<p>
<p>
<hr>
<h3>Files</h3>
<UL>
<LI> <a href=GNUmakefile><b>GNUmakefile</b></a>: Makefile
<LI> <a href=README><b>README</b></a>: Empty readme
<LI> <a href=explanation.html><b>explanation.html</b></a>: This file.
<LI> <a href=gf.c><b>gf.c</b></a>: Main gf routines
<LI> <a href=gf.h><b>gf.h</b></a>: Main gf prototypes and typedefs
<LI> <a href=gf_int.h><b>gf_int.h</b></a>: Prototypes and typedefs for common routines for the
    internal gf implementations.
<LI> <a href=gf_method.c><b>gf_method.c</b></a>: Code to help parse argc/argv to define the method.
    This way, various programs can be consistent with how they handle the command line.
<LI> <a href=gf_method.h><b>gf_method.h</b></a>: Prototypes for ibid.
<LI> <a href=gf_methods.c><b>gf_methods.c</b></a>: This program prints out how to define
    the various methods on the command line.  My idea is to beef this up so that you can
    give it a method spec on the command line, and it will tell you whether it's valid, or
    why it's invalid.  I haven't written that part yet.
<LI> <a href=gf_mult.c><b>gf_mult.c</b></a>: Program to do single multiplication.
<LI> <a href=gf_mult.c><b>gf_mult.c</b></a>: Program to do single divisions -- it's created
    in the makefile with a sed script on gf_mult.c.
<LI> <a href=gf_time.c><b>gf_time.c</b></a>: Time tester
<LI> <a href=gf_unit.c><b>gf_unit.c</b></a>: Unit tester
<LI> <a href=gf_54.c><b>gf_54.c</b></a>: A simple example program that multiplies
    5 and 4 in GF(2^4).
<LI> <a href=gf_w4.c><b>gf_w4.c</b></a>: Implementation of code for <i>w</i> = 4.
(For now, only SHIFT and LOG, plus EUCLID & MATRIX).
<LI> <a href=gf_w8.c><b>gf_w8.c</b></a>: Implementation of code for <i>w</i> = 8.
(For now, only SHIFT plus EUCLID & MATRIX).
<LI> <a href=gf_w16.c><b>gf_w16.c</b></a>: Implementation of code for <i>w</i> = 16.
(For now, only SHIFT plus EUCLID & MATRIX).
<LI> <a href=gf_w32.c><b>gf_w32.c</b></a>: Implementation of code for <i>w</i> = 32.
(For now, only SHIFT plus EUCLID & MATRIX).
<LI> <a href=gf_w64.c><b>gf_w64.c</b></a>: Implementation of code for <i>w</i> = 64.
(For now, only SHIFT and EUCLID.
<LI> I don't have gf_w128.c or gf_gen.c yet.
</UL>

<hr>
<h3>Prototypes and typedefs in gf.h</h3>

The main structure that users will see is in <b>gf.h</b>, and it is of type
<b>gf_t</b>:

<p><center><table border=3 cellpadding=3><td><pre>
typedef struct gf {
  gf_func_a_b    multiply;
  gf_func_a_b    divide;
  gf_func_a      inverse;
  gf_region      multiply_region;
  void           *scratch;
} gf_t;
</pre></td></table></center><p>

We can beef it up later with buf-buf or buf-acc.  The problem is that the paper is
already bloated, so right now, I want to keep it lean.
<p>
The types of the procedures are big unions, so that they work with the following
types of arguments:

<p><center><table border=3 cellpadding=3><td><pre>
typedef uint8_t     gf_val_4_t;
typedef uint8_t     gf_val_8_t;
typedef uint16_t    gf_val_16_t;
typedef uint32_t    gf_val_32_t;
typedef uint64_t    gf_val_64_t;
typedef uint64_t    *gf_val_128_t;
typedef uint32_t    gf_val_gen_t;   /* The intent here is for general values <= 32 */
</pre></td></table></center><p>

To use one of these, you need to create one with <b>gf_init_easy()</b> or
<b>gf_init_hard()</b>.  Let's concentrate on the former:

<p><center><table border=3 cellpadding=3><td><pre>
extern int gf_init_easy(gf_t *gf, int w, int mult_type);
</pre></td></table></center><p>

You pass it memory for a <b>gf_t</b>, a value of <b>w</b> and
a variable that says how to do multiplication.  The valid values of <b>mult_type</b>
are enumerated in <b>gf.h</b>:

<p><center><table border=3 cellpadding=3><td><pre>
typedef enum {GF_MULT_DEFAULT,
              GF_MULT_SHIFT,
              GF_MULT_GROUP,
              GF_MULT_BYTWO_p,
              GF_MULT_BYTWO_b,
              GF_MULT_TABLE,
              GF_MULT_LOG_TABLE,
              GF_MULT_SPLIT_TABLE,
              GF_MULT_COMPOSITE } gf_mult_type_t;
</pre></td></table></center><p>

After creating the <b>gf_t</b>, you use its <b>multiply</b> method
to multiply, using the union's fields to work with the various types.
It looks easier than my explanation.  For example, suppose you wanted to multiply 5 and 4 in <i>GF(2<sup>4</sup>)</i>.
You can do it as in
<b><a href=gf_54.c>gf_54.c</a></b>

<p><center><table border=3 cellpadding=3><td><pre>
#include "gf.h"

main()
{
  gf_t gf;

  gf_init_easy(&gf, 4, GF_MULT_DEFAULT);
  printf("%d\n", gf.multiply.w4(&gf, 5, 4));
  exit(0);
}
</pre></td></table></center><p>


If you wanted to multiply in <i>GF(2<sup>8</sup>)</i>, then you'd have to use 8 as a parameter
to <b>gf_init_easy</b>, and call the multiplier as <b>gf.mult.w8()</b>.
<p>
When you're done with your <b>gf_t</b>, you should call <b>gf_free()</b> on it so
that it can free memory that it has allocated.  We'll talk more about memory later, but if you
create your <b>gf_t</b> with <b>gf_init_easy</b>, then it calls <b>malloc()</b>, and
if you care about freeing memory, you'll have to call <b>gf_free()</b>.
<p>

<hr>
<h3>Memory allocation</h3>

Each implementation of a multiplication technique keeps around its
own data.  For example, <b>GF_MULT_TABLE</b> keeps around
multiplication and division tables, and <b>GF_MULT_LOG</b> maintains log and
antilog tables.  This data is stored in the pointer <b>scratch</b>.  My intent
is that the memory that is there is all that's required.  In other
words, the <b>multiply()</b>, <b>divide()</b>, <b>inverse()</b> and
<b>multiply_region()</b> calls don't do any memory allocation.
Moreover, <b>gf_init_easy()</b> only allocates one chunk of memory --
the one in <b>scratch</b>.
<p>
If you don't want to have the initialization call allocate memory, you can use <b>gf_init_hard()</b>:

<p><center><table border=3 cellpadding=3><td><pre>
extern int gf_init_hard(gf_t *gf,
                        int w,
                        int mult_type,
                        int region_type,
                        int divide_type,
                        uint64_t prim_poly,
                        int arg1,
                        int arg2,
                        gf_t *base_gf,
                        void *scratch_memory);
</pre></td></table></center><p>

The first three parameters are the same as <b>gf_init_easy()</b>.
You can add additional arguments for performing <b>multiply_region</b>, and
for performing division in the <b>region_type</b> and <b>divide_type</b>
arguments.  Their values are also defined in <b>gf.h</b>.  You can
mix the <b>region_type</b> values (e.g. "DOUBLE" and "SSE"):

<p><center><table border=3 cellpadding=3><td><pre>
#define GF_REGION_DEFAULT      (0x0)
#define GF_REGION_SINGLE_TABLE (0x1)
#define GF_REGION_DOUBLE_TABLE (0x2)
#define GF_REGION_QUAD_TABLE   (0x4)
#define GF_REGION_LAZY         (0x8)
#define GF_REGION_SSE          (0x10)
#define GF_REGION_NOSSE        (0x20)
#define GF_REGION_STDMAP       (0x40)
#define GF_REGION_ALTMAP       (0x80)
#define GF_REGION_CAUCHY       (0x100)

typedef uint32_t gf_region_type_t;

typedef enum { GF_DIVIDE_DEFAULT,
               GF_DIVIDE_MATRIX,
               GF_DIVIDE_EUCLID } gf_division_type_t;
</pre></td></table></center><p>
You can change
the primitive polynomial with <b>prim_poly</b>, give additional arguments with
<b>arg1</b> and <b>arg2</b> and give a base Galois Field for composite fields.
Finally, you can pass it a pointer to memory in <b>scratch_memory</b>.  That
way, you can avoid having <b>gf_init_hard()</b> call <b>malloc()</b>.
<p>
There is a procedure called <b>gf_scratch_size()</b> that lets you know the minimum
size for <b>scratch_memory</b>, depending on <i>w</i>, the multiplication type
and the arguments:

<p><center><table border=3 cellpadding=3><td><pre>
extern int gf_scratch_size(int w,
                           int mult_type,
                           int region_type,
                           int divide_type,
                           int arg1,
                           int arg2);
</pre></td></table></center><p>

You can specify default arguments in <b>gf_init_hard()</b>:
<UL>
<LI> <b>region_type</b> = <b>GF_REGION_DEFAULT</b>
<LI> <b>divide_type</b> = <b>GF_REGION_DEFAULT</b>
<LI> <b>prim_poly</b> = 0
<LI> <b>arg1</b> = 0
<LI> <b>arg2</b> = 0
<LI> <b>base_gf</b> = <b>NULL</b>
<LI> <b>scratch_memory</b> = <b>NULL</b>
</UL>
If any argument is equal to its default, then default actions are taken (e.g. a
standard primitive polynomial is used, or memory is allocated for <b>scratch_memory</b>).
In fact, <b>gf_init_easy()</b> simply calls <b>gf_init_hard()</b> with the default
parameters.
<p>
<b>gf_free()</b> frees memory that was allocated with <b>gf_init_easy()</b>
or <b>gf_init_hard()</b>.  The <b>recursive</b> parameter is in case you
use composite fields, and want to recursively free the base fields.
If you pass <b>scratch_memory</b> to <b>gf_init_hard()</b>, then you typically
don't need to call <b>gf_free()</b>.  It won't hurt to call it, though.

<hr>
<h3>gf_mult and gf_div</h3>

For the moment, I have few things completely implemented, but that's because I want
to be able to explain the structure, and how to specify methods.  In particular, for
<i>w=4</i>, I have implemented <b>SHIFT</b> and <b>LOG</b>.  For <i>w=8, 16, 32, 64</i>
I have implemented <b>SHIFT</b>.  For all <i>w &le; 32</i>, I have implemented both
Euclid's algorithm for inversion, and the matrix method for inversion.  For
<i>w=64</i>, it's just Euclid.  You can
test these all with <b>gf_mult</b> and <b>gf_div</b>.  Here are a few calls:

<pre>
UNIX> <font color=darkred><b>gf_mult 7 11 4</b></font>                - Default
4
UNIX> <font color=darkred><b>gf_mult 7 11 4 SHIFT - -</b></font>      - Use shift
4
UNIX> <font color=darkred><b>gf_mult 7 11 4 LOG - -</b></font>        - Use logs
4
UNIX> <font color=darkred><b>gf_div 4 7 4</b></font>                  - Default
11
UNIX> <font color=darkred><b>gf_div 4 7 4 LOG - -</b></font>          - Use logs
11
UNIX> <font color=darkred><b>gf_div 4 7 4 LOG - EUCLID</b></font>     - Use Euclid instead of logs
11
UNIX> <font color=darkred><b>gf_div 4 7 4 LOG - MATRIX</b></font>     - Use Matrix inversion instead of logs
11
UNIX> <font color=darkred><b>gf_div 4 7 4 SHIFT - -</b></font>        - Default
11
UNIX> <font color=darkred><b>gf_div 4 7 4 SHIFT - EUCLID</b></font>   - Use Euclid (which is the default)
11
UNIX> <font color=darkred><b>gf_div 4 7 4 SHIFT - MATRIX</b></font>   - Use Matrix inversion instead of logs
11
UNIX> <font color=darkred><b>gf_mult 200 211 8</b></font>        - The remainder are shift/Euclid
201
UNIX> <font color=darkred><b>gf_div 201 211 8</b></font>
200
UNIX> <font color=darkred><b>gf_mult 60000 65111 16</b></font>
63515
UNIX> <font color=darkred><b>gf_div 63515 65111 16</b></font>
60000
UNIX> <font color=darkred><b>gf_mult abcd0001 9afbf788 32h</b></font>
b0359681
UNIX> <font color=darkred><b>gf_div b0359681 9afbf788 32h</b></font>
abcd0001
UNIX> <font color=darkred><b>gf_mult abcd00018c8b8c8a 9afbf7887f6d8e5b 64h</b></font>
3a7def35185bd571
UNIX> <font color=darkred><b>gf_mult abcd00018c8b8c8a 9afbf7887f6d8e5b 64h</b></font>
3a7def35185bd571
UNIX> <font color=darkred><b>gf_div 3a7def35185bd571 9afbf7887f6d8e5b 64h</b></font>
abcd00018c8b8c8a
UNIX> <font color=darkred><b></b></font>
</pre>

You can see all the methods with <b>gf_methods</b>.  We have a lot of implementing to do:

<pre>
UNIX> <font color=darkred><b>gf_methods</b></font>
To specify the methods, do one of the following:
       - leave empty to use defaults
       - use a single dash to use defaults
       - specify MULTIPLY REGION DIVIDE

Legal values of MULTIPLY:
       SHIFT: shift
       GROUP g_mult g_reduce: the Group technique - see the paper
       BYTWO_p: BYTWO doubling the product.
       BYTWO_b: BYTWO doubling b (more efficient thatn BYTWO_p)
       TABLE: Full multiplication table
       LOG:   Discrete logs
       LOG_ZERO: Discrete logs with a large table for zeros
       SPLIT g_a g_b: Split tables defined by g_a and g_b
       COMPOSITE k l [METHOD]: Composite field, recursively specify the
                               method of the base field in GF(2^l)

Legal values of REGION: Specify multiples with commas e.g. 'DOUBLE,LAZY'
       -: Use defaults
       SINGLE/DOUBLE/QUAD: Expand tables
       LAZY: Lazily create table (only applies to TABLE and SPLIT)
       SSE/NOSSE: Use 128-bit SSE instructions if you can
       CAUCHY/ALTMAP/STDMAP: Use different memory mappings

Legal values of DIVIDE:
       -: Use defaults
       MATRIX: Use matrix inversion
       EUCLID: Use the extended Euclidian algorithm.

See the user's manual for more information.
There are many restrictions, so it is better to simply use defaults in most cases.
UNIX> <font color=darkred><b></b></font>
</pre>

<hr>
<h3>gf_unit and gf_time</h3>

<b><a href=gf_unit.c>gf_unit.c</a></b> is a unit tester, and
<b><a href=gf_time.c>gf_time.c</a></b> is a time tester.

They are called as follows:

<p><center><table border=3 cellpadding=3><td><pre>
UNIX> <font color=darkred><b>gf_unit w tests seed [METHOD] </b></font>
UNIX> <font color=darkred><b>gf_time w tests seed size(bytes) iterations [METHOD] </b></font>
</pre></td></table></center><p>

The <b>tests</b> parameter is one or more of the following characters:

<UL>
<LI>        A: Do all tests
<LI>        S: Test only single operations (multiplication/division)
<LI>        R: Test only region operations
<LI>        V: Verbose Output
</UL>

<b>seed</b> is a seed for <b>srand48()</b> -- using -1 defaults to the current time.
<p>
For example, testing the defaults with w=4:

<pre>
UNIX> <font color=darkred><b>gf_unit 4 AV 1 LOG - -</b></font>
Seed: 1
Testing single multiplications/divisions.
Testing Inversions.
Testing buffer-constant, src != dest, xor = 0
Testing buffer-constant, src != dest, xor = 1
Testing buffer-constant, src == dest, xor = 0
Testing buffer-constant, src == dest, xor = 1
UNIX> <font color=darkred><b>gf_unit 4 AV 1 SHIFT - -</b></font>
Seed: 1
Testing single multiplications/divisions.
Testing Inversions.
No multiply_region.
UNIX> <font color=darkred><b></b></font>
</pre>

There is no <b>multiply_region()</b> method defined for <b>SHIFT</b>.
Thus, the procedures are <b>NULL</b> and the unit tester ignores them.
<p>
At the moment, I only have the unit tester working for w=4.
<p>
<b>gf_time</b> takes the size of an array (in bytes) and a number of iterations, and
tests the speed of both single and region operations.  The tests are:

<UL>
<LI> A: All
<LI> S: All Single Operations
<LI> R: All Region Operations
<LI> M: Single: Multiplications
<LI> D: Single: Divisions
<LI> I: Single: Inverses
<LI> B: Region: Multipy_Region
</UL>

Here are some examples with <b>SHIFT</b> and <b>LOG</b> on my mac.

<pre>
UNIX> <font color=darkred><b>gf_time 4 A 1 102400 1024 LOG - -</b></font>
Seed: 1
Multiply:   0.538126 s      185.830 Mega-ops/s
Divide:     0.520825 s      192.003 Mega-ops/s
Inverse:    0.631198 s      158.429 Mega-ops/s
Buffer-Const,s!=d,xor=0:    0.478395 s      209.032 MB/s
Buffer-Const,s!=d,xor=1:    0.524245 s      190.751 MB/s
Buffer-Const,s==d,xor=0:    0.471851 s      211.931 MB/s
Buffer-Const,s==d,xor=1:    0.528275 s      189.295 MB/s
UNIX> <font color=darkred><b>gf_time 4 A 1 102400 1024 LOG - EUCLID</b></font>
Seed: 1
Multiply:   0.555512 s      180.014 Mega-ops/s
Divide:     5.359434 s       18.659 Mega-ops/s
Inverse:    4.911719 s       20.359 Mega-ops/s
Buffer-Const,s!=d,xor=0:    0.496097 s      201.573 MB/s
Buffer-Const,s!=d,xor=1:    0.538536 s      185.689 MB/s
Buffer-Const,s==d,xor=0:    0.485564 s      205.946 MB/s
Buffer-Const,s==d,xor=1:    0.540227 s      185.107 MB/s
UNIX> <font color=darkred><b>gf_time 4 A 1 102400 1024 LOG - MATRIX</b></font>
Seed: 1
Multiply:   0.544005 s      183.822 Mega-ops/s
Divide:     7.602822 s       13.153 Mega-ops/s
Inverse:    7.000564 s       14.285 Mega-ops/s
Buffer-Const,s!=d,xor=0:    0.474868 s      210.585 MB/s
Buffer-Const,s!=d,xor=1:    0.527588 s      189.542 MB/s
Buffer-Const,s==d,xor=0:    0.473130 s      211.358 MB/s
Buffer-Const,s==d,xor=1:    0.529877 s      188.723 MB/s
UNIX> <font color=darkred><b>gf_time 4 A 1 102400 1024 SHIFT - -</b></font>
Seed: 1
Multiply:   2.708842 s       36.916 Mega-ops/s
Divide:     8.756882 s       11.420 Mega-ops/s
Inverse:    5.695511 s       17.558 Mega-ops/s
UNIX> <font color=darkred><b></b></font>
</pre>

At the moment, I only have the timer working for w=4.

<hr>
<h3>Walking you through <b>LOG</b></h3>

To see how <b>scratch</b> is used to store data, let's look at what happens when
you call <b>gf_init_easy(&gf, 4, GF_MULT_LOG);</b>
First, <b>gf_init_easy()</b> calls <b>gf_init_hard()</b> with default parameters.
This is in <b><a href=gf.c>gf.c</a></b>.
<p>
<b>gf_init_hard()</b>' first job is to set up the scratch.
The scratch's type is <b>gf_internal_t</b>, defined in
<b><a href=gf_int.h>gf_int.h</a></b>:

<p><center><table border=3 cellpadding=3><td><pre>
typedef struct {
  int mult_type;
  int region_type;
  int divide_type;
  int w;
  uint64_t prim_poly;
  int free_me;
  int arg1;
  int arg2;
  gf_t *base_gf;
  void *private;
} gf_internal_t;
</pre></td></table></center><p>

All the fields are straightfoward, with the exception of <b>private</b>.  That is
a <b>(void *)</b> which points to the implementation's private data.
<p>
Here's the code for
<b>gf_init_hard()</b>:

<p><center><table border=3 cellpadding=3><td><pre>
int gf_init_hard(gf_t *gf, int w, int mult_type,
                        int region_type,
                        int divide_type,
                        uint64_t prim_poly,
                        int arg1, int arg2,
                        gf_t *base_gf,
                        void *scratch_memory)
{
  int sz;
  gf_internal_t *h;


  if (scratch_memory == NULL) {
    sz = gf_scratch_size(w, mult_type, region_type, divide_type, arg1, arg2);
    if (sz &lt;= 0) return 0;
    h = (gf_internal_t *) malloc(sz);
    h-&gt;free_me = 1;
  } else {
    h = scratch_memory;
    h-&gt;free_me = 0;
  }
  gf-&gt;scratch = (void *) h;
  h-&gt;mult_type = mult_type;
  h-&gt;region_type = region_type;
  h-&gt;divide_type = divide_type;
  h-&gt;w = w;
  h-&gt;prim_poly = prim_poly;
  h-&gt;arg1 = arg1;
  h-&gt;arg2 = arg2;
  h-&gt;base_gf = base_gf;
  h-&gt;private = (void *) gf-&gt;scratch;
  h-&gt;private += (sizeof(gf_internal_t));

  switch(w) {
    case 4: return gf_w4_init(gf);
    case 8: return gf_w8_init(gf);
    case 16: return gf_w16_init(gf);
    case 32: return gf_w32_init(gf);
    case 64: return gf_w64_init(gf);
    case 128: return gf_dummy_init(gf);
    default: return 0;
  }
}
</pre></td></table></center><p>

The first thing it does is determine if it has to allocate space for <b>scratch</b>.
If it must, it uses <b>gf_scratch_size()</b> to figure out how big the space must be.
It then sets <b>gf->scratch</b> to this space, and sets all of the fields of the
scratch to the arguments in <b>gf_init_hard()</b>.  The <b>private</b> pointer is
set to be the space just after the pointer <b>gf->private</b>.   Again, it is up to
<b>gf_scratch_size()</b> to make sure there is enough space for the scratch, and
for all of the private data needed by the implementation.
<p>
Once the scratch is set up, <b>gf_init_hard()</b> calls <b>gf_w4_init()</b>.  This is
in <b><a href=gf_w4.c>gf_w4.c</a></b>, and it is a
simple dispatcher to the various initialization routines, plus it
sets <b>EUCLID</b> and <b>MATRIX</b> if need be:

<p><center><table border=3 cellpadding=3><td><pre>
int gf_w4_init(gf_t *gf)
{
  gf_internal_t *h;

  h = (gf_internal_t *) gf-&gt;scratch;
  if (h-&gt;prim_poly == 0) h-&gt;prim_poly = 0x13;

  gf-&gt;multiply.w4 = NULL;
  gf-&gt;divide.w4 = NULL;
  gf-&gt;inverse.w4 = NULL;
  gf-&gt;multiply_region.w4 = NULL;

  switch(h-&gt;mult_type) {
    case GF_MULT_SHIFT:     if (gf_w4_shift_init(gf) == 0) return 0; break;
    case GF_MULT_LOG_TABLE: if (gf_w4_log_init(gf) == 0) return 0; break;
    case GF_MULT_DEFAULT:   if (gf_w4_log_init(gf) == 0) return 0; break;
    default: return 0;
  }
  if (h-&gt;divide_type == GF_DIVIDE_EUCLID) {
    gf-&gt;divide.w4 = gf_w4_divide_from_inverse;
    gf-&gt;inverse.w4 = gf_w4_euclid;
  } else if (h-&gt;divide_type == GF_DIVIDE_MATRIX) {
    gf-&gt;divide.w4 = gf_w4_divide_from_inverse;
    gf-&gt;inverse.w4 = gf_w4_matrix;
  }

  if (gf-&gt;inverse.w4 != NULL && gf-&gt;divide.w4 == NULL) {
    gf-&gt;divide.w4 = gf_w4_divide_from_inverse;
  }
  if (gf-&gt;inverse.w4 == NULL && gf-&gt;divide.w4 != NULL) {
    gf-&gt;inverse.w4 = gf_w4_inverse_from_divide;
  }
  return 1;
}
</pre></td></table></center><p>

The code in <b>gf_w4_log_init()</b> sets up the log and antilog tables, and sets
the <b>multiply.w4</b>, <b>divide.w4</b> etc routines to be the ones for logs.  The
tables are put into <b>gf->scratch->private</b>, which is typecast to a <b>struct
gf_logtable_data *</b>:

<p><center><table border=3 cellpadding=3><td><pre>
struct gf_logtable_data {
    gf_val_4_t      log_tbl[GF_FIELD_SIZE];
    gf_val_4_t      antilog_tbl[GF_FIELD_SIZE * 2];
    gf_val_4_t      *antilog_tbl_div;
};
.......

static
int gf_w4_log_init(gf_t *gf)
{
  gf_internal_t *h;
  struct gf_logtable_data *ltd;
  int i, b;

  h = (gf_internal_t *) gf-&gt;scratch;
  ltd = h-&gt;private;

  ltd-&gt;log_tbl[0] = 0;

  ltd-&gt;antilog_tbl_div = ltd-&gt;antilog_tbl + (GF_FIELD_SIZE-1);
  b = 1;
  for (i = 0; i &lt; GF_FIELD_SIZE-1; i++) {
      ltd-&gt;log_tbl[b] = (gf_val_8_t)i;
      ltd-&gt;antilog_tbl[i] = (gf_val_8_t)b;
      ltd-&gt;antilog_tbl[i+GF_FIELD_SIZE-1] = (gf_val_8_t)b;
      b &lt;&lt;= 1;
      if (b & GF_FIELD_SIZE) {
          b = b ^ h-&gt;prim_poly;
      }
  }

  gf-&gt;inverse.w4 = gf_w4_inverse_from_divide;
  gf-&gt;divide.w4 = gf_w4_log_divide;
  gf-&gt;multiply.w4 = gf_w4_log_multiply;
  gf-&gt;multiply_region.w4 = gf_w4_log_multiply_region;
  return 1;
}
</pre></td></table></center><p>

And of course the individual routines use <b>h->private</b> to access the tables:

<p><center><table border=3 cellpadding=3><td><pre>
static
inline
gf_val_8_t gf_w4_log_multiply (gf_t *gf, gf_val_8_t a, gf_val_8_t b)
{
  struct gf_logtable_data *ltd;

  ltd = (struct gf_logtable_data *) ((gf_internal_t *) (gf-&gt;scratch))-&gt;private;
  return (a == 0 || b == 0) ? 0 : ltd-&gt;antilog_tbl[(unsigned)(ltd-&gt;log_tbl[a] + ltd-&gt;log_tbl[b])];
}
</pre></td></table></center><p>

Finally, it's important that the proper sizes are put into
<b>gf_w4_scratch_size()</b> for each implementation:

<p><center><table border=3 cellpadding=3><td><pre>
int gf_w4_scratch_size(int mult_type, int region_type, int divide_type, int arg1, int arg2)
{
  int region_tbl_size;
  switch(mult_type)
  {
    case GF_MULT_DEFAULT:
    case GF_MULT_LOG_TABLE:
      return sizeof(gf_internal_t) + sizeof(struct gf_logtable_data) + 64;
      break;
    case GF_MULT_SHIFT:
      return sizeof(gf_internal_t);
      break;
    default:
      return -1;
   }
}
</pre></td></table></center><p>
I hope that's enough explanation for y'all to start implementing.  Let me know if you have
problems -- thanks -- Jim

<hr>
The initial structure has been set for w=4, 8, 16, 32 and 64, with implementations of SHIFT and EUCLID, and for w <= 32, MATRIX.  There are some weird caveats:

<UL>
<LI> For w=32 and w=64, the primitive polynomial does not have the leading one.
<LI> I'd like for naming to be:
<p>
<UL>
      <b>gf_w</b><i>w</i><b>_</b><i>technique</i></i><b>_</b><i>funcationality</i><b>()</b>.
</UL>
<p>
For example, the log techniques for w=4 are:
<pre>
gf_w4_log_multiply()
gf_w4_log_divide()
gf_w4_log_multiply_region()
gf_w4_log_init()
</pre>
<p>
<LI> I'd also like a header block on implementations that says who wrote it.
</UL>

<hr>
<h3>Things we need to Implement: <i>w=4</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> Single TABLE </td> <td> Done - Jim </td> </tr>
<tr> <td> Double TABLE </td> <td> Done - Jim </td> </tr>
<tr> <td> Double TABLE, SSE </td> <td> Done - Jim </td> </tr>
<tr> <td> Quad TABLE </td> <td>Done - Jim</td> </tr>
<tr> <td> Lazy Quad TABLE </td> <td>Done - Jim</td> </tr>
<tr> <td> LOG </td> <td> Done - Jim </td> </tr>
</table><p>

<hr>
<h3>Things we need to Implement: <i>w=8</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td>Done - Jim </td> </tr>
<tr> <td> BYTWO_b </td> <td>Done - Jim </td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td>Done - Jim </td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td>Done - Jim </td> </tr>
<tr> <td> Single TABLE </td> <td> Done - Kevin </td> </tr>
<tr> <td> Double TABLE </td> <td> Done - Jim </td> </tr>
<tr> <td> Lazy Double TABLE </td> <td> Done - Jim </td> </tr>
<tr> <td> Split 2 1 (Half) SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> Composite, k=2 </td> <td> Done - Kevin (alt mapping not passing unit test) </td> </tr>
<tr> <td> LOG </td> <td> Done - Kevin </td> </tr>
<tr> <td> LOG ZERO</td> <td> Done - Jim</td> </tr>
</table><p>

<hr>
<h3>Things we need to Implement: <i>w=16</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> Lazy TABLE </td> <td>Done - Jim</td> </tr>
<tr> <td> Split 4 16 No-SSE, lazy </td> <td>Done - Jim</td> </tr>
<tr> <td> Split 4 16 SSE, lazy </td> <td>Done - Jim</td> </tr>
<tr> <td> Split 4 16 SSE, lazy, alternate mapping </td> <td>Done - Jim</td> </tr>
<tr> <td> Split 8 16, lazy </td> <td>Done - Jim</td> </tr>
<tr> <td> Composite, k=2, stdmap recursive </td> <td> Done - Kevin</td> </tr>
<tr> <td> Composite, k=2, altmap recursive </td> <td> Done - Kevin</td> </tr>
<tr> <td> Composite, k=2, stdmap inline </td> <td> Done - Kevin</td> </tr>
<tr> <td> LOG </td> <td> Done - Kevin </td> </tr>
<tr> <td> LOG ZERO</td> <td> Done - Kevin </td> </tr>
<tr> <td> Group 4 4 </td> <td>Done - Jim: I don't see a reason to implement others, although 4-8 will be faster, and 8 8 will have faster region ops.  They'll never beat SPLIT.</td> </tr>
</table><p>

<hr>
<h3>Things we need to Implement: <i>w=32</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td>Done - Jim</td> </tr>
<tr> <td> Split 2 32,lazy </td> <td>Done  - Jim</td> </tr>
<tr> <td> Split 2 32, SSE, lazy </td> <td>Done  - Jim</td> </tr>
<tr> <td> Split 4 32, lazy </td> <td>Done  - Jim</td> </tr>
<tr> <td> Split 4 32, SSE,ALTMAP lazy </td> <td>Done  - Jim</td> </tr>
<tr> <td> Split 4 32, SSE, lazy </td> <td>Done  - Jim</td> </tr>
<tr> <td> Split 8 8 </td> <td>Done - Jim </td> </tr>
<tr> <td> Group, g_s == g_r </td> <td>Done - Jim</td></tr>
<tr> <td> Group, any g_s and g_r</td> <td>Done - Jim</td></tr>
<tr> <td> Composite, k=2, stdmap recursive </td> <td> Done - Kevin</td> </tr>
<tr> <td> Composite, k=2, altmap recursive </td> <td> Done - Kevin</td> </tr>
<tr> <td> Composite, k=2, stdmap inline </td> <td> Done - Kevin</td> </tr>
</table><p>
<hr>
<h3>Things we need to Implement: <i>w=64</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td> - </td> </tr>
<tr> <td> BYTWO_b </td> <td> - </td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td> - </td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td> - </td> </tr>
<tr> <td> Split 16 1 SSE, maybe lazy </td> <td> - </td> </tr>
<tr> <td> Split 8 1 lazy </td> <td> - </td> </tr>
<tr> <td> Split 8 8 </td> <td> - </td> </tr>
<tr> <td> Split 8 8 lazy </td> <td> - </td> </tr>
<tr> <td> Group </td> <td> - </td> </tr>
<tr> <td> Composite, k=2, alternate mapping </td> <td> - </td> </tr>
</table><p>
<hr>
<h3>Things we need to Implement: <i>w=128</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> SHIFT </td> <td> Done - Will </td> </tr>
<tr> <td> BYTWO_p </td> <td> - </td> </tr>
<tr> <td> BYTWO_b </td> <td> - </td> </tr>
<tr> <td> BYTWO_p, SSE </td> <td> - </td> </tr>
<tr> <td> BYTWO_b, SSE </td> <td> - </td> </tr>
<tr> <td> Split 32 1 SSE, maybe lazy </td> <td> - </td> </tr>
<tr> <td> Split 16 1 lazy </td> <td> - </td> </tr>
<tr> <td> Split 16 16 - Maybe that's insanity</td> <td> - </td> </tr>
<tr> <td> Split 16 16 lazy </td> <td> - </td> </tr>
<tr> <td> Group (SSE) </td> <td> - </td> </tr>
<tr> <td> Composite, k=?, alternate mapping </td> <td> - </td> </tr>
</table><p>
<hr>
<h3>Things we need to Implement: <i>w=general between 1 & 32</i></h3>

<p><table border=3 cellpadding=2>
<tr> <td> CAUCHY Region (SSE XOR)</td> <td> Done - Jim </td> </tr>
<tr> <td> SHIFT </td> <td> Done - Jim </td> </tr>
<tr> <td> TABLE </td> <td> Done - Jim </td> </tr>
<tr> <td> LOG </td> <td> Done - Jim </td> </tr>
<tr> <td> BYTWO_p </td> <td>Done - Jim</td> </tr>
<tr> <td> BYTWO_b </td> <td>Done - Jim</td> </tr>
<tr> <td> Group, g_s == g_r </td> <td>Done - Jim</td></tr>
<tr> <td> Group, any g_s and g_r</td> <td>Done - Jim</td></tr>
<tr> <td> Split - do we need it?</td> <td>Done - Jim</td></tr>
<tr> <td> Composite - do we need it?</td> <td> - </td></tr>
<tr> <td> Split - do we need it?</td> <td> - </td></tr>
<tr> <td> Logzero?</td> <td> - </td></tr>
</table><p>