Stupid Translation Layer:

mapping = 16b:
4b magic
4b block number
4b version number
4b crc32

[block] phys block = 512
[cluster] mapping unit = 4096? = X phys blocks, X=8
index block = phys block
N = index block size / 16 = phys block size / 16 = 32
[segment] sequence of N mapping units and 1 physical block
[erase unit] device erase unit (or management unit in case of FTLed flash like USB/SD)

* maintain block mappings in RAM
* reserve at least N*(N*X+1) phys blocks for defragmentation
* scan each (N*X+1)th device block during mount
* => for the case of 512 byte sector mappings will eat 128MB of 4GB flash, plus 528KB reserved space
* => for 4096b sector AND 4096b index block mappings = 16MB/4GB, but reserved space = 256MB!!!
* => for 4096b sector and 512b index block mappings = 16MB/4GB, reserved space = 4MB
* first just write next available map unit
* commit mappings each N blocks or each 1 second
* mark blocks having old version numbers as unused (only in RAM, do not touch the flash itself!)
* N unused blocks = "free block sequence"
* When we wrap around the ring buffer end, we must find free place to continue writing.
  (and ideally it should be exactly after the previous end). There will always be enough
  reserved space to move blocks, because each partially occupied segment has at least 1
  free block, and we have N-1 segments reserved. We just find first available segments that
  have at least N free blocks in total, and move them to reserved space. If there is an
  offset between first moved block and the previous end of ring buffer, we decide between
  moving or skipping blocks based on <skip cost> and <full move cost>.
  For example, if the offset to first partially free segment is VERY BIG, we won't move anything.
  But we ALWAYS take first available partially free segments - because the increasing offset
  cost is almost always greater than the decrease of moving cost.

  Here are the cleaning costs:
  Cost of skipping some segments is determined by the idea that else we could write
  them and gain more performance (totally true for FTLed devices like USB flash drive or SD
  card; but the expression differs for raw NAND).

  E = erase unit size in blocks
  L = number of last written segment
  O = number of first moved segment
  S = N*X+1 = segment size in blocks
  <move cost> = (blocks occupied in sequence)*(READ + WRITE) + min(<skip cost>, <full move cost>)
  <full move cost> = (O-L)*S*(READ + WRITE)
  <skip cost> = WRITE*(int(O*S/E) > int(L*S/E) ? (E-(L*S)%E) + ((O*S)%E) : S*(O-L))
  <skip cost for raw NAND> = WRITE*(int(O*S/E) > int(L*S/E) ? ((O*S)%E) : 0)

Data structures:
* Mapping/version array: 8b * block count ~~ 8MB for 4GB flash and 4096/512 map/phys sizes
* Next block pointer: exactly 1 integer because STL flash is a ring buffer
  filled with number of first free block followed or included in an empty sequence
* Free segment counter
* Free block counter
* (MAYBE) R-B tree / list of segments having at least one free block

USB flash read/write cost:
* Write <8Kb cost = 4 * (Random read <8Kb cost)
* Write >=16Kb cost = 2 * (Random read >=16Kb cost)
* Best speed is achieved with I/O size >=16Kb, ideally 32Kb; bigger values aren't that better.
* Erase unit is usually around 1MB