s-bsdipa, a mutation of BSDiff
==============================

Colin Percival's BSDiff, imported from FreeBSD and transformed into
a library; please see header comment of s-bsdipa-lib.h for more:
create or apply a difference patch.  In general:

- One includes s-bsdipa-lib.h, and uses the all-in-memory s_bsdipa_diff()
  and s_bsdipa_patch() functions to create and apply patches.
  That is to say, (for example mmap(2)ed) memory in, (heap) memory out.

- Compression is necessary; storage preparation can (could) be achieved
  easily by including s-bsdipa-io.h after defining s_BSDIPA_IO as
  desired, followed by using the according s_bsdipa_io_write_*() and
  _read_*() functions.  These still do not perform direct I/O, but call
  a supplied hook with fully prepared buffers or store in (heap) memory,
  respectively.  Multiple _IO methods are provided.

- In general the lib/ directory of the source repository is self-
  contained, and may be copied for inclusion in other projects.

  If the s_BSDIPA_SMALL approach is taken the directory
  lib/libdivsufsort may also be removed.

- Please see the introductional header comments of s-bsdipa-lib.h and
  s-bsdipa-io.h for more.

- The directory s-bsdipa contains a self-contained (except for
  compression libraries) program which can create and apply patches
  (like a combined FreeBSD bsdiff and bspatch program).
  It times execution and tracks memory usage on stderr.

- The directory perl contains the self-contained BsDiPa CPAN module.
  (Perl ships with ZLIB, libz2/BZ2, liblzma/XZ, and libzstd/ZSTD
  support is compile-time detected.)

Licenses (full text included in s-bsdipa-lib.h):
  libdivsufsort(/LICENSE): MIT
  s-bsdiff.c: BSD-2-clause
  s-bspatch.c, s-bsdipa-lib.h, s-bsdipa-io.h, s-bsdipa.c: ISC

Repository:
  browse: https?://git.sdaoden.eu/browse/s-bsdipa.git
  clone:  https?://git.sdaoden.eu/scm/s-bsdipa.git
  (Alternatively: https://github.com/sdaoden/s-bsdipa)
  We are Coverity.com project 31371 / s-bsdipa.

Contact: steffen at sdaoden dot eu.

1. Numbers, Numbers
2. Releases

1. Numbers, Numbers
-------------------

Examples 1 and 2 for a s-bsdipa executable built like

  CFLAGS='-O2 ' make s_BSDIPA_CFLAGS='$(SUFX)' \
    s_BSDIPA_32=y s_BSDIPA_TEXT=y all

Example 3 built and tested like

  CFLAGS='-O3 -DNDEBUG' \
    make s_BSDIPA_32=y s_BSDIPA_CFLAGS='$(SUFX)' clean all
  mv s-bsdipa/s-bsdipa /tmp/s-bsdipa32
  CFLAGS='-O3 -DNDEBUG' \
    make s_BSDIPA_CFLAGS='$(SUFX)' clean all
  mv s-bsdipa/s-bsdipa /tmp/s-bsdipa64
  make distclean

  for c in -J -j -Z -z; do
      for b in 32 64; do
        echo $b/$c
        ./s-bsdipa$b $c -f7 diff .1 .2 .P$b$c
        ./s-bsdipa$b -f patch .2 .P$b$c .R$b$c
      done
    done

Example 1: manuals of s-nail v14.9.25 vs v14.10.0-alpha, roff (mdoc)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  -rw-r----- 1 steffen steffen 428420 May 18 20:00 .B1
  -rw-r----- 1 steffen steffen 390770 May 18 20:00 .B2

  diff -e .B1 .B2 | wc -c
  241233
  diff -u .B1 .B2 | wc -c
  593019

BSDiff:

  # 67368 result bytes | 121 allocs: all=9956597 peek=8130345 curr=485317
  # Code 0:210 secs, BZ2 I/O 0:022 secs
  # data: ctrl=54816 (4568 entries) diff=304307 extra=124113

  # 65216 result bytes | 131 allocs: all=389444036 peek=387617784 curr=485317
  # Code 0:209 secs, XZ I/O 0:138 secs
  # data: ctrl=54816 (4568 entries) diff=304307 extra=124113

  # 74830 result bytes | 122 allocs: all=2706657 peek=2254673 curr=485317
  # Code 0:213 secs, ZLIB I/O 0:122 secs
  # data: ctrl=54816 (4568 entries) diff=304307 extra=124113

  # 69444 result bytes | 119 allocs: all=96286768 peek=94460516 curr=485317
  # Code 0:210 secs, ZSTD I/O 0:145 secs
  # data: ctrl=54816 (4568 entries) diff=304307 extra=124113

Textual:

  # 81943 result bytes | 182 allocs: all=8765557 peek=8129837 curr=484809
  # Code 0:012 secs, BZ2 I/O 0:126 secs
  # data: ctrl=54456 (4538 entries) diff=202309 extra=226111

  # 82364 result bytes | 192 allocs: all=388252996 peek=387617276 curr=484809
  # Code 0:011 secs, XZ I/O 0:254 secs
  # data: ctrl=54456 (4538 entries) diff=202309 extra=226111

  # 97355 result bytes | 183 allocs: all=1515617 peek=1120529 curr=484809
  # Code 0:011 secs, ZLIB I/O 0:179 secs
  # data: ctrl=54456 (4538 entries) diff=202309 extra=226111

  # 86767 result bytes | 180 allocs: all=95095728 peek=94460008 curr=484809
  # Code 0:012 secs, ZSTD I/O 0:230 secs
  # data: ctrl=54456 (4538 entries) diff=202309 extra=226111

Without header (-H), textual diff, ZLIB:

  file .BP
  .BP: zlib compressed data
  base64 < .BP | wc -c
  131516

Example 2: two emails, as sent out to, and received reencoded back from a ML:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  -rw-r----- 1 steffen steffen 4621 May 18 20:00 .S1
  -rw-r----- 1 steffen steffen 5309 May 18 20:00 .S2

  diff -e .S1 .S2 | wc -c
  2326
  diff -u .S1 .S2 | wc -c
  4979

BSDiff:

  # 950 result bytes | 9 allocs: all=7582046 peek=7539566 curr=5130
  # Code 0:003 secs, BZ2 I/O 0:001 secs
  # data: ctrl=216 (18 entries) diff=3554 extra=1067

  # 884 result bytes | 19 allocs: all=387069485 peek=387027005 curr=5130
  # Code 0:003 secs, XZ I/O 0:127 secs
  # data: ctrl=216 (18 entries) diff=3554 extra=1067

  # 857 result bytes | 10 allocs: all=332106 peek=289626 curr=5130
  # Code 0:002 secs, ZLIB I/O 0:000 secs
  # data: ctrl=216 (18 entries) diff=3554 extra=1067

  # 875 result bytes | 7 allocs: all=93912217 peek=93869737 curr=5130
  # Code 0:003 secs, ZSTD I/O 0:144 secs
  # data: ctrl=216 (18 entries) diff=3554 extra=1067

Textual:

  # 1260 result bytes | 9 allocs: all=7569958 peek=7539566 curr=5130
  # Code 0:000 secs, BZ2 I/O 0:002 secs
  # data: ctrl=108 (9 entries) diff=3043 extra=1578

  # 1144 result bytes | 19 allocs: all=387057397 peek=387027005 curr=5130
  # Code 0:000 secs, XZ I/O 0:130 secs
  # data: ctrl=108 (9 entries) diff=3043 extra=1578

  # 1089 result bytes | 10 allocs: all=320018 peek=289626 curr=5130
  # Code 0:000 secs, ZLIB I/O 0:000 secs
  # data: ctrl=108 (9 entries) diff=3043 extra=1578

  # 1116 result bytes | 7 allocs: all=93900129 peek=93869737 curr=5130
  # Code 0:000 secs, ZSTD I/O 0:147 secs
  # data: ctrl=108 (9 entries) diff=3043 extra=1578

Without header (-H), textual diff, ZLIB:

  file .SP
  .SP: zlib compressed data
  base64 < .SP | wc -c
  1472

Example 3: binary btrfs-progs#6.19.1 vs #7.0-1 (btrfs; Linux x86-64):
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  -rwxr-x--- 1 steffen steffen 1243312 May 20 19:49 .1*
  -rwxr-x--- 1 steffen steffen 1251504 May 20 19:49 .2*

Ending up with

  # data: ctrl=11904 (496 entries) diff=1217413 extra=25899

  32/-J
  # 71324 result bytes | 32 allocs: all=201586036 peek=196316848
  # Code 0:151 secs, XZ I/O 0:272 secs
  # 1243312 result bytes | 11 allocs: all=19305425 peek=18062000
  # Code 0:002 secs, XZ I/O 0:004 secs
  64/-J
  # 71508 result bytes | 32 allocs: all=206861620 peek=196323244
  # Code 0:154 secs, XZ I/O 0:275 secs
  # 1243312 result bytes | 11 allocs: all=19311377 peek=18067952
  # Code 0:002 secs, XZ I/O 0:004 secs

  32/-j
  # 77363 result bytes | 22 allocs: all=13436581 peek=8167393
  # Code 0:147 secs, BZ2 I/O 0:026 secs
  # 1243312 result bytes | 4 allocs: all=5356721 peek=4113408
  # Code 0:003 secs, BZ2 I/O 0:010 secs
  64/-j
  # 76847 result bytes | 22 allocs: all=18712165 peek=11781689
  # Code 0:152 secs, BZ2 I/O 0:026 secs
  # 1243312 result bytes | 4 allocs: all=5362673 peek=4119360
  # Code 0:003 secs, BZ2 I/O 0:011 secs

  32/-Z
  # 79990 result bytes | 20 allocs: all=46196322 peek=40927134
  # Code 0:146 secs, ZSTD I/O 0:106 secs
  # 1243312 result bytes | 4 allocs: all=7176153 peek=5932840
  # Code 0:002 secs, ZSTD I/O 0:001 secs
  64/-Z
  # 80168 result bytes | 20 allocs: all=51471906 peek=40933530
  # Code 0:155 secs, ZSTD I/O 0:105 secs
  # 1243312 result bytes | 4 allocs: all=7182105 peek=5938792
  # Code 0:004 secs, ZSTD I/O 0:001 secs

  32/-z
  # 85292 result bytes | 23 allocs: all=7786641 peek=6512501
  # Code 0:146 secs, ZLIB I/O 0:029 secs
  # 1243312 result bytes | 4 allocs: all=2532505 peek=2492577
  # Code 0:002 secs, ZLIB I/O 0:002 secs
  64/-z
  # 85643 result bytes | 23 allocs: all=13062225 peek=11781689
  # Code 0:153 secs, ZLIB I/O 0:029 secs
  # 1243312 result bytes | 4 allocs: all=2538457 peek=2498529
  # Code 0:002 secs, ZLIB I/O 0:002 secs

2. Releases
-----------

v0.10.0, 2026-06-22:
  - Add optional (s_BSDIPA_TEXT) simple texual line-based mode.

v0.9.1, 2026-05-09:
  + perl/, s-bsdipa/: detect availability of I/O layers via compilation
    test, instead of through existence of command line utilities.

  - s-bspatch.c: verify all control tuples were consumed, plus tweaks.
    This file is now ISC copyright.
  - Add optional zstd (Zstandard, libzstd) _IO method support.

v0.9.0, 2025-12-24:
  + Breaks backward compatibility of bsdipa_patch() as that assumes
    patches satisfy content constraints that are only satisfied by
    bsdipa_diff() of v0.9.0!!

  + Import of Colin Percival's original qsufsort() algorithm; it was
    replaced with libdivsufsort in FreeBSD, and Colin Percival pointed
    me to this because of existing bugfixes.  The original variant is
    smaller code, but suffers from a performance penalty on large files
    (about 15% for unrelated 5 megabyte binaries) -- it is faster for
    small files, however, so having it around is very beneficial.
  ++ By default both algorithms are compiled in, their usage depends on
     the data size.

v0.8.1, 2025-12-20:
  + Notes:
    Not released, bsdipa_patch() includes a patch content constraint
    test that is not satisfied before v0.9.0!
  + Warning:
    Miscompilations (of libdivsufsort) with gcc 15.2.0 and -O3 and above!
    As well as in sanitizer compilations.
    (clang 21.1.6 ok.)  (All on Linux.)
  - Tighten tested constraints on patch content.
  - Optimize away needless work when applying patch.
  - Fix beflen!=0 aftlen==0 "algorithm error" in BSDiff resulting in SEGV.
  - ZLIB I/O: allow compression config via cookie.
  - perl: make official core_try_oneshot_set(), add core_diff_level_set().
  - Add optional BZ2 (BZIP2, libbz2) _IO method support.

v0.8.0, 2025-07-03:
  - ABI and API breakage.
  - Fixes a cast that could have lost bits on systems with a 64-bit
    bsdipa_off_t and a 32-bit size_t (if any).
  - Adds an "is equal data" state.
  - Adds optional XZ (LZMA2, liblzma, XZ utils) _IO method support.
  - Adds I/O cookie support (yet only for XZ): cookie can be reused by
    successive calls to diff/patch, which can aid in dramatical
    reduction of resource aquire/release cycles.
  - s-bsdipa is now a real program, with options, manual, test, etc.
  - Coverity.com (project 31371 / s-bsdipa) still sees us 0.0.

v0.7.0, 2025-02-19:
  - CHANGE: honour s_bsdipa_patch_ctx::pc_max_allowed_restored_len
    already on the s-bsdipa-io.h layer, directly after having called
    s_bsdipa_patch_parse_header().  (Ie, before the ".pc_patch_dat
    allocation" even.)
  - FIX for s-bsdipa example program: when compiled without NDEBUG
    it would munmap(2) invalidated pointer/length combo.

v0.6.1, 2025-02-17:
  - Coverity.com (project 31371 / s-bsdipa) FIXes for the s-bsdipa
    example program: one unused value, one fd resource leak.
    (Tool design changed without that being adopted in early design
    stage: obviously not enough iterations and/or too much fuzz.)
  - bsdipa_patch() CHANGE: until now field lengths were not verified
    in the (unusual) .pc_patch_dat==NULL case, as the user was expected
    to have done this before; instead, always check anything.
  -- Do not increment minor number nonetheless, no ABI change.

v0.6.0, 2025-01-31:
  - Adds struct s_bsdipa_patch_ctx::pc_max_allowed_restored_len, which
    allows to configure the maximum allowed size of the restored data.
    (Mostly for perl or other possible script/xy interfaces, the
    C interface as such has s_bsdipa_header::h_before_len ...)

v0.5.3, 2025-01-17:
  - FIXes totally false buffer usage blindlessly introduced to fix
    (correct .. but nonetheless false) cpantesters.org assertion
    failure.  (That is: it is binary data so NUL termination is a fake,
    .. but that is how it has to be, stupid!)
    What a mess.

v0.5.2, 2025-01-09:
  - CHANGE/FIX: ensure patch fits in _OFF_MAX, including control data.
    s_bsdipa_patch_parse_header() did verify that on the patch side,
    but on the diff side we yet did not care, as in theory the data
    could have been stored in individual chunks.
  - FIX: perl CPAN testers started failing (in a second round?)
    due to assertion failures regarding SV_HAS_TRAILING_NUL and that
    missing.  Therefore ensure our memory results have one byte in
    addition and do always terminate them.
  - more perl module creation related tweaks.

v0.5.1, 2025-01-05:
  - perl module creation related tweaks.

v0.5.0, 2024-12-26: (first release)

# s-ts-mode
