HeavyThing - toplip/toplip.asm

Jeff Marrison

	; ------------------------------------------------------------------------
	; HeavyThing x86_64 assembly language library and showcase programs
	; Copyright © 2015-2018 2 Ton Digital 
	; Homepage: https://2ton.com.au/
	; Author: Jeff Marrison <jeff@2ton.com.au>
	;       
	; This file is part of the HeavyThing library.
	;       
	; HeavyThing is free software: you can redistribute it and/or modify
	; it under the terms of the GNU General Public License, or
	; (at your option) any later version.
	;       
	; HeavyThing is distributed in the hope that it will be useful, 
	; but WITHOUT ANY WARRANTY; without even the implied warranty of
	; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
	; GNU General Public License for more details.
	;       
	; You should have received a copy of the GNU General Public License along
	; with the HeavyThing library. If not, see <http://www.gnu.org/licenses/>.
	; ------------------------------------------------------------------------
	;
	; toplip.asm: Command line file encryption/decryption utility that includes
	;   plausible deniability, PNG/JPG embedding, very strong anti-brute-forcing,
	;   very strong encryption.
	;
	; Full commentary below.
	;
	; usage: ./toplip [-b] [-d] [-r] [-m mediafile] [[-nomix|-drbg][-1][-c COUNT][-i ITER ]-alt inputfile] [-nomix|-drbg][-1][-c COUNT][-i ITER ]inputfile
	; -b == input/output in base64 (see below notes)
	; -d == decrypt the inputfile
	; -r == generate (and display of course) one time 48 byte each pass phrases as base64
	; -m mediafile == for encrypting only, merge the output into the specified mediafile.
	;      Valid media types: PNG, JPG (plain JFIF or EXIF).
	;      (Note that decrypting will auto-detect and attempt to extract if the inputfile for 
	;      decryption is given a media file).
	; -1 == for each input file (-alt or main), this option disables the use of cascaded
	;      AES256, and instead uses a single AES256 context (two for the XTS-AES stage).
	; -c COUNT == for each input file (-alt or main), this option overrides the default count
	;      of one (1) passphrase. Specifying a higher count here will ask for this many actual
	;      passphrases, and generate this number of separate key material and crypto contexts
	;      that are then used over-top of each other.
	; -i ITER == for each input file (-alt or main), specify an alternate iteration count
	;      for scrypt's internal use of PBKDF2-SHA512 (default is 1). For the initial 8192
	;      bytes of key material, and before one-way AES key grinding of same, we use scrypt
	;      and this option overrides how many iterations of PBKDF2-SHA512 it will perform
	;      for each passphrase. (NOTE: this can _dramatically_ increase the calc times).
	;      Hex values or decimal values permitted (e.g. 10, 0xfff, etc).
	; -drbg == for each input file, by default the 8192 bytes of key material is xor'd with
	;      TLSv1.2 PRF(SHA256) of the supplied passphrase(s). This option will mix the key
	;      material with HMAC_DRBG(SHA256) instead.
	; -nomix == for each input file (see -drbg), this option specifies no additional mixing
	;      of the scrypt generated 8192 byte key material.
	; -alt inputfile == generate one or more "Plausible Deniability" file (encrypting only)
	;      This will ask for another set of passphrases, which MUST NOT be the same.
	;      Without this option, three alternate contents are randomly generated such that it is
	;      impossible to tell by examining the encrypted output whether there is or is not
	;      anything other than pure random. See the -noalt option for what happens without
	;      this option. This option can be specified up to 3 times (for a max of 4 files).
	; -noalt == Do not generate additional random data. By default, extra random data is
	;      inserted into the encrypted output such that forensic analysis (with a valid set
	;      of passphrases) on a given encrypted output does not cover all of the ciphertext
	;      present. See further commentary below about why the default setting is a good thing.
	;      Specifying this option means that no extra random data is inserted into the output
	;      (and this might be useful if you do not need plausible deniability, or you are
	;      dealing with very large files).
	; if -b is specified for encrypt, base64 of the encrypted goods is output to stdout
	; if -b is specified for decrypt, it is assumed the input is base64, and plaintext is output to stdout
	;
	; NOTE: for base64 and media merging, all crypto input/output must be _BUFFERED IN MEMORY_.
	; NOTE 2: for plausible-deniability automatic extras, they will be IN MEMORY also, so for very large
	;      inputs, make sure you have enough VM.
	;
	; passphrase acquisition is done via stdin with stderr prompts if -r not specified
	;
	;
	; The reason this code exists is because there is a distinct lack of command-line crypto
	; that has the following features (top-level overview, keep reading for a more technical explanation):
	;    1) Plausible deniability: The ability to embed multiple payloads with different
	;       passphrase materials inside the same crypto block in a way that makes it impossible
	;       to conclusively identify the number of payloads that exist (if any). While this
	;       of course does not cover actual "Rubber-hose cryptanalysis", what it does provide
	;       are ways to plausibly open a crypted bundle with controlled exposure risk, and no
	;       way outside the rubber-hose method for an attacker to know positively whether MORE
	;       payloads exist or not.
	;    2) Multiple passphrase protection: The ability to, at encryption time, specify the number
	;       and complexity (iterations, mixing) of passphrases for a given payload. Used properly, this
	;       dramatically increases the difficulty level for password-based brute force recovery.
	;    3) No easily identified output markers or alignment: No "file header", and no fixed
	;       alignment for output means that it is impossible to determine by quick examination
	;       whether a file with otherwise-random data was produced by this code or not.
	;    4) The ability to embed and extract crypted materials in common image types (PNG/JPG):
	;       Unless you are working on crypto, it is unlikely that you have a reason to have a
	;       slew of files that look like random data in your posession. Placing important or
	;       otherwise private encrypted materials inside images means that a casual observer will
	;       not discover there is any extraneous data inside the image to begin with (and the same
	;       images can be opened/viewed without revealing they are carrying payloads).
	;    5) Simplified protection against brute force recovery: OpenSSL's enc/dec command line
	;       certainly works, but as has been documented repeatedly over the years, its key
	;       derivation function appears to still use MD5, and it still appears that we can't set
	;       even the iteration count it uses. We employ scrypt-SHA512 for the base key derivation
	;       and depending on command-line options, mix in either TLSv1.2 PRF or HMAC-DRBG with it,
	;       and allow command-line options to specify the PBKDF2 iteration count (that scrypt
	;       itself uses). In addition to this, we use the resultant 8192 bytes of key material
	;       as one-way sets of AES256 keys, and then use AES256 as a CSPRNG to further grind
	;       on the 8192 bytes of key material. Combined with the option to specify an arbitrary
	;       number of multiple passphrases per encrypted input file, this effectively renders
	;       password-based brute force recovery a very difficult exercise.
	;
	; Cryptographic design motivations:
	; 1) Passphrase-based key derivation
	;    Computationally expensive key derivation, combined with optional cascading of multiple keys
	;    was the primary objective. This was specifically to render high-speed passphrase brute forcing
	;    difficult, and in a user-specified way (with command line options that obviously need to be
	;    remembered along with passphrases themselves).
	; 
	;    Thanks to cryptocurrency proliferation, there are now a great many hash accelerators. Things
	;    like hashcat.net (hats off to that) and others mean that "simple" or straightforward hash based
	;    KDF don't provide the same level of protection for passphrases. By modifying scrypt to make use
	;    of HMAC-SHA512 in its initial and final stages of PBKDF2, and further by allowing arbitrary
	;    iterations counts for same, we can effectively render "purist" hash brute forcers ineffective.
	;    Further, that we obviously do not store any part of the "computed hash" (key material) in the
	;    output, means that all brute force attempts will require a minimum number of calculations,
	;    much in the same way that traditional disk encryption uses. By combining the output from
	;    scrypt-SHA512 with other PRFs and performing one-way AES256 key "grinding", the task for
	;    brute forcing is considerably more expensive than any single PBKDF method.
	;
	; 2) Plausible deniability payload discovery
	;    The ability to optionally embed multiple payloads inside the same contiguous output without
	;    disclosing the existence of same, each with their own separate passphrases/key materials.
	;
	; 3) Payload confidentiality
	;    Making use of XTS-AES in combination with our cascaded AES256, as well as cascaded XTS-AES
	;    itself.
	;
	; Technical design commentary:
	; 1) Underlying crypto methods
	;    No "new" or "homebrew" crypto has been used here. AES256, HMAC-SHA512, scrypt (and thus
	;    PBKDF2), TLSv1.2 PRF(SHA256), HMAC-DRBG(SHA256), and XTS-AES (via htxts) have been put together
	;    in carefully considered ways to achieve the aforementioned design goals.
	;
	;    As explained briefly above, each passphrase supplied is passed to scrypt-sha512 to derive 8192
	;    bytes of key material. Command line options allow for specifying a >1 PBKDF2 iteration count
	;    that the underlying scrypt function uses. Note that the SALT used for scrypt (and other mixing
	;    functions) is 32 bytes of PRNG output.
	;
	;    Depending on command line options, this 8192 bytes is then further manipulated (listed per opt):
	;      -nomix: Nothing further is done.
	;      -drbg: HMAC_DRBG(SHA256) is used, seeded with SALT || SHA512(passphrase), to generate an 
	;             additional 8192 bytes of key material, which is then xor'd with the scrypt output.
	;      default: TLSv1.2 PRF(SHA256) is used, secret = passphase, label = 'key derivation', seed =
	;               SALT to generate an additional 8192 bytes of key material, which is then xor'd with
	;               the scrypt output.
	;
	;    Note that our use of scrypt, TLSv1.2 PRF, and HMAC_DRBG for key derivation are not NIST/FIPS
	;    approved methods of key derivation.
	;
	;    Once the above has derived 8192 bytes of key material, that is passed to the "htcrypt" set of
	;    functions, which is an encapsulation of 256 separate AES256 encryption and decryption contexts.
	;    As explained in the htcrypt.inc commentary, htcrypt's init then takes the first 64 bytes of
	;    the scrypt-generated key material, and uses it as its "main sequencer." It then initializes
	;    256 AES256 encryption contexts using 32 bytes each of the original 8192 bytes (32x256 == 8192).
	;    Then, using the last 64 bytes of the original 8192 bytes of key material as a "temporary
	;    sequencer", it encrypts the full 8192 bytes of key material 64 times, using the temporary
	;    sequencer bytes as indexes to the AES256 encryption contexts initialized before. It performs
	;    this "reencrypting the 8192 bytes of key material 64 times" a full 1024 times. This is
	;    effectively using AES256 as a CSPRNG to further arrive at a computationally difficult final
	;    set of 8192 bytes key material.
	;
	;    Once the final 8192 bytes of key material is arrived at, htcrypt then initializes all 256
	;    separate AES256 encryption and decryption contexts using 32 bytes each in succession. Unless
	;    the -1 option is specified, for later-done encrypt operations, each call to htcrypt$encrypt 
	;    actually results in 64 separate AES256 block encryptions, and the contexts used are determined 
	;    by the initial "main sequencer" noted earlier in forward order. For decrypt operations, the 
	;    "main sequencer" bytes are used in reverse for the underlying AES256 decrypt calls. If the -1
	;    option is set, only one AES256 context is used (at the end of the 8192 bytes of generated
	;    key material), and thus does not use cascaded AES256 like the default.
	;
	;    For the 128 bytes of HEADER_OR_IV described below, htcrypt encrypt/decrypt is then used in
	;    a CBC manner. The 16 byte IV is randomly generated, and then for each set of htcrypt
	;    contexts (one per passphrase set), the HEADER = HEADER xor IV, htcrypt$encrypt HEADER, 
	;    IV = IV xor HEADER, htcrypt$encrypt IV. (see below for more detail)
	;
	;    For the actual payload encryption, we make use of the "htxts" set of functions, which is
	;    identical to AES-XTS except for the use of htcrypt$encrypt and htcrypt$decrypt instead of
	;    a single AES256 encrypt/decrypt. The initial XTS 16 byte tweak value is the unencrypted
	;    first 16 byte IV (randomly generated). The first block of htxts plaintext is prepended with
	;    64 bytes of additional random IV.
	;
	;    For integrity verification, an HMAC-SHA512 is appended to the output, along with a partial
	;    randomly sized garbage block (to prevent all outputs from being 16-byte aligned).
	;
	; 2) Effective Security
	;    Reiterated:
	;    No "new" or "homebrew" crypto has been used here. AES256, HMAC-SHA512, scrypt (and thus
	;    PBKDF2), TLSv1.2 PRF(SHA256), HMAC-DRBG(SHA256), and XTS-AES (via htxts) have been put together
	;    in carefully considered ways to achieve the aforementioned design goals.
	;
	;    The point of cascading AES256 was not necessarily to increase the security of AES256 in and
	;    of itself, but to mandate the use and generation of a full 8192 bytes of key material. At best
	;    this dramatically increases its security, and at worst it falls back to a single-key AES256.
	;    If -1 is specified, thereby disabling cascaded AES256, then all of the encrypt/decrypt ops
	;    are "by the book" and the key material used for the single (and double for XTS) AES256 is
	;    taken from the end of the 8192 bytes of key material, still enforcing the use of 8192 bytes.
	;    If -1 is not specified (the default), then the effective security at the block level for the
	;    HEADER_OR_IV portion is somewhere above a single 32 byte AES256 key, and for the XTS portion
	;    somewhere above two 32 byte keys, and for our requirements this is more than sufficient.
	;    Noting here that there is ongoing debate about what the actual effective security gains are
	;    from performing cascaded AES256, but for the purposes herein, regardless of how that debate
	;    turns out, the design goals are satisfied.
	;
	;    At the time of this writing:
	;    Since related key attacks do not apply to this design, at the worst our HEADER_OR_IV (see below)
	;    security sits at 2**254, and if cascaded AES256 is enabled possibly much higher still. For the
	;    XTS payload encryption, since it uses two full sets of AES256 keys, and at worst we are at a
	;    "non-key-halved" full security of XTS-AES. If indeed cascaded AES256 lends to a security
	;    increase, it could further be argued that the use of the 64 bytes of sequencing material
	;    also contribute to the overall security. Either way, our baseline effective security is more
	;    than sufficient for the design goals herein.
	;
	;    The command line options for constructing key material complexity provide for a much higher than
	;    "typical" level of passphrase-based security, and at the end of the day, the effective security
	;    is most likely tied to the passphrase(s) as it should be. (Noting of course you could provide
	;    extremely high entropy passphrases and shift it the other way.)
	;
	; 3) Encryption output format
	;    Each output starts with a 32 byte SALT, followed by eight 16 byte blocks, each of which can be
	;    used as an IV or a HEADER block. These eight blocks are randomly ordered and selected at encrypt
	;    time, and a HEADER consists of the start and end offsets in the overall crypto stream. Which 2
	;    blocks get used by a given stream is chosen at random. If plausible deniability is enabled (and
	;    thus more than one encryption stream exists), each one chooses its own 2 random blocks. For
	;    unused blocks, they are simply PRNG initialized.
	;
	;    The order of input files is randomized before encryption begins (which may include "bogus"
	;    files, and/or alternates).
	;
	;    Following the aforementioned 32+128 bytes, up to four separate htxts encrypted payloads follow.
	;    Each plaintext payload is prepended with 64 bytes of PRNG, appended with a block sized
	;    padding, followed by an HMAC_SHA512 of the plaintext, followed by a random length (1..15) of
	;    garbage data.
	;
	;    Note that the 64 byte PRNG "preamble" does not increase security in any way, because we are not
	;    making use of block chaining (XTS-AES method is used). We do however include it in the HMAC
	;    calculation, as well as use the trailing 4 bits as our padding indicator.
	;
	;    Thus:
	;          [0..31] == SALT
	;          [32..47] == HEADER_OR_IV[0]
	;          [48..63] == HEADER_OR_IV[1]
	;          [64..79] == HEADER_OR_IV[2]
	;          [80..95] == HEADER_OR_IV[3]
	;          [96..111] == HEADER_OR_IV[4]
	;          [112..127] == HEADER_OR_IV[5]
	;          [128..143] == HEADER_OR_IV[6]
	;          [144..159] == HEADER_OR_IV[7]
	;          [160...] == crypted materials and PRNG mix (one or more)
	;          crypted material HTXTS(
	;            [0..63] == PRNG output
	;            [64..X] == plaintext
	;            [X..block-size-padded] == PRNG
	;            HMAC-SHA512(plaintext)
	;            garbage (random length 1..15)
	;
	;    HEADER and IV indexes are randomly selected (the list of 8 is scrambled initially)
	;
	;    All of the HEADER_OR_IV entries are initialized with PRNG output, and then for each input file
	;    specified (depending on -alt, -noalt, etc), an IV and HEADER block is selected.
	;
	;    If -alt is specified once or more (thus plausible deniability, aka multiple sets of valid keys/files),
	;    for each file specified an additional HEADER and IV is chosen. The list of input files is also 
	;    randomized. Four minus the number of input files "bogus" files are added and are PRNG-only output.
	;
	;    If -alt is not specified, and -noalt is also not specified (thus, one set of keys and one file),
	;    then three additional PRNG "bogus" files are added to the list and is PRNG-only output.
	;
	;    If -noalt is specified, then the a single header/IV is chosen, but the actual encrypted contents
	;    begin at offset 160 and no extra random data (other than the garbage/padding above) is added.
	;
	; 4) Decryption discovery/HEADER validity checking pseudocode
	;    For x in 0..7
	;      For y in 0..7
	;        if y <> x
	;          copy HEADER_OR_IV[x] to tempbuf[0]
	;          copy HEADER_OR_IV[y] to tempbuf[1]
	;          foreach key in reverse-order keys (keys == passphrase-derived htcrypt contexts)
	;            key.decrypt(tempbuf[0])
	;            xor tempbuf[0] with tempbuf[1]
	;            key.decrypt(tempbuf[1])
	;            xor tempbuf[1] with tempbuf[0]
	;          if (both qwords of tempbuf[1] are within bounds of filesize, and low qword < high qword)
	;            initial XTS tweak = tempbuf[0]
	;            start_ofs = low qword of tempbuf[1]
	;            end_ofs = high qword of tempbuf[1]
	;            goto success
	;          end if
	;        end if
	;    if we made it through the loop and arrived here, fail.
	;    success:
	;    previous_key = null
	;    foreach key in forward-order keys (keys == passphrase-derived htcrypt contexts)
	;      key.tweak = initial XTS tweak (from above discovery loop result)
	;      if previous_key <> null
	;        previous_key.encrypt(key.tweak)
	;      end if
	;      previous_key = key
	;
	; 5) Hacking and brute force recovery notes
	;   5.1) Password/passphrase based
	;     Key derivation is intentionally expensive, and with command line options to increase its
	;     complexity manifold. At a minimum, for each set of derived key material, the discovery process
	;     outlined in section #4 above would need to be completed in order to validate a given set
	;     of passphrase inputs. (Of course, you could skip the discovery process and attempt the
	;     XTS-AES sections instead, but then you would be forced to guess the location and extents of
	;     the actual payload, versus the HEADER/IV discovery process in section #4 which would yield
	;     those values anyway and is simpler compared to the XTS-AES process itself.)
	;
	;     The defaults contained herein, like many other key derivation settings, are a compromise
	;     between acceptable user-experience delays and the associated computational difficulty of
	;     brute forcing. At these levels on a local development machine, approximately 1.7 attempts per
	;     second per CPU core are possible.
	;
	;     For non-default settings, especially where multiple passphrases and high iteration counts
	;     are specified, things quickly become intractable (at least on consumer-grade hardware). By
	;     increasing the iteration count to 100,000 on the same local development machine the time
	;     per attempt goes up to 23 seconds per CPU core per passphrase.
	;
	;   5.2) Encryption and XTS description
	;     The SALT is only used for key material generation, and is not used at all for any subsequent
	;     encryption operations. Inside htcrypt.inc, if you enable htcrypt_debug_keymaterial, it will
	;     output the sequence and 256 AES256 keys that it uses to stderr.
	;
	;     Per the discovery method outlined in #4 above, the IV and HEADER are "CBC-style" encrypted,
	;     and the PRNG-generated plaintext IV is used as the initial XTS tweak block for actual payload
	;     recovery. Further to this, 64 bytes of random data is prepended to the first XTS block's
	;     plaintext (though as mentioned above, these 64 bytes do not affect security).
	;
	;     What this effectively means is that the XTS enc/decrypt operations are linked to the CBC-style
	;     IV and HEADER portion due to the use of the decrypted IV as the initial XTS tweak value.
	;
	;     At the time of encryption, the plaintext IV (which is then used as the initial XTS tweak) is
	;     PRNG output. The "CBC-style" means that the IV + HEADER are encrypted in the following way
	;     (per key in forward order):
	;         xor HEADER with IV (which is initial random plaintext IV, or last ciphertext IV)
	;         encrypt HEADER (set last ciphertext to result)
	;         xor IV with HEADER (which is last ciphertext)
	;         encrypt IV (set last ciphertext to result)
	;     then if there are more keys in use, the IV is still the "last ciphertext", per normal CBC-mode
	;     operation, despite reiterating over the same two blocks. (See section #4 for more details.)
	;
	;     For an attacker who has nothing but output from this code to work with, and is attempting to
	;     extract plaintext payload(s), there is no way to discern which of the 8 initial 16 byte blocks
	;     are IV or HEADER, so the same discovery process outlined in #4 would have to apply to any
	;     attempts on the HEADER_OR_IV blocks themselves.
	;
	;     In order to recover any plaintext payload(s), all sets of key material (and their respective
	;     AES256 contexts along with their sequencing), the decrypted IV, and the decrypted file offsets
	;     (start/end) are required.
	;
	;     If -1 was specified for a given inputfile, then the "htxts" operations here do not use
	;     cascaded AES256, and instead do "normal" single AES256 encryption. The tweak encryption key
	;     and the data encryption key in this mode are chosen from the end of the final 8192 bytes of
	;     key material. If -1 was not specified, thus cascaded AES256 is enabled, then the "htxts"
	;     methods used here are still done per the XTS-AES standard, but instead of using a single
	;     AES256, we use our cascaded AES256 htcrypt operations described above, and the tweak key
	;     is the first unused (in the main sequence) AES256 context, thus providing at most 65 full
	;     AES256 contexts for a normal htxts operation. "At most" being that the htcrypt sequencing
	;     may actually contain duplicate indexes.
	;
	;     Note that if multiple passphrases (and thus key material) were specified for a given inputfile,
	;     that even if -1 is specified, these do end up in a cascaded manner for both the HEADER_OR_IV
	;     and the XTS portion as outlined above.
	;
	;     Unlike traditional disk-based XTS-AES, we use the initial 16 byte plaintext (PRNG-generated)
	;     IV for the initial tweak value. This has the pleasant side effect that two plaintexts encrypted
	;     with the same set of key materials will not result in identical outputs.
	;
	;     htxts encrypt does (per block, which is 2048 bytes each):
	;       AES256 encrypt tweak with htcrypt.aeskeys[tweak key index] (not cascaded)
	;       for i in 0..127
	;         xor subblock[i] with tweak
	;         htcrypt.encrypt(subblock[i])
	;         xor subblock[i] with tweak
	;         LSFRshift(tweak)
	;     so to encrypt a block, the above is run in forward order for each set of keys, modifying the
	;     tweak (per key, each key has its own unique tweak) in-place through all blocks.
	;
	;   5.3) scrypt key derivation implementation and mixing strategies
	;     The HeavyThing scrypt implementation is a reference one, with N=1024, r=1, p=1 except for the
	;     fact that instead of using SHA256 as scrypt-proper does, we use SHA512 to initialize the state,
	;     and then for the final output stage, again we use SHA512. Note that the PBKDF2 iteration counts
	;     that scrypt uses for its init and final output stages we allow to override the default of 1.
	;
	;     The reason that mixing strategies of the final scrypt key material are employed is twofold:
	;     First, that there may be some issue with the scrypt output itself, and secondly that it forces
	;     any attacker to implement multiple key generation techniques as we have done here.
	;
	;   5.4) PRNG implementation
	;     The underlying HeavyThing library uses a modified version of Agner Fog's SFMT and Mother-of-all
	;     generators. He specifically states that they are safe to use _provided_ that only the combined
	;     output is accessible to an attacker, and that a complete subsequence of the output (in our case
	;     1408 bytes) is never revealed. Care is taken to satisfy these requirements.
	;
	;   5.5) HMAC-SHA512 integrity verification
	;     The primary purpose for appending HMAC-SHA512 is to verify the integrity of the encrypted
	;     payload to ensure that no tampering or other bit errors occur. Since the payload encryption is
	;     done with XTS-AES, single bit errors do not corrupt the entirety that follows, and as such may
	;     not necessarily be self-evident without employing integrity verification.
	;

include '../ht_defaults.inc'
include '../ht.inc'

passphrase_default_count = 1
passphrase_default_iter = 1



globals
{
	do_enc		dd	1
	do_b64		dd	0
	do_pwd		dd	0	; bool as to whether we generate one-offs for passphrases
	do_cascaded	dd	1	; bool as to whether we use cascaded AES256 in htcrypt or not, default true.
	pcount		dd	passphrase_default_count	; how many passphrases we'll get
	piter		dd	passphrase_default_iter		; how many PBKDF2-SHA512 iterations we do for scrypt
	pmix		dd	2	; default key material mixer
	noalt		dd	0	; set if -noalt is passed
	next_is_inputfile	dd	0	; bool for argscan using list$foreach
	next_is_mediafile	dd	0	; bool for argscan using list$foreach
	next_is_count	dd	0	; bool for argscan
	next_is_iter	dd	0	; bool for argscan
	outbuf		dq	0	; buffer if we are outputting anything other than straight to stdout
	outmedia	dq	0	; if -m mediafile was specified, this is the source media to merge output with (privmapped)
	termios		dq	0	; heap allocated spot to save our initial termios (so we can +/-ECHO)
	inputfiles dq	0		; list of inputfiles to deal with
	rofs		dq	128
	headerblocks	dq	0	; list of 0..7, shuffled for encryption block selection
	salt		dq	0	; buffer to hold the 32 byte SALT
	headerbuf	dq	0	; buffer to hold the 128 byte HEADER_OR_IV section
	firstkey	dd	1	; flag as to whether we are getting the first key or not
}

	; this is called setup for a syscall_write
falign
output:
	cmp	qword [outmedia], 0
	jne	.output_buffer
	cmp	dword [do_b64], 0
	jne	.output_buffer
	; otherwise, syscall it is
	syscall
	ret
.output_buffer:
	; rsi + rdx is our desired output
	mov	rdi, [outbuf]
	call	buffer$append
	ret

calign
output_flush:
	cmp	qword [outmedia], 0
	jne	.tomedia
	cmp	dword [do_b64], 0
	jne	.tobase64
	ret
.tomedia:
	call	outmedia$merge
	ret
.tobase64:
	; convert outbuf to base64 and send to stdout
	call	buffer$new
	mov	r8, [outbuf]
	push	rax
	mov	rdi, rax
	mov	rsi, [r8+buffer_itself_ofs]
	mov	rdx, [r8+buffer_length_ofs]
	xor	ecx, ecx
	call	buffer$append_bintobase64_latin1
	mov	rcx, [rsp]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, [rcx+buffer_itself_ofs]
	mov	rdx, [rcx+buffer_length_ofs]
	syscall
	pop	rdi
	call	buffer$destroy
	ret


inputfile_name_ofs = 0
inputfile_privmapped_ofs = 8
inputfile_buffer_ofs = 16
inputfile_bogus_ofs = 24
inputfile_size_ofs = 32
inputfile_garbage_ofs = 40
inputfile_padlen_ofs = 44
inputfile_start_ofs = 48
inputfile_end_ofs = 56
inputfile_keys_ofs = 64
inputfile_mac_ofs = 72
inputfile_srcptr_ofs = 80
inputfile_totalsize_ofs = 88
inputfile_macbuf_ofs = 96
inputfile_pcount_ofs = 104
inputfile_piter_ofs = 112
inputfile_mix_ofs = 120
inputfile_cascaded_ofs = 128

inputfile_size = 136



	; single argument in rdi: 
falign
public inputfile$new
inputfile$new:
	push	rdi					; save a copy of hte name
	xor	esi, esi
	call	privmapped$new
	test	rax, rax
	jz	.nodeal
	push	rax					; save privmapped
	mov	edi, inputfile_size
	call	heap$alloc_clear
	pop	rdi rsi
	mov	[rax+inputfile_name_ofs], rsi
	mov	[rax+inputfile_privmapped_ofs], rdi
	mov	rcx, [rdi+privmapped_size_ofs]
	mov	[rax+inputfile_size_ofs], rcx
	add	rcx, 0xf
	and	rcx, not 0xf
	sub	rcx, [rax+inputfile_size_ofs]
	mov	[rax+inputfile_padlen_ofs], ecx
	; set pcount and piter to whatever is currently in the global context
	mov	r8d, [pcount]
	mov	r9d, [piter]
	mov	r10d, [pmix]
	mov	r11d, [do_cascaded]
	mov	[rax+inputfile_pcount_ofs], r8d
	mov	[rax+inputfile_piter_ofs], r9d
	mov	[rax+inputfile_mix_ofs], r10d
	mov	[rax+inputfile_cascaded_ofs], r11d
	; reset pcount and piter to their defaults
	mov	[pcount], passphrase_default_count
	mov	[piter], passphrase_default_iter
	mov	[pmix], 2
	mov	[do_cascaded], 1
	;
	push	rax
	mov	edi, 1
	mov	esi, 15
	call	rng$int
	mov	rcx, rax
	pop	rax
	mov	[rax+inputfile_garbage_ofs], ecx
	add	ecx, [rax+inputfile_padlen_ofs]
	add	ecx, 128
	add	rcx, [rax+inputfile_size_ofs]
	mov	[rax+inputfile_totalsize_ofs], rcx
	ret
calign
.nodeal:
	pop	rdi
	ret


falign
public inputfile$destroy
inputfile$destroy:
	push	rbx
	mov	rbx, rdi
	cmp	dword [rdi+inputfile_bogus_ofs], 0
	jne	.bogus
	mov	rdi, [rdi+inputfile_mac_ofs]
	call	hmac$destroy
	mov	rdi, [rbx+inputfile_macbuf_ofs]
	test	rdi, rdi
	jz	.nomacbuf
	call	heap$free_clear
.nomacbuf:
	mov	rdi, [rbx+inputfile_privmapped_ofs]
	call	privmapped$destroy
	mov	rdi, [rbx+inputfile_buffer_ofs]
	test	rdi, rdi
	jz	.nobuffer
	call	buffer$destroy
.nobuffer:
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, htcrypt$destroy
	call	list$clear
	mov	rdi, [rbx+inputfile_keys_ofs]
	call	heap$free
	lea	rdi, [rbx+inputfile_start_ofs]
	mov	esi, 16
	call	rng$block
	; leave the rest
	pop	rbx
	ret
calign
.bogus:
	; even though our bogus file is already PRNG, scramble it again
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rdi+buffer_length_ofs]
	mov	rdi, [rdi+buffer_itself_ofs]
	call	rng$block
	mov	rdi, [rbx+inputfile_buffer_ofs]
	call	buffer$destroy
	; leave the rest
	pop	rbx
	ret


	; single argument in rdi: the size of the other actual input file, which we'll randomly pick our bogus size from
falign
public inputfile$new_bogus
inputfile$new_bogus:
	; so the actual size of a normal encryption is:
	; size rounded up to nearest 16 == padlen
	; + 64 IV
	; + 64 HMAC
	; + RNG(1..15 bytes)
	; so if we get a random value of our input file size
	; between 1 and our input file size, then calc the extra required space:
	mov	rsi, rdi
	mov	edi, 1
	call	rng$int
	mov	rcx, rax
	add	rcx, 0xf
	and	rcx, not 0xf
	; rcx now has our file size rounded up to the nearest blocklen
	add	rcx, 128
	push	rcx
	mov	edi, 1
	mov	esi, 15
	call	rng$int
	pop	rcx
	add	rax, rcx
	; now we have an accurate randomized bogus size
	push	rbx
	mov	rbx, rax
	mov	edi, inputfile_size
	call	heap$alloc_clear
	mov	rsi, rbx
	mov	rbx, rax
	mov	dword [rax+inputfile_bogus_ofs], 1
	mov	[rax+inputfile_size_ofs], rsi
	mov	[rax+inputfile_totalsize_ofs], rsi
	; next we need srcptr, and a random buffer
	call	buffer$new
	mov	[rbx+inputfile_buffer_ofs], rax
	mov	rdi, rax
	mov	rsi, [rbx+inputfile_size_ofs]
	call	buffer$reserve
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rbx+inputfile_size_ofs]
	call	buffer$append_nocopy
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rbx+inputfile_size_ofs]
	mov	rdi, [rdi+buffer_itself_ofs]
	mov	[rbx+inputfile_srcptr_ofs], rdi
	call	rng$block
	mov	rax, rbx
	pop	rbx
	ret



falign
public inputfile$load
inputfile$load:
	mov	rsi, [rdi+inputfile_privmapped_ofs]
	mov	rdx, [rsi+privmapped_base_ofs]
	mov	[rdi+inputfile_srcptr_ofs], rdx
	; if do_enc == 0 && do_b64 is 1, we need to base64 decode our input
	cmp	dword [do_enc], 0
	jne	.nothingtodo
	cmp	dword [do_b64], 1
	jne	.mediacheck
	; otherwise, base64 decode our goods
	push	rbx
	mov	rbx, rdi
	call	buffer$new
	mov	[rbx+inputfile_buffer_ofs], rax
	mov	rdi, rax
	mov	rsi, [rbx+inputfile_srcptr_ofs]
	mov	rdx, [rbx+inputfile_size_ofs]
	xor	ecx, ecx
	call	buffer$append_base64tobin_latin1
	; that returned us the number of bytes it wrote
	mov	[rbx+inputfile_size_ofs], rax
	; set our new srcptr to the buffer
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rdi+buffer_itself_ofs]
	mov	[rbx+inputfile_srcptr_ofs], rsi
	add	rax, 0xf
	and	rax, not 0xf
	sub	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_padlen_ofs], eax
	add	eax, [rbx+inputfile_garbage_ofs]
	add	eax, 128
	add	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_totalsize_ofs], rax

	; populate the SALT with the first 32 bytes
	mov	rcx, [salt]
	mov	rdi, [rcx+buffer_itself_ofs]
	; rsi is still set to the srcptr
	mov	edx, 32
	mov	qword [rcx+buffer_length_ofs], 32
	lea	r8, [rdi+32]
	mov	[rcx+buffer_endptr_ofs], r8
	call	memcpy

	pop	rbx
.nothingtodo:
	ret
calign
.mediacheck:
	; not base64, so check and see if it is a recognized media type, and if so, see if we can
	; parse out our goods, otherwise leave it alone
	push	rbx r12 r13
	mov	rbx, rdi
	mov	r12, rdx		; privmapped_base_ofs
	mov	r13, [rsi+privmapped_size_ofs]
	mov	rax, 0xa1a0a0d474e5089
	cmp	r13, 41
	jb	.media_asis
	; see if it starts with a PNG header
	cmp	[r12], rax
	je	.maybe_png
	cmp	dword [r12], 0xe0ffd8ff
	je	.maybe_jfif
	cmp	dword [r12], 0xe1ffd8ff
	je	.maybe_exif
calign
.media_asis:
	; we still need to load up the salt:
	mov	rcx, [salt]
	mov	rdi, [rcx+buffer_itself_ofs]
	mov	rsi, r12
	mov	edx, 32
	mov	qword [rcx+buffer_length_ofs], 32
	lea	r8, [rdi+32]
	mov	[rcx+buffer_endptr_ofs], r8
	call	memcpy
	pop	r13 r12 rbx
	ret
calign
.maybe_exif:
	cmp	dword [r12+6], 'EXIF'
	je	.exif
	cmp	dword [r12+6], 'Exif'
	jne	.media_asis
.exif:
	; we do basically the same thing as for a normal JFIF
	call	buffer$new
	mov	[rbx+inputfile_buffer_ofs], rax
	; skip the app1
	movzx	eax, word [r12+4]
	xchg	ah, al
	add	eax, 4			; +2 for length, +2 for SOI
	add	r12, rax
	sub	r13, rax
calign
.exif_scan:
	cmp	word [r12], 0xecff
	je	.exif_app12
	; see if this is our app2 or SOS
	cmp	word [r12], 0xe2ff
	je	.jfif_done
	cmp	word [r12], 0xdaff
	je	.jfif_done
.exif_skip:
	; otherwise, skip this one, making sure we don't run past the end
	movzx	eax, word [r12+2]
	xchg	ah, al
	add	eax, 2
	cmp	rax, r13
	ja	.undo_jfif
	add	r12, rax
	sub	r13, rax
	jmp	.exif_scan
calign
.exif_app12:
	; make sure the byte at [r12+11] is 0
	cmp	byte [r12+11], 0
	jne	.exif_skip
	mov	rdi, [rbx+inputfile_buffer_ofs]
	lea	rsi, [r12+12]
	movzx	edx, word [r12+2]
	xchg	dh, dl
	sub	edx, 10
	call	buffer$append
	jmp	.exif_skip

calign
.maybe_jfif:
	cmp	dword [r12+6], 'JFIF'
	jne	.media_asis
	; it is a JFIF, create an input buffer, and scan the image
	call	buffer$new
	mov	[rbx+inputfile_buffer_ofs], rax
	; skip the app0
	movzx	eax, word [r12+4]
	xchg	ah, al
	add	eax, 4			; +2 for length, +2 for SOI
	add	r12, rax
	sub	r13, rax
calign
.jfif_scan:
	cmp	word [r12], 0xecff
	je	.jfif_app12
	; see if this is our SOS
	cmp	word [r12], 0xdaff
	je	.jfif_done
.jfif_skip:
	; otherwise, skip this one, making sure we don't run past the end
	movzx	eax, word [r12+2]
	xchg	ah, al
	add	eax, 2
	cmp	rax, r13
	ja	.undo_jfif
	add	r12, rax
	sub	r13, rax
	jmp	.jfif_scan
dalign
.ducky:
	db	'Ducky',0
dalign
.pictureinfo:
	db	'PictureInfo',0
calign
.jfif_app12:
	; if the identifier (at [r12+4]) == 'Ducky'0 or 'PictureInfo'0, skip
	lea	rdi, [r12+4]
	mov	rsi, .ducky
	mov	edx, 6
	call	memcmp
	test	eax, eax
	jz	.jfif_skip
	lea	rdi, [r12+4]
	mov	rsi, .pictureinfo
	mov	edx, 12
	call	memcmp
	test	eax, eax
	jz	.jfif_skip
	; make sure the byte at [r12+11] is 0
	cmp	byte [r12+11], 0
	jne	.jfif_skip
	mov	rdi, [rbx+inputfile_buffer_ofs]
	lea	rsi, [r12+12]
	movzx	edx, word [r12+2]
	xchg	dh, dl
	sub	edx, 10
	call	buffer$append
	jmp	.jfif_skip
calign
.undo_jfif:
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	qword [rbx+inputfile_buffer_ofs], 0
	call	buffer$destroy
	pop	r13 r12 rbx
	ret
calign
.jfif_done:
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rax, [rdi+buffer_length_ofs]
	mov	[rbx+inputfile_size_ofs], rax
	; set our new srcptr to the buffer
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rdi+buffer_itself_ofs]
	mov	[rbx+inputfile_srcptr_ofs], rsi
	add	rax, 0xf
	and	rax, not 0xf
	sub	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_padlen_ofs], eax
	add	eax, [rbx+inputfile_garbage_ofs]
	add	eax, 128
	add	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_totalsize_ofs], rax

	; populate the SALT with the first 32 bytes
	mov	rcx, [salt]
	mov	rdi, [rcx+buffer_itself_ofs]
	; rsi is still set to the srcptr
	mov	edx, 32
	mov	qword [rcx+buffer_length_ofs], 32
	lea	r8, [rdi+32]
	mov	[rcx+buffer_endptr_ofs], r8
	call	memcpy

	pop	r13 r12 rbx
	ret

calign
.maybe_png:
	cmp	dword [r12+12], 'IHDR'
	jne	.media_asis
	; otherwise, we have a PNG file... create an inputfile buffer to store the goods
	; and then walk forward until we find a private chunk that matches our chunk naming
	; convention
	call	buffer$new
	mov	[rbx+inputfile_buffer_ofs], rax
	add	r12, 8
	sub	r13, 16
	mov	ecx, [r12]		; length of the IHDR chunk
	bswap	ecx
	add	r12, 8
	cmp	rcx, r13
	jae	.png_bad
	cmp	ecx, 13
	jne	.png_bad
	; otherwise, 13+4 bytes for the crc to skip the IHDR
	add	r12, 17
	sub	r13, 17
	; commence scanning for our crypto'd chunk
calign
.png_scan:
	cmp	r13, 12
	jb	.png_bad
	mov	eax, dword [r12]
	bswap	eax
	mov	r10d, eax
	add	eax, 12
	cmp	rax, r13
	ja	.png_bad
	movzx	ecx, byte [r12+4]
	movzx	edx, byte [r12+5]
	movzx	r8d, byte [r12+6]
	movzx	r9d, byte [r12+7]
	cmp	ecx, 'a'
	jb	.png_next
	cmp	ecx, 'z'
	ja	.png_next
	cmp	edx, 'a'
	jb	.png_next
	cmp	edx, 'z'
	ja	.png_next
	cmp	r8d, 'Z'
	ja	.png_next
	cmp	r8d, 'A'
	jb	.png_next
	cmp	r9d, 'a'
	jb	.png_next
	cmp	r9d, 'z'
	ja	.png_next
	; otherwise, we have a private chunk that fits the bill
	; its data is at r12+8, its length is sitting in r10
	mov	rdi, [rbx+inputfile_buffer_ofs]
	lea	rsi, [r12+8]
	mov	edx, r10d
	push	r10
	call	buffer$append
	pop	rax
	mov	[rbx+inputfile_size_ofs], rax
	; set our new srcptr to the buffer
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	rsi, [rdi+buffer_itself_ofs]
	mov	[rbx+inputfile_srcptr_ofs], rsi
	add	rax, 0xf
	and	rax, not 0xf
	sub	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_padlen_ofs], eax
	add	eax, [rbx+inputfile_garbage_ofs]
	add	eax, 128
	add	rax, [rbx+inputfile_size_ofs]
	mov	[rbx+inputfile_totalsize_ofs], rax

	; populate the SALT with the first 32 bytes
	mov	rcx, [salt]
	mov	rdi, [rcx+buffer_itself_ofs]
	; rsi is still set to the srcptr
	mov	edx, 32
	mov	qword [rcx+buffer_length_ofs], 32
	lea	r8, [rdi+32]
	mov	[rcx+buffer_endptr_ofs], r8
	call	memcpy

	pop	r13 r12 rbx
	ret
calign
.png_next:
	add	r12, rax
	sub	r13, rax
	jz	.undo_png
	jmp	.png_scan
calign
.undo_png:
	; in the _extremely_ unlikely case that we matched a PNG header, but it was
	; really crypto output (haha), just undo our buffering and let the decode
	; proceed with the input as-is
	mov	rdi, [rbx+inputfile_buffer_ofs]
	mov	qword [rbx+inputfile_buffer_ofs], 0
	call	buffer$destroy
	pop	r13 r12 rbx
	ret
calign
.png_bad:
	mov	rdi, .pngcorrupted
	call	string$to_stderrln
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .pngcorrupted, 'PNG image format parse of inputfile failed.'
	


	; single argument in rdi, the inputfile we are hanging the keys on
falign
public inputfile$keygen
inputfile$keygen:
	cmp	dword [rdi+inputfile_bogus_ofs], 0
	jne	.nothingtodo
	push	rbx r12 r13 r14 r15
	mov	rbx, rdi
	mov	r12d, [rdi+inputfile_pcount_ofs]	; how many passphrases we are getting/generating
	mov	r13d, 1					; the current # for display purposes

	; create our keys list
	call	list$new
	mov	[rbx+inputfile_keys_ofs], rax

	; make room for a full block on our stack
	sub	rsp, htxts_blocksize

	cmp	dword [do_pwd], 0
	je	.pwd_acquire
calign
.generateloop:
	cmp	dword [firstkey], 1
	je	.skiplf1
	mov	eax, syscall_write
	mov	edi, 2
	mov	dword [rsp], 10
	mov	rsi, rsp
	mov	edx, 1
	syscall
.skiplf1:
	mov	dword [firstkey], 0

	mov	rdi, [rbx+inputfile_name_ofs]
	call	string$to_stderr

	mov	rdi, .passphrase_preface
	call	string$to_stderr

	; generate our current passphrase # and display that
	mov	edi, r13d
	mov	esi, 10
	call	string$from_unsigned
	push	rax
	mov	rdi, rax
	call	string$to_stderr
	pop	rdi
	call	heap$free

	mov	rdi, .passphrase_postface
	call	string$to_stderr

	; generate 48 bytes of RNG for each and display them
	lea	rdi, [rsp+(htxts_blocksize-96)]
	mov	esi, 96
	call	rng$block
	; encode the first as our passphrase
	lea	rdi, [rsp+(htxts_blocksize-96)]
	mov	esi, 48
	mov	rdx, rsp
	xor	ecx, ecx
	call	base64$encode_latin1
	; base64_linebreaks is set by default for the HeavyThing library
	; which means it added a CRLF to the end, we need to change it to a single LF
	sub	rax, 1
	mov	byte [rsp+rax-1], 10
	; dump that to stderr
	mov	rdx, rax
	mov	eax, syscall_write
	mov	edi, 2
	mov	rsi, rsp
	push	rdx
	sub	rdx, 1		; skip the LF
	syscall
	pop	rax

	mov	r11, [salt]
	mov	rdx, rsp	; passphrase
	mov	ecx, eax	; length of same
	mov	r8, [r11+buffer_itself_ofs] ; salt
	mov	r9d, 32		; length of same
	mov	r10d, [rbx+inputfile_piter_ofs]
	; save the location of the original passphrase for mixing
	mov	r14, rsp
	mov	r15d, eax
	sub	rsp, 8192
	mov	rdi, rsp
	mov	esi, 8192
	call	scrypt_iter
	; deal with mixing
	mov	rdi, rsp
	mov	eax, [rbx+inputfile_mix_ofs]
	shl	eax, 3
	mov	rsi, [rax+.mixdispatch]
	call	rsi
	mov	rdi, rsp
	call	htcrypt$new_keymaterial
	; if cascaded AES256 is disabled, set the htcrypt's x var to 255
	mov	ecx, [rax+htcrypt_x_ofs]
	mov	edx, 255
	cmp	dword [rbx+inputfile_cascaded_ofs], 0
	cmove	ecx, edx
	mov	[rax+htcrypt_x_ofs], ecx
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, rax
	call	list$push_back
	; clear our stack key material
	mov	rdi, rsp
	mov	esi, 8192
	call	rng$block
	add	rsp, 8192

	; randomize our block/stackframe (which also discards htxts_blocksize worth of RNG state/sequence)
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block

	; update counters, proceed
	add	r13d, 1
	sub	r12d, 1
	jnz	.generateloop

	add	rsp, htxts_blocksize
	pop	r15 r14 r13 r12 rbx
	ret
cleartext .passphrase_preface, '   Passphrase #'
cleartext .passphrase_postface, ': '
cleartext .badpass, 10,'unable to acquire passphrase'
cleartext .keygen, 'generating keys...'
cleartext .donemsg, 'Done'
dalign
.mixdispatch:
	dq	.nomix, .drbgmix, .tlsprfmix
falign
.nomix:
	ret
falign
.drbgmix:
	; rdi == pointer to 8192 bytes of key material output from scrypt
	; r14 == pointer to original passphrase
	; r15d == length of same

	; we want to seed an HMAC_DRBG(SHA256) with a SHA512 of our passphrase
	; and then generate a _separate_ 8192 bytes of key material with that
	; and then xor mix it in with the scrypt output
	push	r12 r13
	sub	rsp, 8192
	mov	r12, rdi		; save our original scrypt output pointer
	
	; copy the salt to the first 32 bytes of the stack
	mov	rcx, [salt]
	mov	rdi, rsp
	mov	rsi, [rcx+buffer_itself_ofs]
	mov	edx, 32
	call	memcpy

	call	sha512$new
	mov	r13, rax
	mov	rdi, rax
	mov	rsi, r14
	mov	edx, r15d
	call	sha512$update
	mov	rdi, r13
	lea	rsi, [rsp+32]
	mov	edx, 1			; cleanup the sha512 state
	call	sha512$final

	; next up, init an HMAC_DRBG and seed it with the 64 byte sha512 final
	mov	rdi, hmac$init_sha256
	mov	rsi, rsp
	mov	edx, 96
	call	hmac_drbg$new
	mov	r13, rax
	; next up, generate 8192 bytes with that
	mov	rdi, rax
	mov	rsi, rsp
	mov	edx, 8192
	call	hmac_drbg$generate
	; cleanup our drbg
	mov	rdi, r13
	call	hmac_drbg$destroy
	; mix (xor) the results
	mov	rdi, r12
	mov	rsi, rsp
	mov	edx, 8192
	call	memxor
	; scramble our stackframe
	mov	rdi, rsp
	mov	esi, 8192
	call	rng$block
	; done, dusted.
	add	rsp, 8192
	pop	r13 r12
	ret
dalign
.tlsprflabel:
	db	'key derivation'	; 14 bytes
falign
.tlsprfmix:
	; rdi == pointer to 8192 bytes of key material output from scrypt
	; r14 == pointer to original passphrase
	; r15d == length of same

	; we want to mix our scrypt output with the TLSv1.2 PRF(SHA256)
	; secret = user supplied passphrase
	; label = 'key derivation'
	; seed = SALT
	push	r12 r13
	sub	rsp, 80
	; our concatenated data starts with our label for 14 bytes
	mov	rax, qword [.tlsprflabel]
	mov	rcx, qword [.tlsprflabel+8]
	mov	[rsp], rax
	mov	[rsp+8], rcx
	mov	r12, rdi		; save our original scrypt output pointer

	; copy our SALT to [rsp+14]
	mov	rcx, [salt]
	lea	rdi, [rsp+14]
	mov	rsi, [rcx+buffer_itself_ofs]
	mov	edx, 32
	call	memcpy
	
	; next up, create an HMAC_SHA256 (so we can use the phash function of it, which is TLSv1.2 PRF)
	call	hmac$new_sha256
	mov	r13, rax
	; the hmac key gets set to our user supplied passphrase
	mov	rdi, rax
	mov	rsi, r14
	mov	edx, r15d
	call	hmac$key
	; now we can call phash_xor directly
	mov	rdi, r13
	mov	rsi, r12		; the scrypt original
	mov	edx, 8192
	mov	rcx, rsp
	mov	r8d, 46			; 14 bytes for label, 32 bytes for SALT
	call	hmac$phash_xor
	; cleanup our hmac state
	mov	rdi, r13
	call	hmac$destroy
	; randomize our stack
	mov	rdi, rsp
	mov	esi, 80
	call	rng$block
	; done, dusted.
	add	rsp, 80
	pop	r13 r12
	ret

calign
.badpassphrase:
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, htcrypt$destroy
	call	list$clear
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block
	mov	rdi, .badpass
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
calign
.pwd_acquire:
	; acquire passphrase
	cmp	dword [firstkey], 1
	je	.skiplf2
	mov	eax, syscall_write
	mov	edi, 2
	mov	dword [rsp], 10
	mov	rsi, rsp
	mov	edx, 1
	syscall
.skiplf2:
	mov	dword [firstkey], 0

	mov	rdi, [rbx+inputfile_name_ofs]
	call	string$to_stderr

	mov	rdi, .passphrase_preface
	call	string$to_stderr

	; generate our current passphrase # and display that
	mov	edi, r13d
	mov	esi, 10
	call	string$from_unsigned
	push	rax
	mov	rdi, rax
	call	string$to_stderr
	pop	rdi
	call	heap$free

	mov	rdi, .passphrase_postface
	call	string$to_stderr

	mov	eax, syscall_read
	mov	edi, 0
	mov	rsi, rsp
	mov	edx, htxts_blocksize
	syscall
	cmp	rax, 0
	jle	.badpassphrase
	push	rax
	mov	rdi, .keygen
	call	string$to_stderr
	pop	rax

	mov	r11, [salt]
	mov	rdx, rsp	; passphrase
	mov	ecx, eax	; length of same
	mov	r8, [r11+buffer_itself_ofs] ; salt
	mov	r9d, 32		; length of same
	mov	r10d, [rbx+inputfile_piter_ofs]
	; save the location of the original passphrase for mixing
	mov	r14, rsp
	mov	r15d, eax
	sub	rsp, 8192
	mov	rdi, rsp
	mov	esi, 8192
	call	scrypt_iter
	; deal with mixing
	mov	rdi, rsp
	mov	eax, [rbx+inputfile_mix_ofs]
	shl	eax, 3
	mov	rsi, [rax+.mixdispatch]
	call	rsi
	mov	rdi, rsp
	call	htcrypt$new_keymaterial
	; if cascaded AES256 is disabled, set the htcrypt's x var to 255
	mov	ecx, [rax+htcrypt_x_ofs]
	mov	edx, 255
	cmp	dword [rbx+inputfile_cascaded_ofs], 0
	cmove	ecx, edx
	mov	[rax+htcrypt_x_ofs], ecx
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, rax
	call	list$push_back
	; clear our stack key material
	mov	rdi, rsp
	mov	esi, 8192
	call	rng$block
	add	rsp, 8192

	mov	rdi, .donemsg
	call	string$to_stderr

	; randomize our block/stackframe (which also discards htxts_blocksize worth of RNG state/sequence)
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block

	; update our counters/proceed
	add	r13d, 1
	sub	r12d, 1
	jnz	.pwd_acquire

	add	rsp, htxts_blocksize
	pop	r15 r14 r13 r12 rbx
	ret
calign
.nothingtodo:
	ret




	; single argument in rdi: inputfile
falign
public inputfile$extents
inputfile$extents:
	mov	rax, [rofs]
	mov	rcx, [rdi+inputfile_totalsize_ofs]
	mov	[rdi+inputfile_start_ofs], rax
	add	rax, rcx
	mov	[rdi+inputfile_end_ofs], rax
	mov	[rofs], rax
	ret




	; single argument in rdi: inputfile
falign
public inputfile$headeriv
inputfile$headeriv:
	; so, everything has been setup, headerblocks is shuffled, and headerbuf contains
	; the 128 byte RNG that we are ultimately messing with
	cmp	dword [rdi+inputfile_bogus_ofs], 0
	jne	.nothingtodo		; bogus files == we don't touch any HEADER_OR_IV blocks, and leave them as PRNG
	push	rbx r12 r13 r14 r15
	mov	rbx, rdi
	; get our IV location
	mov	rdi, [headerblocks]
	call	list$pop_front

	mov	rdi, [headerbuf]
	shl	eax, 4			; index * 16 is the offset
	mov	rsi, [rdi+buffer_itself_ofs]
	lea	r12, [rsi+rax]		; pointer offset into the 128 byte HEADER_OR_IV blocks for our IV
	; do the same again for our HEADER spot
	mov	rdi, [headerblocks]
	call	list$pop_front

	mov	rdi, [headerbuf]
	shl	eax, 4
	mov	rsi, [rdi+buffer_itself_ofs]
	lea	r13, [rsi+rax]		; pointer offset into the 128 byte HEADER_OR_IV blocks for our HEADER

	; grab our tweak xor value as the unmolested initial 16 byte PRNG output
	mov	rax, [r12]
	mov	rcx, [r12+8]
	; set our HEADER qwords to our start and end offsets
	mov	r8, [rbx+inputfile_start_ofs]
	mov	r9, [rbx+inputfile_end_ofs]
	; store them in the right spot (r13)
	mov	[r13], r8
	mov	[r13+8], r9

	; for each set of keys, initial tweaks and do our xor + encrypt, which goes (foreach key):
	; (populate and/or encrypt the initial tweak)
	; xor HEADER with IV
	; encrypt HEADER with current key
	; xor IV with resultant HEADER
	; encrypt IV with current key
	; (move to next key or done)

	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	r14, [rdi+_list_first_ofs]
	xor	r15d, r15d
	; note: we do not use list$foreach here because we need to pass multiple args
	; (r12 and r13)
	; store rax/rcx on the stack so we can propagate it per-key
	push	rcx rax
calign
.foreach_key:
	; first up, populate and possibly encrypt this key's initial tweak
	mov	rdi, [r14+_list_valueofs]
	mov	rax, [rsp]
	mov	rcx, [rsp+8]
	mov	[rdi+htcrypt_user_ofs], rax
	mov	[rdi+htcrypt_user_ofs+8], rcx
	; if r15 is set, then there is a previous set of keys, use those to encrypt it
	; such that every htcrypt context gets a different initial tweak
	test	r15, r15
	jz	.foreach_key_skiptweakencrypt
	lea	rsi, [rdi+htcrypt_user_ofs]
	mov	rdi, [r15+_list_valueofs]
	call	htcrypt$encrypt
.foreach_key_skiptweakencrypt:

	; IV is at [r12], HEADER is at [r13]
	; xor the HEADER with the IV
	mov	rax, [r12]
	mov	rcx, [r12+8]
	xor	[r13], rax
	xor	[r13+8], rcx
	; encrypt the header with this key
	mov	rdi, [r14+_list_valueofs]
	mov	rsi, r13
	call	htcrypt$encrypt
	; xor IV with the resultant HEADER
	mov	rax, [r13]
	mov	rcx, [r13+8]
	xor	[r12], rax
	xor	[r12+8], rcx
	; encrypt IV with this key
	mov	rdi, [r14+_list_valueofs]
	mov	rsi, r12
	call	htcrypt$encrypt
	; next, or done
	mov	r15, r14
	mov	r14, [r14+_list_nextofs]
	test	r14, r14
	jnz	.foreach_key
	pop	rax rcx

	; allocate our hmac
	sub	rsp, 32
	call	hmac$new_sha512
	mov	[rbx+inputfile_mac_ofs], rax

	; get our first set of keys out so we can read back the unencrypted initial tweak
	mov	rdi, [rbx+inputfile_keys_ofs]
	xor	esi, esi
	call	list$index
	mov	r11, rax

	; copy our HEADER and our initial tweak and use that as our HMAC key, noting here
	; that the HMAC we are using is only for integrity checking and not for authenticity

	mov	r8, [rbx+inputfile_start_ofs]
	mov	r9, [rbx+inputfile_end_ofs]
	mov	r10, [r11+htcrypt_user_ofs]
	mov	r11, [r11+htcrypt_user_ofs+8]
	mov	[rsp], r8
	mov	[rsp+8], r10
	mov	[rsp+16], r11
	mov	[rsp+24], r9
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, 32
	xor	r10d, r10d
	xor	r11d, r11d
	call	hmac$key
	mov	rdi, rsp
	mov	esi, 32
	call	rng$block
	add	rsp, 32

	; go ahead and compute our HMAC
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, [rbx+inputfile_srcptr_ofs]
	mov	rdx, [rbx+inputfile_size_ofs]
	call	hmac$data

	; done, dusted.
	pop	r15 r14 r13 r12 rbx
	ret
calign
.nothingtodo:
	ret




	; single argument in rdi: our inputfile
falign
public inputfile$encrypt
inputfile$encrypt:
	cmp	dword [rdi+inputfile_bogus_ofs], 0
	jne	.bogus
	push	rbp rbx r12 r13 r14 r15
	mov	rbx, rdi
	mov	rbp, [rdi+inputfile_srcptr_ofs]
	mov	r15, [rdi+inputfile_size_ofs]

	sub	rsp, htxts_blocksize

	; first up, do our 64 byte IV at the start, and then a partial htxts block
	mov	rdi, rsp
	mov	esi, 64
	call	rng$block

	; we need to store our padding byte value at [63] so that when decrypt occurs
	; it can reverse-determine the correct output length, but rather than clear
	; the upper bits, we'll just set the lower 4 bits of byte 63

	mov	eax, [rbx+inputfile_padlen_ofs]
	movzx	ecx, byte [rsp+63]
	and	ecx, 0xf0
	or	ecx, eax
	mov	byte [rsp+63], cl

	; add the 64 bytes preamble to the _end_ of the HMAC
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, 64
	call	hmac$data
	; allocate and compute our mac final
	mov	edi, 64
	call	heap$alloc
	mov	[rbx+inputfile_macbuf_ofs], rax
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rax
	call	hmac$final

	; now we can fill the remainder of the block with our input
	mov	rcx, htxts_blocksize - 64
	lea	rdi, [rsp+64]
	mov	rsi, rbp
	mov	rdx, r15
	cmp	r15, rcx
	cmova	rdx, rcx
	lea	r14, [rdx+64]	; save the total length of the first block
	cmp	r15, rdx
	je	.encrypt_lastblock
	; otherwise, not the last block, so encrypt this one
	call	memcpy

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	rdx, r14
	call	output
	; our r15/rbp needs updating by the r14-64 result
	sub	r14, 64
	add	rbp, r14
	sub	r15, r14
	; now we can proceed with the rest of the blocks, we know that r15 is nonzero
calign
.encrypt_loop:
	mov	ecx, htxts_blocksize
	mov	rdi, rsp
	mov	rsi, rbp
	mov	rdx, r15
	cmp	r15, rcx
	cmova	rdx, rcx
	mov	r14, rdx	; save the length of this block
	cmp	r15, rdx
	je	.encrypt_lastblock
	; otherwise, not the last block, so encrypt this one
	call	memcpy

	; encrypt the block with all the keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	rdx, r14
	call	output
	; update r15/rbp
	add	rbp, r14
	sub	r15, r14
	; we know it is nonzero
	jmp	.encrypt_loop
calign
.encrypt_lastblock:
	; memcpy has not occurred yet, but we can go ahead and let it:
	call	memcpy
	; just in case we are splitting a block, allocate 64 bytes
	mov	edi, 64
	call	heap$alloc
	mov	r15, rax
	; so r14 has the length that we populated
	; several case scenarios:
	; 1) we are sitting neatly on an end of htxts_blocksize
	; 2) we don't have enough room to add our HMAC + garbage
	; 3) we have enough room
	mov	eax, htxts_blocksize
	sub	eax, r14d
	jz	.encrypt_fullblock	; this can only happen if padding length == 0
	; otherwise, we need padding length + 64 + garbage block worth of room left
	; eax is how much space is left in our block
	cmp	eax, [rbx+inputfile_padlen_ofs]
	je	.encrypt_fullblock	; padding length neatly hit the end of a block
	mov	ecx, [rbx+inputfile_padlen_ofs]
	; we need room for paddinglength + 64 + 16
	add	ecx, 80
	cmp	ecx, eax
	ja	.encrypt_splitblock
	; otherwise, there is room left for all our goods, we need to place our hmac
	; at rsp+r14+paddinglength for 64 bytes
	; place our hmac final
	mov	ecx, [rbx+inputfile_padlen_ofs]
	lea	rdi, [rsp+r14]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	add	rdi, rcx
	call	memcpy

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; now determine our final write length, which is r14+paddinglength+64+garbagelength
	mov	r8d, [rbx+inputfile_padlen_ofs]
	mov	ecx, [rbx+inputfile_garbage_ofs]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp

	mov	rdx, r14
	add	rdx, r8
	add	rdx, 64
	add	rdx, rcx
	call	output

	; done, dusted.
	jmp	.done
calign
.encrypt_fullblock:
	; padding length was zero, _or_ padding lenght neatly put us at the end of a full block

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output the full block to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, htxts_blocksize
	call	output

	; now our hmac goes at the start of our new block
	mov	rdi, rsp
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	call	memcpy
	; encrypt this as an entire new block, noting here that we are re-encrypting the remaining
	; contents of the previous encrypted block (which will remain in our garbage padding area)

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output 64 + our garbage length
	mov	ecx, [rbx+inputfile_garbage_ofs]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	rdx, 64
	add	rdx, rcx
	call	output
	; we are all done.
	jmp	.done
calign
.encrypt_splitblock:
	; there is not enough room left in the current block to hold 80 more bytes after padding
	sub	eax, [rbx+inputfile_padlen_ofs]
	; we need to add same to r14 so we know how much is used
	add	r14d, [rbx+inputfile_padlen_ofs]
	; we know that eax is nonzero, or fullblock would have been the result
	; so the next step is to put our hmac output in a temporary, and then copy bits and pieces
	cmp	eax, 64
	jb	.encrypt_reallysplit
	; otherwise, there is room for our entire hmac, but _not_ enough room for the garbage afterwards
	lea	rdi, [rsp+r14]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	call	memcpy
	; encrypt this full block

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output the full block to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, htxts_blocksize
	call	output
	; now encrypt the full block once more, noting we are only producing garbage

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output only the garbage length
	mov	edx, [rbx+inputfile_garbage_ofs]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	call	output
	; we are all done.
	jmp	.done
calign
.encrypt_reallysplit:
	lea	rdi, [rsp+r14]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, eax		; how many bytes are actually left
	mov	r15d, 64
	sub	r15d, eax
	lea	rbp, [rsi+rax]
	call	memcpy
	; encrypt and send the full block

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output the full block to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, htxts_blocksize
	call	output

	; put what is left of our hmac
	mov	rdi, rsp
	mov	rsi, rbp
	mov	edx, r15d
	call	memcpy
	; encrypt this entire block

	; encrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .encrypt_block
	mov	rdx, rsp
	call	list$foreach_arg

	; output the remainder + garbage length
	mov	edx, [rbx+inputfile_garbage_ofs]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	add	rdx, r15
	call	output
	; we are all done, fallthrough:
calign
.done:
	; whatever is sitting on the stack is already public
	; so we don't need to worry about randomizing it again
	add	rsp, htxts_blocksize
	pop	r15 r14 r13 r12 rbx rbp
	ret

	; despite being inline here, called as external function from list$foreach_arg
falign
.encrypt_block:
	; rdi == our htcrypt context, rsi == pointer to block we are encrypting
	lea	rdx, [rdi+htcrypt_user_ofs]
	call	htxts$encrypt
	ret


calign
.bogus:
	; all we have to do is output our buffer
	mov	rcx, [rdi+inputfile_buffer_ofs]
	mov	rdx, [rdi+inputfile_totalsize_ofs]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, [rcx+buffer_itself_ofs]
	call	output
	ret






	; single argument in rdi: our inputfile
falign
public inputfile$decrypt
inputfile$decrypt:
	cmp	qword [rdi+inputfile_size_ofs], 290	; our abso-minimum size == salt + header + 130 bytes
	jb	.notenough
	push	rbp rbx r12 r13 r14 r15
	sub	rsp, htxts_blocksize
	mov	rbx, rdi
	mov	rbp, [rdi+inputfile_srcptr_ofs]
	mov	r15, [rdi+inputfile_size_ofs]

	; skip over the SALT 32 bytes:
	add	rbp, 32
	sub	r15, 32

	; our first order of business is doing our discovery/HEADER validity checking
	; read: brute force attempts have to do this bit.
	xor	r12d, r12d			; our IV offset
	xor	r13d, r13d			; our HEADER offset

	mov	rdi, .decrypting
	call	string$to_stderr
calign
.disco_loop:
	cmp	r13d, r12d			; if IV == HEADER, no sense in checking this one
	je	.disco_inner_next
	; copy the 16 bytes at IV
	mov	rax, [rbp+r12]
	mov	rcx, [rbp+r12+8]
	; copy the 16 bytes at HEADER
	mov	rdx, [rbp+r13]
	mov	r8, [rbp+r13+8]
	mov	[rsp], rax
	mov	[rsp+8], rcx
	mov	[rsp+16], rdx
	mov	[rsp+24], r8
	; IV temporary copy is at [rsp]
	; HEADER temporary copy is at [rsp+16]

	; so now, for each set of keys in REVERSE, do our decrypt + xor which goes (foreach key):
	; decrypt IV with current key
	; xor IV with HEADER
	; decrypt HEADER with current key
	; xor HEADER with IV
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .decrypt_iv_header
	mov	rdx, rsp
	call	list$reverse_foreach_arg

	; so at this point, if everything went perfect, our initial tweak is sitting in
	; IV, and our HEADER will contain valid start/end offsets
	mov	rdx, [rsp+16]
	mov	r8, [rsp+24]
	cmp	rdx, r8
	jae	.disco_inner_next		; if the start >= end, no deal
	cmp	rdx, [rbx+inputfile_size_ofs]
	jae	.disco_inner_next		; if the start >= filesize, no deal
	cmp	r8, [rbx+inputfile_size_ofs]
	ja	.disco_inner_next		; if the end > filesize, no deal
	mov	r9, r8
	sub	r9, rdx
	cmp	r9, 130
	jae	.decrypt			; if the end - start is >= 130, looks like we're sweet
	; otherwise, no deal, fallthrough to .disco_inner_next
calign
.disco_inner_next:
	add	r13d, 16
	cmp	r13d, 128
	jb	.disco_loop
	xor	r13d, r13d
	add	r12d, 16
	cmp	r12d, 128
	jb	.disco_loop
	; if we made it to here, no deal

	mov	rdi, .sorry
	call	string$to_stderrln

	; destroy our htcrypt contexts before we die
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, htcrypt$destroy
	call	list$clear
	call	termreset

	mov	eax, syscall_exit
	mov	edi, 1
	syscall

	; despite being declared inline here, called as a function from list$reverse_foreach_arg
	; to decrypt the IV + header
falign
.decrypt_iv_header:
	; rdi == our htcrypt context, rsi == 32 bytes, [0] == IV, [16] == HEADER
	push	r12 r13
	mov	r12, rdi
	mov	r13, rsi
	; decrypt IV with current key
	call	htcrypt$decrypt
	; xor IV with HEADER
	mov	rax, [r13+16]
	mov	rcx, [r13+24]
	xor	[r13], rax
	xor	[r13+8], rcx
	; decrypt HEADER with current key
	mov	rdi, r12
	lea	rsi, [r13+16]
	call	htcrypt$decrypt
	; xor HEADER with IV
	mov	rax, [r13]
	mov	rcx, [r13+8]
	xor	[r13+16], rax
	xor	[r13+24], rcx
	pop	r13 r12
	ret
cleartext .sorry, 'Invalid keys and/or input'
cleartext .decrypting, 10,'Decrypting...'
calign
.decrypt:
	; so our IV is sitting in [rsp] for 16 bytes
	; and our HEADER is sitting in rdx/r8
	mov	[rbx+inputfile_start_ofs], rdx
	mov	[rbx+inputfile_end_ofs], r8

	; we need to populate and initialize all of the tweaks for our keys
	; in FORWARD ORDER (just like the encrypt does)
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	r12, [rdi+_list_first_ofs]
	xor	r13d, r13d
calign
.tweakpopulate:
	mov	rdi, [r12+_list_valueofs]
	mov	rax, [rsp]
	mov	rcx, [rsp+8]
	mov	[rdi+htcrypt_user_ofs], rax
	mov	[rdi+htcrypt_user_ofs+8], rcx
	; if there was a previous one, use it to encrypt this one, otherwise we leave it
	test	r13, r13
	jz	.tweakpopulate_skipencrypt
	lea	rsi, [rdi+htcrypt_user_ofs]
	mov	rdi, [r13+_list_valueofs]
	call	htcrypt$encrypt
.tweakpopulate_skipencrypt:
	mov	r13, r12
	mov	r12, [r12+_list_nextofs]
	test	r12, r12
	jnz	.tweakpopulate
	
	; get our IV back and HEADER back:
	mov	rax, [rsp]
	mov	rcx, [rsp+8]
	mov	rdx, [rsp+16]
	mov	r8, [rsp+24]

	; compute our hmac key, reordering the goods
	mov	[rsp], rdx
	mov	[rsp+8], rax
	mov	[rsp+16], rcx
	mov	[rsp+24], r8

	call	hmac$new_sha512
	mov	[rbx+inputfile_mac_ofs], rax
	mov	rdi, rax
	mov	rsi, rsp
	mov	edx, 32
	call	hmac$key
	mov	rdi, rsp
	mov	esi, 32
	call	rng$block

	; allocate a 64 byte buffer to hold the first block's 64 byte preamble
	mov	edi, 64
	call	heap$alloc
	mov	[rbx+inputfile_macbuf_ofs], rax
	
	; adjust our rbp/r15 markers
	mov	r15, [rbx+inputfile_end_ofs]
	add	rbp, [rbx+inputfile_start_ofs]
	sub	r15, [rbx+inputfile_start_ofs]
	; save our "new" srcptr and overwrite the original
	mov	[rbx+inputfile_srcptr_ofs], rbp
	; save our "new" size and overwrite the original
	mov	[rbx+inputfile_size_ofs], r15
calign
.decrypt_loop:
	; so, r15 == total number of bytes we have to deal with
	; rbp == pointer to source
	; copy up to a full block into rsp, decrypt it, then figure out
	; what we are dealing with
	mov	ecx, htxts_blocksize
	mov	rdi, rsp
	mov	rsi, rbp
	mov	rdx, r15
	cmp	r15, rcx
	cmova	rdx, rcx
	mov	r14d, edx
	call	memcpy

	; decrypt this full block, even if we didn't populate all of it
	; decrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .decrypt_block
	mov	rdx, rsp
	call	list$reverse_foreach_arg

	; now, several possibilities that we have to deal with
	cmp	r14, r15
	je	.decrypt_lastblock
	; if we are not the last block, see if we are the first block
	cmp	rbp, [rbx+inputfile_srcptr_ofs]
	je	.decrypt_first_not_last
	; we are not the first and not the last, make sure we are not a split block
	mov	rcx, r15		; how much total is left including this block
	mov	rax, [rbx+inputfile_size_ofs]
	and	eax, 0xf		; our garbage amount
	sub	rcx, r14		; - this block == how much is left for the _next_
	cmp	eax, ecx		; is all thats left garbage?
	je	.decrypt_lastblock_evenly	; if so, treat this as the last block and be happy
	; see if there is at least garbage length + 64 in the _next_ block
	add	eax, 64
	cmp	rcx, rax
	jb	.decrypt_last_split
	; otherwise, there is at least 64 + garbage length in the next block
	; we are _not_ the first block, and we are _not_ the last block
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	rdx, r14
	call	hmac$data
	; output what we have to stdout
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, r14d
	call	output
	; udpate our pointers and keep going
	add	rbp, r14
	sub	r15, r14
	jmp	.decrypt_loop
calign
.decrypt_lastblock_evenly:
	; all that remains _after_ this block is garbage
	cmp	rbp, [rbx+inputfile_srcptr_ofs]
	je	.decrypt_first_and_last
	; otherwise, last block, but _not_ the first block, and our hmac is _exactly_ at the end
	; of this block
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, htxts_blocksize - 64
	sub	edx, [rbx+inputfile_padlen_ofs]
	call	hmac$data
	; add the 64 byte preamble
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	call	hmac$data
	; output the goods
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, htxts_blocksize - 64
	sub	edx, [rbx+inputfile_padlen_ofs]
	call	output
	; we can blast our data now with our hmac final
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	call	hmac$final
	mov	rdi, rsp
	lea	rsi, [rsp+(htxts_blocksize - 64)]
	mov	edx, 64
	call	memcmp
	test	eax, eax
	jnz	.decrypt_done_badhmac
	jmp	.done
calign
.decrypt_lastblock:
	cmp	rbp, [rbx+inputfile_srcptr_ofs]
	je	.decrypt_first_and_last
	; otherwise, last block, but _not_ the first block, and we are _not_ a split last block
	; so, determine how much actual data we have, update our mac, then compare against the
	; we already have our padding length sitting in pincount
	mov	rcx, [rbx+inputfile_size_ofs]
	mov	eax, r14d		; how much data we put into this block
	sub	eax, [rbx+inputfile_padlen_ofs]	; less the padding length
	sub	eax, 64			; less our HMAC length
	and	ecx, 0xf
	sub	eax, ecx
	; eax is now our data length
	mov	r8d, [rbx+inputfile_padlen_ofs]
	lea	rbp, [rsp+rax]		; the end of the data
	add	rbp, r8			; + padding == start of HMAC decrypted
	; so now we can update the actual hmac
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	rdx, rax
	mov	r15, rax		; save the length of the data so we can output it
	call	hmac$data
	; add the preamble 64 bytes
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	call	hmac$data
	; output to stdout the data
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	rdx, r15
	call	output
	; now, we need to hmac_final and compare the result
	sub	rsp, 64
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	call	hmac$final
	mov	rdi, rsp
	mov	rsi, rbp
	mov	edx, 64
	call	memcmp
	add	rsp, 64
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, all good
	jmp	.done
calign
.decrypt_first_and_last:
	; extract our padding length
	movzx	ecx, byte [rsp+63]
	and	ecx, 0xf
	mov	[rbx+inputfile_padlen_ofs], ecx
	; verify that our filesize is legit
	mov	r11, [rbx+inputfile_size_ofs]
	mov	r10, r11
	and	r10d, 0xf
	; r10d == garbage amount, ecx == padding amount
	sub	r11, rcx
	sub	r11, r10
	sub	r11, 128
	; so r11 is now our computed length according to garbage + padlen as indicated in [rsp+63]
	; lets recompute the total length and verify that it matches
	mov	r9, r11
	add	r9, 0xf
	and	r9, not 0xf
	add	r9, r10
	add	r9, 128
	; r9 should match our actual size
	cmp	r9, [rbx+inputfile_size_ofs]
	jne	.decrypt_failed
	; update our hmac with the real data
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64]
	mov	edx, r14d		; total length we put into the block
	sub	edx, 128		; less 64 RNG, 64 HMAC
	sub	edx, ecx		; less padding
	; and finally less our garbage amount
	mov	r8d, r14d
	and	r8d, 0xf
	sub	edx, r8d
	; save our length
	mov	r15d, edx
	; get the position of our HMAC
	lea	rbp, [rsi+rdx]
	; add the padding length to that
	add	rbp, rcx
	call	hmac$data
	; append the preamble to the end
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, 64
	call	hmac$data
	; output our result to stdout
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+64]
	mov	edx, r15d
	call	output
	; do our final hmac and compare the result
	sub	rsp, 64
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	call	hmac$final
	mov	rdi, rsp
	mov	rsi, rbp
	mov	edx, 64
	call	memcmp
	add	rsp, 64
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, all good
	jmp	.done
calign
.decrypt_first_not_last:
	mov	rcx, r15		; how much total is left including this block
	mov	rax, [rbx+inputfile_size_ofs]
	and	eax, 0xf		; our garbage amount
	sub	rcx, r14		; - this block == how much is left for the _next_
	cmp	eax, ecx		; is all thats left garbage?
	je	.decrypt_first_and_last
	; see if there is at least garbage length + 64 in the _next_ block
	add	eax, 64
	cmp	rcx, rax
	jb	.decrypt_first_and_last_split
	; otherwise, there is at least 64 + garbage length in the next block
	; if there is _exactly_ 64 + garbage length in the next block, then this block contains
	; our padding
	je	.decrypt_first_evenly
	; extract our padding length first up
	movzx	ecx, byte [rsp+63]
	and	ecx, 0xf
	mov	[rbx+inputfile_padlen_ofs], ecx
	; verify that our filesize is legit
	mov	r11, [rbx+inputfile_size_ofs]
	mov	r10, r11
	and	r10d, 0xf
	; r10d == garbage amount, ecx == padding amount
	sub	r11, rcx
	sub	r11, r10
	sub	r11, 128
	; so r11 is now our computed length according to garbage + padlen as indicated in [rsp+63]
	; lets recompute the total length and verify that it matches
	mov	r9, r11
	add	r9, 0xf
	and	r9, not 0xf
	add	r9, r10
	add	r9, 128
	; r9 should match our actual size
	cmp	r9, [rbx+inputfile_size_ofs]
	jne	.decrypt_failed

	; update our hmac with all of the rest of the data
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64]
	mov	edx, htxts_blocksize - 64
	call	hmac$data

	; set our macbuf to the preamble so we can add it at the end
	mov	rdi, [rbx+inputfile_macbuf_ofs]
	mov	rsi, rsp
	mov	edx, 64
	call	memcpy
	; output all of this block to stdout
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+64]
	mov	edx, htxts_blocksize - 64
	call	output
	; udpate our pointers and keep going
	add	rbp, r14
	sub	r15, r14
	jmp	.decrypt_loop
calign
.decrypt_first_evenly:
	; this block _contains_ our padding, and is the end, and the next block's start is our HMAC
	; extract our padding length first up
	movzx	ecx, byte [rsp+63]
	and	ecx, 0xf
	mov	[rbx+inputfile_padlen_ofs], ecx
	; verify that our filesize is legit
	mov	r11, [rbx+inputfile_size_ofs]
	mov	r10, r11
	and	r10d, 0xf
	; r10d == garbage amount, ecx == padding amount
	sub	r11, rcx
	sub	r11, r10
	sub	r11, 128
	; so r11 is now our computed length according to garbage + padlen as indicated in [rsp+63]
	; lets recompute the total length and verify that it matches
	mov	r9, r11
	add	r9, 0xf
	and	r9, not 0xf
	add	r9, r10
	add	r9, 128
	; r9 should match our actual size
	cmp	r9, [rbx+inputfile_size_ofs]
	jne	.decrypt_failed
	; update our hmac with all of the data sans padding
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64]
	mov	edx, htxts_blocksize - 64
	sub	edx, ecx
	call	hmac$data
	; update our hmac with the preamble
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, 64
	call	hmac$data
	; output that to stdout
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+64]
	mov	edx, htxts_blocksize - 64
	sub	edx, [rbx+inputfile_padlen_ofs]
	call	output
	; in order to verify the hmac, we first need to decrypt the next block
	add	rbp, r14
	sub	r15, r14
	mov	rdi, rsp
	mov	rsi, rbp
	mov	edx, r15d
	call	memcpy

	; decrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .decrypt_block
	mov	rdx, rsp
	call	list$reverse_foreach_arg

	sub	rsp, 64
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	call	hmac$final
	mov	rdi, rsp
	lea	rsi, [rsp+64]
	mov	edx, 64
	call	memcmp
	add	rsp, 64
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, done and dusted
	jmp	.done
calign
.decrypt_first_and_last_notsplit:
	; this block contains all of the hmac, but only some/part/maybenotany of the garbage
	; garbage amount is in edx
	; eax is what is left in the next block
	; ecx is our padding bytecount
	; edx is our garbage count
	; so, ALL data that remains - garbage count - 64 == HMAC start
	; and ALL data that remains - garbage count - 64 - padding count == end of data
	mov	r8d, r15d	; all the data that remains
	sub	r8d, edx	; less garbage count
	sub	r8d, 64		; less HMAC count
	; so our hmac is at rsp+r8
	mov	r9d, r8d
	sub	r9d, ecx
	; our data end is at r9
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64]	; start of data
	push	r8 r9
	mov	edx, r9d
	sub	edx, 64
	mov	[rsp], rdx	; length of our data
	call	hmac$data
	; update the hmac with our preamble
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+16]
	mov	edx, 64
	call	hmac$data
	; output our data
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+64+16]	; start of data
	mov	rdx, [rsp]
	call	output
	; last but not least, compare hmacs
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+16]
	call	hmac$final
	pop	r9 r8
	mov	rdi, rsp
	lea	rsi, [rsp+r8]
	mov	edx, 64
	call	memcmp
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, done
	jmp	.done
calign
.decrypt_first_and_last_split:
	; so this block contains our padding, _and_ part of the hmac
	; extract our padding length first up
	movzx	ecx, byte [rsp+63]
	and	ecx, 0xf
	mov	[rbx+inputfile_padlen_ofs], ecx
	; verify that our filesize is legit
	mov	r11, [rbx+inputfile_size_ofs]
	mov	r10, r11
	and	r10d, 0xf
	; r10d == garbage amount, ecx == padding amount
	sub	r11, rcx
	sub	r11, r10
	sub	r11, 128
	; so r11 is now our computed length according to garbage + padlen as indicated in [rsp+63]
	; lets recompute the total length and verify that it matches
	mov	r9, r11
	add	r9, 0xf
	and	r9, not 0xf
	add	r9, r10
	add	r9, 128
	; r9 should match our actual size
	cmp	r9, [rbx+inputfile_size_ofs]
	jne	.decrypt_failed
	; we need to update our hmac and output the data we have gotten so far
	mov	rax, r15
	sub	rax, r14
	; rax now has what is left in the next block, we know this one was a full one
	mov	rdx, [rbx+inputfile_size_ofs]
	and	edx, 0xf
	; edx == garbage length
	cmp	edx, eax		; if the next block is only garbage
	jae	.decrypt_first_and_last_notsplit	; there may be some garbage in this block too
	; otherwise, the next block contains _some_ of the hmac
	sub	eax, edx		; == how many bytes in the next block remain less the garbage, which is how many of our hmac is there
	mov	r8d, 64
	sub	r8d, eax		; how many bytes of the hmac are in _this_ block
	; so now we can update our running hmac, and output the data
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64]
	mov	edx, htxts_blocksize - 64
	sub	edx, r8d
	sub	edx, ecx
	lea	r9, [rsi+rdx]
	add	r9, rcx			; start of hmac, after data+padding
	push	rax r8 rdx r9
	call	hmac$data
	; update hmac with the preamble
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+32]
	mov	edx, 64
	call	hmac$data
	; rax was # of bytes remaining in next block less the garbge
	; r8 was how many bytes of HMAC are in _this_ block
	; r9 == address of HMAC start in this block
	; rdx == length of the data we updated
	
	; first up, output the data
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+64+32]
	mov	rdx, [rsp+8]
	call	output
	; now we need a temporary spot to hold our decrypted hmac
	mov	rsi, [rsp]		; address of start of hmac
	mov	rdx, [rsp+16]		; length of hmac in this block
	sub	rsp, 64
	mov	rdi, rsp
	call	memcpy
	; now we need to copy the next block and decrypt it so we can get the rest of the encrypted HMAC
	lea	rdi, [rsp+64+32]
	lea	rsi, [rbp+r14]
	mov	rdx, [rsp+64+24]
	call	memcpy
	; now decrypt that full block at rsp+64+32

	; decrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .decrypt_block
	lea	rdx, [rsp+64+32]
	call	list$reverse_foreach_arg

	; now we have to copy whats left and append it to our hmac
	mov	rax, [rsp+64+16]	; how many bytes we _got_
	lea	rdi, [rsp+rax]
	lea	rsi, [rsp+64+32]
	mov	rdx, [rsp+64+24]
	call	memcpy
	; now we can hmac$final into rsp+64+32 and then compare them both
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64+32]
	call	hmac$final
	mov	rdi, rsp
	lea	rsi, [rsp+64+32]
	mov	edx, 64
	call	memcmp
	add	rsp, 64+32
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, we are sweet
	jmp	.done
calign
.decrypt_last_split:
	; so the next block contains < (64 + garbage) bytes, which means this block contains _some_ of the hmac
	; and the next block contains the rest of it
	; we need to update our hmac and output the data we have gotten so far
	mov	ecx, [rbx+inputfile_padlen_ofs]
	mov	rax, r15
	sub	rax, r14
	; rax now has what is left in the next block, we know this one was a full one
	mov	rdx, [rbx+inputfile_size_ofs]
	and	edx, 0xf
	; edx == garbage length
	; the next block contains _some_ of the hmac
	sub	eax, edx		; == how many bytes in the next block remain less the garbage, which is how many of our hmac is there
	mov	r8d, 64
	sub	r8d, eax		; how many bytes of the hmac are in _this_ block
	; so now we can update our running hmac, and output the data
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, rsp
	mov	edx, htxts_blocksize
	sub	edx, r8d
	sub	edx, ecx
	lea	r9, [rsi+rdx]
	add	r9, rcx			; start of hmac, after data+padding
	push	rax r8 rdx r9
	call	hmac$data
	; add our preamble 64 bytes
	mov	rdi, [rbx+inputfile_mac_ofs]
	mov	rsi, [rbx+inputfile_macbuf_ofs]
	mov	edx, 64
	call	hmac$data

	; rax was # of bytes remaining in next block less the garbge
	; r8 was how many bytes of HMAC are in _this_ block
	; r9 == address of HMAC start in this block
	; rdx == length of the data we updated
	
	; first up, output the data
	mov	eax, syscall_write
	mov	edi, 1
	lea	rsi, [rsp+32]
	mov	rdx, [rsp+8]
	call	output
	; now we need a temporary spot to hold our decrypted hmac
	mov	rsi, [rsp]		; address of start of hmac
	mov	rdx, [rsp+16]		; length of hmac in this block
	sub	rsp, 64
	mov	rdi, rsp
	call	memcpy
	; now we need to copy the next block and decrypt it so we can get the rest of the encrypted HMAC
	lea	rdi, [rsp+64+32]
	lea	rsi, [rbp+r14]
	mov	rdx, [rsp+64+24]
	call	memcpy
	; now decrypt that full block at rsp+64+32

	; decrypt the block with all keys/tweaks
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, .decrypt_block
	lea	rdx, [rsp+64+32]
	call	list$reverse_foreach_arg

	; now we have to copy whats left and append it to our hmac
	mov	rax, [rsp+64+16]	; how many bytes we _got_
	lea	rdi, [rsp+rax]
	lea	rsi, [rsp+64+32]
	mov	rdx, [rsp+64+24]
	call	memcpy
	; now we can hmac$final into rsp+64+32 and then compare them both
	mov	rdi, [rbx+inputfile_mac_ofs]
	lea	rsi, [rsp+64+32]
	call	hmac$final
	mov	rdi, rsp
	lea	rsi, [rsp+64+32]
	mov	edx, 64
	call	memcmp
	add	rsp, 64+32
	test	eax, eax
	jnz	.decrypt_done_badhmac
	; otherwise, we are sweet
	; fallthrough to done.
calign
.done:
	; remove any cleartext remaining on the stack:
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block
	; normal/okay return from here will result in cleanup of keys/etc

	add	rsp, htxts_blocksize
	pop	r15 r14 r13 r12 rbx rbp
	ret

	; despite being inline here, called as external function from list$reverse_foreach_arg
falign
.decrypt_block:
	; rdi == our htcrypt context, rsi == pointer to block we are decrypting
	lea	rdx, [rdi+htcrypt_user_ofs]
	call	htxts$decrypt
	ret

calign
.decrypt_done_badhmac:
	; remove any cleartext remaining on the stack (even though the HMAC failed and it is
	; likely garbage anyway)
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block
	; get rid of our keys
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, htcrypt$destroy
	call	list$clear
	lea	rdi, [rbx+inputfile_start_ofs]
	mov	esi, 16
	call	rng$block
	mov	rdi, .donebadhmacmsg
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
calign
.decrypt_failed:
	; remove any cleartext remaining on the stack (even though the HMAC failed and it is
	; likely garbage anyway)
	mov	rdi, rsp
	mov	esi, htxts_blocksize
	call	rng$block
	; get rid of our keys
	mov	rdi, [rbx+inputfile_keys_ofs]
	mov	rsi, htcrypt$destroy
	call	list$clear
	lea	rdi, [rbx+inputfile_start_ofs]
	mov	esi, 16
	call	rng$block
	mov	rdi, .sorry
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall

cleartext .donebadhmacmsg, 'Done, HMAC FAIL'
cleartext .notenoughdata, 10,'insufficient input length'
calign
.notenough:
	mov	rdi, .notenoughdata
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall






falign
argscan:
	cmp	dword [next_is_inputfile], 0
	jne	.do_inputfile
	cmp	dword [next_is_mediafile], 0
	jne	.do_mediafile
	cmp	dword [next_is_count], 0
	jne	.do_count
	cmp	dword [next_is_iter], 0
	jne	.do_iter
	push	rdi
	mov	rsi, .dashd
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashd
	push	rdi
	mov	rsi, .dashb
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashb
	push	rdi
	mov	rsi, .dashr
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashr
	push	rdi
	mov	rsi, .dash1
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dash1
	push	rdi
	mov	rsi, .dashalt
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashalt
	push	rdi
	mov	rsi, .dashm
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashm
	push	rdi
	mov	rsi, .dashnoalt
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashnoalt
	push	rdi
	mov	rsi, .dashc
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashc
	push	rdi
	mov	rsi, .dashi
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_dashi
	push	rdi
	mov	rsi, .dashdrbg
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_drbg
	push	rdi
	mov	rsi, .dashnomix
	call	string$equals
	pop	rdi
	test	eax, eax
	jnz	.do_nomix
	; otherwise, unrecognized option
	push	rdi
	mov	rdi, .invalidarg
	call	string$to_stderr
	pop	rdi
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
	; not reached
cleartext .invalidarg, 'invalid argument: '
calign
.do_inputfile:
	push	rdi
	mov	dword [next_is_inputfile], 0
	call	inputfile$new
	pop	rdi
	test	rax, rax
	jz	.bad_inputfile
	mov	rdi, [inputfiles]
	mov	rsi, rax
	call	list$push_back
	ret
calign
.do_mediafile:
	push	rdi
	mov	dword [next_is_mediafile], 0
	call	privmapped$new
	pop	rdi
	test	rax, rax
	jz	.bad_mediafile
	mov	[outmedia], rax
	call	outmedia$identify
	; that won't come back if it failed
	ret
calign
.do_count:
	push	rdi
	mov	dword [next_is_count], 0
	call	string$to_unsigned
	pop	rdi
	cmp	rax, 0
	jle	.bad_count
	mov	dword [pcount], eax
	call	heap$free
	ret
calign
.do_iter:
	push	rdi
	mov	dword [next_is_iter], 0
	call	string$to_unsigned
	pop	rdi
	cmp	rax, 0
	jle	.bad_iter
	mov	dword [piter], eax
	call	heap$free
	ret
calign
.bad_inputfile:
	push	rdi
	mov	rdi, .badfile
	call	string$to_stderr
	pop	rdi
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .badfile, 'input file error: '
calign
.bad_mediafile:
	push	rdi
	mov	rdi, .badmedia
	call	string$to_stderr
	pop	rdi
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .badmedia, 'media file error: '
calign
.bad_count:
	push	rdi
	mov	rdi, .badcount
	call	string$to_stderr
	pop	rdi
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .badcount, 'invalid -c COUNT specified: '
calign
.bad_iter:
	push	rdi
	mov	rdi, .baditer
	call	string$to_stderr
	pop	rdi
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .baditer, 'invalid -i ITER specified: '
calign
.do_dashd:
	mov	ecx, 1
	sub	ecx, eax
	mov	[do_enc], ecx
	ret
calign
.do_dashb:
	mov	[do_b64], eax
	ret
calign
.do_dashr:
	mov	[do_pwd], eax
	ret
calign
.do_dashalt:
	mov	dword [next_is_inputfile], 1
	ret
calign
.do_dashm:
	mov	dword [next_is_mediafile], 1
	ret
calign
.do_dashnoalt:
	mov	dword [noalt], 1
	ret
calign
.do_dashc:
	mov	dword [next_is_count], 1
	ret
calign
.do_dashi:
	mov	dword [next_is_iter], 1
	ret
calign
.do_drbg:
	mov	[pmix], 1
	ret
calign
.do_dash1:
	mov	[do_cascaded], 0
	ret
calign
.do_nomix:
	mov	[pmix], 0
	ret
cleartext .dashd, '-d'
cleartext .dashb, '-b'
cleartext .dashr, '-r'
cleartext .dash1, '-1'
cleartext .dashm, '-m'
cleartext .dashalt, '-alt'
cleartext .dashnoalt, '-noalt'
cleartext .dashc, '-c'
cleartext .dashi, '-i'
cleartext .dashdrbg, '-drbg'
cleartext .dashnomix, '-nomix'


	; because of the [silly] way I decided to do the arguments, special handling
	; is required for the -c COUNT and/or -i ITER for the main input file
	; along with -drbg and -nomix
	; (versus the argscan above which works fine for -alt inputfiles)
falign
mainargopts:
	; so, the idea here is to parse out -i and -c in reverse order until we hit
	; a -alt or run out of options, and apply them to the main inputfile
	; which was already list$pop_back'd from [argv]
	push	rbx r12
	mov	rdi, [argv]
	mov	rbx, [_list_last]
	test	rbx, rbx
	jz	.alldone
	xor	r12d, r12d
calign
.iter:
	mov	rdi, [rbx]
	mov	rsi, .dashi
	call	string$equals
	test	eax, eax
	jnz	.do_dashi
	mov	rdi, [rbx]
	mov	rsi, .dashc
	call	string$equals
	test	eax, eax
	jnz	.do_dashc
	mov	rdi, [rbx]
	mov	rsi, .dashdrbg
	call	string$equals
	test	eax, eax
	jnz	.do_drbg
	mov	rdi, [rbx]
	mov	rsi, .dash1
	call	string$equals
	test	eax, eax
	jnz	.do_dash1
	mov	rdi, [rbx]
	mov	rsi, .dashnomix
	call	string$equals
	test	eax, eax
	jnz	.do_nomix
	mov	rdi, [rbx]
	mov	rsi, .dashalt
	call	string$equals
	test	eax, eax
	jnz	.alldone
	; otherwise, set r12 to rbx and keep going
calign
.next:
	mov	r12, rbx
	mov	rbx, [rbx+_list_prevofs]
	test	rbx, rbx
	jnz	.iter
	pop	r12 rbx
	ret
cleartext .dashc, '-c'
cleartext .dashi, '-i'
cleartext .dash1, '-1'
cleartext .dashalt, '-alt'
cleartext .dashdrbg, '-drbg'
cleartext .dashnomix, '-nomix'
cleartext .missingdashi, 'missing argument to -i'
cleartext .missingdashc, 'missing argument to -c'
cleartext .baddashi, 'invalid argument to -i'
cleartext .baddashc, 'invalid argument to -c'
calign
.alldone:
	pop	r12 rbx
	ret
calign
.error:
	call	string$to_stderrln
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
calign
.do_dashi:
	mov	rdi, .missingdashi
	test	r12, r12
	jz	.error
	mov	rdi, [r12]
	call	string$to_unsigned
	mov	rdi, .baddashi
	cmp	rax, 0
	jle	.error
	mov	rdi, [inputfiles]
	mov	rdi, [_list_first]
	mov	rdi, [rdi]
	mov	[rdi+inputfile_piter_ofs], eax
	jmp	.next
calign
.do_dashc:
	mov	rdi, .missingdashc
	test	r12, r12
	jz	.error
	mov	rdi, [r12]
	call	string$to_unsigned
	mov	rdi, .baddashc
	cmp	rax, 0
	jle	.error
	mov	rdi, [inputfiles]
	mov	rdi, [_list_first]
	mov	rdi, [rdi]
	mov	[rdi+inputfile_pcount_ofs], eax
	jmp	.next
calign
.do_drbg:
	mov	rdi, [inputfiles]
	mov	rdi, [_list_first]
	mov	rdi, [rdi]
	mov	qword [rdi+inputfile_mix_ofs], 1
	jmp	.next
calign
.do_nomix:
	mov	rdi, [inputfiles]
	mov	rdi, [_list_first]
	mov	rdi, [rdi]
	mov	qword [rdi+inputfile_mix_ofs], 0
	jmp	.next
calign
.do_dash1:
	mov	rdi, [inputfiles]
	mov	rdi, [_list_first]
	mov	rdi, [rdi]
	mov	qword [rdi+inputfile_cascaded_ofs], 0
	jmp	.next



falign
termreset:
	xor	edi, edi
	mov	esi, 0x5404		; TCSETSF
	mov	rdx, [termios]
	mov	eax, syscall_ioctl
	syscall
	ret



outmedia_png = 0
outmedia_jfif = 1
outmedia_exif = 2


falign
public outmedia$identify
outmedia$identify:
	; [outmedia] is a valid privmapped object, see if we can identify what it is
	; and if we can't, bailout with an error
	push	rbx r12 r13 r14
	mov	rbx, [outmedia]
	mov	r12, [rbx+privmapped_base_ofs]
	mov	r13, [rbx+privmapped_size_ofs]
	cmp	r13, 41				; min size required for a PNG parse
	jb	.err_unrecognized
	mov	rax, 0xa1a0a0d474e5089
	cmp	[r12], rax
	je	.maybe_png
	cmp	dword [r12], 0xe0ffd8ff
	je	.maybe_jfif
	cmp	dword [r12], 0xe1ffd8ff
	je	.maybe_exif
calign
.err_unrecognized:
	mov	rdi, .unrecognized
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .unrecognized, 'Unrecognized or unsupported mediafile type.'
calign
.maybe_exif:
	cmp	dword [r12+6], 'EXIF'
	je	.exif
	cmp	dword [r12+6], 'Exif'
	jne	.err_unrecognized
.exif:
	; EXIF it is, we'll let the merge function deal with the rest
	mov	qword [rbx+privmapped_user_ofs], outmedia_exif
	pop	r14 r13 r12 rbx
	ret
calign
.maybe_jfif:
	cmp	dword [r12+6], 'JFIF'
	jne	.err_unrecognized
	; JFIF it is, we'll let the merge function deal with the rest
	mov	qword [rbx+privmapped_user_ofs], outmedia_jfif
	pop	r14 r13 r12 rbx
	ret
calign
.maybe_png:
	; make sure the first chunk is an IHDR
	cmp	dword [r12+12], 'IHDR'
	jne	.err_unrecognized
	; otherwise, it is a PNG
	mov	qword [rbx+privmapped_user_ofs], outmedia_png
	; count the number of chunks that exist after the IHDR, and validate it as we go
	add	r12, 8
	sub	r13, 16
	xor	r14d, r14d
	mov	ecx, [r12]			; length of the header chunk
	bswap	ecx
	add	r12, 8
	cmp	rcx, r13
	jae	.err_unrecognized
	cmp	ecx, 13				; IHDR is 13 bytes
	jne	.err_unrecognized
	; 13 + 4 bytes for the CRC to skip:
	add	r12, 17
	sub	r13, 17
	; commence chunk counting, and store the location of the first chunk after the header in outbuf
	mov	rsi, [outbuf]
	mov	[rsi+buffer_user_ofs], r12
calign
.png_chunkscan:
	cmp	r13, 12
	jb	.err_unrecognized
	mov	eax, [r12]
	bswap	eax
	add	eax, 12
	cmp	rax, r13
	ja	.err_unrecognized
	; dword at [r12+4] is our chunk type, if it is IEND, bailout, otherwise, increment r14d and keep going
.png_chunkscan_normal:
	cmp	dword [r12+4], 'IEND'
	je	.png_done
	add	r14d, 1
	cmp	dword [r12+4], 'IDAT'
	je	.png_idat
	add	r12, rax
	sub	r13, rax
	jz	.err_unrecognized
	jmp	.png_chunkscan
calign
.png_idat:
	; spec says multiple IDAT chunks must be consecutive, so we can't treat them as separate and inject our goods
	add	r12, rax
	sub	r13, rax
	jz	.err_unrecognized
	; otherwise, fall into continuous IDAT scanning
calign
.png_idatscan:
	cmp	r13, 12
	jb	.err_unrecognized
	mov	eax, [r12]
	bswap	eax
	add	eax, 12
	cmp	rax, r13
	ja	.err_unrecognized
	cmp	dword [r12+4], 'IDAT'
	jne	.png_chunkscan_normal
	add	r12, rax
	sub	r13, rax
	jz	.err_unrecognized
	jmp	.png_idatscan
calign
.png_done:
	; r14d has our # of chunks that exist _after_ IHDR
	; if it is _zero_, something went horribly wrong
	test	r14d, r14d
	jz	.err_unrecognized
	mov	dword [rbx+privmapped_user_ofs+4], r14d
	pop	r14 r13 r12 rbx
	ret





falign
public outmedia$merge
outmedia$merge:
	; called when all output is sitting in the outbuf ready to go
	mov	rdi, [outmedia]
	mov	eax, [rdi+privmapped_user_ofs]
	jmp	qword [rax*8+.dispatch]
dalign
.dispatch:
	dq	.png, .jfif, .exif
calign
.png:
	; outmedia is a PNG file, the dword in [rdi+privmapped_user_ofs+4] is our total PNG chunk count
	; outbuf's buffer_user_ofs is a pointer to the start of the first chunk after the header
	; we want our encrypted goods to be placed in a _randomly_ located chunk somewhere after IHDR
	; and before IEND, in a per-specification randomized chunk identifier
	push	rbx r12 r13 r14 r15
	sub	rsp, 8
	mov	rbx, [outbuf]
	mov	r12, [rdi+privmapped_base_ofs]
	mov	r13, [rbx+buffer_user_ofs]		; start of first chunk after IHDR
	mov	r14, [rdi+privmapped_size_ofs]
	add	r14, r12				; pointer to the end of the PNG
	mov	eax, [rbx+buffer_length_ofs]		; note: >4GB not gonna fly here, hahah, but that seems unreasonable anyway, fine by me
	bswap	eax
	mov	[rsp], eax				; store our byteswapped length

	mov	esi, [rdi+privmapped_user_ofs+4]	; # of chunks between IHDR and IEND
	xor	edi, edi
	call	rng$int
	; so now we have a random skip count, we know our mediafile is good, so skip forward this many
	test	eax, eax
	jz	.png_skipdone
calign
.png_skip:
	mov	ecx, [r13]
	bswap	ecx
	add	ecx, 12
.png_skip_normal:
	cmp	dword [r13+4], 'IDAT'
	je	.png_skip_idats
	add	r13, rcx
	sub	eax, 1
	jnz	.png_skip
	; fallthrough to png_skipdone:
calign
.png_skipdone:
	; so now, we can safely output between r13 and r12 to stdout:
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, r12
	mov	rdx, r13
	sub	rdx, r12
	syscall
	; now we can generate our random chunk type, and that must be:
	; first and second letters must be random lowercase, third spec says must be upper case
	; fourth must be lowercase
	xor	edi, edi
	mov	esi, 25
	call	rng$int
	and	eax, 0xff
	add	eax, 'a'
	mov	[rsp+4], al
	xor	edi, edi
	mov	esi, 25
	call	rng$int
	and	eax, 0xff
	add	eax, 'a'
	mov	[rsp+5], al
	xor	edi, edi
	mov	esi, 25
	call	rng$int
	and	eax, 0xff
	add	eax, 'A'
	mov	[rsp+6], al
	xor	edi, edi
	mov	esi, 25
	call	rng$int
	and	eax, 0xff
	add	eax, 'a'
	mov	[rsp+7], al
	; now our preface 8 bytes is complete, next we need to calculate our CRC to append to the outbuf before we send it
	; CRC is calculated with the chunk type and the data, but not the length
	xor	edi, edi
	lea	rsi, [rsp+4]
	mov	edx, 4
	call	crc$32
	mov	edi, eax
	mov	rsi, [rbx+buffer_itself_ofs]
	mov	edx, [rbx+buffer_length_ofs]
	call	crc$32
	; spec says network byte order:
	bswap	eax
	mov	rdi, rbx
	mov	esi, eax
	call	buffer$append_dword
	; so now we can output our 8 byte chunk preface + outbuf
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, 8
	syscall
	add	rsp, 8
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, [rbx+buffer_itself_ofs]
	mov	edx, [rbx+buffer_length_ofs]
	syscall
	; and finally, r13 for r14-r13 bytes of the remaining PNG
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, r13
	mov	rdx, r14
	sub	rdx, r13
	syscall
	pop	r15 r14 r13 r12 rbx
	ret
calign
.png_skip_idats:
	; spec says multiple IDAT must be consecutive so we have to treat these all as one if more than one exists
	; skip over the first IDAT and decrement our chunk counter
	add	r13, rcx
	sub	eax, 1
calign
.png_skip_idatscan:
	mov	ecx, [r13]
	bswap	ecx
	add	ecx, 12
	cmp	dword [r13+4], 'IDAT'
	jne	.png_skip_idatdone
	add	r13, rcx
	jmp	.png_skip_idatscan
calign
.png_skip_idatdone:
	test	eax, eax
	jz	.png_skipdone
	jmp	.png_skip_normal

calign
.exif:
	push	rbx r12 r13 r14 r15
	mov	rbx, [outbuf]				; our encrypted materials to embed
	mov	r12, [rdi+privmapped_base_ofs]		; sourceptr of our outmedia
	mov	r13, [rdi+privmapped_size_ofs]		; size of our outmedia
	add	r13, r12				; ptr to end of our outmedia
	mov	r14, [rbx+buffer_itself_ofs]		; crypto start
	mov	r15, [rbx+buffer_length_ofs]		; crypto length
	; for EXIF, we skip the APP1 initial segment, and inject our APP12 straight after it
	movzx	eax, word [r12+4]			; Length
	xchg	ah, al
	add	eax, 4					; +2 for length, +2 for SOI
	add	r12, rax
	; so at this point, we can output up to r12
	mov	eax, syscall_write
	mov	rsi, [rdi+privmapped_base_ofs]
	mov	edi, 1
	mov	rdx, r12
	sub	rdx, rsi
	syscall
	sub	rsp, 16
	; the remainder will be same-same as we do for JFIF
	jmp	.jfif_outloop
calign
.jfif:
	push	rbx r12 r13 r14 r15
	mov	rbx, [outbuf]				; our encrypted materials to embed
	mov	r12, [rdi+privmapped_base_ofs]		; sourceptr of our outmedia
	mov	r13, [rdi+privmapped_size_ofs]		; size of our outmedia
	add	r13, r12				; ptr to end of our outmedia
	mov	r14, [rbx+buffer_itself_ofs]		; crypto start
	mov	r15, [rbx+buffer_length_ofs]		; crypto length
	; we can safely skip the JFIF APP0 segment, and see if the next one is a JFXX
	; and skip that one too if it is
	movzx	eax, word [r12+4]			; Length
	xchg	ah, al
	add	eax, 4					; +2 for length, +2 for SOI
	add	r12, rax
	; so if the next two bytes are also ff, e0, and dword at [4] == JFXX, skip that too
	cmp	word [r12], 0xe0ff
	jne	.jfif_nojfxx
	cmp	dword [r12+4], 0x5858464a
	jne	.jfif_nojfxx
	; otherwise, JFXX is sitting here, and it is sposed to be adjacent to the JFIF APP0
	; so skip this one too
	movzx	eax, word [r12+2]			; Length
	xchg	ah, al
	add	eax, 2
	add	r12, rax
calign
.jfif_nojfxx:
	; we'll output APP12 (0xecff) segments, with our identifier being a random 7 byte & 0x7f characters,
	; null terminated. In the wild, the only APP12 markers I have in my fairly large stash are Ducky
	; or PictureInfo, so during the parse we skip those and treat the rest as though we created them
	
	; so at this point, we can output up to r12
	mov	eax, syscall_write
	mov	rsi, [rdi+privmapped_base_ofs]
	mov	edi, 1
	mov	rdx, r12
	sub	rdx, rsi
	syscall

	; now we can commence our segment creation, we'll need 12 bytes for our APP12 + length + string
	; and each actual crypto segment can be at most 65535 - 2(length) - 8 (identifier), 65525
	sub	rsp, 16
calign
.jfif_outloop:
	; get a 64 bit RNG for our identifier
	call	rng$u64
	mov	word [rsp], 0xecff		; APP12
	mov	rcx, r15
	mov	edx, 65525
	cmp	rcx, rdx
	cmova	rcx, rdx
	; save this length so we can mess with r14/r15
	mov	r8d, ecx
	mov	r9, qword [.jfif_idmask]
	; add 8 for our identifier and 2 for our length
	add	ecx, 10
	; byteswap and put into our length
	xchg	ch, cl
	mov	word [rsp+2], cx
	; construct our identifier
	and	rax, r9
	; add our identifier
	mov	[rsp+4], rax
	; output our 12 bytes, but save our length modifier
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, rsp
	mov	edx, 12
	push	r8
	syscall
	pop	rdx
	; output our rdx worth of bytes from r14
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, r14
	; update r14/r15
	add	r14, rdx
	sub	r15, rdx
	syscall
	; if we have data remaining, repeat
	test	r15, r15
	jnz	.jfif_outloop
	; otherwise, we are all done adding our segments
	; now we can output the remainder of the media
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, r12
	mov	rdx, r13
	sub	rdx, r12
	syscall

	; done, dusted.
	add	rsp, 16
	pop	r15 r14 r13 r12 rbx
	ret
dalign
.jfif_idmask:
	dq	0x007f7f7f7f7f7f7f



cleartext banner, 'This is toplip v1.24 ',0xc2,0xa9,' 2015-2018 2 Ton Digital. Author: Jeff Marrison',10,'A showcase piece for the HeavyThing library. Commercial support available',10,'Proudly made in Cooroy, Australia. More info: https://2ton.com.au/toplip',10

falign
public _start
_start:
	; every HeavyThing program needs to start wiht a call to initialise it:
	call	ht$init
	
	cmp	dword [argc], 1
	je	.needinputfile

	; get our termios goods happening
	mov	edi, 64
	call	heap$alloc_clear
	mov	[termios], rax
	xor	edi, edi
	mov	esi, 0x5401		; TCGETS
	mov	rdx, rax
	mov	eax, syscall_ioctl
	syscall
	; copy that to the stack
	sub	rsp, 64
	mov	rdi, rsp
	mov	rsi, [termios]
	mov	edx, 60			; sizeof(struct termios) == 60
	call	memcpy
	; clear the ECHO
	and     dword [rsp+0xC], 0xfffffff7	; c_lflag &= ~(ECHO)
	; TCSETSF next
	xor	edi, edi
	mov	esi, 0x5404		; TCSETSF
	mov	rdx, rsp
	mov	eax, syscall_ioctl
	syscall
	; done with the stack
	add	rsp, 64

	; create our inputfiles list
	call	list$new
	mov	[inputfiles], rax

	; remove our program name (argv[0]) from the args
	mov	rdi, [argv]
	call	list$pop_front
	mov	rdi, rax
	call	heap$free

	; regardless of whether we are encrypting or decrypting, last arg must be inputfile
	mov	rdi, [argv]
	call	list$pop_back

	; fake out the argscanner and call that an input file
	mov	dword [next_is_inputfile], 1
	mov	rdi, rax
	call	argscan
	; that will have bailed out if it failed

	; due to the [silly] way that I decided to do arg handling, deal with -i and -c to main inputfile first
	call	mainargopts

	; create an output buffer
	call	buffer$new
	mov	[outbuf], rax

	; create a salt buffer
	call	buffer$new
	mov	[salt], rax

	; create a header buffer
	call	buffer$new
	mov	[headerbuf], rax

	; argscan to get our flags and other input files
	mov	rdi, [argv]
	mov	rsi, argscan
	call	list$foreach

	; do some sanity checking of our args
	mov	rdi, .toomanyinputfiles
	mov	rsi, [inputfiles]
	cmp	qword [rsi+_list_size_ofs], 4
	ja	.error
	mov	rdi, .inputfile
	cmp	qword [rsi+_list_size_ofs], 0
	je	.error

	; if we are decrypting, and input files > 0, puke
	cmp	[do_enc], 0
	jne	.skip_decfilecheck
	mov	rdi, .toomanyinputfiles
	cmp	qword [rsi+_list_size_ofs], 1
	jne	.error
.skip_decfilecheck:
	; now, load/parse/do whatever to our inputfiles before we go any further
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$load
	call	list$foreach

	; now we go our separate ways depending on whether we are encrypting or decrypting

	; dump our banner to stderr
	mov	rdi, banner
	call	string$to_stderr

	cmp	[do_enc], 0
	je	.decrypt

	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; encrypt
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------

	; if -noalt was not specified, _and_ if inputfiles count is 1, we need to add a bogus
	; inputfile to the mix
	cmp	dword [noalt], 0
	jne	.skip_bogus

	; otherwise, fill our inputfiles up to 4 with bogus material, random size based on main input file
.bogus_fill:
	mov	rdi, [inputfiles]
	cmp	qword [rdi+_list_size_ofs], 4
	je	.skip_bogus
	mov	rsi, [rdi+_list_first_ofs]
	mov	rdx, [rsi+_list_valueofs]
	mov	rdi, [rdx+inputfile_size_ofs]
	call	inputfile$new_bogus
	mov	rdi, [inputfiles]
	mov	rsi, rax
	call	list$push_back
	jmp	.bogus_fill

.skip_bogus:

	; generate our 32 byte SALT, we know that the default buffer size is 256, more than we need
	mov	rdi, [salt]
	add	qword [rdi+buffer_length_ofs], 32
	add	qword [rdi+buffer_endptr_ofs], 32
	mov	rdi, [rdi+buffer_itself_ofs]
	mov	esi, 32
	call	rng$block

	; each inputfile's totalsize is already set, garbage is already set, padlen is already set
	; first thing we have to do is acquire (or generate) the keys for each input file
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$keygen
	call	list$foreach

	mov	rdi, .encrypting
	call	string$to_stderr

	; next step is randomizing the inputfile list (this determines what order they appear in the output)
	mov	rdi, [inputfiles]
	call	list$shuffle

	; determine our offsets into our output (which is required before we can generate HEADER blocks)
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$extents
	call	list$foreach

	; generate our 128 byte HEADER_OR_IV block, we know that the default buffer
	; size is 256, more than we need so we can write directly to it
	mov	rdi, [headerbuf]
	add	qword [rdi+buffer_length_ofs], 128
	add	qword [rdi+buffer_endptr_ofs], 128
	mov	rdi, [rdi+buffer_itself_ofs]
	mov	esi, 128
	call	rng$block
	
	; generate a list of 8 header entries
	call	list$new
	mov	[headerblocks], rax
	repeat 8
		mov	rdi, [headerblocks]
		mov	esi, % - 1
		call	list$push_back
	end repeat
	mov	rdi, [headerblocks]
	call	list$shuffle

	; now we can let each input file calculate its HEADER and IV
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$headeriv
	call	list$foreach

	; output the 32 byte SALT to stdout
	mov	rcx, [salt]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, [rcx+buffer_itself_ofs]
	mov	edx, 32
	call	output

	; output the 128 byte headerbuf to stdout
	mov	rcx, [headerbuf]
	mov	eax, syscall_write
	mov	edi, 1
	mov	rsi, [rcx+buffer_itself_ofs]
	mov	edx, 128
	call	output

	; encrypt each output file
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$encrypt
	call	list$foreach

	; flush the output buffer
	call	output_flush

	; cleanup after ourselves
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$destroy
	call	list$clear

	; done, dusted.
	mov	rdi, .donemsg
	call	string$to_stderrln

	call	termreset
	mov	eax, syscall_exit
	xor	edi, edi
	syscall
cleartext .donemsg, 'Done'
cleartext .encrypting, 10,'Encrypting...'
calign
.decrypt:
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; decrypt
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; -----------------------------------------------------------------------------------------
	; if do_b64 was set, clear it either way because we have already converted the input
	; and we don't want our plaintext output to be base64
	mov	[do_b64], 0

	; the 32 byte SALT was set by the inputfile$load function call earlier

	; first thing we have to do is acquire the keys for the input file
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$keygen
	call	list$foreach

	; now just call decrypt, which will scan/find/do the deed
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$decrypt
	call	list$foreach

	; flush the output buffer
	call	output_flush
	
	; cleanup after ourselves
	mov	rdi, [inputfiles]
	mov	rsi, inputfile$destroy
	call	list$clear

	; done, dusted
	mov	rdi, .donemsg
	call	string$to_stderrln

	call	termreset
	mov	eax, syscall_exit
	xor	edi, edi
	syscall

.needinputfile:
	mov	rdi, banner
	call	string$to_stderr
	mov	eax, syscall_write
	mov	edi, 2
	mov	rsi, .msg_usage
	mov	edx, .msg_usage_len
	syscall
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
calign
.error:
	call	string$to_stderrln
	call	termreset
	mov	eax, syscall_exit
	mov	edi, 1
	syscall
cleartext .inputfile, 'input file required'
cleartext .toomanyinputfiles, 'too many input files'
dalign
.msg_usage:
db	'Usage: toplip [-b] [-d] [-r] [-m mediafile] [[-nomix|-drbg][-1][-c COUNT][-i ITER ]-alt inputfile] [-nomix|-drbg][-1][-c COUNT][-i ITER ]inputfile',10,\
	'  -b == input/output in base64 (see below notes)',10,\
	'  -d == decrypt the inputfile',10,\
	'  -r == generate (and display of course) one time 48 byte each pass phrases as base64',10,\
	'  -m mediafile == for encrypting only, merge the output into the specified mediafile.',10,\
	'       Valid media types: PNG, JPG (plain JFIF or EXIF).',10,\
	'       (Note that decrypting will auto-detect and attempt to extract if the inputfile for',10,\
	'       decryption is given a media file).',10,\
	'  -1 == for each input file (-alt or main), this option disables the use of cascaded',10,\
	'       AES256, and instead uses a single AES256 context (two for the XTS-AES stage).',10,\
	'  -c COUNT == for each input file (-alt or main), this option overrides the default count',10,\
	'       of one (1) passphrase. Specifying a higher count here will ask for this many actual',10,\
	'       passphrases, and generate this number of separate key material and crypto contexts',10,\
	'       that are then used over-top of each other (cascaded).',10,\
	'  -i ITER == for each input file (-alt or main), specify an alternate iteration count',10,\
	'       for scrypt',0x27,'s internal use of PBKDF2-SHA512 (default is 1). For the initial 8192',10,\
	'       bytes of key material, and before one-way AES key grinding of same, we use scrypt',10,\
	'       and this option overrides how many iterations of PBKDF2-SHA512 it will perform',10,\
	'       for each passphrase. (NOTE: this can _dramatically_ increase the calc times).',10,\
	'       Hex values or decimal values permitted (e.g. 10, 0xfff, etc).',10,\
	'  -drbg == for each input file, by default the 8192 bytes of key material is xor',0x27,'d with',10,\
	'       TLSv1.2 PRF(SHA256) of the supplied passphrase(s). This option will mix the key',10,\
	'       material with HMAC_DRBG(SHA256) instead.',10,\
	'  -nomix == for each input file (see -drbg), this option specifies no additional mixing',10,\
	'       of the scrypt generated 8192 byte key material.',10,\
	'  -alt inputfile == generate one or more "Plausible Deniability" file (encrypting only)',10,\
	'       This will ask for another set of passphrases, which MUST NOT be the same.',10,\
	'       Without this option, three alternate contents are randomly generated such that it is',10,\
	'       impossible to tell by examining the encrypted output whether there is or is not',10,\
	'       anything other than pure random. See the -noalt option for what happens without',10,\
	'       this option. This option can be specified up to 3 times (for a max of 4 files).',10,\
	'  -noalt == Do not generate additional random data. By default, extra random data is',10,\
	'       inserted into the encrypted output such that forensic analysis (with a valid set',10,\
	'       of passphrases) on a given encrypted output does not cover all of the ciphertext',10,\
	'       present. See further commentary below about why the default setting is a good thing.',10,\
	'       Specifying this option means that no extra random data is inserted into the output',10,\
	'       (and this might be useful if you do not need plausible deniability, or you are',10,\
	'       dealing with very large files).',10,\
	'  if -b is specified for encrypt, base64 of the encrypted goods is output to stdout',10,\
	'  if -b is specified for decrypt, it is assumed the input is base64, and plaintext is output to stdout',10
.msg_usage_len = $ - .msg_usage



include '../ht_data.inc'