Kleen Suite of extra Perl scripts distributed with UEFwalk Greg Cook, 13/Jun/2023 CONTENTS bitclean Converts between audio signal and bit stream bitparse Marks up a bit stream as data blocks, high tone, gaps etc. bitsplit Divides a bit stream into several files based on markers oetm Puts out UEF origin, manual and cover scan chunks readme.txt Informal documentation (the file you are reading now) writeuef Translates a marked-up bit stream into binary UEF chunks INTRODUCTION The UEFwalk utility was written to test my understanding of the UEF format, and publish Perl code that could interpret UEF files. It was also a way to retrofit UEF capability into a suite of scripts I had written earlier to preserve BBC cassette data, before I had discovered Thomas Harte's work. Thanks to him there was a standard way of exchanging Acorn 8-bit media verbatim, with plenty of scope to record incidental details such as instruction manuals and cover artwork. The original text-based format is still present in these scripts as the waveform analyser (bitclean) is not accurate enough to create UEF files in one step. There are other UEF writing utilities available (MakeUEF, to name but one) if you desire ease-of-use... The Kleen suite has the following features: * Reads raw PCM audio samples, gives good results even without filtering. * Recognises standard and non-standard data, security waves and high tones - automatically and with a minimum of loss. The parser can be applied to bit streams from .tape files and from older UEF files. * Recognises and compacts 300 baud data recorded with 2400Hz/1200Hz tones - as produced by the BBC Micro - also automatically. * Hysteresis signal detection plus amplitude threshold - immune to DC bias, low levels of mains hum and of noise. * Analyses and synthesises signals in any of four user-selected phases, resolves single cycles. The analysis includes enough information to calculate the mean base frequency if required. * The analyser does not reject corrupted data blocks (UEFwalk can find these later). Useful if only a single sampling is available. * You can patch the data stream using only a text editor. Most single bit errors are ready-marked for you to select the appropriate correction. * Position markers can be added using a text editor. * You can splice two or more recordings together, add a title, software usage instructions and other data chunks. NAME bitclean - clean (convert) audio signals to bit stream and vice versa SYNOPSIS bitclean [-doivnm][--debug][--text-output][--text-input] [--verbose][--negative][--middle] FILE DESCRIPTION Bitclean performs signal processing and reconditioning on waveforms of audio cassette signals generated by the BBC Microcomputer and Acorn Electron. It can accept audio data and put audio data out, thereby generating a purified audio signal which can be highly compressed by lossless algorithms. It can also accept data in an internal text-based 'bit stream' format and put it out, recalculating the time duration of the waveform it represents. Its chief uses however are to decode a waveform into a bit stream for processing and use in Acorn emulators, and to play back Acorn data as a waveform that can be read by a BBC Micro or an Electron. The cycles directive selects between the Acorn data encoding at 1200 baud (when set to 1) or the CUTS data encoding at 300 baud (when set to 4). Both encodings use 1200 Hz and 2400 Hz cycles. The precise base frequency is not preserved. Apart from that, bitclean currently preserves only gap directives. The waveform analyser is sensitive to the phase of the input signal; if it cannot synchronise to the signal, try --negative, --middle or both. SWITCHES -d --debug Print internal variables and progress messages on standard error during execution. Implies --verbose. -o --text-output Put out data in the bit stream (text) format. The default is to put out audio data. See DATA FORMATS below. -i --text-input Accept data in the bit stream (text) format. The default is to accept audio data. See DATA FORMATS below. -v --verbose Print progress messages on standard error during execution. The default is quiet operation. -n --negative Reverse the polarity of all input and output waveforms. The frequency will change at the trough (-) or on the rising edge (-/+). The default is to change at the peak (+) or on the falling edge (+/-). This switch does not affect the timing or length of the output. -m --middle Change frequencies at zero volts. The default is to change frequencies at the peak or trough. Due to rounding errors and the insertion of quarter waves between a gap and a wavetrain, the length and timing of the output depends on this switch. This affects both audio and bit stream output. The --middle switch is important if you are recording the output to a cassette tape. See CAVEATS below. ARGUMENT FILE The name of the file to read. FILE cannot be omitted. If bitclean is to read from standard input, FILE must be - (dash). DATA FORMATS Audio waveforms are encoded as raw, single channel, 44100 Hz, signed word, PCM data in the local byte order. This can be converted to WAV (for example) with the following arguments to SoX: sox -D -t raw -c 1 -r 44100 -b 16 -e signed-integer \ output.raw -t wav output.wav To present arbitrary WAV files in the correct format, use the following SoX command line: sox -D input.wav -t raw -c 1 -r 44100 -b 16 \ -e signed-integer - | bitclean ... A bit stream consists of (long) lines of ASCII text each ending in the local system's conventional newlines. Empty lines are ignored. For a full description of the bit stream format, see the documentation for UEFwalk. This format is intended to show how to decode the UEF cassette chunks into strings of binary data using Perl code. It is a throwaway format and may be radically changed or discontinued in future scripts. NAME bitparse - parse a bit stream and add markup for conversion to a UEF file SYNOPSIS bitparse [FILE...] DESCRIPTION Bitparse accepts a text-based bit stream representing an Acorn audio cassette data signal, and parses the stream to prepare it for conversion to a UEF file. It then puts out the data in the same text-based format (see DATA FORMATS above) but with extra characters serving as markup for the UEF writing script, writeuef. The parser uses Perl regular expressions to recognise the three component parts of a wavetrain: standard data blocks, generic data blocks and high tones. When a match is found a newline and one or two characters are inserted, indicating the type of data found. Bitparse is designed to pass unchanged a bit stream it has already processed. It operates on each line separately (lines of over 300K characters are handled without problems), therefore avoid inserting newlines, and only then at the end of high tones (strings of at least forty-eight 1s.) NAME bitsplit - divide marked-up data into separate files according to position markers SYNOPSIS bitsplit [FILE...] DESCRIPTION Bitsplit reads a bit stream from standard input and creates a number of files in the current directory, each containing a segment of the bit stream delimited by position marker directives. Initially each line of the input is copied to a file named _bof.bit (standing for Beginning of File). When a marker directive is met (see DATA FORMATS under bitclean above) the current output file is closed and a new one, named after the marker text plus an extension of .bit, is opened. The marker directive and all subsequent lines are then copied into this new file, until another marker is found as before or until the input ends. Then the current file is closed. Therefore, each line in the input appears in exactly one of the files created by bitsplit. NAME oetm - put out UEF Origin, extra data, Title and Manual chunks SYNOPSIS oetm ORIGIN_FILE EXTRA_BYTES TITLE_FILE MANUAL_FILE DESCRIPTION Oetm is a simple tool for generating UEF chunks to append to files under construction. It will put on standard output any, all or none of the following, in the specified order: User-defined binary data Short Title (0x0009) chunk Game Instructions / Manual (0x0001) chunk Origin Information (0x0000) chunk ARGUMENTS All the arguments are compulsory. To disable a function, give an empty string, "", as the corresponding argument. ORIGIN_FILE The name of the file containing the UEF Origin string. If given, puts out a File Origin (0x0000) chunk whose body is the contents of the file, minus all carriage return (0x0D) characters. EXTRA_BYTES A hexadecimal string representing bytes to be written at the start of the output. EXTRA_BYTES must consist only of hexadecimal digits (0-9, a-f) and optional spaces, in which case the argument must be enclosed in quotes. The digits are taken in pairs, starting with the leftmost pair, to form bytes; spaces are ignored, but must not come between the digits of each pair. The first digit of each pair is the most significant digit. Signifiers such as '0x', '$' or '&' are forbidden. TITLE_FILE The name of a file whose first line contains the title, to be placed in a Short Title (0x0009) chunk and appended to the output. Only ASCII characters are allowed. The title is read from a file for ease of use in shell scripts. MANUAL_FILE The name of the file containing the text of the instruction manual. If given, puts out a Game Instructions / Manual (0x0001) chunk whose body is the contents of the file, minus all carriage return (0x0D) characters. NAME writeuef - translate marked-up data into binary UEF format SYNOPSIS writeuef EXTRA_BYTES [FILE...] DESCRIPTION Writeuef accepts a text-based bit stream representing an Acorn audio cassette data signal, and translates each line into at most one UEF chunk having equivalent contents. To generate complete UEF files a valid UEF header must be supplied via the EXTRA_BYTES argument. Alternatively, append the output to a valid, uncompressed UEF file. ARGUMENTS EXTRA_BYTES A hexadecimal string representing bytes to be written at the start of the output. EXTRA_BYTES is compulsory. If no extra bytes are required, give an empty argument, "". EXTRA_BYTES must consist only of hexadecimal digits (0-9, a-f) and optional spaces, in which case the argument must be enclosed in quotes. The digits are taken in pairs, starting with the leftmost pair, to form bytes; spaces are ignored, but must not come between the digits of each pair. The first digit of each pair is the most significant digit. Signifiers such as '0x', '$' or '&' are forbidden. FILE The name(s) of the marked-up file(s) to read. If it is - (a single dash), or if no file name is given, then standard input will be read. EXAMPLES To set an environment variable in Bash, containing the UEF header in hexadecimal: UEF_HEADER='5545 4620 4669 6C65 2100 0A00' This variable will be used in the subsequent examples. To extract the bit stream from a .wav file: sox -D tape.wav -t raw -c 1 -r 44100 -b 16 \ -e signed-integer - \ | bitclean --text-output - > tape.bit To convert the bit stream to a UEF file: bitparse tape.bit | writeuef "$UEF_HEADER" | gzip -cn \ > tape.hq.uef To merge two bit streams into a UEF file: bitparse side1.bit side2.bit | writeuef "$UEF_HEADER" \ > merge.hq.uef Alternatively: bitparse side1.bit | writeuef "$UEF_HEADER" > merge.hq.uef bitparse side2.bit | writeuef '' >> merge.hq.uef To append extra information to an uncompressed UEF file (merge.hq.uef) using oetm: echo -n 'Your name here http://www.example.org/' \ > origin.txt echo -n 'Example - Game' > title.txt oetm origin.txt '' title.txt manual.txt >> merge.hq.uef gzip -cn merge.uef > game.hq.uef To put Wouter Scholten's .tape format into a UEF 'wrapper': perl -0777 -ne 'print pack("Z10C2vVC", "UEF File!", ' \ -e '0x0A, 0x00, 0x0102, length()+1, 0x08), $_;' \ fortress.tape | gzip -cn > fortress.uef To convert a .tape file to a UEF file with standard data chunks as required by some BBC emulators: perl -0777 -e 'print "G",unpack("b*",<>),"\n"' \ fortress.tape \ | bitparse \ | writeuef "$UEF_HEADER" | gzip -cn > fortress.uef (If the '| bitparse' is omitted, this is a less efficient equivalent of the previous 'wrapper' example.) To produce a .wav file from a UEF file: uefwalk --output=bitstream --quiet fortress.uef \ | bitclean --verbose --text-input - \ | sox -D -t raw -c 1 -r 44100 -b 16 \ -e signed-integer - -t wav -b 16 fortress.wav CAVEATS (for all files) Do not try to listen to the audio signals. In fact do not play ANY computer data signals on loudspeakers or headphones. It is almost always recorded at full gain and may be surprisingly loud - PERMANENTLY damaging your hearing and/or your audio amplifier. On the same note (no pun intended) be extremely careful when playing back the signals to another computer - it can also be damaged by over-amplified output. Start with zero volume and load the appropriate Welcome cassette, increasing the volume up to approximately 50 per cent (guidance only) or until the computer recognises the signal. Acorn Electrons are sensitive to the phase of the tape input signal. Supply the --middle switch to bitclean if you are recording the signal on to an (analogue) cassette tape which will be played back to an Electron. If very poor reliability results, toggle the --middle switch. Do not feed bitclean its own audio output repeatedly. Rounding errors and drifting will result, especially in the lengths of high tones and gaps. All newlines in the input to bitparse will be preserved. This means newlines may break up what is otherwise a single standard data chunk. It is safe, and in most cases essential, to pass long lines of text to the parser. The Kleen suite only resolves whole cycles, for historical reasons. Bits are treated as having zero width, this increases gap lengths by the length of the first bit. Calculation of the mean absolute base frequency awaits a future release. AUTHOR Greg Cook debounce@yahoo.co.uk http://regregex.bbcmicro.net/ DISCLAIMER The software and documentation included in Kleen is supplied without warranty, not even the implied warranties of merchantability or fitness for a particular purpose. In no event shall the author or his suppliers be liable for any loss, damage, injury or death, of any nature and howsoever caused, arising from the use of, or failure, inability or unwillingness to use, this software or documentation. -END-