NAME uefwalk -- UEF file extractor and summariser SYNOPSIS uefwalk [--output=OUTPUT-OPTION] [-v|--verbose|-q|--quiet] [--number-all|--no-number-first] [--test-crcs|--no-test-crcs] [--strip-uef|--no-strip-uef] [-?|-h|-u|--help|--usage] [FILE...] DESCRIPTION Uefwalk reads a UEF stream from the named file(s) or from standard input, if the file name is a single dash (-) or no file name is given. A UEF stream on standard input must be uncompressed, but named regular files may or may not be compressed. The script will either silently test the input (the default behaviour), print a summary of the contents, extract a specified class of information to standard output, or extract all understood information into separate files. The file will be interpreted according to the UEF version recorded in its header -- uefwalk conforms to versions 0.8, 0.9 and 0.10, and will accept all other versions. In particular, the time units of the tape chunks, and the syntax of the generic chunk, are selected according to the version. If uefwalk detects a violation of the UEF specification (for instance, if it cannot 'walk' along the chunks, or finds an invalid chunk ID, or has one of numerous chunk-specific tests fail) it will exit and report the error. Uefwalk can currently extract information from the following types of chunk: 0x0001, game instructions / manual 0x0003, inlay scan 0x0100, implicit start/stop bit tape data block 0x0102, explicit tape data block 0x0104, defined tape format data block 0x0110, high tone 0x0111, high tone with dummy byte 0x0112, baudwise gap 0x0113, change of base frequency 0x0114, security waves 0x0115, polarity / phase change 0x0116, floating point gap 0x0117, data encoding format change 0x0120, position marker 0x0130, tape set info 0x0131, start of tape side The program can display summaries of the following types of chunk: 0x0000, origin information 0x0001, game instructions / manual 0x0003, inlay scan 0x0005, target platform 0x0009, short title 0x000A, visible area 0x0120, position marker 0x0200, disc info 0x0201, single implicit disc side 0x0210, explicit disc track 0xFF00, emulator identification string Tape chunks (0x0100..0x01FF) are not summarised due to there being typically large numbers of them in a file, which would clutter the summary listing. Other chunks are simply listed with their ID and length, and not processed further. OUTPUT FORMATS Here is the specification of each form of output that uefwalk will produce: Summary and Instruction manuals Output as ASCII text with the local system's conventional newlines. .txt is the extension for manuals (see below). Inlay images Cassette inlays are output in Portable Graymap (PGM) or Portable Pixmap (PPM) format. These are highly simplified formats which can be converted to other types using the PBMPLUS suite alias Libgr, its successor NetPBM, and many image viewers running under Linux. The extension is either .pgm or .ppm according to whether the image is in greyscale or in colour. The binary or ASCII forms of the PBMPLUS format (see below) will be used depending on the colour depth. Unix-style newlines are always used. Cassette bit stream All cassette chunks are written to standard output or to one file whose extension is .bit. A bit stream consists of (long) lines of ASCII text each ending in the local system's conventional newlines. Empty lines are ignored. Lines beginning with . (dot) contain directives that can be ignored by simple applications. A directive has one of the following forms: .rem A comment which is to be ignored. . A gap in the tape. gives the number of CDDA samples (1/44100ths of a second) to leave blank. The rest of the line is reserved; any text following the number must be ignored. .basefreq Specifies the base frequency to use hereafter. The default is 1200 Hz, and the replacement value can be non-integer. .cycles Specifies the data encoding format. 1 is for 1200 baud Acorn data, 4 is for 300 baud CUTS data. .marker Marks this point in the tape with the given label. .mediaset Describes the source media. can be 'generic', 'tape', 'disc', 'dat', '8-trak' (without the quotes) or a code digit, which only serves to select the vocabulary. is the number of tapes, discs or cartridges. can be 'mono', 'stereo', 'quad' (without the quotes) or the number of tracks or channels per side. .phase Sets the phase of the following signal. contains an integer between 0 and 359 inclusive, representing the phase shift angle in degrees. The value 0 means each cycle of the signal spans from rising edge to rising edge (-/+), 90 means from peak to peak (+). The default phase angle is 180. This definition conforms to Draft 23 of the UEF spec. .platform Makes a platform and keyboard recommendation. can be 'a', 'b', 'electron', 'master', 'atom' (without the quotes) or a UEF code digit. can be 'any', 'physical', 'logical' or a UEF code digit. .position Marks the start of a track or channel. is the unit number starting from 1. can be A, B, 1 or 2. can be 'left', 'right', 'mono' (without the quotes) or the track number starting from 1. is an optional description of the side or track. Unlike the .marker directive this marks a location that can be found by the media player itself. .sign The phase of the signal is rotated by 180 degrees at this point. Use of this directive is deprecated. .title Gives a short title suitable for a title bar, for example. Lines not beginning with a dot contain the digit stream in text form. Each line corresponds to at most one UEF chunk, and must start with one of the following strings categorising the binary sequence on that line: S Implicit start/stop bit tape data block (0x0100). The bit stream includes the start and stop bits, which will be discarded when it is converted to a binary UEF chunk. H High tone (0x0110). All but the last character are assumed to be 1s, and are counted only. The last character is I to indicate a half bit or 1 for a full bit. DH High tone with dummy byte (0x0111). The rest of the line contains the high tone before the dummy byte. Following that is an S or G line containing the dummy byte, then an H line giving the trailing high tone. When converted to UEF format the three lines are treated as a unit, and the dummy byte is discarded (it is assumed to be a standard 0xAA). G Explicit tape data block (0x0102). Used if the contents of the line are non-standard and contain only whole bits. CCifr Security waves (0x0114). Used if the contents of the line are non-standard or have not been parsed. The CC is followed by three letters having the following meanings: i f r Initial unit Final unit Reserved D Omit first half Omit last half --- H Play first half Play last half --- G Unknown Unknown Valid This line is a direct translation of the Security Wave (0x0114) chunk and so shares its semantics. In this case a 'unit' corresponds to one binary 0 or half a binary 1 at the current baud rate. If only one unit is present then at least one half of it must be played. The rest of the line contains the security waves. Chdps Defined tape format data block (0x0104). The C is followed by four letters having the following meanings: h d p s Stop half-bits Data Parity Stop bits C --- [sec. wave] 6 bits None 0 bits D 0 half bits 7 bits Even 1 bit H 1 half bit 8 bits Odd 2 bits S --- 9 bits Present 3 bits G Unknown Unknown Unknown Unknown Stop half-bits, if specified, follow the stop bit(s) at the end of every field. Parity present is an assertion that each field contains a parity bit and the entire line fulfils all odd or all even parity. Only nine of the combinations are valid within a UEF 0x0104 chunk. The line continues and ends with the formatted data. The start, parity and stop bits are included and will be discarded on conversion to UEF. The remainder of the line is data, one ASCII character per bit. The meanings are affected by the current baud rate: Bit stream Freq. Cycles Cycles 0x0114 chunk Other chks. ASCII character @1200bd @300bd Binary digit(s) Binary dig. "0" 1200Hz 1 4 1 0 "I" 2400Hz 1 4 0 n/a "1" 2400Hz 2 8 00 1 Characters "E", "U" and "i" are synonyms for "1". Characters "O", "e" and "u" are synonyms for "0". Character "I" must not appear immediately before "1", its synonyms, or another "I". Simple applications may choose to treat it as a synonym for "1". This format is intended to show how to decode the UEF cassette chunks into strings of binary data using Perl code. It is a throwaway format and may be radically changed or discontinued in future scripts. OPTIONS --output=OUTPUT-OPTION Controls what is written to standard output. Choose one of the following: --output=none Print nothing to standard output. This is the default behaviour. --output=summary Print a summary of the contents, including all chunks except cassette chunks (0x0100..0x01FF). Forces --quiet. --output=bitstream Decode the cassette chunks and print an ASCII representation of the bit stream. The output is suitable for conversion into a waveform that can be fed into the cassette interface of an Acorn computer. --output=inlays Decode all inlay images in the file and write each one to standard output in Portable Graymap (PGM) or Portable Pixmap (PPM) format. --output=manuals Extract all instruction text to standard output. Individual instruction chunks lose their identity as their contents are catenated. --output=to-files Write nothing to standard output; instead extract the bitstream to a file and each inlay image and instruction chunk to its own file. The file names are based on the name given on the command line (minus the trailing .uef, if there was one) or on 'stdin' if standard input was used. --output=file-dump Write nothing to standard output; instead dump the contents of all standard tape blocks to files, named after the blocks themselves. If no file names are duplicated and all blocks are valid and in the correct order, the dumps will contain the files that were saved on the cassette. UEFwalk will append the data to any existing files having the same name. USE WITH CAUTION. SWITCHES -v --verbose Print the summary of contents on standard error. This switch has no effect when --output=summary is specified. -q --quiet Disable printing of the summary of contents on standard error. This is the default behaviour. --number-all Effective only with --output=to-files. The bitstream file is named 'NAME.EXT' and all generated inlay and instruction files have names of the form 'NAME.NNNN.EXT', where: NAME is the file name given on the command line, minus the '.uef' extension if there was one; if standard input was used then NAME is 'stdin'; NNNN is one of two four-digit counters starting from 0000, one counter for inlays and one for instructions; EXT is the extension appropriate to the file contents, for example 'bit', 'pgm', 'ppm' or 'txt'. Example filenames that would be generated from 'demo.uef' therefore include demo.bit, demo.0000.txt, demo.0000.ppm, demo.0001.pgm. --no-number-first As --number-all, but if the appropriate counter is 0000, then use a file name of the form 'NAME.EXT', that is, omit the number. This is the default. A command sequence such as uefwalk --output=to-files demo.uef would create files named demo.bit, demo.txt, demo.ppm, demo.0001.pgm. --test-crcs Calculate the Cyclic Redundancy Check (CRC) codes for cassette data blocks and compare them with the stored CRCs. If they do not match then report and continue. To find the stored CRCs the filename must be parsed properly; otherwise, the block is reported as non-standard and the program continues. Noise (three or fewer bytes of data) are ignored. This is the default behaviour. --no-test-crcs Do not test the Cyclic Redundancy Check codes or warn of non-standard blocks. --strip-uef If the file name ends in .uef, strip this extension when building the names of the new files. This is the default. --no-strip-uef This prevents uefwalk stripping the .uef extension from the file name when building the names of the new files (so that you can create demo.uef.txt from demo.uef if you want to.) -? -h -u --help --usage Print brief usage instructions on standard output, then exit. -- Forcibly ends option and switch processing; only arguments follow this switch. ARGUMENTS FILE... The name(s) of the file(s) to read. If it is - (a single dash), or if no file name is given, then standard input will be read. The file's name does not have to have a .uef extension, but if it does, so must the argument. CAVEATS Some chunks are not yet analysed -- a successful completion of uefwalk does not guarantee a UEF file error-free. HISTORY v1.53 2024-05-06 * Data bits in security wave chunks are now interpreted according to Draft 28; that is, similarly to other chunks and contrary to ITU-T Recommendation V.1. v1.52 2023-06-13 * Bitclean: acquires initial waves correctly under -mn. * Kleen README: syntax of example sox commands modernised. v1.51 2016-05-20 * Uefwalk: removed deprecated option from gzip call. * Kleen: fixed parsing of .cycles directives. v1.50 2008-03-25 * Security wave chunks (0x0114) have the first cycle in the MSB, as per Draft 28. * Atom format streams are now represented as "8N-1" in UEF chunk &0104, as per Draft 28. The bitstream representation is unchanged (CHHCD). v1.40 2008-03-02 * Support for media set chunks, two years after they were added to the UEF spec at my request. * Added .title, .mediaset, .position and .platform bitstream directives to generate metadata chunks. * Replaced all mention of 'baud rate' with either 'data encoding format' or 'base frequency'. All two byte 'baud rate' chunks change the data encoding format; the four byte chunk of v0.10 sets the base frequency. The latter is preserved but not used in UEFwalk. * Tolerates standard data blocks with an accidental extra copy of the last CRC byte. v1.35 2005-06-03 * Added missing declaration of private array in unpack_float() * Corrected the test for parity errors in bitparse * Replaced parity indicators in bitparse output with plain bits v1.34 2005-02-08 * Improved combining of half bits in bitparse * Disabled printing of baud rate in uefwalk summary v1.33 2004-12-05 * Prints catalogue of tape files in .inf format. v1.32 2004-12-04 * Reduced gain of wave output to 95% linear * Fixed splitting of gaps etc. into chunks containing even values, for applications that do not understand half bits * Numerous documentation revisions v1.31 2004-08-25 * Support for phase records and half stop bits in fixed format data v1.30b3 2004-08-18 * Small fix to a regular expression in bitparse v1.30b2 2004-08-09 * Fixed bitclean and bitparse regarding 300 baud operation and internal substitute characters v1.30b 2004-08-07 * Support for floating point baud rate and gap chunks * Deleted non-standard and noise block entries from summary v1.24 2004-08-05 * Accepts, but does not support, floating point baud rate chunks as in v0.10. v1.23 2004-07-29 * Validates top bit of bytes containing 7-bit fixed format data. v1.22 2004-07-20 * Updated documentation and Kleen: the current baud rate does apply to Security Wave chunks. v1.21 2004-07-07 * Deleted standard block summary, which cluttered output (and was called at incorrect point) v1.20 2004-07-05 * Support for UEF version 0.10 including new definitions of high tone and baudwise gap chunks * Added file-dump mode v1.13 2004-01-12 * Corrected known version to 0.9; draft 16 has been released and names new version 0.10 * Corrected comment regarding 8-bit colour output: r,g,b v1.12 2003-09-27 * Bitstream file format defined in more detail v1.11 2003-08-06 * Added --output=none option, made default for better behaviour of --quiet * --dont-* switches renamed --no-* * --always-number switch renamed --number-all * redundant variable declaration removed v1.10 2003-04-12 * Updated to UEF Specs 0.9 draft 14 * Ignores junk after first NUL in text chunks v1.01 2002-12-03 * Limited release 2003-01-05 * First public release 2003-03-11 * Reset counters between input files v1.00 2002-11-30 Named Uefwalk * First limited release AUTHOR Greg Cook, debounce@yahoo.co.uk http://regregex.bbcmicro.net/ ACKNOWLEDGEMENTS Many thanks to: Thomas Harte, as author and maintainer of ElectrEm and the UEF format, and for his kind assistance in clarifying the specifications. Fraser Ross, author of MakeUEF and CSW, for contributing the v0.10 expansions and introducing the square wave approach to cassette decoding. Thomas Harte, Dave Moore and Fraser Ross, for discussions and test material relating to the UEF expansion to cover phase shifts. Wouter Scholten, for raising awareness of the unique Atom data format. REFERENCES UEF File Format Specification (version 0.10 / document draft 28, revised 10/Jan/2006) http://electrem.emuunlim.com/UEFSpecs.htm The BBC Micro Mailing List Archive: http://nelsonit.net/~jon/BBCMicro/ To subscribe: http://www.nvg.org/bbc/netresource.php3#mail DISCLAIMER The software and documentation included in Uefwalk is supplied without warranty, not even the implied warranties of merchantability or fitness for a particular purpose. In no event shall the author or his suppliers be liable for any loss, damage, injury or death, of any nature and howsoever caused, arising from the use of, or failure, inability or unwillingness to use, this software or documentation. -END-