OKsort

 

sorts files with fixed or variable record size in three passes respecting Czech language rules.

Brief description

Syntax

Input and output file names are expected as OKsort command line parameters. Long file names are not supported. The second file name can be omitted, in this case the sorted text goes to screen. OKsort works like a standard input/output filter when both file names are omitted.

When output file already exists, OKsort ask for permission to overwrite: Výstupní soubor už existuje. Přepsat (A/N) ? Press A (=yes) to allow or use parameter /O (=do not ask).

Parameters may start with slash / or hyphen - and are not case sensitive. Some parameters require a value which may be written as in any of the following examples:

 /PositionOfKey=10  /p10   -pos: 10  -p0x0A    /P0Ah

File types

Sorting program changes the order of records in file. Records have fixed length which is set with parameter /R(ecordSize)=.

OKsort also works with text files with variable record size. In this case the /R= parameter must not be specified. A text record (i.e. line) is row of characters terminated with LineFeed (ASCII 10).

Sort key definition

Sorting reorders records in such a way that sort keys are in monotonous progression. The key is identical with the whole record by default but can be made shorter (/SizeOfKey=) and can start at any offset withing the record. Default offset is /Position=0. Sort key size is limited to 16384.

Sort order

Program sorts in ascending progression by default; this can be reversed with /D(escending) parameter.

In the Czech language diphtongs "ch", "Ch" and "CH" are being sorted between letters "H" and "I". This rule can be supressed with /C parameter.

OKsort removes redundant (two or more adjacent) spaces and replaces them with single space before collation. This can be supressed with /V.

Head of file

Some file formats (e.g. dBIII) start with a header which should stay intact. Use parameter /H= to specify number of bytes which are to be copied to output untouched.

In case of text files (/R= omitted) this parameter /Header= specifies how many lines of text is to be copied verbatim.

Tail of file

If the input file size is not an integer multiply of /record size, the last incomplete record is virtually completed with ASCII (0).

If the last line of text file is not terminated with ASCII (10), the missing LineFeed will be virtually appended to it.

The EOF marker ASCII (26) is appended to the output only if it was found in the input file. Parameter /Y tells OKsort to never append EOF marker, parameter /Z means always append EOF.

National character sorting

Using one of the parameter /ASCII7bit, /ISO8859-2, /Kamenicky, /Latin2 or /WindowsCP1250 tells OKsort which codepage is the sorted file using. Codepage is autodetected from input text if none of these parameters is specified.

OKsort can use external 3pass sort weight tables specified with /T=file. Size of such file must be 3*256=768 bytes. You can use SortKit utility to create an external table.

Diagnostic messages

After the input file is read, OKsort reports some diagnostic information on standard error output. This info can be supressed with /N. Example:

Vstup        : *Standard                    input
Výstup       : *Standard                    output
Kódování     : *PC Latin2                   detected codepage
Velikost dat :    507580 B                  file size
Počet záznamů:     10065                    number of records
Spotřeba XMS :       656 KB                 XMS memory allocated
Předtříděno  :       100 %                  pre-sort phase completed
Zatříděno    :       100 %                  merge phase completed
Trvání       :         5 s                  time elapsed

Memory management

Program requires DOS or compatible emulation, CPU Intel386 or higher, 256KB of conventional memory and some additional memory. Size of required additional memory can be calculated as input file size plus 4 bytes per record of fixed size and 8 bytes per record of variable size. This additional memory is allocated by OKsort from DOS/VDM resources in the following order:

  1. XMS - extended memory (HIMEM)
  2. EMS - expanded memory (EMM386)
  3. VMS - virtual filesystem memory (RAMDRIVE)
  4. DMS - disk cache memory (SMARTDRV)

OKsort limits:

max. record size          268 402 688 bytes
max. sort key length           16 384 characters
max. number of records     13 420 544 (variable size)
max. number of records     44 736 512 (fixed size)

Alphabetical list of parameters

/A codepage is ASCII 7bit
/C do not consider CH as diphtong
/D descending order
/E= limit EMS allocation to the given amount of KB
/H= header size in bytes (/R= specified) or in lines (/R= not specified)
/I codepage is ISO-8859-2
/K codepage is Kamenických CP 895
/L codepage is PC Latin2 CP 852
/M output Czech manual
/N suppress diagnostic information
/O overwrite output file
/P= Position of key within record (default /P=0)
/R= record size
/S= Size of sort key (default is to the end of record)
/T= external sort weight table file
/V do not condense redundant spaces in sort key before compare
/W codepage is Windows CP 1250
/X= limit XMS allocation to the given amount of KB
/Y never append EOF marker at the end of output file
/Z always append EOF marker (^Z)