Data Description, Inc.
site map download order
 
  Data Desk
Data Desk/XL
Viz!on
ActivStats
ActivEpi
ProgramLive
KeyDonor

Product Updates
Demo Downloads
Data Desk Templates
SHOE file format

Product Registration
Customer Service
 

SHOE Spec Level 1

This level defines the overall structure of a Shoe file, independent of its meaning. i.e. how objects are composed of properties.


This level makes it possible for the reader to skip over parts which aren't understood. The Shoe format is designed to not have any limits on file size, and to be writable in a single pass. This means that complex objects cannot be proceeded by their size. Simple data blocks can be proceeded by their size. So this level must completely specify how complex objects are constructed out of simple ones. The basic structure used is objects with properties.

Simplified summary:

a file consists of a header and a list of objects.
an object consists of a type and a list of properties.
a property consist of a type and the property data, which can be one of 5 kinds, including simple data block, object, list of objects, object reference, and list of references.

Most of this is built using the 'arbitrary size integer'. Since there is no limit on magnitude, it can't occupy a fixed number of bytes.


file

  • 4 bytes = 'Shoe' (hex: 53 68 6F 65) = magic format identifer
  • 1 byte - file flags
    • bit 0 = have level 1 version
    • bit 1 = have format family
    • bit 2 = have format variation list
    • bit 3 = have check sum
    • bits 4-7 clr
  • if bit 0 of file flags set then
    • arbitrary size integer = level 1 version

    else level 1 version = 0
    [if level 1 version is not 0, then none of the rest of this specification applies]

  • if bit 1 of file flags set then
    • arbitrary size integer = format family

    else format family = 0

  • if bit 2 of file flags set then
    • list of format variations
    • byte = 0x03 = termination of variations list
  • list of objects
  • byte = 0x01 = termination of objects list
  • if bit 3 of file flags set then
    • arbitrary size integer = check sum value
      (checksum range from after format variation list to before check sum value)

object

  • byte = object flags
    • bit 0 clr to indicate not terminater
    • bit 1 set to indicate object
    • bit 2 = have property id
    • bit 3 = have dictionary label
    • bits 4-7 = nestingcheck (# of enclosing objects mod 16)
      • bits 1 and 4-7 are just a check, their correct value is determined by the position in the file
  • if bit 2 of object flags set then
    • arbitrary size integer = object type

    else object type = 0

  • if bit 3 of object flags set then
    • arbitrary size integer = dictionary entry id (can not be nan)
  • list of properties
  • byte = (0x07 + (nestingcheck << 4)) = termination of property list

property

  • byte = property flags
    • bit 0 clr to indicate not terminater
    • bit 1 clr to indicate property
      • this is just a check, the correct value is determined by the position in the file
    • bit 2 = have property id
    • bits 3-7 are used for other info.
  • if bit 2 of property flags set then
    • arbitrary size integer = property id

    else property id = 0

  • if bit 3 of property flags clr then, simple data block:
    • if bit 4 set
      • arbitrary size integer = format
        (else format = 0)
    • if bit 5 set
      • arbitrary size integer = elsz (can not be nan or inf) (else elsz = 3)
    • if bit 6 set
      • arbitrary size integer = n (can not be nan or inf) (elsz n = 1)
    • if bit 7 set
      • arbitrary size integer = padding size (can not be nan or inf) followed by padding of that size bytes. padding must be zero.
    • followed by data of length (in bits)
      • nbits = n * (2 ^ elsz) length in bytes = (nbits + 7) DIV 8 if element size is less than byte, then use lower order bits first. if nbits is not a multiple of 8, padding is added to fill out the high order bits of the last byte. (padding bits must be zero)

    else (bit 3 set)

    • if (bit 4 clr) (bit 5 clr) (bit 6 clr) (bit 7 clr)
      • object

      else if (bit 4 clr) (bit 5 set) (bit 6 clr) (bit 7 clr)

      • object list list of objects, terminated by 0x2D

      else if (bit 4 set) (bit 5 clr) (bit 6 clr) (bit 7 clr)

      • object reference arbitrary size integer - dictionary entry id (can not be nan)

      else if (bit 4 set) (bit 5 set) (bit 6 clr) (bit 7 clr)

      • object reference list list of arbitrary size integers - entry ids (can not be nan) terminated by arbitrary int nan (0xFF)

format variation

  • byte = format variation flags
    • bit 0 clr to indicate not terminater
    • bit 1 = have format variation id
    • bit 2 = have format variation value
    • bits 3-7 clr
  • if bit 1 of variation flags set then
    • arbitrary size integer = format variation id (else format variation id = 0)
  • if bit 2 set then have format variation value
    • arbitrary size integer = format variation value (else format variation value = 1)

arbitrary size integer

  • first byte
    • 0x00 - 0xEF small constant (0 to 239)
    • 0xF0 - 0xF7 less small constant (0 to 2047) followed by signed byte value = (byte2 * 8) + (byte1 - 0xF0)
    • 0xF8 - 0xFD size specification (0xF8 = 16, 0xF9 = 32, 0xFA = 64, 0xFB = 128 , 0xFC = 256) followed by signed data of that size bits (least signifigant byte first)
    • 0xFD recursive followed by ln2 of size as arbitrary length integer (error if negative) followed by signed data of that size bits (least signifigant byte first). (if size < 8 bits, padded with zeros to 8 bits. this is legal, but useless)
    • 0xFE inf (infinity)
    • 0xFF nan (not a number)

    when read arbitrary size integer, if the value is too large to represent, then act as if had read inf.


notes:
  • For some implementations of reading a Shoe file, need memory proportional to the maximum dictionary entry id. So when writing a Shoe file, it's best to use ids from 0 to n-1, where n is the number of ids used.
  • properties can be written in any order that's convenient to the writer. the reader must be able to read them in any order. (7/31/97 - exception: insist that, if present, the arg1,arg2,arg3, etc. properties be written in that order. for now, using this to ensure that know variables relation before have to read its value)
  • a property id cannot be used more than once for each object. but if read a file that has a property id defined twice, the reader must not crash, or give a corrupted file. aside from this constraint the results are undefined. the reader is not required to detect or report the error.
  • two objects cannot have the same dictionary entry id. but if read file that has dictionary entry id defined twice, the reader must not crash, or give a corrupted file. aside from this constraint the results are undefined. the reader is not required to detect or report the error.
  • an object property should be treated as equivalent to an object list property of length 1.
  • an object reference property should be treated as equivalent to a reference list property of length 1.
  • arbitrary integer format starting with 0xFD is for completeness. not likely to need integers >= 2^256 (1.2e77), but when we say Shoe has no limits on size, we mean it. 0xFD can be used with a size < 256 bits, but it is better to use one of 0xF8 through 0xFC instead.
  • Data Desk never writes any optional padding for simple data blocks (bit 4 of property flags), and isn't significantly affected by data alignment when reading. this option is for the convenience of other applications that may care. a Shoe writer can use the alignment that is best for it. a Shoe reader must not require any alignment.
  • a serious issue with this interchange format is recovering damaged files, because if the reader gets lost in the stream it cannot read the rest of the file. but an advantage of this format is that each object is stored in a contiguous block, so if can find the beginning, can recover the object. so one way to salvage a file is to try reading an object at each file offset, and if read a valid object then save it, else try the next offset. to make this work well, the Shoe format has been defined to help detect when not reading a valid object, such as making the list terminator different for different lists. (the Shoe format is less prone to corruption problems than Data Desk files, because a Shoe file is always written completely from scratch. rather than simply writing out the current contents a data structure that has been continually modified.)
 

The Shoe specification consists of three levels:

level 0 - This level defines the storage and/or transmission of the stream of bytes upon which Shoe is based. i.e. file types, clipboard types.

level 1 - This level defines the overall structure of a Shoe file, independent of its meaning. i.e. how objects are composed of properties.

level 2 - This level defines the interpretation of the Shoe file. i.e. object types, property types, data types.