| |
|
|
SHOE Spec Level 1
This level defines the overall structure of a Shoe file, independent
of its meaning. i.e. how objects are composed of properties.
This level makes it possible for the reader to skip over
parts which aren't understood. The Shoe format is designed to not have
any limits on file size, and to be writable in a single pass. This means
that complex objects cannot be proceeded by their size. Simple data blocks
can be proceeded by their size. So this level must completely specify
how complex objects are constructed out of simple ones. The basic structure
used is objects with properties.
Simplified summary:
a file consists of a header and a list of
objects.
an object consists of a type and a list of properties.
a property consist of a type and the property data, which can be
one of 5 kinds, including simple data block, object, list of objects,
object reference, and list of references.
Most of this is built using the 'arbitrary size integer'.
Since there is no limit on magnitude, it can't occupy a fixed number of
bytes.
file
object
- byte = object flags
- bit 0 clr to indicate not terminater
- bit 1 set to indicate object
- bit 2 = have property id
- bit 3 = have dictionary label
- bits 4-7 = nestingcheck (# of enclosing objects
mod 16)
- bits 1 and 4-7 are just a check, their correct
value is determined by the position in the file
- if bit 2 of object flags set then
- arbitrary size integer = object type
else object type = 0
- if bit 3 of object flags set then
- arbitrary size integer = dictionary entry id (can
not be nan)
- list of properties
- byte = (0x07 + (nestingcheck << 4)) = termination
of property list
property
- byte = property flags
- bit 0 clr to indicate not terminater
- bit 1 clr to indicate property
- this is just a check, the correct value is determined
by the position in the file
- bit 2 = have property id
- bits 3-7 are used for other info.
- if bit 2 of property flags set then
- arbitrary size integer = property id
else property id = 0
- if bit 3 of property flags clr then, simple data
block:
- if bit 4 set
- arbitrary size integer = format
(else format = 0)
- if bit 5 set
- arbitrary size integer = elsz (can not be nan
or inf) (else elsz = 3)
- if bit 6 set
- arbitrary size integer = n (can not be nan or
inf) (elsz n = 1)
- if bit 7 set
- arbitrary size integer = padding size (can not
be nan or inf) followed by padding of that size bytes. padding
must be zero.
- followed by data of length (in bits)
- nbits = n * (2 ^ elsz) length in bytes = (nbits
+ 7) DIV 8 if element size is less than byte, then use lower
order bits first. if nbits is not a multiple of 8, padding is
added to fill out the high order bits of the last byte. (padding
bits must be zero)
else (bit 3 set)
- if (bit 4 clr) (bit 5 clr) (bit 6 clr) (bit
7 clr)
else if (bit 4 clr) (bit 5 set) (bit 6 clr) (bit
7 clr)
- object list list of objects, terminated by 0x2D
else if (bit 4 set) (bit 5 clr) (bit 6 clr) (bit
7 clr)
- object reference arbitrary size integer - dictionary
entry id (can not be nan)
else if (bit 4 set) (bit 5 set) (bit 6 clr) (bit
7 clr)
- object reference list list of arbitrary size
integers - entry ids (can not be nan) terminated by arbitrary
int nan (0xFF)
format variation
- byte = format variation flags
- bit 0 clr to indicate not terminater
- bit 1 = have format variation id
- bit 2 = have format variation value
- bits 3-7 clr
- if bit 1 of variation flags set then
- arbitrary size integer = format variation id (else
format variation id = 0)
- if bit 2 set then have format variation value
- arbitrary size integer = format variation value (else
format variation value = 1)
arbitrary size integer
notes:
- For some implementations of reading a Shoe file,
need memory proportional to the maximum dictionary entry id. So when
writing a Shoe file, it's best to use ids from 0 to n-1, where n is
the number of ids used.
- properties can be written in any order that's convenient
to the writer. the reader must be able to read them in any order. (7/31/97
- exception: insist that, if present, the arg1,arg2,arg3, etc. properties
be written in that order. for now, using this to ensure that know variables
relation before have to read its value)
- a property id cannot be used more than once for
each object. but if read a file that has a property id defined twice,
the reader must not crash, or give a corrupted file. aside from this
constraint the results are undefined. the reader is not required to
detect or report the error.
- two objects cannot have the same dictionary entry
id. but if read file that has dictionary entry id defined twice, the
reader must not crash, or give a corrupted file. aside from this constraint
the results are undefined. the reader is not required to detect or report
the error.
- an object property should be treated as equivalent
to an object list property of length 1.
- an object reference property should be treated as
equivalent to a reference list property of length 1.
- arbitrary integer format starting with 0xFD is for
completeness. not likely to need integers >= 2^256 (1.2e77), but
when we say Shoe has no limits on size, we mean it. 0xFD can be used
with a size < 256 bits, but it is better to use one of 0xF8 through
0xFC instead.
- Data Desk never writes any optional padding for
simple data blocks (bit 4 of property flags), and isn't significantly
affected by data alignment when reading. this option is for the convenience
of other applications that may care. a Shoe writer can use the alignment
that is best for it. a Shoe reader must not require any alignment.
- a serious issue with this interchange format is
recovering damaged files, because if the reader gets lost in the stream
it cannot read the rest of the file. but an advantage of this format
is that each object is stored in a contiguous block, so if can find
the beginning, can recover the object. so one way to salvage a file
is to try reading an object at each file offset, and if read a valid
object then save it, else try the next offset. to make this work well,
the Shoe format has been defined to help detect when not reading a valid
object, such as making the list terminator different for different lists.
(the Shoe format is less prone to corruption problems than Data Desk
files, because a Shoe file is always written completely from scratch.
rather than simply writing out the current contents a data structure
that has been continually modified.)
|
|

The
Shoe specification consists of three levels:
level 0 - This
level defines the storage and/or transmission of the stream of bytes upon
which Shoe is based. i.e. file types, clipboard types.
level 1 - This
level defines the overall structure of a Shoe file, independent of its
meaning. i.e. how objects are composed of properties.
level 2 - This
level defines the interpretation of the Shoe file. i.e. object types,
property types, data types.
|