Data File Format


ORCA data files consist of two main parts:


  1. a header

  2. the raw data


Starting with data version 2, the header was changed to XML and the data format was changed. See Old Format section for documentation of version 1 header and data formats.


Header


The header is written in xml and contains a description of the configuration of the system as it was at the time the data file was written. It also contains the information needed to decode the data records. A partial example of a particular header is shown below. The easiest way to dig into an ORCA header and learn what's included is to create one and open it in an editor. The header is most easily parsed using the Cocoa objects that created it, but it is relatively easy to write one of your own design.


Starting with ORCA v6.0, the header is preceded by two long words:


0000 0000 0000 00xx xxxx xxxx xxxx xxxx   1st word of the header

^^^^ ^^^^ ^^^^ ^^-----------------------  always set to all zeros.

                 ^^ ^^^^ ^^^^ ^^^^ ^^^^-  total length n of record (in longs)

xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx   2nd word of the record, length of header (bytes)


The header proper follows immediately:


<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<plist version="1.0">

<dict>

<key>Document Info</key>

<dict>

<key>OrcaVersion</key>

<string>6.5.3h</string>

<key>dataVersion</key>

<integer>3</integer>

<key>date</key>

<string>2007-12-18 11:03:38 -0800</string>

<key>documentName</key>

<string>/Users/markhowe/Desktop/Untitled.Orca</string>

</dict>

<key>ObjectInfo</key>

<dict>

<key>Crates</key>

<array>

<dict>

<key>Cards</key>

<array>

<dict>

<key>Card</key>

<integer>14</integer>

<key>Class Name</key>

<string>ORJADCLModel</string>

</dict>

<dict>

<key>Card</key>

<integer>24</integer>

<key>Class Name</key>

<string>ORC111CModel</string>

</dict>

</array>

<key>ClassName</key>

<string>ORCamacCrateModel</string>

<key>CrateNumber</key>

<integer>0</integer>

<key>FirstSlot</key>

<integer>1</integer>

</dict>


...

data for other hardware omitted

...

</dict>

    <key>dataDescription</key>

    <dict>

        <key>ORRunModel</key>

        <dict>

             <key>Run</key>

             <dict>

                 <key>dataId</key>

                 <integer>20185088</integer>

                 <key>decoder</key>

                 <string>ORRunDecoderForRun</string>

                 <key>length</key>

                 <integer>4</integer>

                 <key>variable</key>

                 <false/>

             </dict>

        </dict>

        <key>ORShaperModel</key>

        <dict>

             <key>Scaler</key>

             <dict>

                 <key>dataId</key>

                 <integer>18612224</integer>

                 <key>decoder</key>

                 <string>ORShaperDecoderForScalers</string>

                 <key>length</key>

                 <integer>-1</integer>

                 <key>variable</key>

                 <true/>

             </dict>

             <key>Shaper</key>

             <dict>

                 <key>dataId</key>

                 <integer>-2147483648</integer>

                 <key>decoder</key>

                 <string>ORShaperDecoderForShaper</string>

                 <key>length</key>

                 <integer>1</integer>

                 <key>variable</key>

                 <false/>

             </dict>

        </dict>

...

other data description entries omitted

...

</plist>


--note that the header is padded out to the nearest long word boundary with zeros.


Custom Data in Header


It is possible for a certain types of scripts to place custom data into the header. This can only be done if the script is executed automatically at certain points of the run start sequence. See RunControl.


Here is an example script:


function main() {

    doc = find(ORDocument);

    [doc addCustomRunParameters:"NormalThresholdFile.txt" forKey:@"thresholdFile"];


    anArray = @["param1","param2"]; //array

    [doc addCustomRunParameters: anArray forKey:"arrayTest"];


    aDictionary = @{"param1":11,"param2":1.222}; //dictionary

    [doc addCustomRunParameters: aDictionary forKey:"dictionaryTest"];

}


The resulting field will be in the “ObjectInfo” section under “Custom”:


<key>Custom</key>

<dict>

  <key>arrayTest</key>

  <array>

   <string>param1</string>

   <string>param2</string>

  </array>

  <key>dictionaryTest</key>

  <dict>

   <key>param1</key>

   <real>11</real>

   <key>param2</key>

   <real>1.222</real>

  </dict>

  <key>thresholdFile</key>

  <string>NormalThresholdFile.txt</string>

</dict>


Raw Data Format


The raw data immediately follows the xml header.


With the release of data format version 2 the old format was depreciated. ORCA remains able to decode the old format, but always generates data in the new format. The new format is more self describing and is not limited to 32 different types of data records.


Short form data format (always 32 bits)


xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx

^--------------------------------------- always set

^^^^ ^^--------------------------------- dataId from the data description

       ^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^- raw data


Long form data format (variable length)


xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  1st word of the record

^--------------------------------------- always clear.

^^^^ ^^^^ ^^^^ ^^----------------------- dataId from the data description

                 ^^ ^^^^ ^^^^ ^^^^ ^^^^- total length n of record (in longs)

xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  2nd word of the record, 1st word of data

..

..

xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  nth word of the record


If the most significant bit in the first word of a data record is set, then the class ID uses the top 6 bits and the length of the record is always one long word (4 bytes) in length. If the most significant bit is not set, then the class ID uses the top 14 bits and the length of the record in bytes (including this header word) is encoded in the bottom 18 bits. Note that the length is always the number of longs (not bytes) in the record. The raw data immediately follows the header word.


Note that the object ID's are dynamically assigned and may vary from configuration to configuration. Objects that can produce data that will fit into one long word will request to be allowed to use just one word, however if there are more than 32 short records assigned then objects will be assigned the long form. External analysis application have to assume that an object that normally puts out the short form could put out the long form and must be prepared to deal with either.


Some of the objects that can ship data:


ORRunModel

ORShaperModel

ORTrigger32Model

ORHPPulserModel

OR754ScopeModel


That is by no means a complete list. To find the data record format for a particular object look at that object’s help page. If there is no help page, the record formats are documented in the object’s decoder source code.


Old Format (versions < 2)


Note that the old data format is depreciated and is no longer written out by ORCA - ever! The following information is included for historical reasons. As of ORCA version 3.x, ORCA can still read and decode the old data format, but at some point in the future this capability will probably be removed.


Header


The header is completely ascii and is thus human readable. It consists three parts:


data type word

data description section

record description section (for data version 1 only!)

parameter section


Each of these parts will be discussed below.


Data Type Word


The orginal ORCA data form had this is set to be "DataStructure_LittleEndian" to give an idication of the how to decode the raw data section. Starting with data version 1 this is set to be "DataStructure version x", where x is the data version number.


Data Description Section


This section is started with "BeginDataDescriptionSection" and ended with "EndDataDescriptionSection". The content is a list of objects that have put data into the file and the size of the data records that those objects produce. A typical entry is:


ORShaperModel Shaper 0x18000000 0 4

^^^^^^^^^^^^^------------------------ data class object name

--------------^^^^^^----------------- data class descrition

---------------------^^^^^^^^^^------ data ID number

--------------------------------^---- variable length (1 == variable, 0 == fixed)

----------------------------------^-- record size in bytes (-1 if variable)


Data Class Object Name


The first field is the name of data producing class: 'ORShaperModel'. This is used internally by ORCA in forming the name of a data decoding object. Other applications can use this name combined with the class description and data ID to decide how to decode the record.


Data Class Description


The next field, 'Shaper', is a sub-label for the data record. Some objects put out several differant types of data records. Each will have a different data id. One thing to be aware of is that there is no ascii space between the data class object name and the data class description. But the data class object name is a C string, so it is followed by a '\0'.


Data Id Number format for versions 0 thru 1


The data id number is the number that is located in the 5 most signification bits of the first word a data record.


Note that the object ID's are dynamically assigned and may vary from configuration to configuration.


Note that this format of using five bits for the object ID puts a hard limit on the number of data producing objects that can be used in a particular experiment. This may be addressed as needed in future versions.


Variable Length and Data Size fields.


The next field can be '0' or '1'. A '0' means that the last field is the size of the raw data record that this object put out in bytes. If the object puts out a variable sized record the last two fields would be '1' followed by '-1'. If the data record is variable the size will be included in the data record itself.


Note that the current format of using five bits for the object ID puts a hard limit on the number of data producing objects that can be used in a particular experiment. This may be addressed as needed in future versions.


Record Description Section


This section was added starting with data version 1 but was depreciated and removed starting with version 2. In version 2, data records are treated the same as the data records that described in the Data Description Section. Data records are meant to be small information records that are put in the data stream on an ocasional basis.


This section is started with "BeginRecordDescriptionSection" and ended with "EndRecordDescriptionSection". The content is a list of objects that can put records into the file and the size of the records that those objects produce. A typical entry is:


ORShaperModel Scalers 0x00020000 1 -1


The first field is the name of data producing class: 'ORShaperModel'. The next field, 'Scalers', is a label for the record. The next field is the record ID, of which only the top 16 bits are used. The next field can be '0' or '1'. A '0' means that the last field is the size of the raw data record that this object puts out in bytes. If the object puts out a variable sized record the last two fields would be '1' followed by '-1'. If the data record is variable the size will be included in the data record itself.


Note that the record ID's are dynamically assigned and may vary from configuration to configuration.


One thing to be aware of is that there is no ASCII space between the class name and the record label. But the class name is a C string, so it is followed by a '\0'.


Parameter Section


This section is started with "BeginParameterSection" and ended with "EndParameterSection". The content is a list of parameter data. A typical entry is:


0x18000000 ORShaperModel Threshold 0 0 8 short 78


The first field is the object ID. This is the same ID that is in the data description section, with one exception. If the ID is 0xffffffff, the parameter is from an object that is not in the data description section. The second field is the object class name. The third field is the name of the parameter, followed by three hardware identification numbers. In this example, they encode the crate, card, and channel. The last two fields hold the parameter data type and the parameter value.


Raw Data


The data section begins with the C string "BeginData". The raw data format is set by each object that puts data into the data stream. In general, the data records are as compact as possible. See the individual objects to get the data formats (i.e. ORShaperModel)


The data section may also contain generic records that are differant from the regular raw data in that any object can place a record in the data stream. Records are meant to be small and only ocasionally inserted into the data. For an example, see the ORShaperModel's Scaler Record format. No matter what a generic record contains, it is wrapped with a generic record header.


Some of the objects that can ship data:


ORRunModel

ORShaperModel

ORTrigger32Model

NcdMuxModel

NcdModel

ORHPPulserModel

OR754ScopeModel