Guide to tito

TITO (Trace-In-Trace-Out) processes geophysical surveys on a trace-by-trace basis. It is part of the GU toolkit written at the Geotechnology Research Institute.

Introduction
The in module
The out module
The stats module
The thdr module
The z2t module
The t2z_int module
The t2z_ave module
The agc module
Developer's Guide

Introduction

Tito is designed to perform a sequence of actions, called modules, on geophysical surveys. Each trace is processed independently. Processing modules are available to read a trace from an input survey, write a trace to an output survey, and so on.

Tito has its own parameters:

job=in,out: which processing modules to run
help=none: which processing modules to print help for

Processing modules include:

in: read from input
out: write to output
stats: collect statistics from trace data
thdr: put values in output trace header
z2t: convert depth traces to time traces
t2z_int: converts input time traces into output depth traces based on a velocity function
t2z_ave: converts input time traces into output depth traces based on average velocity traces
agc: automatic gain control

In and Out Modules

A tito job will end whenever any one of its modules end. For instance, the 'in' module ends when end-of-survey is encountered.

Each processing module has its own parameters. 'In' and 'out' take the standard survey parameters to describe the input and output surveys, respectively. Using just 'in' and 'out' copies a survey. For example, a parameter file to copy a survey from a disk file to a tape could be:

	tito.job=in,out

	in.device=file
	in.names=/data/myfile

	out.device=tape
	out.names=mytape

If the input and output formats are different, the trace will be reformatted. For instance, if the above parameter file included:

	in.sample_type=ieee
	out.sample_type=ibm

then tito would convert the IEEE input traces to IBM output traces.

If no 'out' module is specified, no output will be written. This could be useful if all that is desired is statistics about the input survey, for example. Similarly, if no 'in' module is specified, no input data will be read. This could be used to create a test data set.

'In' ends the job when end-of-survey is encountered. 'Out' will not end the job.

Stats Module

The 'stats' module will print the minimum and maximum data values in the survey and give some idea of the sample distribution. It has two parameters which control the sample distribution chart:

stats.ninc=6: number of positive increments in distribution chart
stats.base=10: base to use for limits of distribution chart

Stats will not end the job.

Thdr Module

The 'thdr' module will put values into the output trace header. It has the following parameters:

thdr.trace_header=240: trace header length in bytes
thdr.map="": trace header map (see below)
thdr.values="": trace header values (see below)

The 'map' parameter has the format

"name loc,len [ ... ]"

where name is any of the following:

seqno: sequence number
nsamp: number of samples in the trace
pkey: primary key value (ex. line number)
skey: secondary key value (ex. shot number)
tkey: tertiary key value (ex. receiver offset)
cN: constant value N (ex. c0 is zero)

loc is the byte offset (starting from 1) of the value within the trace header, and len is the value's byte length (2 or 4). A name may appear more than once in the header map, in which case the value will be stored into more than one location.

For example, the parameter

	thdr.map="seqno 1,4
		  seqno 5,4
		  nsamp 115,2
		  c4000 117,2"

will put the sequence number in bytes 1-4 and 5-8, the number of samples in bytes 115-116, and the value 4000 in bytes 117-118.

The 'values' parameter has the format:

key first,last,incr [...]

where key is 'pkey', 'skey', or 'tkey', first is the first key value, last is the last key value, and incr is the key increment. For example, the parameter

	thdr.values="pkey 1,100,2
		     skey 5,50,1"

will create trace headers for lines 1 to 100 incremented by 2, where each line has shots 5 to 50 incremented by 1. The job will end when all lines and shots have been put into the headers. If no keys are specified, 'thdr' will not end the job.

As an example, the following parameter file will read a headerless IEEE data file and write out SEG-Y tapes:

	job=in,thdr,out

	in.device=file
	in.names=/me/myfile
	in.reel_headers=0
	in.trace_header=0
	in.sample_type=ieee
	in.nsamples=1500

	thdr.trace_header=240
	thdr.map="seqno 1,4 nsamp 115,2 pkey 109,2 skey 21,4"
	thdr.values="pkey 1,50,1 skey 1000,9000,100"

	out.device=tape
	out.names=mytape
	out.nkeys=2
	out.pkey_loc=109,2
	out.skey_loc=21,4

Z2t Module

The 'z2t' module will convert depth traces to time traces based on a trace-sequential velocity model. Z2t's parameters include:

z2t.z0=0.0: depth of first input sample
z2t.dz=1.0: depth increment
z2t.t0=0.0: time corresponding to first input sample
z2t.dt=1.0: time increment
z2t.tsamples=0: number of time samples to create
z2t.nte=1: number of input traces per velocity trace

In addition, the velocity model is specified as a survey, described by the standard survey parameters. These parameters should also be prefixed by 'z2t.'.

Input velocity traces must be interval velocities as a function of depth. The sample rate dz must be the same as that of the input survey.

Z2t has a limitation: the input depth trace will be truncated to the the minimum number of samples of the velocity and output time traces. This means that the 'tsamples' parameter should be at least the number of input depth samples and that ideally z2t.nsamples will be the same as the number of input depth samples.

T2z_int Module

The 't2z_int' module will convert input time traces into output depth traces based on a velocity function. T2z_int's parameters include:

t2z_int.z0=0.0: first depth request
t2z_int.dz=1.0: depth increment
t2z_int.t0=0.0: first time in Time data sample
t2z_int.dt=1.0: time increment
t2z_int.zsamples=0: number of depth samples to create
(should be >= number of input time samples)

In addition, the velocity model is specified as a survey, described by the standard survey parameters. These parameters should also be prefixed by 't2z_int.'.

Input velocity traces must be interval velocities as a function of depth. The sample rate dz must be the same as that of the output survey.

T2z_ave Module

The 't2z_ave' module will convert input time traces into output depth traces based on average velocity traces. T2z_ave's parameters include:

t2z_ave.dz=1.0: depth increment
t2z_ave.dt=1.0: time increment
t2z_ave.zsamples=0: number of depth samples to create
(should be >= number of input time samples)

In addition, the velocity model is specified as a survey, described by the standard survey parameters. These parameters should also be prefixed by 't2z_ave.'.

Input velocity traces must be average velocities as a function of two-way travel time. The sample rate dt must be the same as that of the input survey.

Agc Module

The AGC module performs automatic gain correction on its input data. Its parameters are:

mean=127.5: mean data value after AGC.
tolerance=1e-30: minimum data value to consider zero.
window=0: AGC window as a number of samples. If zero, no AGC will be performed.

Developer's Guide

TITO is designed for addition of new modules. All modules have a similar source code format which can be copied to provide a template. If you are developing a new TITO module, I suggest you copy an existing one and modify it to do what you want. This guide will explain what the different sections of TITO module source code do.

When adding a new module, you need to create a source code file for the module, modify the Makefile, and modify the configuration header file.

Create a source code file for the module. This is explained in detail later. For now, copy an existing module. The name of the new source code file should be the name of the module (for example, the name of the 'stats' module's source code file is 'stats.c').
Add the module's source code and object code file names to the Makefile. There are two lines that have the format
```
	SRCS = tito.c stats.c ...
	OBJS = tito.o stats.o ...
```
Just add your module's .c and .o files to the ends of these lists.
Edit the file tito_config.h to add your module. This is really simple, just follow the examples. There will be a part with declarations such as
```
	extern TitoModuleRec titoStatsRec;
```
Just add a similar line for your module, substituting your module's name for "Stats". There will also be a list of modules with entries such as
```
	&titoStatsRec,
```
Again, add a similar line, substituting your module's name for "Stats". In all of these cases, order is unimportant.

The Module Source Code File

Your module's source code file will have five basic parts. The first will be a static definition part defining your module's "module record." The remaining parts will be your module's help routine, initialization routine, main routine, and clean-up routine.

The first thing to do is include the "tito.h" header file which defines the module record type and such. "tito.h" includes "gu.h", and tito links with the GU library, so your module has full access to GU library routines (see the "Guide to the GU Library" for details). Then you will declare your module's routines and define its module record. See the example below.

	/* include GU library and TITO structures */
#include "tito.h"

	/* declare module routines */
/*
	Notes: the names don't matter as long as you keep track
	of them. Everything here is listed in ANSI C for
	clarity but tito currently uses K&R C for hysterical
	reasons.
*/
static int Initialize(TitoModule, TitoHeaders*, char**);
static int Run(TitoModule, int[]);
static void Cleanup(TitoModule);
static void Help(void);

	/* define the module record */
TitoModuleRec titoStatsRec = {
	"stats",			/* module name */
	"collect trace statistics",	/* short description */
	Initialize,			/* preproc */
	Run,				/* runproc */
	Cleanup,			/* postproc */
	Help				/* helpproc */
};

Write a similar section of code for your module. Note that "titoStatsRec" was the name specified in "tito_config.h". Just replace "Stats" with your module's name.

The module record has six fields that need to be statically initialized. The first is the module's name as it will be called by the user (this should be the same name that you used for your source code file name and for your module record name). The second field is a short (less than about twenty characters) description of the module for an abbreviated help message. The remaining four fields are the module's initialization routine, main routine, clean-up routine and help routine. These should be the routines declared earlier in the file.

The static definition part of your source code file may also define any private definitions needed, including static variables, GU parameter lists and structure definitions.

The help routine simply prints out help for a user who requests it. It has the form

static void Help(void)

A module's initialization routine initializes data private to the module, opening any devices and allocating any memory needed. The initialization routine is called once per job, before any traces have been processed. It has the form

static int Initialize(TitoModule mod, TitoHeaders *hdrs,
	char **argv)

The first argument is a pointer to the module's module record (TitoModule is equivalent to TitoModuleRec*). This is the same record statically defined earlier in the source code file, but now other fields will be dynamically initialized (explained below).

The last argument is the command-line argument vector for the job, allowing the module to use parameters specified by the user. Tito uses the GU parameter format. See existing modules for examples and the Guide to the GU Library for complete documentation.

The initialization routine may use or modify information about the current state of the survey's reel headers. The second argument is a pointer to a structure containing the following fields:

int nhdrs;	/* number of reel headers on the survey */
int *hdrsize;	/* size of each reel header in bytes */
char *hdr;	/* contents of all reel headers */

If the nhdrs field if zero, hdrsize and hdr will be NULL, otherwise they will be allocated appropriately. If the preproc changes nhdrs, it should realloc or free hdrsize and hdr appropriately.

The preproc must initialize certain fields of the module record, called trace requirements. The following fields of the TitoModule structure must be set:

GuType sample_type; /* sample type required by module */
int nsamples;       /* number of samples in the trace */
int trace_header;   /* trace header length */
int nkeys;          /* number of keys in trace header */

Tito initializes the above fields to those of the previous module before calling the preproc, so that modules who aren't interested in a particular value can safely ignore it.

The preproc may also allocate private data for the module's use. This data should be hooked to the module through the following field of the TitoModule structure:

	char *private;

which can be assigned to the address of any dynamically allocated variable or structure.

The preproc should return 0 on success and -1 on error.

All modules must specify a preproc, even if it does nothing.

The module's main routine does whatever the module is supposed to do to a single input trace. The main routine is called many times in a job, once per trace. The main routine has the form

static int Run(TitoModule mod, int value[])

where 'mod' is the pointer to the module's module record. The module record fields initialized by the module's initialization routine are available for use by the main routine. The main routine should not change any module record fields other than the 'private' field.

What modules commonly do is define a structure containing all the private data they will need. The module's initialization routine then allocates an instance of this structure, assigning the pointer to the private field. The main routine can cast the private field to the appropriate pointer type and use the private data.

The main routine also has access to another field in the module record:

	char *trace;

which contains the contents of the trace to be processed. For example, 'in' reads a trace into this space, and 'out' writes from this space.

The second argument 'value' contains the header keys for the trace. Tito does not use this field, it is provided for the modules to do whatever they want. For instance, the 'in' module replaces what is in value[] with the keys from the input trace, and the 'out' module puts what is in value[] into the trace header before writing the trace.

The main routine should return 0 to continue the job, -1 to stop the job with an error condition, and +1 to end the job successfully. Note that the clean-up routine (see below) will be called whenever the job ends, whether on error or success.

The clean-up routine takes care of any loose ends, for example, closing open devices and freeing dynamic memory. The clean-up routine is called once per job after all traces have processed. It has the form

static void Cleanup(TitoModule Mod)

The TITO Source Code File

Module writers don't need to worry about the tito source code file. However, if someone wants to extend tito's basic functionality, fix bugs that are detected, or some such, they will need to modify the basic tito source code file 'tito.c'.

There are some implementation problems arising from tito's versatility. A few of these have not been satisfactorily solved.

One problem is the allocation of space for traces and the subsequent copying involved. Because each module can have a different trace size (resulting from a different trace header size for example), tito currently allocates separate space for each module. This means that the trace must be copied between each module. This is fine if the trace really does change, but if the module does not change the trace size, there is no reason to waste memory for another trace or to waste time copying an unchanged trace. Ideally, tito would check during initialization whether a new trace is actually called for or not. This is further complicated by the fact that some modules require native floating-point samples for calculations, but don't actually change the data besides converting it. Ideally, tito should keep track of which modules change the trace and which sample types the trace is available in.

A second problem arises from the fact that any module sequence may be specified. The problem is: when does tito stop? Some modules (such as "in") have logical stopping points while others (such as "stats") do not. If exactly one module has a stopping point, all is fine and dandy, but what if there are no modules with stopping points? What if there is more than one? Does it stop when any module stops, or when the first module in the list stops, or what? Currently, if the user does not include a module with a stopping point, the job simply will go on forever (or more accurately, until an error causes it to crash). If there is more than one stopping module, the job stops when any module stops. Ideally, tito should detect and reject infinite jobs, and should have clear reasons for choosing one stopping policy over another.

The GU library has varying degrees of complexity. The "device" level as represented by the GuDevice object is the lowest level, and handles local and remote tape and file I/O. A second level, the "survey" level represented by GuSurvey, handles quality control, statistics and some format handling. TITO can be considered a third level, providing job control.

How much overhead do the various layers add?

The overhead can be examined by running four commands. The "cp" program is built on UNIX system commands and only copies disk files. The "gudcp" program is built on the GU device layer and only copies local disk files. The "cpsegy" program is built on the GU survey layer and can copy surveys of any type. Finally, "tito" is also built on the GU survey layer but adds job control.

The following timings were made on May 5, 1994, during normal business hours, on GTRI's Silicon Graphics Challenge workstation. A large (270MB) file was copied from and to a local Seagate Elite disk drive. Another user was using the same disk at the same time. Each program was run three times. The results shown are an average of the three runs. The largest variation between runs of the sample program was 15 elapsed seconds. Given that variation, the results show that while the overhead is identifiable, it has an insignificant effect on elapsed time.

	|-----------------------------------------|
	| Program | CPU seconds	| Elapsed seconds | MB/sec
	|---------+-------------+-----------------|
	| cp      |     27.1    |      191.6      | 1.4
	|---------+-------------+-----------------|
	| gudcp   |     31.7    |      183.7      | 1.5
	|---------+-------------+-----------------|
	| cpsegy  |     32.5    |      188.6      | 1.4
	|---------+-------------+-----------------|
	| tito    |     38.0    |      184.6      | 1.5
	|-----------------------------------------|