Walter Kessinger: The SEG-Y Format

The SEG-Y Format for Geophysical Data

This page is terribly out-of-date. It refers to the original SEG-Y specification from 1975.

The SEG-Y format was updated in 2002. As of this writing, the revised SEG-Y standard is available on the SEG web site as SEG-Y rev. 1.

The SEG site has an index of various technical standards for recording and archiving geophysical data.

Written by Ken Gaillot Jr.
Last updated 10 June 1994
Based on Digital Tape Standards published by the Society of Exploration Geophysicists (SEG)

SEG-Y

The SEG-Y format is one of several tape standards developed by the Society of Exploration Geophysicists (SEG). It is the most common format used for seismic data in the exploration and production industry. However, it was created in 1973 and many different 'modernized' flavors exist.

SEG-Y was designed for storing a single line of seismic data on IBM 9-track tapes attached to IBM mainframe computers. Most of the variations in modern SEG-Y varieties result from trying to overcome these limitations.

Some of the features of SEG-Y which are outdated today include:

EBCDIC descriptive header (rather than the now-standard ASCII)
IBM floating-point data (rather than the now-standard IEEE)
single line storage (rather than the now-common 3D surveys)

The official standard SEG-Y consists of the following components:

a 3200-byte EBCDIC descriptive reel header record
a 400-byte binary reel header record
trace records consisting of
- a 240-byte binary trace header
- trace data

As mentioned earlier there are many variations of the standard.

The SEG-Y EBCDIC Reel Header

The EBCDIC reel header is equivalent to 40 IBM punch-cards (EBCDIC? punchcards? Welcome to the 70's, man!). The official layout of these 80-character cards is the EBCDIC equivalent of the following:

12345678901234567890123456789012345678901234567890123456789012345678901234567890
C 1 CLIENT                        COMPANY                       CREW NO
C 2 LINE            AREA                        MAP ID
C 3 REEL NO           DAY-START OF REEL     YEAR      OBSERVER
C 4 INSTRUMENT: MFG            MODEL            SERIAL NO
C 5 DATA TRACES/RECORD        AUXILIARY TRACES/RECORD         CDP FOLD
C 6 SAMPLE INTERVAL         SAMPLES/TRACE       BITS/IN      BYTES/SAMPLE
C 7 RECORDING FORMAT        FORMAT THIS REEL        MEASUREMENT SYSTEM
C 8 SAMPLE CODE: FLOATING PT     FIXED PT     FIXED PT-GAIN     CORRELATED
C 9 GAIN  TYPE: FIXED     BINARY     FLOATING POINT     OTHER
C10 FILTERS: ALIAS     HZ  NOTCH     HZ  BAND     -     HZ  SLOPE    -    DB/OCT
C11 SOURCE: TYPE            NUMBER/POINT        POINT INTERVAL
C12     PATTERN:                           LENGTH        WIDTH
C13 SWEEP: START     HZ  END     HZ  LENGTH      MS  CHANNEL NO     TYPE
C14 TAPER: START LENGTH       MS  END LENGTH       MS  TYPE
C15 SPREAD: OFFSET        MAX DISTANCE        GROUP INTERVAL
C16 GEOPHONES: PER GROUP     SPACING     FREQUENCY     MFG          MODEL
C17     PATTERN:                           LENGTH        WIDTH
C18 TRACES SORTED BY: RECORD     CDP     OTHER
C19 AMPLITUDE RECOVERY: NONE      SPHERICAL DIV       AGC    OTHER
C20 MAP PROJECTION                      ZONE ID       COORDINATE UNITS
C21 PROCESSING:
C22 PROCESSING:
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40 END EBCDIC

The blank spaces in the cards are fill-in-the-blanks. For example, the client's name is intended to go in the space after 'CLIENT' in the first card. Multiple-choice entries like 'SAMPLE CODE' in card 8 are intended to have the appropriate choice (such as 'FLOATING PT') marked with an 'X'.

Cards 21 through 40 are intended for general descriptions such as the data set's processing history.

The SEG-Y Binary Reel Header

The binary reel header contains much information about the data. Much of this information is optional, that is, the entire header is not required to be valid. In fact, none of it is required to be valid, although some fields are strongly recommended.

The 400 bytes contain 2-byte and 4-byte integers in the following layout:

Bytes         Description

001 - 004     Job identification number.
005 - 008  *  Line number.
009 - 012  *  Reel number.
013 - 014  *  Number of data traces per record.
015 - 016  *  Number of auxiliary traces per record.
017 - 018  *  Sample interval of this reel's data in microseconds.
019 - 020     Sample interval of original field recording in microseconds.
021 - 022  *  Number of samples per trace for this reel's data.
023 - 024     Number of samples per trace in original field recording.
025 - 026  *  Data sample format code:
                  1 = 32-bit IBM floating point
                  2 = 32-bit fixed-point (integer)
                  3 = 16-bit fixed-point (integer)
                  4 = 32-bit fixed-point with gain code (integer)
027 - 028  *  CDP fold (expected number of data traces per ensemble).
029 - 030     Trace sorting code:
                  1 = as recorded
                  2 = CDP ensemble
                  3 = single fold continuous profile
                  4 = horizontally stacked
031 - 032     Vertical sum code (1 = no sum, 2 = two sum, ...)
033 - 034     Sweep frequency at start in Hertz.
035 - 036     Sweep frequency at end in Hertz.
037 - 038     Sweep length in milliseconds.
039 - 040     Sweep type code:
                  1 = linear
                  2 = parabolic
                  3 = exponential
                  4 = other
041 - 042     Trace number of sweep channel.
043 - 044     Sweep trace taper length at start in milliseconds.
045 - 046     Sweep trace taper length at end in milliseconds.
047 - 048     Taper type code:
                  1 = linear
                  2 = cosine squared
                  3 = other
049 - 050     Correlated data traces (1 = no, 2 = yes).
051 - 052     Binary gain recovered (1 = yes, 2 = no).
053 - 054     Amplitude recovery method code:
                  1 = one
                  2 = spherical divergence
                  3 = AGC
                  4 = other
055 - 056  *  Measurement system (1 = meters, 2 = feet).
057 - 058     Impulse signal polarity (increase in pressure or upward
              geophone case movement gives 1=negative or 2=positive number).
059 - 060     Vibratory polarity code (seismic lags pilot signal by):
                  1 = 337.5 to 22.5 degrees
                  2 = 22.5 to 67.5 degrees
                  3 = 67.5 to 112.5 degrees
                  4 = 112.5 to 157.5 degrees
                  5 = 157.5 to 202.5 degrees
                  6 = 202.5 to 247.5 degrees
                  7 = 247.5 to 292.5 degrees
                  8 = 292.5 to 337.5 degrees
061 - 400     Unassigned (for optional information).

* strongly recommended

The SEG-Y Trace Header

The 240-byte binary trace header consists of 2-byte and 4-byte integers in the following layout:

Bytes         Description

001 - 004  *  Trace sequence number within line.
005 - 008     Trace sequence number within reel.
009 - 012  *  Original field record number.
013 - 016  *  Trace sequence number within original field record.
017 - 020     Energy source point number.
021 - 024     CDP ensemble number.
025 - 028     Trace sequence number within CDP ensemble.
029 - 030  *  Trace identification code:
                  1 = seismic data
                  2 = dead
                  3 = dummy
                  4 = time break
                  5 = uphole
                  6 = sweep
                  7 = timing
                  8 = water break
                  9+ = optional use
031 - 032     Number of vertically summed traces yielding this trace.
033 - 034     Number of horizontally stacked traced yielding this trace.
035 - 036     Data use (1 = production, 2 = test).
037 - 040     Distance from source point to receiver group.
041 - 044     Receiver group elevation.
045 - 048     Surface elevation at source.
049 - 052     Source depth below surface.
053 - 056     Datum elevation at receiver group.
057 - 060     Datum elevation at source.
061 - 064     Water depth at source.
065 - 068     Water depth at receiver group.
069 - 070     Scalar for elevations and depths (+ = multiplier, - = divisor).
071 - 072     Scalar for coordinates (+ = multiplier, - = divisor).
073 - 076     X source coordinate.
077 - 080     Y source coordinate.
081 - 084     X receiver group coordinate.
085 - 088     Y receiver group coordinate.
089 - 090     Coordinate units (1 = length in meters or feet, 2 = arc seconds).
091 - 092     Weathering velocity.
093 - 094     Subweathering velocity.
095 - 096     Uphole time at source.
097 - 098     Uphole time at receiver group.
099 - 100     Source static correction.
101 - 102     Receiver group static correction.
103 - 104     Total static applied.
105 - 106     Lag time between end of header and time break in milliseconds.
107 - 108     Lag time between time break and shot in milliseconds.
109 - 110     Lag time beteen shot and recording start in milliseconds.
111 - 112     Start of mute time.
113 - 114     End of mute time.
115 - 116  *  Number of samples in this trace.
117 - 118  *  Sample interval of this trace in microseconds.
119 - 120     Field instrument gain type code:
                  1 = fixed
                  2 = binary
                  3 = floating point
                  4+ = optional use
121 - 122     Instrument gain constant.
123 - 124     Intrument early gain in decibels.
125 - 126     Correlated (1 = no, 2 = yes).
127 - 128     Sweep frequency at start.
129 - 130     Sweep fequency at end.
131 - 132     Sweep length in milliseconds.
133 - 134     Sweep type code:
                  1 = linear
                  2 = parabolic
                  3 = exponential
                  4 = other
135 - 136     Sweep taper trace length at start in milliseconds.
137 - 138     Sweep taper trace length at end in milliseconds.
139 - 140     Taper type code:
                  1 = linear
                  2 = cosine squared
                  3 = other
141 - 142     Alias filter frequency.
143 - 144     Alias filter slope.
145 - 146     Notch filter frequency.
147 - 148     Notch filter slope.
149 - 150     Low cut frequency.
151 - 152     High cut frequency.
153 - 154     Low cut slope.
155 - 156     High cut slope.
157 - 158     Year data recorded.
159 - 160     Day of year.
161 - 162     Hour of day (24-hour clock).
163 - 164     Minute of hour.
165 - 166     Second of minute.
167 - 168     Time basis (1 = local, 2 = GMT, 3 = other).
169 - 170     Trace weighting factor for fixed-point format data.
171 - 172     Geophone group number of roll switch position one.
173 - 174     Geophone group number of first trace of original field record.
175 - 176     Geophone group number of last trace of original field record.
177 - 178     Gap size (total number of groups dropped).
179 - 180     Overtravel associated with taper (1 = down/behind, 2 = up/ahead).
181 - 240     Unassigned (for optional information).

* strongly recommended

The SEG-Y Trace Data

Seismic data is acquired by generating a loud sound at one location and recording the resulting rumblings at another location.

The source or shot which generates the sound is typically an explosion or vibration at the Earth's surface (land or sea). Each shot is recorded by many receivers. Generally a line of shots is fired. If one line is recorded, the data is a 2D survey, and if more than one line is recorded, the data is a 3D survey.

The object of recording is to infer geological subsurface structure from the strength (amplitude) of the recorded signal at different times in the recording.

A trace begins life as the recording from one receiver. The recording is sampled at some discrete interval, typically around 4 milliseconds, and lasts for some duration, typically 4 or more seconds. After the initial recording, the traces are processed in any number of ways. This processing usually changes the absolute amplitudes such that amplitude units are irrelevant, and only relative amplitudes are significant. Also the trace may reflect a logical ordering different from the original (shot,receiver) pair.

But in the end, seismic data is almost always stored as a sequence of traces, each trace consisting of amplitude samples for one location (physical or logical).

SEG-Y Variations

Many variations of SEG-Y exist, most created to overcome SEG-Y's limitations.

The EBCDIC reel header is usually completely ignored, and when it is used, it may or may not follow the standard template, and it may even be in ASCII format.

The binary reel header is almost completely ignored. None of the fields should be assumed to be correct, although the number of samples per trace and the sample rate usually are. Often, programs that use the SEG-Y format will read values from the binary header by default but allow the user to override the header values.

The trace header contains important information but not always in the locations specified by the standard. To adapt SEG-Y for 3D surveys, a line number field is often added somewhere in the trace header. Programs that use values from the SEG-Y trace header usually allow the user to specify the byte location and length of the values.

The trace data is most often in 32-bit IBM floating point format. Occasionally 32-bit IEEE floating-point format is used.

Although the standard applies only to tapes, SEG-Y has been adapted for storing surveys on disk as well. Disk files have no record marks or file marks, so traditional methods of reading from tapes don't work with files. There are several SEG-Y disk adaptations: a binary file with 3200-byte and 400-byte headers followed by traces, a binary file with just traces, and Fortran sequential-access files which have 3200-byte and 400-byte headers followed by traces but with Fortran record marks separating them. A similar and common format is a flat file of trace data with no reel or trace headers.