2. Detailed specification

  2.1. Overall conventions

  |  | <-- the vertical bars might be missing

  |  |

  Bytes stored within a computer do not have a "bit order", since
  they are always treated as a unit.  However, a byte consred as
  an integer between 0 and 255 does have a most- and least-
  significant bit, and since we write numbers with the most-
  significant digit on the left, we also write bytes with the most-
  significant bit on the left.  In the diagrams below, we number the
  bits of a byte so that bit 0 is the least-significant bit, i.e.,
  the bits are numbered:

  This document does not address the issue of the order in which
  bits of a byte are transmitted on a bit-sequential medium, since
  the data format described here is byte- rather than bit-oriented.

  Within a computer, a number may occupy multiple bytes.  All
  multi-byte numbers in the format described here are stored with
  the least-significant byte first (at the lower memory address).
  For example, the decimal number 520 is stored as:

  0  1
  ^  ^
  |  |
  |  + more significant byte = 2 x 256
  + less significant byte = 8

  2.2. File format
  A gzfile consists of a series of "members" (compressed data
  sets).  The format of each member is specified in the following
  section.  The members simply appear one after another in the file,
  with no additional information before, between, or after them.

  2.3. Member format
 成员格式:每个成员都 有如下的结构:
  Each member has the following structure:

  |ID1|ID2|CM |FLG|  MTIME  |XFL|| (more-->)

  (if FLG.FEXTRA set)

  | XLEN  |...XLEN bytes of "extra field"...| (more-->)

  (if FLG.FNAME set)

  |...original file name, zero-tenated...| (more-->)

  (if FLG.FCOMMENT set)

  |...file comment, zero-terminated...| (more-->)

  (if FLG.FHCRC set)

  | CRC16 |

  |...compressed blocks...| (more-->)

  0  1  2  3  4  5  6  7
  |  CRC32  |  ISIZE  |

  2.3.1. Member header and trailer
  ID1 (IDentification 1) 
  ID2 (IDentification 2)
 这两个字节是标识符用来识别gzip文件,有固定值:ID1 = 31,ID2 = 139;
  These have the fixed values ID1 = 31 (0x1f, 37), ID2 = 139
  (0x8b, 213), to identify the file as being in gformat.

  CM (Compression Method)
 这个字节标识了文件的压缩方式。CM = 0-7的值是被保留的,CM = 8 表示
  This identifies the compression method used in the file.  CM
  = 0-7 are reserved.  CM = 8 denotes the "deflate"
  compression method, which is the one customarily used by
  gzip and which is documented elsewhere.

  FLG (FLaGs)
  This flag byte is divided into individual bits as follows:

  bit 0  FTEXT
  bit 1  FHCRC
  bit 2  FEXTRA
  bit 3  FNAME
  bit 4  FCOMMENT
  bit 5  reserved
  bit 6  reserved
  bit 7  reserved

  If FTEXT is set, the file is probably ASCII text.  This is
  an optional indication, which the compressor may set by
  checking a small amount of the input data to see whether any
  non-ASCII characters are present.  In case of doubt, FTEXT
  is cleared, indicating binary data. For systems which have
  different file formats for ascii text and binary data, the
  decompressor can use FTEXT to choose the appropriate format.
  We deliberately do not specify the algorithm used to set
  this bit, since a compressor always has the option of
  leaving it cleared and a decompressor always has the option
  of ignoring it and letting some other program handle issues
  of data conversion.

  If FHCRC is set, a CRC16 for the gzip header is present,
  immediately before the compressed data. The CRC16 consists
  of the two least significant bytes of the CRC32 for all
  bytes of the gzip header up to and not including the CRC16.
  [The FHCRC bit was never set by versions of gzip up to
  1.2.4, even though it was documented with a different
  meaning in gzip 1.2.4.]

  If FEXTRA is set, optional extra fields are present, as
  described in a following section.

生成文件名的时候,文件名必须被转换到ISO LATIN-1字符集中。这个是被压缩的
  If FNAME is set, an original file name is present,
  terminated by a zero byte.  The name must consist of ISO
  8859-1 (LATIN-1) characters; on operating systems using
  EBCDIC or any other character set for file names, the name
  must be translated to the ISO LATIN-1 character set.  This
  is the original name of the file being compressed, with any
  directory components removed, and, if the file being
  compressed is on a file system with case insensitive names,
  forced to lower case. There is no original file name if the
  data was compressed from a other than a named file;
  for example, if the source was stdin on a system, there
  is no file name.

容不被解释,它只是被用来为人们所用。这部分内容必须包含有ISO 8859-1(LATIN-1)

  If FCOMMENT is set, a zero-terminated file comment is
  present.  This comment is not interpreted; it is only
  intended for human consumption.  The comment must consist of
  ISO 8859-1 (LATIN-1) characters.  Line breaks should be
  denoted by a single line feed character (10 decimal).

  Reserved FLG bits must be zero.

  MTIME (Modification TIME)
  This gives the most recent modification time of the original
  file being compressed.  The time is in Unix format, i.e.,
  seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
  may cause problems for MS-DOS and other systems that use
  local rather than Universal time.)  If the compressed data
  did not come from a file, MTIME is set to the time at which
  compression started.  MTIME = 0 means no time stamp is

  XFL (eXtra FLags)
  These flags are available for use by specific compression
  methods.  The "deflate" method (CM = 8) sets these flags as

  XFL = 2 - compressor used maximum compression,
  slowest algorithm
  XFL = 4 - compressor used fastest algorithm

  OS (Operating System)
  This identifies the type of file system on which compression
  took place.  This may be useful in determining end-of-line
  convention for text files.  The currently defined values are
  as follows:

  0 - filesystem (MS-DOS, OS/2, NT/)
  1 - Amiga
  2 - VMS (or OpenVMS)
  3 - Unix
  4 - VM/CMS
  5 - Atari TOS
  6 - HPFS filesystem (OS/2, NT)
  7 - Macintosh
  8 - Z-System
  9 - CP/M
  10 - TOPS-20
  11 - NTFS filesystem (NT)
  12 - QDOS
  13 - Acorn RIS
  255 - unknown

  XLEN (eXtra LENgth)
  If FLG.FEXTRA is set, this gives the length of the optional
  extra field.  See below for details.

  CRC32 (CRC-32)
  This contains a Cyclic Redundancy Check value of the
  uncompressed data computed according to CRC-32 algorithm
  used in the ISO 3309 standard and in section of
  ITU-T recommendation V.42.  (See for
  ordering ISO documents. See for an
  online version of ITU-T V.42.)

  ISIZE (Input SIZE)
  This contains the size of the original (uncompressed) input
  data modulo 2^32. Extra field

  If the FLG.FEXTRA bit is set, an "extra field" is present in
  the header, with total length XLEN bytes.  It consists of a
  series of subfields, each of the form:

  |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|

  SI1 and SI2 provide a subfield ID, typically two ASCII letters
  with some mnemonic value.  Jean-Loup Gailly> is maintaining a registry of subfield
  IDs; please send him any subfield ID you wish to use.  Subfield
  IDs with SI2 = 0 are reserved for future use.  The following
  IDs are currently defined:

  SI1  SI2  Data
  ----------  ----------  ----
  0x41 ('A')  0x70 ('P')  Apollo file type information

  LEN gives the length of the subfield data, excluding the 4
  initial bytes. Compliance
  A compliant compressor must produce files with correct ID1,
  ID2, CM, CRC32, and ISIZE, but may set all the other fields in
  the fixed-length part of the header to default values (255 for
  OS, 0 for all others).  The compressor must set all reserved
  bits to zero.

FTEXT 和 OS 而总是产生二进制的输。如果保留位非0,要给出错误提示,因为这一

  A compliant decompressor must check ID1, ID2, and CM, and
  provide an error indication if any of these have incorrect
  values.  It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
  at least so it can skip over the optional fields if they are
  present.  It need not examine any other part of the header or
  trailer; in particular, a decompressor may ignore FTEXT and OS
  and always produce binary output, and still be compliant.  A
  compliant decompressor must give an error indication if any
  reserved bit is non-zero, since such a bit could indicate the
  presence of a new field that would cause subsequent data to be
  interpreted incorrectly.

