logo

oasis-root

Compiled tree of Oasis Linux based on own branch at <https://hacktivis.me/git/oasis/> git clone https://anongit.hacktivis.me/git/oasis-root.git

gitformat-chunk.5 (7564B)


  1. '\" t
  2. .\" Title: gitformat-chunk
  3. .\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
  4. .\" Generator: DocBook XSL Stylesheets v1.79.2 <http://docbook.sf.net/>
  5. .\" Date: 2025-03-14
  6. .\" Manual: Git Manual
  7. .\" Source: Git 2.49.0
  8. .\" Language: English
  9. .\"
  10. .TH "GITFORMAT\-CHUNK" "5" "2025-03-14" "Git 2\&.49\&.0" "Git Manual"
  11. .\" -----------------------------------------------------------------
  12. .\" * Define some portability stuff
  13. .\" -----------------------------------------------------------------
  14. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  15. .\" http://bugs.debian.org/507673
  16. .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
  17. .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  18. .ie \n(.g .ds Aq \(aq
  19. .el .ds Aq '
  20. .\" -----------------------------------------------------------------
  21. .\" * set default formatting
  22. .\" -----------------------------------------------------------------
  23. .\" disable hyphenation
  24. .nh
  25. .\" disable justification (adjust text to left margin only)
  26. .ad l
  27. .\" -----------------------------------------------------------------
  28. .\" * MAIN CONTENT STARTS HERE *
  29. .\" -----------------------------------------------------------------
  30. .SH "NAME"
  31. gitformat-chunk \- Chunk\-based file formats
  32. .SH "SYNOPSIS"
  33. .sp
  34. Used by \fBgitformat-commit-graph\fR(5) and the "MIDX" format (see the pack format documentation in \fBgitformat-pack\fR(5))\&.
  35. .SH "DESCRIPTION"
  36. .sp
  37. Some file formats in Git use a common concept of "chunks" to describe sections of the file\&. This allows structured access to a large file by scanning a small "table of contents" for the remaining data\&. This common format is used by the \fBcommit\-graph\fR and \fBmulti\-pack\-index\fR files\&. See the \fBmulti\-pack\-index\fR format in \fBgitformat-pack\fR(5) and the \fBcommit\-graph\fR format in \fBgitformat-commit-graph\fR(5) for how they use the chunks to describe structured data\&.
  38. .sp
  39. A chunk\-based file format begins with some header information custom to that format\&. That header should include enough information to identify the file type, format version, and number of chunks in the file\&. From this information, that file can determine the start of the chunk\-based region\&.
  40. .sp
  41. The chunk\-based region starts with a table of contents describing where each chunk starts and ends\&. This consists of (C+1) rows of 12 bytes each, where C is the number of chunks\&. Consider the following table:
  42. .sp
  43. .if n \{\
  44. .RS 4
  45. .\}
  46. .nf
  47. | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
  48. |\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|
  49. | ID[0] | OFFSET[0] |
  50. | \&.\&.\&. | \&.\&.\&. |
  51. | ID[C] | OFFSET[C] |
  52. | 0x0000 | OFFSET[C+1] |
  53. .fi
  54. .if n \{\
  55. .RE
  56. .\}
  57. .sp
  58. Each row consists of a 4\-byte chunk identifier (ID) and an 8\-byte offset\&. Each integer is stored in network\-byte order\&.
  59. .sp
  60. The chunk identifier \fBID\fR[\fBi\fR] is a label for the data stored within this file from \fBOFFSET\fR[\fBi\fR] (inclusive) to \fBOFFSET\fR[\fBi+1\fR] (exclusive)\&. Thus, the size of the \fBi\fR`th \fBchunk\fR \fBis\fR \fBequal\fR \fBto\fR \fBthe\fR \fBdifference\fR \fBbetween\fR `OFFSET[\fBi+1\fR] and \fBOFFSET\fR[\fBi\fR]\&. This requires that the chunk data appears contiguously in the same order as the table of contents\&.
  61. .sp
  62. The final entry in the table of contents must be four zero bytes\&. This confirms that the table of contents is ending and provides the offset for the end of the chunk\-based data\&.
  63. .sp
  64. Note: The chunk\-based format expects that the file contains \fIat least\fR a trailing hash after \fBOFFSET\fR[\fBC+1\fR]\&.
  65. .sp
  66. Functions for working with chunk\-based file formats are declared in \fBchunk\-format\&.h\fR\&. Using these methods provide extra checks that assist developers when creating new file formats\&.
  67. .SH "WRITING CHUNK\-BASED FILE FORMATS"
  68. .sp
  69. To write a chunk\-based file format, create a \fBstruct\fR \fBchunkfile\fR by calling \fBinit_chunkfile\fR() and pass a \fBstruct\fR \fBhashfile\fR pointer\&. The caller is responsible for opening the \fBhashfile\fR and writing header information so the file format is identifiable before the chunk\-based format begins\&.
  70. .sp
  71. Then, call \fBadd_chunk\fR() for each chunk that is intended for writing\&. This populates the \fBchunkfile\fR with information about the order and size of each chunk to write\&. Provide a \fBchunk_write_fn\fR function pointer to perform the write of the chunk data upon request\&.
  72. .sp
  73. Call \fBwrite_chunkfile\fR() to write the table of contents to the \fBhashfile\fR followed by each of the chunks\&. This will verify that each chunk wrote the expected amount of data so the table of contents is correct\&.
  74. .sp
  75. Finally, call \fBfree_chunkfile\fR() to clear the \fBstruct\fR \fBchunkfile\fR data\&. The caller is responsible for finalizing the \fBhashfile\fR by writing the trailing hash and closing the file\&.
  76. .SH "READING CHUNK\-BASED FILE FORMATS"
  77. .sp
  78. To read a chunk\-based file format, the file must be opened as a memory\-mapped region\&. The chunk\-format API expects that the entire file is mapped as a contiguous memory region\&.
  79. .sp
  80. Initialize a \fBstruct\fR \fBchunkfile\fR pointer with \fBinit_chunkfile\fR(\fBNULL\fR)\&.
  81. .sp
  82. After reading the header information from the beginning of the file, including the chunk count, call \fBread_table_of_contents\fR() to populate the \fBstruct\fR \fBchunkfile\fR with the list of chunks, their offsets, and their sizes\&.
  83. .sp
  84. Extract the data information for each chunk using \fBpair_chunk\fR() or \fBread_chunk\fR():
  85. .sp
  86. .RS 4
  87. .ie n \{\
  88. \h'-04'\(bu\h'+03'\c
  89. .\}
  90. .el \{\
  91. .sp -1
  92. .IP \(bu 2.3
  93. .\}
  94. \fBpair_chunk\fR() assigns a given pointer with the location inside the memory\-mapped file corresponding to that chunk\(cqs offset\&. If the chunk does not exist, then the pointer is not modified\&.
  95. .RE
  96. .sp
  97. .RS 4
  98. .ie n \{\
  99. \h'-04'\(bu\h'+03'\c
  100. .\}
  101. .el \{\
  102. .sp -1
  103. .IP \(bu 2.3
  104. .\}
  105. \fBread_chunk\fR() takes a
  106. \fBchunk_read_fn\fR
  107. function pointer and calls it with the appropriate initial pointer and size information\&. The function is not called if the chunk does not exist\&. Use this method to read chunks if you need to perform immediate parsing or if you need to execute logic based on the size of the chunk\&.
  108. .RE
  109. .sp
  110. After calling these methods, call \fBfree_chunkfile\fR() to clear the \fBstruct\fR \fBchunkfile\fR data\&. This will not close the memory\-mapped region\&. Callers are expected to own that data for the timeframe the pointers into the region are needed\&.
  111. .SH "EXAMPLES"
  112. .sp
  113. These file formats use the chunk\-format API, and can be used as examples for future formats:
  114. .sp
  115. .RS 4
  116. .ie n \{\
  117. \h'-04'\(bu\h'+03'\c
  118. .\}
  119. .el \{\
  120. .sp -1
  121. .IP \(bu 2.3
  122. .\}
  123. \fBcommit\-graph:\fR
  124. see
  125. \fBwrite_commit_graph_file\fR() and
  126. \fBparse_commit_graph\fR() in
  127. \fBcommit\-graph\&.c\fR
  128. for how the chunk\-format API is used to write and parse the commit\-graph file format documented in the commit\-graph file format in
  129. \fBgitformat-commit-graph\fR(5)\&.
  130. .RE
  131. .sp
  132. .RS 4
  133. .ie n \{\
  134. \h'-04'\(bu\h'+03'\c
  135. .\}
  136. .el \{\
  137. .sp -1
  138. .IP \(bu 2.3
  139. .\}
  140. \fBmulti\-pack\-index:\fR
  141. see
  142. \fBwrite_midx_internal\fR() and
  143. \fBload_multi_pack_index\fR() in
  144. \fBmidx\&.c\fR
  145. for how the chunk\-format API is used to write and parse the multi\-pack\-index file format documented in the multi\-pack\-index file format section of
  146. \fBgitformat-pack\fR(5)\&.
  147. .RE
  148. .SH "GIT"
  149. .sp
  150. Part of the \fBgit\fR(1) suite