Join us

Advanced GIT(Part 1) — What is GIT?

0_HrTf2mh44x9VktP4.jpg

GIT — is a powerful tool for collaborating with the team and storing work that can be easily replayed in time. However, for a sad reason, a lot of people use the GUI tool for interacting with it and can’t uncover its full potential of this tool. Like in a famous meme people learn only to commit and push and if something happened, they prefer to reload copies except to solve problems. (In all cases it’s not the optimal way)

1_0o9GZUzXiNnI4poEvxvy8g.png

XKCD Common usage of git tool.

Requirements

  • CLI that support UNIX styles commands
  • git version > 2.0 (For me current version is 2.34.0)
  • github.com account

Why use the command line?

All GUI tools for git have been built on top of CLI, also nowadays they’ve become very rich, but they lack functionality from the original CLI. For simple tasks it’s not a big deal, but for complicated one missing functionality start to become vital.

GUI can sometimes create a layer of abstraction for the user, that becomes only available only in this proprietary tool putting boundaries on choosing software freely.

Last, but not least CLI commands can be automatized using bash on your demand. (Combining several commands you can gain a stunning result) With GUI — you’re fully dependent on vendor updates.

What is GIT?

GIT — Distributed Version Control System

Let’s unpack this. Distributed — means that storing of data is the responsibility of several computers, not a single one. For example, the server, our computer and collaborator Bob’s computer. Distributing doesn’t mean that our data will be synchronized.

Version Control System — every object (in our case file) have an established state in time (version) and we can manage this state and go back and forward replaying it. (control system).

How does GIT store information?

  • At its core, GIT is like a key-value store.
  • The Value = Data
  • The Key = Hash of data
  • Having keys, you can retrieve files.

1_r6UqnCDVdNtCW46ltcgQTA.png

Key to content transformation

The key — SHA1

  • Cryptographic hash function.
  • Given a piece of data, it produces a 40-digit hexadecimal number.
  • This value always is the same if the given input is the same.

The value — BLOB

GIT stores the compressed data in a blob, along with metadata in a header:

  • identifier blob
  • size of content
  • \0 delimiter
  • content

1_zp3zILGClf_vlPQheJArXw.png

Schema of the git value

This picture can be misleading, so you should understand that these four blocks are sequentially and do not lie on a disk as presented in the mnemonic picture.

Under the hood — GIT Hash-Object

Let’s ask git for SHA1 of content

                > echo 'Hello, world!' | git hash-object --stdin
            

Result

                af5626b4a114abcb82d63db7c8082c3c4756e51b
            

Let’s try with metadata

                echo -e “blob 14\0Hello, world!” | openssl dgst -sha1 -binary | xxd -p
            

Result

                af5626b4a114abcb82d63db7c8082c3c4756e51b
            

It’s a match with the previous result

Where does GIT store its data?

                ~/projects/sample
❯ git init
Initialized empty Git repository in /Users/danga/projects/sample/.git/
            

.git the directory contains info about the repository

                > echo 'Hello, world!' | git hash-object -w --stdin 
# -w flag means write
af5626b4a114abcb82d63db7c8082c3c4756e51b
> rm -rf .git/hooks # we don’t need this for now
> tree .git
.git
├── branches
├── config
├── description
├── HEAD
├── info
│   └── exclude
├── objects
│   ├── af
│   │   └── 5626b4a114abcb82d63db7c8082c3c4756e51b
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags
9 directories, 5 files
            

We need more info

The block lacks information:

  • filenames
  • directory structure

Git manages this information in a tree.

Tree

A tree contains pointers (using SHA1):

  • to BLOBs
  • to trees

and metadata:

  • type of pointer (blob or tree)
  • filename or folder name
  • mode (executable file, symbolic link, …)

1_LqDZ4TTuBoTnoZecCpda_Q.png

Example of the tree structure

1_vt5AqwY6RrMk4QrjGLhJFw.png

Trees point to blobs and trees

1_OGUnbz-9RFcYuaHisnWrEw.png

Identical content is stored only once

Other optimizations — Packfiles and deltas

  • Git objects are compressed
  • As files change, their contents remain mostly similar.
  • Git optimizes for this by compressing these files together, into a Packfile
  • The Packfile stores the object, and “deltas”, or the differences between one version of the file and the next.
  • Packfiles are generated when, you have too many objects, during gc ( git commit), or during a push to a remote

Commits

Commit object

commit points to a tree

and contains metadata:

  • author and committer
  • date
  • message
  • parent commit (one or more)

the SHA1 of the commit is the hash of all this information

1_witoS0ubRKWkZ62WKnLSCw.png

Commit structure

1_Qj7f9zB6UtNhEeJBgt6F2w.png

Commits points to parent commits and trees

A commit is a code snapshot

Commits under the hood — make a commit

                > echo 'Hello world!' > hello.txt
> git add hello.txt
> git commit -m "initial commit" 
[master (root commit) 767c4fd] initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 hello.txt
            

Commits under the hood — look in .git objects

                > tree .git/objects
.git/objects
├── 76
│   └── 7c4fd32dbcff87f8e040156e1a95a59f4bf730
├── 96
│   └── ee448816b927c395aa87a48734a41ab9a801b9
├── af
│   └── 5626b4a114abcb82d63db7c8082c3c4756e51b
├── cd
│   └── 0875583aabe89ee197ea133980a9085d08e497
├── info
└── pack
6 directories, 4 files
            

Commits under the hood — looking at objects

                cat .git/objects/cd/0875583aabe89ee197ea133980a9085d08e497
xK��OR04f�H���W(�/�IQ�IA�
            

Oops, remember that content is compressed.

Git cat-file -t (type) and -p (print the contents)

                > git cat-file -t cd08 # -t  print the type
blob
> git cat-file -p cd08 # -p  print the contents
Hello world!
> git cat-file -t 96ee
tree
> git cat-file -p 96ee
100644 blob cd0875583aabe89ee197ea133980a9085d08e497 hello.txt
> git cat-file -t 767c
commit
> git cat-file -p 767c
tree 96ee448816b927c395aa87a48734a41ab9a801b9
author d9nich <54186666+d9nchik@users.noreply.github.com> 1661151102 +0300
committer d9nich <54186666+d9nchik@users.noreply.github.com> 1661151102 +0300
initial commit
            

Why we can’t change commits?

  • If you change any data about the commit, the commit will have a new SHA1 hash
  • Even if the files don’t change, the created date will

References — pointers to commit

  • Tags
  • Branches
  • HEAD — pointer to the current commit

1_q1aA3KprdNjgn6aoXq9wjg.png

References to commit

References under the hood

                > tree .git
.git
├── HEAD
└── refs
    ├── heads
    │   └── master
    └── tags
            

refs/heads are where branches live

we show only part of the tree command output in the example

                > git log --oneline
767c4fd (HEAD -> master) initial commit
> cat .git/refs/heads/master
767c4fd32dbcff87f8e040156e1a95a59f4bf730
            

/refs/heads/master contains which commit the branch points to

                > cat .git/HEAD
ref: refs/heads/master
            

HEAD is usually a pointer to the current branch


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies, reach more readers and earn rewards!

Join other developers and claim your FAUN account now!

Avatar

Danylo Halaiko

Student

@d9nich
My code will tell about me better than I about myself. If you want more, watch my LinkedIn profile.
User Popularity
40

Influence

3k

Total Hits

2

Posts