GIT — is a powerful tool for collaborating with the team and storing work that can be easily replayed in time. However, for a sad reason, a lot of people use the GUI tool for interacting with it and can’t uncover its full potential of this tool. Like in a famous meme people learn only to commit and push and if something happened, they prefer to reload copies except to solve problems. (In all cases it’s not the optimal way)
Why use the command line?
All GUI tools for git have been built on top of CLI, also nowadays they’ve become very rich, but they lack functionality from the original CLI. For simple tasks it’s not a big deal, but for complicated one missing functionality start to become vital.
GUI can sometimes create a layer of abstraction for the user, that becomes only available only in this proprietary tool putting boundaries on choosing software freely.
Last, but not least CLI commands can be automatized using bash on your demand. (Combining several commands you can gain a stunning result) With GUI — you’re fully dependent on vendor updates.
What is GIT?
GIT — Distributed Version Control System
Let’s unpack this. Distributed — means that storing of data is the responsibility of several computers, not a single one. For example, the server, our computer and collaborator Bob’s computer. Distributing doesn’t mean that our data will be synchronized.
Version Control System — every object (in our case file) have an established state in time (version) and we can manage this state and go back and forward replaying it. (control system).
How does GIT store information?
The key — SHA1
The value — BLOB
GIT stores the compressed data in a blob, along with metadata in a header:
This picture can be misleading, so you should understand that these four blocks are sequentially and do not lie on a disk as presented in the mnemonic picture.
Under the hood — GIT Hash-Object
Let’s ask git for SHA1 of content
> echo 'Hello, world!' | git hash-object --stdin
Let’s try with metadata
echo -e “blob 14\0Hello, world!” | openssl dgst -sha1 -binary | xxd -p
It’s a match with the previous result
Where does GIT store its data?
~/projects/sample ❯ git init Initialized empty Git repository in /Users/danga/projects/sample/.git/
.git the directory contains info about the repository
> echo 'Hello, world!' | git hash-object -w --stdin # -w flag means write af5626b4a114abcb82d63db7c8082c3c4756e51b > rm -rf .git/hooks # we don’t need this for now > tree .git .git ├── branches ├── config ├── description ├── HEAD ├── info │ └── exclude ├── objects │ ├── af │ │ └── 5626b4a114abcb82d63db7c8082c3c4756e51b │ ├── info │ └── pack └── refs ├── heads └── tags 9 directories, 5 files
We need more info
The block lacks information:
Git manages this information in a tree.
A tree contains pointers (using SHA1):
Other optimizations — Packfiles and deltas
git commit), or during a push to a remote
commit points to a tree
and contains metadata:
the SHA1 of the commit is the hash of all this information
A commit is a code snapshot
Commits under the hood — make a commit
> echo 'Hello world!' > hello.txt > git add hello.txt > git commit -m "initial commit" [master (root commit) 767c4fd] initial commit 1 file changed, 1 insertion(+) create mode 100644 hello.txt
Commits under the hood — look in .git objects
> tree .git/objects .git/objects ├── 76 │ └── 7c4fd32dbcff87f8e040156e1a95a59f4bf730 ├── 96 │ └── ee448816b927c395aa87a48734a41ab9a801b9 ├── af │ └── 5626b4a114abcb82d63db7c8082c3c4756e51b ├── cd │ └── 0875583aabe89ee197ea133980a9085d08e497 ├── info └── pack 6 directories, 4 files
Commits under the hood — looking at objects
cat .git/objects/cd/0875583aabe89ee197ea133980a9085d08e497 xK��OR04f�H���W(�/�IQ�IA�
Oops, remember that content is compressed.
Git cat-file -t (type) and -p (print the contents)
> git cat-file -t cd08 # -t print the type blob > git cat-file -p cd08 # -p print the contents Hello world! > git cat-file -t 96ee tree > git cat-file -p 96ee 100644 blob cd0875583aabe89ee197ea133980a9085d08e497 hello.txt > git cat-file -t 767c commit > git cat-file -p 767c tree 96ee448816b927c395aa87a48734a41ab9a801b9 author d9nich <firstname.lastname@example.org> 1661151102 +0300 committer d9nich <email@example.com> 1661151102 +0300 initial commit
Why we can’t change commits?
References — pointers to commit
References under the hood
> tree .git .git ├── HEAD └── refs ├── heads │ └── master └── tags
refs/heads are where branches live
we show only part of the tree command output in the example
> git log --oneline 767c4fd (HEAD -> master) initial commit > cat .git/refs/heads/master 767c4fd32dbcff87f8e040156e1a95a59f4bf730
/refs/heads/master contains which commit the branch points to
> cat .git/HEAD ref: refs/heads/master
HEAD is usually a pointer to the current branch