Git tricks

List of annotated commands to deal with file deletion.

Git is a bless but pushing repositories into the wild got me worried with two points: security and size. The following is a course a summary of my experiences with snippets found on other sites.

Size §

With time every git repository tends to get dirty. Going a direction and changing your mind the second after come at a cost in versionned environments: references are kept and take a lot of place, especially if your keep track of archives.

See how bad the situation is with:

git bundle create tmp.bundle --all  
du -sh tmp.bundle  
rm tmp.bundle  

We first need to detect which files are causing trouble. That comes in the following 3 commands:

git rev-list --objects --all | sort -k 2 > allfileshas.txt to get a list of all files in the history.

git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt to get a list of big files by decreasing size order.

Now get the real file names:

for SHA in `cut -f 1 -d\  < bigobjects.txt`;  
   echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}' >> bigtosmall.txt

Cloning is caring §

The remedy comes with the following list, though step #1 is probably the most important and cloning is just to make things clean with hard links. Clone locally your repo to have clean references and do:

Security §

