Git tricks

written the 13th Jun 2017

Git is a bless but pushing repositories into the wild got me worried with two points: security and size. The following is a course a summary of my experiences with snippets found on other sites.

Size

With time every git repository tends to get dirty. Going a direction and changing your mind the second after come at a cost in versionned environments: references are kept and take a lot of place, especially if your keep track of archives.

See how bad the situation is with:

git bundle create tmp.bundle --all  
du -sh tmp.bundle  
rm tmp.bundle  

We first need to detect which files are causing trouble. That comes in the following 3 commands:

git rev-list --objects --all | sort -k 2 > allfileshas.txt to get a list of all files in the history.

git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt to get a list of big files by decreasing size order.

Now get the real file names:

for SHA in `cut -f 1 -d\  < bigobjects.txt`;  
do  
   echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}' >> bigtosmall.txt
done;  

Cloning is caring

The remedy comes with the following list, though step #1 is probably the most important and cloning is just to make things clean with hard links. Clone locally your repo to have clean references and do:

  • git filter-branch --index-filter 'git rm --cached --ignore-unmatch <FILE NAME>' Removes the file from all revisions.
    • ex. git filter-branch --index-filter 'git rm --cached --ignore-unmatch **/subdirectory/*' All subdirectories that appears in multiple directories
    • ex. git filter-branch --index-filter 'git rm --cached --ignore-unmatch /subdirectory/*.jpg' All jpg files in this subdirectory.
    • ex. git filter-branch --index-filter 'git rm --cached --ignore-unmatch /subdirectory/*' All files in this subdirectory and consequently this subdirectory because git doesn’t support empty directories.
    • ex. git filter-branch --index-filter 'git rm --cached --ignore-unmatch subdirectory/**/subdirectory2/*' All subdirectory2 which are contained within the subdirectory
  • rm -rf .git/refs/original/ Remove git’s backup.
  • git reflog expire --expire=now --all Expires all the loose objects.
  • git fsck --full --unreachable Checks if there are any loose objects.
  • git repack -A -d Repacks the pack.
  • git gc --aggressive --prune=now Finally removes those objects.
  • git push --force [remote] master You will need to do a force push, because the remote will sort of think you went back in time, so just make sure you’ve pulled before you started all of this.

Security

Yet to come 😉