Git tricks
List of annotated commands to deal with file deletion.
Git is a bless but pushing repositories into the wild got me worried with two points: security and size. The following is a course a summary of my experiences with snippets found on other sites.
Size §
With time every git repository tends to get dirty. Going a direction and changing your mind the second after come at a cost in versionned environments: references are kept and take a lot of place, especially if your keep track of archives.
See how bad the situation is with:
git bundle create tmp.bundle --all
du -sh tmp.bundle
rm tmp.bundle
We first need to detect which files are causing trouble. That comes in the following 3 commands:
git rev-list --objects --all | sort -k 2 > allfileshas.txt
to get a list of all files in the history.
git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt
to get a list of big files by decreasing size order.
Now get the real file names:
for SHA in `cut -f 1 -d\ < bigobjects.txt`;
do
echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print $1,$3,$7}' >> bigtosmall.txt
done;
Cloning is caring §
The remedy comes with the following list, though step #1 is probably the most important and cloning is just to make things clean with hard links. Clone locally your repo to have clean references and do:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <FILE NAME>'
Removes the file from all revisions.- ex.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch **/subdirectory/*'
All subdirectories that appears in multiple directories - ex.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch /subdirectory/*.jpg'
All jpg files in this subdirectory. - ex.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch /subdirectory/*'
All files in this subdirectory and consequently this subdirectory because git doesn’t support empty directories. - ex.
git filter-branch --index-filter 'git rm --cached --ignore-unmatch subdirectory/**/subdirectory2/*'
All subdirectory2 which are contained within the subdirectory
- ex.
rm -rf .git/refs/original/
Remove git’s backup.git reflog expire --expire=now --all
Expires all the loose objects.git fsck --full --unreachable
Checks if there are any loose objects.git repack -A -d
Repacks the pack.git gc --aggressive --prune=now
Finally removes those objects.git push --force [remote] master
You will need to do a force push, because the remote will sort of think you went back in time, so just make sure you’ve pulled before you started all of this.
Security §
Yet to come 😉