Cleaning up a git-lfs repository that is too big for GitHub
Answer a question
I have a git repository with the following structure
+ LICENSE
+ README.md
+ experiments
+ ... (large csv files stored with git-lfs)
+ reports
+ ... (pdf files stored with git-lfs)
+ demos
+ ... (small example scripts)
+ src
+ ... (main codebase)
+ tests
+ ... (unit tests)
My work involves running experiments and I use git-lfs to store the experiment results, both data (csv-files) and results (mostly data plots in pdf-form, pdf-presentations).
Recently, I ran a larger experiments and added a csv-file with several gigabytes of size to git-lfs. The official git-lfs implementation has no file size limit, so I thought it would be no problem to store this in git. I did however split the large csv into multiple small ones because I heard some git-lfs implementations have problems with files > 3GB.
Anyhow, adding this large file turned out to be a terrible mistake. Once everything was committed, I tried to push my changes to GitHub and got the following error message.
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Now, I am really not sure how to fix this. I am using GitHub to share the main code with other people, but the experiments don't need to be up there. So, my thought was to split the repository into two. One containing the main code (no git-lfs) and one containing the experiments and reports. There is no need to store the latter one on github, so I should be good using it locally.
So, for the github-repository, I would like to delete the folders experiments and reports completely, including their commit history. I would also like to remove git-lfs completely. On the other side, I would like to preserve the commit history for the rest of the repository.
Is that even possible? If so, how would I go about it? Which tools can I use?
Or, is this situation too messed up and I am better advised to start with a brand-new repository?
Answers
The problem here is not that Git LFS or GitHub can't handle your repository. It's simply that GitHub only provides 1 GB of free storage for Git LFS and you've used that already. If you want to store additional Git LFS data, then you'd need to pay for a data pack.
It is the case that on Windows, Git itself has a limitation that prevents it from using the normal smudge and clean mechanism to correctly create Git LFS files in the tree, but there are solutions for this, and this is not a problem on non-Windows systems. Git LFS handles this fine and this problem will go away automatically once Git itself is fixed.
However, in general, Git repositories are not good for storing the output of code, such as binary artifacts, with or without Git LFS. So you probably shouldn't store the PDF output in a repository at all. Storing it elsewhere, such as on an artifact server or a cloud bucket, is a better idea.
You cannot both remove Git LFS and not rewrite commit history. Adding or removing Git LFS from a repository for historical changes necessitates rewriting the repository, since Git LFS replaces large files with a small pointer file that refers to the object in question. Git LFS provides git lfs migrate import and git lfs migrate export to rewrite that history if you want to add or remove Git LFS.
If you want to rewrite the history to remove those directories altogether, then you'll need a tool like git filter-repo (which is an external tool). If all of your Git LFS files are stored in those directories, then rewriting the history will also remove Git LFS from your repository.
更多推荐


所有评论(0)