Working with (large) Git Repositories using Git LFS or VFS for Git

Martin Tirion
5 min readMar 23, 2021

--

Git is very good at keeping track of changes in text-based files like code. But it is not that good in tracking binary files. For instance, if you store a Photoshop image file (PSD) in a repository, with every change, the complete file is stored again in the history. This can make the history of the Git repo very large, which makes a clone of the repository more and more time consuming.

Even if your repository isn’t containing that much files, have a few of these binary files that change often can make your Git history explode. This will make a simple clone, push or pull a very long process.

Combining this problem with using a very large repository will make it even worse. And not all developer teams using a monorepository solution are always interested in all files in the repository while working a a section of the complete project.

This article discusses two approaches to these problems using Git LFS or VFS for Git.

Git LFS

A solution to work with (large) binary files is using Git LFS (or Git Large File System). This is an extension to Git and must be installed separately. And it can only be used with a repository platform that supports LFS. GitHub.com and Azure DevOps for instance are platforms that have support for LFS.

The way it works in short, is that a placeholder file is stored in the repo with information for the LFS system. Such a file looks something like this:

version https://git-lfs.github.com/spec/v1
oid a747cfbbef63fc0a3f5ffca332ae486ee7bf77c1d1b9b2de02e261ef97d085fe
size 4923023

The actual file is stored in a separate storage location. This way Git will track changes in this placeholder file, not the large file. The combination of using Git and Git LFS will hide this from the developer though. You will just work with the repository and files as before.

When working with these large files yourself, you’ll still see the git history grown on your own machine, as git will still start tracking these large files locally. But when you clone the repo, the history is actually pretty small. So it’s beneficial for others not working directly on the large files.

Installation and use LFS

Go to https://git-lfs.github.com and download and install the setup from there.

For every repository you want to use LFS, you have to go through these steps:

  • Setup LFS for the repo:
git lfs install
  • Indicate which files have to be considered as large files (or binary files). As an example, to consider all Photoshop files to be large:
git lfs track "*.psd"

There are more fine grained ways to indicate files in a folder and more. See the Git LFS Documentation.

With these commands a .gitattribute file is created which contains these settings and must be part of the repository. Make sure this is added to the repository as well.

From here on you just use the standard git commands to work in the repository. The rest will be handled by Git and Git LFS.

Problems with files over 128 MB?

Once we started using Git LFS, we encountered problems with files over 128 MB. A push with a changed binary file would hang for a long time and then crash with a 503 or 413 error. It turned out this had to do with the combination of the client, Azure DevOps and HTTP/1.2. That can cause these problems. This problem can be tackled by switching on the client machine to HTTP/1.1. This can be done with this command:

git config http.version HTTP/1.1

VFS for Git

Imagine a large repository with multiple projects in it, for instance for all kinds of features — a monorepository. A developer in a specific team is working on just one feature. He doesn’t require all files of the repo to work on that one feature. But normally with Git, you will clone all files in the repo.

VFS for Git (or Virtual File System for Git) solves this problem, as it will only download what you need to your local machine. But if you look in the file system, e.g. with Windows Explorer, it will show all the folders and files including the correct file sizes.

When you use VFS for Git for a repository a process is started that does all the work to hide that downloading.

The Git platform must support GVFS to make this work. GitHub.com and Azure DevOps both support this out of the box.

Installation and use VFS

Microsoft created VFS for Git and made it open source. It can be found at https://github.com/microsoft/VFSForGit. Currently it’s only available for Windows.

The necessary installers can be found at https://github.com/Microsoft/VFSForGit/releases

On the releases page you’ll find two important downloads:

  • Git 2.28.0.0 installer, which is a requirement for running VFS for Git. This is not the same as the standard Git for Windows install!
  • SetupGVFS installer.

Download those files and install them on your machine.

To be able to use VFS for Git for a repository, a .gitattributes file needs to be added to the repo with this line in it:

* -text

You have to make sure again that this file is added to your repository. To clone a repository to your machine using VFS for Git you use gvfs instead of git like so:

gvfs clone [URL] [dir]

Once this is done, you have a folder which contains a src folder which contains the contents of the repository. This is done because of a practice to put all outputs of build systems outside of this tree. This makes it easier to manage .gitignore files and to keep Git performant with lots of files.

For working with the repository you just use Git commands as before.

One thing to know is how to remove a VFS for Git repository from your machine. You also want to make sure the VFS process is stopped. This can be done by executing this command inside the main folder:

gvfs unmount

This will stop the process and unregister it. After that you can safely remove the folder.

Conclusion

Tools like Git LSF and VFS for Git can make your life easier with (large) binary files and/or large monorepositories. Currently however it is not possible to combine the two solutions. So make sure you select an option that fits your scenario best.

--

--

Martin Tirion

Senior Software Engineer at Microsoft working on Azure Services and Spatial Computing for enterprise customers around the world.