1. 背景

git clone远程代码仓库时,大家时候只想下载其中某个目录或子目录。比如基于mindspore/models开源开发,实现widedeep模型训练任务时,需要下载其“official/recommend/wide_and_deep”目录代码到本地,做为代码基准,来实现自己的训练和推理业务。
习惯的做法时git clone整个工程到本地。如果有版本诉求需要精准控制代码的commitID,那么可以git reset --hard XXXX(commitID)后,把指定目录提取出来做为开发代码基准。这种做法在克隆少数次数时感觉不到什么问题,但如果要实现mindspore框架几十个训练模型,难道要同一个工程下载几十遍?mindspore工程代码量不大,如果代码仓库是1G以上,你的网络下载速度又不高,那问题就来了。
那么又有人问了,我下载一次仓库代码到本地,分别去取需要的各个模型目录,不是更效率更高?这个时候如果你各个模型代码基准版本都相同,没问题。如果版本不同(截取commitID不同,比如mindspore r1.1, r1.2, r1.3,r1.4,r1.5,r1.6),只下载一份仓库代码就不够用了。
本文章着重解决以下问题:

  • 要求精准克隆。只需要远程代码仓库的某个目录代码或嵌套子目录代码。类似行为多于1次。时间成本和存储成本有考虑的必要
  • 下载的指定目录有版本控制的需求。不总是最新代码

2.实现的思路

通过网络搜索,有很多文章,鱼目混珠。有的说用tortoisesvn,可以实现,但丢失了git信息,无法进行版本控制。有的用git实现了,但未介绍下载嵌套子目录的方法。实际上仓库根目录下目录和嵌套子目录实现方法一样。
Git1.7.0以后加入了Sparse Checkout模式,该模式可以实现Check Out指定文件或者文件夹。
以下实现示例过程以码云https://gitee.com/mindspore/models代码仓库wide_and_deep r1.5 训练模型代码检出做为示范例子。文末有示例脚本附件。
主要步骤:

# create test folder
midir test;cd test
# necessary step for git version control
git init
git remote add  origin https://gitee.com/mindspore/models.git
git config core.sparsecheckout true
# only need given specified nested subdirectory of "wide_and deep”
echo 'official/recommend/wide_and_deep' >> .git/info/sparse-checkout
# 以master分支为例,可以更换其它分支
git pull origin master
# 代码控制在支持mindspore r1.5版本的指定commitID
git reset --hard 5a4ff4e3dc9bcb46dbb71b6b16fbadbb68c5e8dc

特别说明:当工程目录是根目录时,根目录名字前后加“/”,如“/build/”,会精确匹配根目录,否则会匹配上其它目录的同名目录,产生冗余下载。

3.实现过程记录

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp
$ mkdir test

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp
$ cd test

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test
$ git init
Initialized empty Git repository in E:/codes/tmp/test/.git/

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master)
$ git remote add  origin https://gitee.com/mindspore/models.git

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master)
$ git remote -v
origin  https://gitee.com/mindspore/models.git (fetch)
origin  https://gitee.com/mindspore/models.git (push)

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master)
$ git config core.sparsecheckout true

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ echo 'official/recommend/wide_and_deep' >> .git/info/sparse-checkout

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ cat .git/info/sparse-checkout
official/recommend/wide_and_deep

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ tree -a
.
`-- .git
    |-- HEAD
    |-- config
    |-- description
    |-- hooks
    |   |-- applypatch-msg.sample
    |   |-- commit-msg.sample
    |   |-- fsmonitor-watchman.sample
    |   |-- post-update.sample
    |   |-- pre-applypatch.sample
    |   |-- pre-commit.sample
    |   |-- pre-merge-commit.sample
    |   |-- pre-push.sample
    |   |-- pre-rebase.sample
    |   |-- pre-receive.sample
    |   |-- prepare-commit-msg.sample
    |   |-- push-to-checkout.sample
    |   `-- update.sample
    |-- info
    |   |-- exclude
    |   `-- sparse-checkout
    |-- objects
    |   |-- info
    |   `-- pack
    `-- refs
        |-- heads
        `-- tags

9 directories, 18 files

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ git pull origin master
remote: Enumerating objects: 5904, done.
remote: Counting objects: 100% (5904/5904), done.
remote: Compressing objects: 100% (2797/2797), done.
remote: Total 18385 (delta 3664), reused 4338 (delta 2996), pack-reused 12481
Receiving objects: 100% (18385/18385), 64.92 MiB | 5.93 MiB/s, done.
Resolving deltas: 100% (10353/10353), done.
From https://gitee.com/mindspore/models
 * branch              master     -> FETCH_HEAD
 * [new branch]        master     -> origin/master

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ tree -L 4
.
`-- official
    `-- recommend
        `-- wide_and_deep
            |-- README.md
            |-- README_CN.md
            |-- ascend310_infer
            |-- default_config.yaml
            |-- eval.py
            |-- export.py
            |-- mindspore_hub_conf.py
            |-- postprocess.py
            |-- preprocess.py
            |-- requirements.txt
            |-- script
            |-- src
            |-- train.py
            |-- train_and_eval.py
            |-- train_and_eval_auto_parallel.py
            |-- train_and_eval_distribute.py
            |-- train_and_eval_parameter_server_distribute.py
            `-- train_and_eval_parameter_server_standalone.py

6 directories, 15 files

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ git reset --hard 5a4ff4e3dc9bcb46dbb71b6b16fbadbb68c5e8dc
HEAD is now at 5a4ff4e3 !813 add ascend310 infer Merge pull request !813 from jkmopl/master

Administrator@DESKTOP-NRIERC2 MINGW64 /e/codes/tmp/test (master|SPARSE)
$ git log
commit 5a4ff4e3dc9bcb46dbb71b6b16fbadbb68c5e8dc (HEAD -> master)
Merge: 789f442c 48f30e33
Author: i-robot <huawei_ci_bot@163.com>
Date:   Mon Nov 22 07:53:50 2021 +0000

    !813 add ascend310 infer
    Merge pull request !813 from jkmopl/master

commit 789f442c1b2273989dbc4a4c2ce5c762ed5cff8f
Merge: 9c321da6 646c369f
Author: i-robot <huawei_ci_bot@163.com>
Date:   Mon Nov 22 01:53:51 2021 +0000

    !171 [哈尔滨工业大学威海][高校贡献][mindspore][deeplabv3plus]-310提交
    Merge pull request !171 from kzx2020/master

4.sparse-checkout 文件设置

请参考参考文献1

5. 应用示例

集成脚本git_clone_only_given_folder.sh

#!/bin/bash

echo_help()
{
   echo "usage:" 
   echo "  # if project_given_folder is root one,project_given_folder should be added '/' at the head such as '/build/'"   
   echo "  ./git_clone_only_given_folder.sh absolute_target_path download_url project_given_folder branch"
   echo "  when lack of branch parameter, default master branch"
}

if [ $1 == "--help" ] || [ $1 == "-h" ];then
{
   echo_help
   exit 1
}
fi

path=$1
url=$2
folder=$3
branch=master
if [ $# == 5 ];then
    branch=$4
fi

rm -rf $path
mkdir -p $path
cd $path || exit 1
git init
git remote add  origin $url
git config core.sparsecheckout true
echo $folder >> .git/info/sparse-checkout
git pull origin $branch
cd ..

应用脚本调用示例:

./git_clone_only_given_folder.sh '/home/test/' 'https://gitee.com/mindspore/models.git' 'official/recommend/wide_and_deep' master

6.参考文献

[1] yanlong107, git sparse checkout (稀疏检出), https://www.jianshu.com/p/680f2c6c84de

Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐