Get Started

Import Book

One-time SVN repository import with SubGit

Chapter 1. Import configuration overview.

There are two SVN repository import approaches: - subgit import command, that allows to set all the import parameters and run the import using one single command - and subgit import with preliminary subgit configure that allows to create SubGit configuration file to tune all the import parameters.

Both approaches allow to achieve the same result, but the second one is handier as it provides possibilities to perform more precise import parameters tuning and allows to use a regular text editor to set the configuration while the first approach requires all the parameters to be set directly in the command line.

up

Chapter 2. Import by subgit import command.

As its name implies, SubGit’s import command performs import for SVN repository. It allows all the import parameters to be set directly in command line and has the following syntax:

subgit import [--default-domain DOMAIN] [--minimal-revision REV] [--authors-file FILE] [--trunk PATH] [--branches PATH] [--tags PATH] [--username SVN_USERNAME] [--password SVN_PASSWORD] [--non-interactive] [--trust-server-cert] [--private-key SVN_PRIVATE_KEY_PATH] [--private-key-passphrase SVN_PRIVATE_KEY_PASSPHRASE] --svn-url URL REPOS_PATH

Naturally, the import command has a number of options; let’s discuss them more closely starting from latest two options in the list:

--svn-url URL

URL to SVN repository project, like http://example.com/svn/repository/project, ssh://svn/repository/project etc. Note, Git repositories that SubGit creates, match to SVN projects, not to SVN repositories. In other words, every Git repository represents single SVN project and SubGit translates every single SVN project into separate Git repository.

REPOS_PATH

a path to the local directory for Git repository; the directory will be created if it does not exist.

--default-domain DOMAIN 

This option allows setting default domain to generate Git committer !(or author)! name by SVN committer name if there’s no corresponding rule in authors file. Say, you have a SVN user johndoe and you set default domain to be example.com; if you have not set suitable Git user in the authors file, then johndoe <johndoe@example.com> will be set as a committer name corresponding to that SVN user.

--minimal-revision REV

This option allows to set minimal SVN repository revision the import should start from. For example, if you have an SVN repository with last revision 100 and you don’t need revisions older than, say, 50 - then you can set this option to 50 and that revision will become initial Git repository commit.

--authors-file FILE

The path to authors mapping file (or executable helper program) that maps SVN committers names to Git committers. Authors file is a simple text file with SVN to Git authors mapping coming in the following format:

svn_user_name = Git Name <name@email.com>

I.e., if you have johndoe SVN user and you want that user to be mapped to John Doe <johndoe@example.com> Git author then you can get this done as follow:

johndoe = John Doe <johndoe@example.com>

-T [--trunk] PATH

the path to a SVN project trunk (or one that plays an equivalent role) directory, relative to project root that’s specified by --svn-url option. If your SVN project follows recommended SVN layout (that is, there are trunk, branches and tags directories, as it recommended) then this path might be as simple as trunk; or, if you have your own layout, set path to the directory that acts as trunk - e.g. main or even main/trunk. This relative PATH will be added to the URL set by --svn-url option: say, you’ve set http://example.com/svn as SVN URL and main/trunk as trunk path, then the resulting path to your trunk directory will be http://example.com/svn/main/trunk

-b [--branches] PATH

similarly to the trunk, this option sets relative path to SVN project branches directory.

-t [--tags] PATH

again, similarly to the trunk this option sets relative path to SVN project tags directory.

--username SVN_USERNAME

username to use to access SVN repository. If the SVN repository allows anonymous access then this option can be omitted; otherwise, if the SVN does not and no username specified, then SubGit will ask for it later.

--password SVN_PASSWORD

the SVN repository password; might be requested later if no password is set.

--non-interactive

suppress interactive prompts.

--trust-server-cert

accept unknown SSL server certificates with no additional confirmations (only with --non-interactive).

--private-key SVN_PRIVATE_KEY_PATH

this option is required when the SVN server is set to use SSH keys to get users authorized and sets the path to the private key.

--private-key-passphrase SVN_PRIVATE_KEY_PASSPHRASE

passphrase for the private key file to use to access Subversion repository; if no passphrase is specified, SubGit may ask for it later

To illustrate how subgit import works, assume you have a SVN repository that follows recommended structure, but directories names differ from those SVN team recommends:

main/
features/
snapshots/

Here main directory contains “main line” of development; features contains some divergent copies of development lines; and snapshots contains some stable snapshots of a particular line of development. You want to silently (with no prompts) import that SVN project into my_project directory in your home directory; you have prepared an authors file in your home directory and you access the SVN repository over https. In such case you can use the following command to perform repository import:

subgit import --default-domain example.com --authors-file $HOME/authors_file --trunk main --branches features --tags snapshots --username <svn user name> --password <svn user password> --non-interactive --trust-server-cert --svn-url https://example.com/svn/repository/project $HOME/my_project

Here the <svn user name> and <svn user password> are actual SVN user name and password needed to access the SVN repository, https://example.com/svn/repository/project is your SVN project URL and $HOME/my_project is the directory where newly created Git repository will be living.

subgit import is fairly straight command and works perfectly well when your SVN project sticks to recommended structure; however, it doesn’t provide much flexibility when you have more complex structure - and that is the case when the second approach will help.

up

Chapter 3. Import with preliminary configuration tuning.

3.1 Creating new Git repository and SubGit configuration file.

The subgit import command requires all the parameters to be set in the command line; however, under the hood, it actually creates a configuration file, fills it up according to given parameters and performs import using that configuration file. In fact, subgit import doesn’t rely upon command-line options, it actually relies upon the configuration file; therefore it’s possible to preliminary create the configuration file and then run subgit import with no command line options and this actually is the second import approach.

This approach assumes the configuration file to be created before invoking subgit import. It can be done by subgit configure command:

subgit configure SVN_PROJECT_URL [GIT_REPO_PATH]

As its name implies, the SVN_PROJECT_URL argument represents URL that points to single project in SVN repository, like http://example.com/svn/project, ssh://svn/repository/project etc.

This command will not connect to SVN repository; instead, it will create empty Git repository where data from SVN will be imported to. GIT_REPO_PATH command argument tells the command where to create new Git repository directory: actually, it is file system path to that new directory. It’s optional, though, and can be omitted: in such case, the subgit configure command will create <project name>.git repository directory in the directory where the command runs. Namely, if you have passed a URL http://example.com/svn/some_project as an argument to subgit configure then SubGit would create new some_project.git Git repository in current directory; otherwise, if you have passed a URL http://example.com/svn/some_project as SVN_PROJECT_URL and /path/to/new/git/repo as GIT_REPO_PATH then SubGit will create new Git repository at the path /path/to/new/git/repo.

That newly created Git repository has the following structure:

[some_project.git]# ls -l
total 8
drwxr-xr-x. 2 root root   6 mar 20 19:15 branches
-rw-r--r--. 1 root root 140 mar 20 19:15 config
-rw-r--r--. 1 root root  23 mar 20 19:15 HEAD
drwxr-xr-x. 2 root root   6 mar 20 19:15 hooks
drwxr-xr-x. 3 root root  18 mar 20 19:15 logs
drwxr-xr-x. 4 root root  30 mar 20 19:15 objects
drwxr-xr-x. 4 root root  31 mar 20 19:15 refs
drwxr-xr-x. 5 root root 105 mar 20 19:16 subgit

Apart from common Git files, the subgit configure command creates additions subgit directory for SubGit own needs. SubGit configuration file called config resides in this directory:

[some_project.git/subgit]# ls -l config
-rw-r--r--. 1 root root 10145 mar 20 19:16 config

That’s where SubGit reads configuration settings from, it will be described in details in chapter 3.2 below; but we have to pay more attention to subgit configure command before to proceed with configuration file as the command has more useful options, namely:

subgit configure [--minimal-revision REV] [--shared-daemon SHARED_DAEMON_PATH] [--layout MODE] [--trunk PATH ] SVN_PROJECT_URL [GIT_REPO_PATH]

the first option

--minimal-revision REV

does exactly the same as similar subgit import command option: it allows to set minimal SVN repository revision the import should start from.

--shared-daemon SHARED_DAEMON_PATH

specify shared polling process location to register this repository in. It doesn’t make sense for import, so we won’t describe it much. The next two options, however, are rather interesting and useful:

--layout MODE
--trunk PATH

the layout option allows to tell SubGit how the SVN project layout is organized; or, more precisely, how SubGit should treat the SVN project layout. The option can take three possible values:

directory - this option leads SubGit to treat SVN_PROJECT_URL as a single branch and translate only that particular branch. If fact, this option creates trunk = :refs/heads/master setting in configuration file, that tells SubGit to import SVN_PROJECT_URL directory into Git master branch.

std - this is default value, it assumes standard trunk/branches/tags project layout. If subgit configure runs with layout option set to std or without layout at all, then SubGit supposes standard SVN layout is used and thus creates standard mapping scheme in the configuration file:

trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*

These are SVN-to-Git mapping settings in the configuration file, they’re described in details below in chapter 3.2.

auto - this is a special value that notably differs from previous. It requires another option - --trunk - to be set and this is the only option that compels subgit configure connect to the SVN project. Trunk option:

--trunk PATH

tells SubGit relative path to the directory that plays a role of trunk or “main line” of development. Basing on this PATH SubGit tries to find out where branches and tags reside in your SVN project. Let’s assume you have a SVN repository that follows recommended structure, but directories names differ from those SVN team recommends:

main/
features/
snapshots/

Here main directory contains “main line” of development; features contains some divergent copies of development lines; and snapshots contains some stable snapshots of a particular line of development. You want to create SubGit configuration file using --layout auto option and run the following command:

subgit configure --layout auto --trunk main https://example.com/svn/some_repo/some_project ./Git_repo

Given that command SubGit will access the SVN repository (and will ask you for credentials if needed) and will create mapping basing on SVN repository metadata and directory structure. The resulting mapping settings in the configuration file may look like this:

trunk = main:refs/heads/master
branches = features/*:refs/heads/*
tags = snapshots/*:refs/tags/*
shelves = shelves/*:refs/shelves/*

but actually it depends on your SVN repository configuration and layout. In some cases it may be treated like:

trunk = main:refs/heads/master
branches = snapshots/*:refs/heads/*
branches = features/*:refs/heads/features/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*

Of course, it’s just settings in the configuration file so you can change it in the way you need if --layout auto output is not completely correct: the following chapter 3.2 describes how to tune your configuration, so let’s now dive deep into this.

up

3.2 Tuning SubGit configuration

There are four main sections in the configuration file: - core that sets common options, - svn that groups SVN to Git mapping options, - auth for SVN authentication options - daemon that lists options specific to the background translation process. Not all the configuration options have meaning for import, some affect mirroring process only, like those in daemon, we will exclude them from the discussion here. Now let’s dive into the SubGit configurations deeper.

The file starts with core options and first one is

shared = [false|true]

As showed, it can be set either false or true. When core.shared is set to ‘true’, all files will be made group-writable; it must be set true when more than one operating system account access files in the repository and all the accounts must share the same primary group. This option actually sets core.sharedRepository option in newly created Git repository configuration file

The shared option doesn’t make much sense in case of import; it’s mainly intended for subgit install where bidirectional SVN-to-Git mirror is being created, so it can be left untouched (false by default) for import.

The next setting sets where to save SubGit’s logs:

logs = PATH_TO_LOGS

PATH_TO_LOGS is file system path to the directory where to save the logs. Note, the path can be either absolute or relative. The first one is fairly straight - you just set full path to the desired directory. In the latter case the path is being calculated based on the repository path: let’s say, the repository has been created at path /path/to/repo and you want logs to be saved in /path/to/repo/subgit/logs directory. In that case you can just set PATH_TO_LOGS to be subgit/logs - which, in fact, is default setting. Or, say, you want to store the logs at /path/to/logs but don’t want to set absolute path. In such case you can set PATH_TO_LOG to ../logs - that relative path is equal to /path/to/repo/../logs which, in turn, is equivalent to /path/to/logs. Of course, in both cases, the account which SubGit is being run on behalf must have write access to the directory.

The next setting is authors mapping file or helper:

authorsFile = PATH_TO_AUTHOR_FILE

again, the path can be either absolute or relative and it points either to author mapping file or to executable authors mapping helper program. As it’s described earlier, the authors file is simple text file with SVN to Git authors mapping coming in the following format:

svn_user_name = Git Name name@email.com

I.e., if you have johndoe SVN user and you want that user to be mapped to John Doe <johndoe@example.com> Git author then you can get this done as follow:

johndoe = John Doe johndoe@example.com

On the other hand, the path can lead to executable authors mapping helper program. That program is expected to read data from the standard input and provide output to the standard output. Input/output format is:

for Git to Subversion mapping:
       INPUT:
        FULL AUTHOR NAME
        AUTHOR EMAIL
       OUTPUT:
        SUBVERSION USER NAME
for Subversion to Git mapping:
       INPUT:
        SUBVERSION USER NAME
       OUTPUT:
        FULL AUTHOR NAME
        AUTHOR EMAIL

Sample authors mapping script could be found in subgit/samples directory.

The next setting is default domain:

defaultDomain = DOMAIN

Like subgit import --default-domain described above, this option allows to set default domain to generate Git committer name by SVN committer name if there’s no corresponding rule in authors file. If you have a SVN user johndoe and you set default domain to be example.com, then johndoe <johndoe@example.com> will be set as a committer name corresponding to that SVN user if there’s no explicit mapping for that user in authors mapping file.

Last setting in core sections is

pathEncoding = ENCODING

that set encoding to be used to store paths in Git tree objects. Default is UTF-8.

up

The next configuration file section is svn that sets SVN to Git mapping options and the very first settings here is SVN project URL:

url = SVN_PROJECT_URL

This option sets URL to the SVN project that will be imported into Git repository. Note that SVN_PROJECT_URL value will be set to be that very SVN_PROJECT_URL you have passed to subgit configure command earlier.

The next four options - trunk, branches, tags, shelves - match SVN to Git entities. Subversion recommends to use the following layout for projects in SVN repository:

trunk/
branches/
tags/

You can use your own layout, but these entities - “main line”, or trunk, of development; some branches, which are divergent copies of development lines; and some tags, which are named, stable snapshots of a particular line of development - are crucial and these four options tell SubGit where trunk, branches, tags and shelves (those are special branches SubGit creates in Subversion repository to mirror Git anonymous branches) reside in the SVN project. Generic mapping syntax is:

<Subversion-Path-Pattern>:<Git-Reference-Pattern>

Subversion-Path-Pattern is SVN directory path that’s relative to SVN project URL SVN_PROJECT_URL. That is, if you’ve set SVN_PROJECT_URL to be http://example.com/svn/some_project and you have recommended SVN project layout, then you can use trunk, branches and tags as Subversion-Path-Pattern, that will be equivalent to full paths http://example.com/svn/some_project/trunk, http://example.com/svn/some_project/branches and http://example.com/svn/some_project/tags respectively.

The mapping settings are rather flexible, but there are few rules you must follow:

  • trunk must be mapped; the only exclusion is the configuration with no defined mapping - no trunk, branches, tags and shelves options at all - that may be used to import SVN projects with no branches.
  • there must be only one trunk mapping.
  • branches, tags and shelves are optional and may be omitted, one by one or all together.
  • you can set any number of branches and tags you need.
  • each mapping must be unique - that is, you can’t set different SVN branches paths to be mapped to the same Git reference.

There are two approaches to set the mappings: explicit mapping and wildcard syntax. The first one, as its name implies, assumes all the paths to be set explicitly, e.g.:

trunk = trunk:refs/heads/master
branches = branches/1.0.x:refs/heads/1.0.x
tags = tags/1.0.1:refs/tags/1.0.1

This approach is useful if you want to import just some of branches and tags as in the example above; it can also be useful if you have non-standard SVN project layout. Say, you have the following project layout:

/trunk
/branch1
/branch2
/branch3
/tag1
/tag2
/tag3

then you can set mapping like this:

trunk = trunk:refs/heads/master
branches = branches/branch1:refs/heads/branch1
branches = branches/branch2:refs/heads/new_feature_branch
tags = tags/tag1:refs/tags/tag1
tags = tags/tag1:refs/tags/v.1.0.2
tags = tags/tag3:refs/tags/v.2.0.rc2

you can omit some branches/tags if you don’t need them to be imported - in this example we do not want to import bracnh3 so we haven’t set mapping for it - and you can set new names for branches/tags as it’s done in the example.

Another mapping approach is to use wildcard syntax. This approach allows to map set of SVN branches/tags directories to suitable Git references. For example, if you have standard SVN project layout

trunk/
branches/
tags/

and you want to import all the branches and tags you have, then the mapping settings may be as simple as:

trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*

Wildcard mapping syntax brings additional rules to be followed:

  • Each mapping could include zero or more ‘*’ on both sides (sides are separated by ‘:’) of the definition.
  • Number of asterisks (‘*’) on the left side of the mapping must be equal to the number of asterisks on the right side of it.
  • There must be only one asterisk within single path section - that is, only one ‘*’ between two slashes. More than one asterisk within the section may cause ambiguity in SVN-to-Git mirror and hence is not supported.

Asterisk represents any number of any characters in path and hence it can represent a set of directories:

branches = branches/*:refs/heads/*
branches = branches/*/*:refs/heads/*/*
branches = branches/*/project:refs/heads/*

or it can represent certain name pattern, either SVN directory name or Git reference, or even both:

branches = branches/feature_*_2015:refs/heads/features/*
branches = branches/feature_*_2015:refs/heads/2015/features_*
branches = branches/features/*:refs/heads/feature_*

As an example, say, you have that non-standard SVN project layout with many branches and tags:

/trunk
/branch1
/branch2
.
.
/branchN
/tag1
/tag2
.
.
/tagN

You have many branches/tags and you want to import all (or most) of them; it’s kind a hard to set explicit mapping since there are many, thus you want to follow wildcard syntax. If branches/tags names follow certain pattern (which is obviously the case here) the mapping can be set as follow:

trunk = trunk:refs/heads/master
branches = branch*:refs/heads/branch*
tags = tag*:refs/tags/tag*

or

trunk = trunk:refs/heads/master
branches = branch*:refs/heads/branches/*
tags = tag*:refs/tags/*

Additionally, if do not want to import some of the branches, there’s a possibility to exclude them by using option

excludeBranches = SIMPLE_PATTERN

SIMPLE_PATTERN is path matching pattern, with zero or one ‘*’. During the import SubGit evaluates every path against this pattern and excludes any branch matching given SIMPLE_PATTERN - in other words, SubGit appends SIMPLE_PATTERN to given svn.url path and excludes any branches that match the resulting pattern. To make some example, let’s say you have standard SVN project layout and you want to import all the branches excluding those related to some feature and thus have certain name pattern. In such case your mapping may look like:

trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
excludeBranches = branches/feature_*

assuming that the branches you don’t want to import have feature_* name pattern.

Finally, there is a possibility to exclude or include particular files in branches or tags directories using the following two options:

excludePath = PATTERN
includePath = PATTERN

the PATTERN is a expression representing files names to be excluded or included; the PATTERN format is described in details in mapping article.

Suppose, you want to import all the branches, but do not want *.iso and *.exe to be imported into Git repository. IN such case you can set them to exclude:

excludePath = *.iso
excludePath = *.exe

Given this options, SubGit won’t be importing any ‘.iso’ or ‘.exe’ files from branches and tags directories and subdirectories.

When branches, tags or paths are excluded from the translation with the help of one of the above options, modifications to those branches or paths made in Subversion project would not be translated to Git, as well as changes in Git repository made to those locations would not appear in Subversion.

The next options that make sense for import are authentication options. You can omit that if your SVN server allows anonymous access, but if it doesn’t you have to provide valid credentials or keys.

The first setting is auth which is a list of the authentications sections ID. SubGit creates one default authentication section by default:

auth = default

and it should be enough for import since you only need to access the SVN server once for import.

The authentication section itself starts with [auth "SECTION_NAME"] directive; for default section it is

[auth "default"]

There are few options that can be set in the authentication section:

  • passwords - this is simply relative path to the passwords file containing user/password pairs.
  • credentialHelper - path to the credential helper program and its optional arguments; program is expected to be non-interactive (no prompt) and to use Git credential helper input/output format (refer to git-credentials document for more details). Sample credential helper script could be found in subgit/samples directory.
  • userName - user name to use to log into Subversion repository.
  • password - password to use with the explicitly specified user name.
  • sshKeyFile - for svn+ssh protocol, SSH private key to use; OpenSSH format is supported.
  • sshKeyFilePassphrase - passphrase to use to read SSH private key.
  • sslClientCertFile - for SSL protocol, client SSL client certificate to use; PKSC#12 format is supported.
  • sslClientCertPassphrase - passphrase to use to read SSL client certificate
  • subversionConfigurationDirectory - path to Subversion configuration directory or ‘@default@’ to use current user Subversion configuration and credentials cache.

There’s a little trick for a case if you don’t want to expose your password: leaving all the default settings untouched, you can set userName option with no corresponding password and you’ll be asked for it when import process starts.

All the rest options in the configuration file have no meaning for import and can be left untouched.

up

3.3 Perform the import.

By now, we are perfectly ready to perform the import. To do it run the following command:

subgit import GIT_REPO_PATH

GIT_REPO_PATH here is full path to the Git repository we have created on the previous step.

Given that command, SubGit will access the SVN project and translate it to Git repository basing on the configuration you’ve set. After the import process finishes you have ready to work Git repository.

However, it’s not the end of the story! You can also run subgit install using the same configuration and Git repository and thus start to mirror SVN project to Git repository and vice versa! It may require some additional settings to be set, like authentication or daemon settings we left untouched, but basically, it’s possible to start mirroring instead of import if you have changed your mind. subgit install and SVN-to-Git mirroring are described in details in Remote SubGit book.

up

Contact us

Please fill out all fields.


By clicking on this button you agree to provide us your personal data for the purpose of technical support for you. Please read our Privacy policy for more details.

Thank you for contacting us!
We will get back to you soon.

We are sorry, something went wrong. Please try again or contact us.