MODELLING AND SIMULATION, WEB ENGINEERING, USER INTERFACES
May 7th, 2009

Fun with Git

I met a bunch of smart people at the Montreal Ubuntu 9.04 release party, and I talked to one person a lot about the distributed SCM Git. I had heard of distributed SCM’s but had never had any reason to try anything other than SVN. Still, I was intrigued, and so decided to look into it. Here are some useful links:

For an overview of what Git is, see Linus Torvald’s tech talk at Google. For an overview of how to use Git, see Randall Schwartz’s talk at Google. For an overview of how I have so far successfully used it in my GSoC project, read on.

My goal was to use Git to solve a particular problem, which was as follows:

There is an SVN repository A, and a CVS repository B. The code in B was probably forked at some point from the code in A. B was iterated on in a private sandbox before being put into CVS, so B has no history, just HEAD. A has also changed since the time it was forked. I need to take the code in B and compare it to each of the major release tags in A, so as to determine which tag is the most similar to B, and therefore which tag B was most likely to have been forked from. This basically means diffing two different filesystem trees. Because Git is supposed to be very good at merging different branches based on content, I assumed it would be able to give me some very intelligent diff output.

In this contex, A is the GWT SVN repo, and B is a project in Eclipse’s CVS repo. The Eclipse project org.eclipse.swt.e4.jcl appears to be forked from GWT, so I needed to determine when it was forked, what changes were made after, and (hopefully) why.

The person I met at the Ubuntu party, Derek, wrote me very excellent instructions on how to do this, which I have gratuitously copied and pasted here.



Here are the general steps:



- Use git-svn-clone to clone the Subversion repository A.

- Checkout the CVS HEAD of repository B into a separate workspace.

- Create a new branch in the git-svn repository. You can branch

starting from any tag or commit, but the closer that the branch origin

is to the original fork tag, the fewer differences you will likely

encounter. So, try to correctly guess the fork tag and branch from

that guess.

- Overwrite the workspace associated with the new branch with all the

files in the CVS HEAD workspace. Stage and then commit the changes

into the Git repository.

- For every Subversion "tag" branch in Git (Git stores every

Subversion tag as a Git branch, not as a Git tag), display the

differences between the branch that contains CVS HEAD and the Git

Subversion "tag" branch. Note the one that gives you the fewest

differences.



Specific commands (untested):



# Clone the Subversion repository A. Option "--stdlayout" tells Git

that the repository has the standard branches, tags, trunk layout

git svn clone --stdlayout http://domain.com/repositoryA



# Create a branch from a tag

git branch subversiontag cvsbranch



# Switch to branch "cvsbranch"

git checkout cvsbranch



# Overwrite files in Git "cvsbranch" workspace with those from CVS

HEAD workspace. Make sure that you don't copy the CVS metadeta

directory in each workspace directory. Consider using rsync which can

ignore CVS and Subversion metadata directories.



# Stage all CVS changes

git add --all



# Commit CVS changes

git commit --message "Added changes from CVS repository B"



# List all Git Subversion tag branches

git branch | grep svn



# Compare each Git Subversion tag branch with tag "cvsbranch". Option

"--name-only" displays only the names of the files that changed:



for tag in `git branch -r | grep tags`; do git diff --name-only $tag

cvsbranch; done



# Note the tag that produced the fewest differing files.



While this mostly worked great, I ran into one gotch using Git from the Ubuntu 8.04 repo (git version 1.5.4.3), which was that git-svn did not capture all of the tags from svn. When I tried it with git from the 8.10 repo (git version 1.5.6.something), it worked fine. Really, I should be git-cloning the latest git, but I haven’t quite figured out how to do that successfully. Nevertheless, beware.

I also tried using svn2git, which is supposed to basically be svn-git with some manipulations to make the imported svn repo more git-like. I spent an hour trying to get this to work, and met with marginal success. It turned out not to be neccessary.

In the end, I ended up using the following code to guage similarity.

The first one just counts the number of files that changed between each tag and the cvs branch.



olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ for tag in `git-branch -a | grep tag`; do echo $tag; git-diff --relative --name-only $tag cvsbranch | wc; done;



tags/1.3.1

151 151 3767

tags/1.3.3

151 151 3767

tags/1.3.3@288

151 151 3767

tags/1.4.10

150 150 3746

tags/1.4.59

150 150 3746

tags/1.4.60

150 150 3746

tags/1.4.60@1399

150 150 3746

tags/1.4.61

150 150 3746

tags/1.4.61@1504

150 150 3746

tags/1.4.62

150 150 3746

tags/1.4.62@2104

150 150 3746

tags/1.5.0

78 78 1874

tags/1.5.0@2941

78 78 1874

tags/1.5.1

70 70 1709

tags/1.5.1@3391

70 70 1709

tags/1.5.2

70 70 1709

tags/1.5.2@3587

70 70 1709

tags/1.5.3

73 73 1756

tags/1.5.3@3776

73 73 1756

tags/1.6.0

78 78 1863

tags/1.6.0@4621

78 78 1863

tags/1.6.1

78 78 1863

tags/1.6.1@4846

78 78 1863

tags/1.6.2

78 78 1863

tags/1.6.2@5035

78 78 1863

tags/1.6.3

78 78 1863

tags/1.6.3@5110

78 78 1863

tags/1.6.4

78 78 1863

tags/1.6.4@5112

78 78 1863

tags/1.6.4@5189

78 78 1863

This second analysis actually counts the number of lines in each diff
between each tag and the cvs branch.



olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ for tag in `git-branch -a | grep tag`; do echo $tag; git-diff --relative $tag cvsbranch | wc; done;



tags/1.3.1

18257 77427 515421

tags/1.3.3

18257 77427 515421

tags/1.3.3@288

18257 77427 515421

tags/1.4.10

18261 78017 524249

tags/1.4.59

18158 77486 520272

tags/1.4.60

18158 77486 520272

tags/1.4.60@1399

18158 77486 520272

tags/1.4.61

18158 77486 520272

tags/1.4.61@1504

18158 77486 520272

tags/1.4.62

18158 77486 520272

tags/1.4.62@2104

18158 77486 520272

tags/1.5.0

5667 22620 165036

tags/1.5.0@2941

5667 22620 165036

tags/1.5.1

4357 17759 127102

tags/1.5.1@3391

4357 17759 127102

tags/1.5.2

3920 15766 113373

tags/1.5.2@3587

3920 15766 113373

tags/1.5.3

4045 16322 117623

tags/1.5.3@3776

4045 16322 117623

tags/1.6.0

4974 19823 144457

tags/1.6.0@4621

4974 19823 144457

tags/1.6.1

4986 19881 144836

tags/1.6.1@4846

4986 19881 144836

tags/1.6.2

4986 19881 144836

tags/1.6.2@5035

4986 19881 144836

tags/1.6.3

4986 19881 144836

tags/1.6.3@5110

4986 19881 144836

tags/1.6.4

4986 19881 144836

tags/1.6.4@5112

4986 19881 144836

tags/1.6.4@5189

4974 19823 144457

So, it looks like 1.5.2 has significantly fewer changes than any other tag. The diff to 1.5.2 is attached to this email. It basically looked like 48 new files were added, 0 were removed and many were modified.

Once I had established which GWT SVN tag was most similar, it was trivial to diff the branches and get meaningful output:



olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ git-diff --relative cvsbranch tags/1.5.2 > 1-5-2.diff

I won’t post the resulting diff here, but suffice it to say that it was exactly what I was looking for, and I am now moving onto the task of analyzing the diff to figure out exactly what was changed, and why.

My final impressions of git: I’m very happy with it. The ability to import an entire SVN repo so that it can be used offline; the ease with which one may branch and merge; and the ability to push changes back up into the svn repo, are all things that will be very useful to me. I think I’m actually going to get to use Git a lot on my GSoC project.

This work is licensed under GPL - 2009 | Powered by Wordpress using the theme aav1