MODELLING AND SIMULATION, WEB ENGINEERING, USER INTERFACES
May 7th, 2009

Fun with Git

I met a bunch of smart people at the Montreal Ubuntu 9.04 release party, and I talked to one person a lot about the distributed SCM Git. I had heard of distributed SCM’s but had never had any reason to try anything other than SVN. Still, I was intrigued, and so decided to look into it. Here are some useful links:

For an overview of what Git is, see Linus Torvald’s tech talk at Google. For an overview of how to use Git, see Randall Schwartz’s talk at Google. For an overview of how I have so far successfully used it in my GSoC project, read on.

My goal was to use Git to solve a particular problem, which was as follows:

There is an SVN repository A, and a CVS repository B. The code in B was probably forked at some point from the code in A. B was iterated on in a private sandbox before being put into CVS, so B has no history, just HEAD. A has also changed since the time it was forked. I need to take the code in B and compare it to each of the major release tags in A, so as to determine which tag is the most similar to B, and therefore which tag B was most likely to have been forked from. This basically means diffing two different filesystem trees. Because Git is supposed to be very good at merging different branches based on content, I assumed it would be able to give me some very intelligent diff output.

In this contex, A is the GWT SVN repo, and B is a project in Eclipse’s CVS repo. The Eclipse project org.eclipse.swt.e4.jcl appears to be forked from GWT, so I needed to determine when it was forked, what changes were made after, and (hopefully) why.

The person I met at the Ubuntu party, Derek, wrote me very excellent instructions on how to do this, which I have gratuitously copied and pasted here.


Here are the general steps:

- Use git-svn-clone to clone the Subversion repository A.

- Checkout the CVS HEAD of repository B into a separate workspace.

- Create a new branch in the git-svn repository.  You can branch

starting from any tag or commit, but the closer that the branch origin

is to the original fork tag, the fewer differences you will likely

encounter.  So, try to correctly guess the fork tag and branch from

that guess.

- Overwrite the workspace associated with the new branch with all the

files in the CVS HEAD workspace.  Stage and then commit the changes

into the Git repository.

- For every Subversion "tag" branch in Git (Git stores every

Subversion tag as a Git branch, not as a Git tag), display the

differences between the branch that contains  CVS HEAD and the Git

Subversion "tag" branch.  Note the one that gives you the fewest

differences.

Specific commands (untested):

# Clone the Subversion repository A. Option "--stdlayout" tells Git

that the repository has the standard branches, tags, trunk layout

git svn clone --stdlayout http://domain.com/repositoryA

# Create a branch from a tag

git branch subversiontag cvsbranch

# Switch to branch "cvsbranch"

git checkout cvsbranch

# Overwrite files in Git "cvsbranch" workspace with those from CVS

HEAD workspace.  Make sure that you don't copy the CVS metadeta

directory in each workspace directory.  Consider using rsync which can

ignore CVS and Subversion metadata directories.

# Stage all CVS changes

git add --all

# Commit CVS changes

git commit --message "Added changes from CVS repository B"

# List all Git Subversion tag branches

git branch | grep svn

# Compare each Git Subversion tag branch with tag "cvsbranch". Option

"--name-only" displays only the names of the files that changed:

for tag in `git branch -r | grep tags`; do git diff --name-only $tag

cvsbranch; done

# Note the tag that produced the fewest differing files.

While this mostly worked great, I ran into one gotch using Git from the Ubuntu 8.04 repo (git version 1.5.4.3), which was that git-svn did not capture all of the tags from svn. When I tried it with git from the 8.10 repo (git version 1.5.6.something), it worked fine. Really, I should be git-cloning the latest git, but I haven’t quite figured out how to do that successfully. Nevertheless, beware.

I also tried using svn2git, which is supposed to basically be svn-git with some manipulations to make the imported svn repo more git-like. I spent an hour trying to get this to work, and met with marginal success. It turned out not to be neccessary.

In the end, I ended up using the following code to guage similarity.

The first one just counts the number of files that changed between each tag and the cvs branch.


olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ for tag in `git-branch -a | grep tag`; do echo $tag; git-diff --relative --name-only $tag cvsbranch | wc; done;

tags/1.3.1

   151     151    3767

tags/1.3.3

   151     151    3767

tags/1.3.3@288

   151     151    3767

tags/1.4.10

   150     150    3746

tags/1.4.59

   150     150    3746

tags/1.4.60

   150     150    3746

tags/1.4.60@1399

   150     150    3746

tags/1.4.61

   150     150    3746

tags/1.4.61@1504

   150     150    3746

tags/1.4.62

   150     150    3746

tags/1.4.62@2104

   150     150    3746

tags/1.5.0

    78      78    1874

tags/1.5.0@2941

    78      78    1874

tags/1.5.1

    70      70    1709

tags/1.5.1@3391

    70      70    1709

tags/1.5.2

    70      70    1709

tags/1.5.2@3587

    70      70    1709

tags/1.5.3

    73      73    1756

tags/1.5.3@3776

    73      73    1756

tags/1.6.0

    78      78    1863

tags/1.6.0@4621

    78      78    1863

tags/1.6.1

    78      78    1863

tags/1.6.1@4846

    78      78    1863

tags/1.6.2

    78      78    1863

tags/1.6.2@5035

    78      78    1863

tags/1.6.3

    78      78    1863

tags/1.6.3@5110

    78      78    1863

tags/1.6.4

    78      78    1863

tags/1.6.4@5112

    78      78    1863

tags/1.6.4@5189

    78      78    1863

This second analysis actually counts the number of lines in each diff
between each tag and the cvs branch.


olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ for tag in `git-branch -a | grep tag`; do echo $tag; git-diff --relative $tag cvsbranch | wc; done;

tags/1.3.1

 18257   77427  515421

tags/1.3.3

 18257   77427  515421

tags/1.3.3@288

 18257   77427  515421

tags/1.4.10

 18261   78017  524249

tags/1.4.59

 18158   77486  520272

tags/1.4.60

 18158   77486  520272

tags/1.4.60@1399

 18158   77486  520272

tags/1.4.61

 18158   77486  520272

tags/1.4.61@1504

 18158   77486  520272

tags/1.4.62

 18158   77486  520272

tags/1.4.62@2104

 18158   77486  520272

tags/1.5.0

  5667   22620  165036

tags/1.5.0@2941

  5667   22620  165036

tags/1.5.1

  4357   17759  127102

tags/1.5.1@3391

  4357   17759  127102

tags/1.5.2

  3920   15766  113373

tags/1.5.2@3587

  3920   15766  113373

tags/1.5.3

  4045   16322  117623

tags/1.5.3@3776

  4045   16322  117623

tags/1.6.0

  4974   19823  144457

tags/1.6.0@4621

  4974   19823  144457

tags/1.6.1

  4986   19881  144836

tags/1.6.1@4846

  4986   19881  144836

tags/1.6.2

  4986   19881  144836

tags/1.6.2@5035

  4986   19881  144836

tags/1.6.3

  4986   19881  144836

tags/1.6.3@5110

  4986   19881  144836

tags/1.6.4

  4986   19881  144836

tags/1.6.4@5112

  4986   19881  144836

tags/1.6.4@5189

  4974   19823  144457

So, it looks like 1.5.2 has significantly fewer changes than any other tag. The diff to 1.5.2 is attached to this email. It basically looked like 48 new files were added, 0 were removed and many were modified.

Once I had established which GWT SVN tag was most similar, it was trivial to diff the branches and get meaningful output:


olpc@OLPC:~/workspace-gsoc/svn/user/super/com/google/gwt/emul/java$ git-diff --relative cvsbranch tags/1.5.2 > 1-5-2.diff

I won’t post the resulting diff here, but suffice it to say that it was exactly what I was looking for, and I am now moving onto the task of analyzing the diff to figure out exactly what was changed, and why.

My final impressions of git: I’m very happy with it. The ability to import an entire SVN repo so that it can be used offline; the ease with which one may branch and merge; and the ability to push changes back up into the svn repo, are all things that will be very useful to me. I think I’m actually going to get to use Git a lot on my GSoC project.

0 Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

This work is licensed under GPL - 2009 | Powered by Wordpress using the theme aav1