Saturday, October 30, 2004

Clarification

The non existant directories in the subversion db are where there are no files in the cvs repository, only an attic directory (or a directory tree ending in one).


SHHPOP!

That is the noise made by prying my eyes away from the political blogs. I feel much better :)

Some further tests of subversion. Here I compare a checkout from a Kde repository on a local drive, and a subversion database on the same drive. The subversion db has only kdelibs. The working copies are in two different directories.

$ time svn checkout file:///home/extra/kde-svn/trunk
...
Checked out revision 55637.

real    4m33.469s
user    0m31.633s
sys     0m15.543s

$ time cvs -d /home/www/kde_repository checkout kdelibs
...
real    1m31.395s
user    0m2.975s
sys     0m5.156s

So at least using a local checkout, Subversion is quite a bit slower. Would accessing a server show similar results?

The next test will be doing a diff of the two checked out working copies. A quick check shows missing subdirectories, but I will research further.


Wednesday, October 27, 2004

More Subversion

Well, this is going to be interesting.

cvs2svn (http://cvs2svn.tigris.org/) is a python script that converts cvs repositories to svn format. I ran the script to create a kdelibs subversion repository. It goes through the cvs data and builds a list of atomic commits. Then it checks for tags and branches. make_it_cool seems a popular branch and tag name. With some command-line switches all this is resolved. It then commits to the subversion repository.

Cvs keeps branches and tags in the same file. Each change is a patch to the previous version. A tag is simply a version number, ie. KDE_3_3_RELEASE refers to a version number in each file. A branch is again a patch to the previous version. In Subversion, each branch is a separate tree. Doing a list command on the kdelibs module gives us:

# svn list file:///home/extra/kde-svn/
branches/
tags/
trunk/

trunk is where HEAD is kept. tags contains all the tags defined in the cvs repository, ie.

# svn list file:///home/extra/kde-svn/tags
ARTS_1_0_1_RELEASE/
ARTS_1_0_2_RELEASE/
ARTS_1_0_3_RELEASE/
...

Each tag contains all the files (or pointers to, not sure) with that tag. The same in the branches directory. There are quite a number of branches that will need to be cleaned up, ie. unlabeled-1.98.2/. In other words, for Kde to move to Subversion won't be a matter of simply running the convert script and living happily ever after. There will need to be a fair bit of cleanup.

Subversion contains some nice features that will be very useful. The list command gives a directory listing of the repository, so you can browse the directory structure, view the logs, diffs and cat files without checking them out. In cvs, each file starts with version 1.1, and each time the file is changed a version number is assigned to it, and to it only. If there are 3 revisions, they are 1.1, 1.2, 1.3. With subversion, if you commit a change to one file, the whole repository version is incremented. So individual file revision numbers are not consecutive. A file may have a version 20456, 30876 and 56876. It will take some getting used to.

I may have some of the terminology wrong, so don't quote me. I'm figuring this stuff out as I go.


Tuesday, October 26, 2004

Doofuses of the World UNITE!

I humbly submit that the majority of the developer base are doofuses.

I would even submit that many of the current developers were attracted to Kde because of its infinite flexibility. No developer desktop would even be remotely usable by anyone else.

I will outrageously submit that the metric that is important isn't user market share (although the various distributions that feature Kde may disagree). The important metric is developer share. After all, without developer share none of this would exist. We would have to argue over bicycle wheel clickers.

I submit as proof the fact that we have four media players in the Kde repository while the invoice template in Kspread doesn't even calculate the total, ie. quantity * price = total :)


Saturday, October 23, 2004

Subversive journey

Instead of twiddling my thumbs waiting for cvsup to do it's magic, I installed subversion and started reading the documentation. Yes, reading the manual! Two distinct advantages became apparent. Advantages from my standpoint at least. Maybe I'll describe what convolutions and hacks are involved in generating the digest.

When a developer commits a change, seldom does it involve only one file. Usually two or more files have been touched. So cvs gets the changes, commits and promptly forgets any connection. It becomes a new version in each file. We Digest readers want to see the whole thing together. How can the commit be reassembled? The magic happens in enzyme_index_cvs.pl, a perl script that goes through the whole repository, reassembles the commits and spits out index files for each module. Prior to the latest cvs software upgrade, this was easy as each commit had the same timestamp. This is no longer the case, and a bit more logic is required. It checks the next commit (timewise) in the module for similarity, and builds the index. It works reasonably well, although there was a bug last week (which I haven't fixed). When we look for a specific commit, the php scripts search through the index for the file and version number, gets a list of files and version numbers (and a bit of other stuff) and voila, an atomic commit. Still with me? If this sounds complicated, try reading the perl scripts.

Subversion stores all the changes in any number of files as one commit, one version number increment. So no more unmaintainable perl hacks. This is good.

Cvs has two substantially different ways of accessing the repository. The most familiar way is the client-server mode, where cvs up, or cvs co, or cvs diff works with your local working copy and the remote repository. In this mode, a local working copy of the code is required. If I want a cvs log of /foo/bar/foo.h, I need path_to_working_copy/foo/bar/foo.h on my local drive. But if I have access to the repository, I can rlog path_to_repository/foo/bar/foo.h,v to get what I want.

So I could either have 4 gb of working copy and a relatively slow link to a remote cvs repository, or a 12 gb repository with fast access. The indexing scheme described above takes hours with local access. It would probably take days to complete using a remote repository. So the only choice was a local repository. Hence the problems updating the local copy, ad nauseum. And the very large hosting requirements for the Digest.

Subversion doesn't need a local working copy of a file to get a diff or log. So presumably the digest could be built using a remote repository. Did I say anything negative about subversion? Or express bitterness? I feel much better about this whole situation. More tests to come.


Very late digest

It seems cvs is conspiring it's own demise. Something happened to make the cvsup process bork, so as we speak my local repository is getting updated. Thursday evening before bed I started the update, then found out friday afternoon that it had failed. From the looks of things, it is around 40% complete after running all night. So expect the digest Sunday morning


Friday, October 22, 2004

Some progress

Many people have written with suggestions of what they would like to see in the weekly Digest. Some have been implemented, others are on the way.

I added google search which allows searching all the archives and diffs. Search engine crawlers total about 15% of the traffic, so we might as well use it.

I plan to move to the new layout in 4 to 6 weeks, so any complaints, improvements and flames are welcome.


Note to self...

To quote Jeremy Bernstein

Never speak more clearly than you think.


Sunday, October 17, 2004

A modest proposal

Richard Dale pointed us to Characterizing People as Non-Linear, First-Order Components in Software Development, a worthwhile read on the process of software development. Two points struck me that may help here. First is that people are 'Good at looking around', meaning if a task is available and they know how to do it, they probably will.

Maybe the problem with bugzilla is that except for a few, it is a black box. Bugs in, not much visible or quantifiable out. How does a bug get fixed anyways? Who does it, how, what if the bug report is unclear, etc. We learn best by watching, not reading directions. My reaction whenever I look there is ACK! I'm outta here!. So what if there was something similar to the CVS-Digest for bugzilla? The report would include statistics and trends that can be computed from the data. But more importantly highlighting the interactions and flow that takes place. Reporter developer conversations where a difficult bug is tracked down. Readers would get an idea of how the system works, and would be more likely to participate. There have been a few individuals who started contributing to Kde after reading the CVS-Digest. Looking at the highlighted patches you see that it isn't that hard. Seeing the dauntingly enormous task of building a desktop, and seeing many people working diligently catches the eye, and moves people to jump in. If someone is willing to work on such a report, I am willing to help host it, or include it in the weekly digest.


Saturday, October 16, 2004

For the right index finger impaired

For the many who pleaded with me to keep a one page digest format, your wishes have become reality.

cvs-digest.org/index.php?issue=oct152004&all

If you select this option, you will get a cookie, and next time the one page format will show up. If you want to go back to the broken up format, append &clearall or select the 'Not All in One Page' link. The cookie will be cleared and you will rejoin the throngs who enjoy their data in smaller pieces.

Other issues have been fixed such as the non-compliance to any standard of html, the horizontal scrolling and a few other things. I figured I'd better get things in reasonable shape before embarking on an exploration of subversion.


Wednesday, October 13, 2004

Ink

Nancie mentioned that she could tell when I refilled the ink cartridge for our printer by the ink spots on the calendar.

I don't know what happened last time. There is ink on the light fixture and on the calendar 5 feet away. Something to do with liquid, high pressure, small orifice and a heavy hand. My parsimonious soul rebels at paying $45 for 20 ml of black ink, hence the attempted refills. So we now are proud owners of a Samsung 1740, an inexpensive laser printer. No more liquid ink. Hmmm. They sell toner refill kits on ebay...


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]