Saturday, October 23, 2004
Instead of twiddling my thumbs waiting for cvsup to do it's magic, I installed subversion and started reading the documentation. Yes, reading the manual! Two distinct advantages became apparent. Advantages from my standpoint at least. Maybe I'll describe what convolutions and hacks are involved in generating the digest.
When a developer commits a change, seldom does it involve only one file. Usually two or more files have been touched. So cvs gets the changes, commits and promptly forgets any connection. It becomes a new version in each file. We Digest readers want to see the whole thing together. How can the commit be reassembled? The magic happens in enzyme_index_cvs.pl, a perl script that goes through the whole repository, reassembles the commits and spits out index files for each module. Prior to the latest cvs software upgrade, this was easy as each commit had the same timestamp. This is no longer the case, and a bit more logic is required. It checks the next commit (timewise) in the module for similarity, and builds the index. It works reasonably well, although there was a bug last week (which I haven't fixed). When we look for a specific commit, the php scripts search through the index for the file and version number, gets a list of files and version numbers (and a bit of other stuff) and voila, an atomic commit. Still with me? If this sounds complicated, try reading the perl scripts.
Subversion stores all the changes in any number of files as one commit, one version number increment. So no more unmaintainable perl hacks. This is good.
Cvs has two substantially different ways of accessing the repository. The most familiar way is the client-server mode, where cvs up, or cvs co, or cvs diff works with your local working copy and the remote repository. In this mode, a local working copy of the code is required. If I want a cvs log of /foo/bar/foo.h, I need path_to_working_copy/foo/bar/foo.h on my local drive. But if I have access to the repository, I can rlog path_to_repository/foo/bar/foo.h,v to get what I want.
So I could either have 4 gb of working copy and a relatively slow link to a remote cvs repository, or a 12 gb repository with fast access. The indexing scheme described above takes hours with local access. It would probably take days to complete using a remote repository. So the only choice was a local repository. Hence the problems updating the local copy, ad nauseum. And the very large hosting requirements for the Digest.
Subversion doesn't need a local working copy of a file to get a diff or log. So presumably the digest could be built using a remote repository. Did I say anything negative about subversion? Or express bitterness? I feel much better about this whole situation. More tests to come.
What I said was: Wouldn't all that remote log/diff'ing take days in SVN just like it does in CVS? Actually I'm not sure why it would be so slow on CVS, so maybe I'm misunderstanding something.
Maybe what you need is a shell account on a machine with fast access and enough space (e.g. KTown).
To index it locally takes about 6 hours on a reasonably fast machine.
I couldn't see a way of doing it and maintaining good relationships with the server maintainer :)
I have a SEO tips site. It pretty much covers plant you site at the top
Thanks again and keep up the good work.
Subscribe to Post Comments [Atom]
Links to this post:
Subscribe to Posts [Atom]