Phil Hagelberg <phil@hagelb.org> writes: > We had some folks pipe up in the #clojure channel today that were > interested in helping out with the Clojars codebase. It would be great > to come up with a list of things that could get hacked on; maybe a > plan for attacking the higher-priority issues. I'd like to just get > the ball rolling for that. > > My top hits would be: > > #1: take advantage of pom.xml inside jars if present > #23: use lucene for searching (could steal code from lein-search for this) > #24: keep snapshots and releases in separate repositories > #2: browse interface > #5: display dependencies (and possibly project.clj) on show page All good ideas. A couple more: * Something better for the account stuff. A "password reset" email thing would be a good start. (Or maybe even OpenID.) * Some sort of integration with one or more of the various documentation sites that have sprung up. > But of course if there's something specific that catches peoples' eye, > then who am I to tell you what to do? =) I may factor out the lucene > code from lein-search into its own project. > > Of course, writing new code is only helpful if it can be deployed. I > remember talk of migrating off sqlite. As of the last discussion on > this list that was written but not deployed. But the code in > /home/clojars/prod seems to be up to date with the latest master > branch, so is it currently running against couch? I ditched the couchdb attempt. It was taking too long, added lots of dependencies (Erlang etc) and stuff that needed configuring for little benefit. I was also having problems with couchdb crashing, although that's no doubt been fixed by now. The "prod" branch from my github is what's in production, it should be identical to "master" most of the time. > If development were to proceed, should it happen from master? Correct. > There's also the question of deploying in general--what's the process > for that? How are the processes daemonized? There's two identical instances of the application running "clojars" and "clojars-backup". They're daemonized by Upstart (Ubuntu's replacement for /etc/init.d) and set to kill -9 suicide (and thus be respawned by Usptart) on Java out of memory errors (which never really happens, it's just a safety habit). The app runs with an embedded Jetty from the uberjar generated by lein. The command looks kind of hideous but it's really just java -jar with a bunch of extra logging options enabled. $ cat /etc/init/clojars.conf description "Clojars webapp (production)" respawn start on filesystem stop on shutdown chdir /home/clojars/prod exec su clojars -c 'java -Dnla.node=clojars -Xmx32m -server "-XX:OnOutOfMemoryError=kill -9 %p" -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintGCDateStamps -jar current-standalone.jar 8001 7601 2>&1 | /usr/bin/cronolog -S /logs/clojars.log /logs/%Y%m/clojars.%Y-%m-%d.log' The stdout/err output is piped into cronolog for log rotation. The two port numbers passed to -main are the web port (8001) and the nailgun port for the SSH integration (7601). The backup instance uses 8002 and 7602. You can stop and start clojars just like any other service on Ubuntu: sudo stop clojars sudo start clojars sudo restart clojars sudo status clojars In normal operation clojars-backup is not hit at all. It's mainly there as a safety measure in case the main app hangs, so that you can do a sanity check when deploying before going live and so that you can do an outage-free deploy (although logged in users will lose their sessions as they're not stored in the DB currently). The process of deploying is just to pull from git, re-uberjar and then restart clojars-backup. If it looks OK, then restart the primary instance. I just use this shell script: ato@clojars:~/bin$ cat deploy-clojars #!/bin/bash set -e cd /home/clojars/prod sudo -H -u clojars git pull sudo -H -u clojars ~clojars/bin/lein uberjar sudo restart clojars-backup echo "Test changes at: http://clojars.org:8002/" echo "If ok run: sudo restart clojars" The web site is fronted by nginx which handles the failover between the two instances and the serving of static content and repository itself. This means that even if the webapp is down people can still download stuff from the repository. $ cat /etc/nginx/sites-available/clojars upstream clojars-web { server localhost:8001 max_fails=3; server localhost:8002 max_fails=3 backup; } server { listen 80; server_name clojars.org; root /home/clojars/prod/public; access_log /var/log/nginx/clojars.access.log; location / { # try static content first, then fall through to the webapp try_files $uri @clojars_webapp; } location /repo { root /home/clojars; autoindex on; } location @clojars_webapp { proxy_pass http://clojars-web; } ## ## Linked repositories ## location /repo/org/clojure { rewrite ^/repo/(.*)$ http://build.clojure.org/releases/$1 permanent; if ($uri ~ ".*-SNAPSHOT/.*") { rewrite ^/repo/(.*)$ http://build.clojure.org/snapshots/$1 permanent; } } location /repo/org/xerial { rewrite ^/repo/(.*)$ http://www.xerial.org/maven/repository/artifact/$1 permanent; } } Finally that leaves the nailgun/scp socket. For failover of that I use a deliciously simple TCP load balancer called 'balance'. http://www.inlab.de/balance.html Again that just runs out of Upstart: $ cat /etc/init/clojars-scp-balance.conf description "Clojars scp balancer (production)" respawn start on filesystem stop on shutdown chdir /home/clojars exec balance -b 127.0.0.1 8700 localhost:7601 ! localhost:7602 The generated authorized_keys files for the clojars user points the nailgun client at port 8700. /etc/ssh/sshd_config turns off password prompts for the clojars user. Match User clojars,root PasswordAuthentication no The Lucene indexing stuff is Sonatype's CLI nexus-indexer. I just run it out of cron: # crontab -u clojars -l # m h dom mon dow command */15 * * * * java -jar ~/indexer/nexus-indexer-2.0.4-cli.jar -n clojars -i ~/indexer/index -d ~/repo/.index -r ~/repo -s -q -t min -l Documented here: https://docs.sonatype.org/display/M2ECLIPSE/Nexus+Indexer http://www.sonatype.com/people/2009/06/nexus-indexer-api-part-1/ > Hugo mentioned being willing to help automate the deployment using > Pallet--that would certainly make it easier for people to test out > their changes by deploying to a local virtualbox. Pallet is essentially something like Chef/Puppet right? Mmm. I didn't originally see the point in Pallet for Clojars. It's not like we're ever going to need to spin up multiple servers for load reasons. However your use case does make a lot of sense. Saves messing with the SSH and nginx config on your development computer. The setup procedure is going to be something like this: aptitude install openjdk-6-jdk nginx balance sqlite3 cronolog nailgun # install leinigen adduser clojars cd /home/clojars mkdir -p data repo .ssh git clone https://github.com/ato/clojars-web.git prod cd prod lein uberjar ln -s ../data/auth_keys /home/clojars/.ssh/authorized_keys sqlite3 /home/clojars/data/db < clojars.sql Then chuck in the nginx, cron, SSH and upstart config I mentioned above. > If this gets cleaned up and documented, it might even make a good > resource for documenting how to deploy and run a Clojure webapp in > general, since that's something that seems to be not very > well-understood in general. I'm not sure whether this is a good way of deploying a Clojure webapp or not. I haven't put a huge amount of thought into it. I'd certainly be interested if anyone's got any comments. The traditional Java model is with an external servlet container (Tomcat, Jetty etc) and WAR files. This works quite well in a large shop with a dedicated ops team, lots of monitoring, custom automation and such but is not exactly simple or friendly for those without a Java background. An uberjar that just calls run-jetty like I do for Clojars is easy for users, but requires a bit of effort from the developer, particular if you want to add important config options like the bind address, port, path and "devel/test/prod" environment settings. Personally I'd like to see someone do a Ring equivalent of Ruby's "rackup". Just a simple little thing provides your basic command-line options, starts an embedded jetty and can be used as a uberjar main class.