Charles Engelke's Blog

July 25, 2008

OSCON Resources

Filed under: OSCON 2008 — Charles Engelke @ 8:27 pm
Tags: , ,

Presentation materials for many talks are now available on the OSCON site.  I expect more will show up in the coming days.  Some conference videos are available, too, and again, I expect more to show up soon.  I can highly recommend Robert Lefkowitz’s keynote video.

Closing day at OSCON

Filed under: OSCON 2008 — Charles Engelke @ 1:44 pm
Tags: , ,

The last day of the conference is a short one, just keynotes (or plenary sessions, I guess) and two technical sessions.  But it’s a strong one.  One of the keynotes was from Tim Bray who raised all sorts of interesting questions about where programming languages are going.  He didn’t have the answers, just questions.

Bray’s session was the only one that I found really interesting, but lots of the attendees were more interested in Sam Ramji‘s talk.  He’s from Microsoft, and was a target of plenty of pointless Microsoft bashing by “questioners”.  Microsoft is taking good actions now, and the only sensible thing to do is wait and watch its behavior going forward, instead of announcing that it will behave badly no matter what.

The sessions I’m going to are Ray Fielding‘s talk on Open Architecture at REST, and Damian Conway‘s The Twilight Perl.  I think they’ll be a very strong close to a good conference.

I skipped the last two OSCONs because the one three years ago was practically worthless, but the program looked so good this year that I gave it another chance.  I’m glad I did.

But I still wish they’d have a more full last day.  It’s short so the west coast folks can catch a late flight home, but the majority of attendees have to fly east, and can’t do that until early tomorrow morning.

July 24, 2008

Closing out the OSCON day

Filed under: OSCON 2008 — Charles Engelke @ 9:31 pm
Tags: , ,

I’m waiting for the evening Perl talks to start, especially Larry Wall’s State of the Onion.  This afternoon I alternated between mostly technical and mostly entertaining talks, though they were all a mix of the two.

I started with Sam Ruby‘s talk about what to expect in Ruby 1.9.  It was a lot of details of his experiences testing existing Ruby modules with the new developers’ release.  There are some small syntactic clean-ups in the language that break a lot of modules.  Most of the fixes are easy, but the module maintainers aren’t always responsive to applying them.  That’s a bit worrying about the way the Ruby language is being handled.

I then went over to Robert Lefkowitz’s talk on open source as a liberal art.  It’s not really something I can put in a nutshell, but it was broad-ranging, thought-provoking, and fun.

After the snack break I learned about the new options for pluggable storage engines for MySql 5.1 from Peter Zaitsev.  He literally wrote the book on high performance MySql, so he really knows his stuff.  I learned a lot, but also ended up with more questions than I started (because I now know enough to ask them).

I then went to the second half of Perl Lightning Talks.  I always love these five minute presentations.  I saw two astounding talks in a row using the new Parrot compiler tools.  The first one defined a brand new language and created a compiler for it.  The second one build an Apache module to run that language persistently.  Each talk completed the entire task and demonstrated the result in their five minute allotment.

Now I’m at the State of the Onion talk, and learning what Larry thinks about Perl 6.  He’s breaking his tradition by actually giving a technical talk about Perl, but he can’t help but be interesting.

Thursday morning at OSCON

Filed under: OSCON 2008 — Charles Engelke @ 4:50 pm
Tags: , ,

I went from Schwern‘s Skimmable Code talk to one on HDFS Under the Hood, given by Sanjay Radai of Yahoo!  It’s a complex topic, and he did a good job of making it all clear.  I don’t think we have a need for HDFS any time soon, but some of the concepts he showed us might fit our needs soon.

At that was my morning.  It’s going to be long afternoon (until about 8:00PM according to the schedule).

Skimmable Code

Filed under: OSCON 2008 — Charles Engelke @ 1:59 pm
Tags: , ,

Schwern is talking about how to make code easier to read.  Some simple things can make your code much more comprehensible, just like addingspacesbetweenwordsmakesiteasiertoreadnaturallanguage.

Lexical encapsulation – basically, making sure that everything related to the current lines of code are visible on the same page (editor window).  Lexical scope is does this for you, so subroutines (which create a new scope) are useful for more than just code reuse.  They also help readability.

This is not news to me.  Back in the 1980s when I taught introduction to programming at UF I instituted an iron-clad rule: no procedure or function could be more than 25 lines of code long.  If any of them even had a 26th line, you lost a third of the score for the assignment.  Have a 31st line?  Lose two-thirds.

What a lot of complaining that led to!  And what a bunch of terribly illogical code decomposition I saw.  But brand new programmers needed that discipline, and by the end of the course their code was much cleaner, easier to read, and (not incidentally) much more likely to work right.  The best programmer I know was a student back then, and told me it was probably the single most useful thing I ever taught.

No links in this post.  Sorry, but the conference network is so bad I can’t even search and open pages to find the right links.  WordPress.com handles this really well, saving my content pretty much continuously even though slowly.

July 23, 2008

First OSCON Afternoon

Filed under: OSCON 2008 — Charles Engelke @ 7:49 pm
Tags: , ,

I had a conference call I had to make during the lunch break, but it ended early so I was able to eat.  The food was okay, but the dessert was a berry tart that was great.

But I’m here to learn and be inspired (and not about food) so the rest of the day was mostly attending sessions.  I attended a talk on Code Reviews for Fun and ProfitAlex Martelli of Google was interesting and full of advice.  I think we’re just at the right point to get the most value from his pointers, and we’re going to implement some of them pretty soon if I can convince a few folks.  His slides are available for download; get them and browse them.  He’s got pointers to even more good resources on code reviews.  He even told us that the best book he knows on the subject is available for free from smartbearsoftware.com.  I’ve ordered it (it really is free) and I’ll post about it when I get it.

During the short gap between sessions I moved downstairs for the talk about Hypertable.  This is an open source project to create a tool similar to Google’s bigtable, and is almost complete (it’s in alpha right now).  There’s a lot of interest in this; more than OSCON expected.  Every chair was taken, and I heard that there were at least 30 people left in the hall that wanted to attend.  The talk was interesting, but I certainly didn’t absorb anywhere near all the material.  Instead, I’ll be researching this further online.  The performance test results they’re getting are amazing.

We had a longer break to give us time to visit the exhibit hall and get some snacks.  The hall was pretty big and there were lots of t-shirt giveaways, just like at RailsConf.  I still view this as a tech economy indicator.  For at least open source related efforts, it’s booming.

Now I’m learning about Google’s open source efforts.  There’s a lot going on there and we’re all reaping the benefits.  After this I’m going to head downstairs again for my final session of the day, an “Illustrated History of Failure.”  And then I’ll try to integrate all the stuff I’ve been exposed to today.  This has been a diverse and very good program so far.

OSCON Begins

Filed under: OSCON 2008 — Charles Engelke @ 4:49 pm
Tags: , ,

The tutorial days are over, and the conference is beginning.  Actually, it began last night with some open source community awards and talks by Mark Shuttleworth, Robert (r0ml) Lefkowitz, and Damian Conway.  I don’t know why they do that at night (until 10:00 PM, or effectively 1:00 AM for us east coast folks) but they were really good.

This morning began with the typical OSCON sponsor keynotes.  These can be good but generally haven’t been at other OSCONs, and I feel their existence is actually disrespectful to paying attendees.  Our attention is not a commodity to be delivered to sponsors.  At RailsConf this year I saw how the conference got the sponsors and attendees together to benefit both sides, and was impressed.  OSCON doesn’t have a good history of doing that, even though the same company manages both conferences.  I wasn’t feeling well, so I slept in a bit and skipped those sponsored keynotes.  My colleagues tell me that I didn’t miss anything.

The first actual session didn’t start until 10:45 AM, and I’m at it now.  I’m starting with a panel discussion on open education.  The participants are extremely diverse, very knowledgeable, and interesting speakers.  The topic doesn’t directly relate to my own work, but it’s important.  The vast majority of people in the world are extremely poor.  They’re just as smart and capable as those of us in the wealthy countries, but don’t have the resources we have that let us prosper.  Knowledge is one of those resources they need, and the foundation of many other needed things.  Computers and the Internet can have a great impact on that, but only if the information itself is available and accessible.  The panel participants are working on making that happen.

My second session (conveniently in the same room) is on Metaprogramming in Ruby.  Now this is extremely germane to my work, so I hope it’s a great talk.  It’s starting well, and deserves my full attention.

And it kept going well and getting better.  Brian Sam-Bodden of Integrallis is a great speaker.  I loved the way he showed live demos – by recording them and including the recordings (at a nice, reasonable speed) in his slides, instead of jumping from window to window.  He’ll be posting his slides on his company’s website soon (probably on this page), and I recommend you go find them.

July 22, 2008

Catalyst: 21st Century Perl Web Development

Filed under: OSCON 2008 — Charles Engelke @ 5:19 pm
Tags: , ,

My last tutorial session is on Catalyst, a Perl MVC web framework, given by Matt Trout.  I’ve bought a book on Catalyst, but haven’t yet read it.  As a Perl programmer I’d normally be extremely excited about using Catalyst, but I’m pretty well sure that Ruby on Rails will fit my next few projects better.  And Ruby on Rails is just such an active technology, growing by leaps and bounds, that there’s an enormous set of resources available for it.  So this talk should be interesting for me, but its usefulness is likely to be seeing concepts I’ll apply elsewhere instead of directly learning Catalyst.

Catalyst leverages lots of tools already on CPAN instead of being one big tightly coupled environment.  It’s MVC by default, but doesn’t have to be.

The approach given in this talk is interesting.  It’s the exact opposite of how introductory Ruby on Rails talks usually go.  We’re jumping right to low-level internals, instead of starting with the proverbial “view from 10,000 feet”.  We’re seeing a lot of Perl code that uses Catalyst modules, and he’s moving very, very quickly.

I’m already lost, and I know Perl well.  I guess if I want to know about Catalyst I’m going to have to read that book, after all!  I’m following the details, but have no idea how to string them together.  I’ll be slipping out of this talk at the break, unless there’s a huge change soon.  The room is packed; there’s a lot of interest in the subject, but I doubt people are getting what they hoped from the talk.

At least I found three of my four tutorials to be pretty good this year.

Secrets of JavaScript Libraries

Filed under: OSCON 2008 — Charles Engelke @ 2:52 pm
Tags: , ,

Another conference morning, another tutorial.  This time, John Resig of Mozilla is talking about JavaScript libraries.  He’s the creator of jQuery as well as having done a lot of work on Firefox and standards.  He’s written one book and working on another.  So, I’m eagerly anticipating the session.

Libraries have in common: advanced use of JavaScript, cross-browser support, best practices. He likes Prototype, jQuery (surprise!), and base2 (which I hadn’t heard of; it adds missing JavaScript/DOM features to bring all browsers up to a common high level).

JavaScript testing can be painfully simple, so there’s no reason not to do it.  For example, he shows an assert that builds a list of results and then adds that list to the page after the page load is completed.  Delayed tests (asynchronous behavior like timeouts) are harder.

Cross-browser code.

Strategies: pick your browsers, know your enemies, write your code.

To pick browsers, he shows a cost/benefit chart for the most popular browsers.  IE 6 and Opera 9.5 are the only ones with costs exceeding benefit.  IE 6 has the highest cost, but a significant amount of benefit (due to its wide use).  IE 7 has the second highest cost, but much less than IE6, and it has even greater benefit.  Then we have Firefox 3 with a very low cost and very high benefit.  Safari 3 and Opera 9.5 are each pretty low cost, but also pretty low benefit.  Yahoo publishes a list of the level of support they provide to various browsers, based on their assessments of costs versus benefit.  Firefox is “A-grade” on every platform; Opera 9.5 is, too, except on Windows Vista.  The Yahoo list covers more than 99% of the actual users they see hitting their sites.

jQuery supports IE, Firefox, Safari and Opera in their previous (by one), current, and next versions.  The work is always on the current version, but they test both back one and forward one version.  But he notes that this strategy is ignoring market share, which may not be the best approach for others.

Knowing your enemies means know what browser bugs and missing features you’re going to have to work with.  But it also means know the mark-up and external code your pages work with.  And even bug fixes can be an enemy, since they can break the code you had to write to work around the old bug.

Browser bugs are generally the main concern here.  Have a very good test suite so you’ll know if library updates cause problems, and also apply the suite to browser pre-releases.  But what is a bug?  It’s only a bug if it’s an error in following specified behavior.  If you use unspecified functionality and it changes from version to version, it’s still not a bug.

External code can cause problems (or you can cause problems in it).  Working to encapsulate your code much more than JavaScript requires is an important way to protect against this.  For example, don’t extend existing objects; someone else might be trying the same thing.  Example: IE creates properties of form elements for each input field that has an id; the name of the property is the value of the id.  So that sometimes overrides standard properties and methods for those form objects.  Since form designers and JavaScript programmers are often different folks, this is particularly vexing.

JavaScript programmers know not to “sniff” browser versions but instead detect browser capabilities.  Feature simulation is a more advanced way to do this than just object detection.  It makes sure not only that an API is available, but also that it’s working as expected.  Write a function that uses the API and examine the result.  Save a flag based on the result for later use.  You don’t care about the what the function is actually doing (so it should be invisible to users and the rest of your code), you just want to know whether the feature it implements is realiable so you can use it for real application code.  You can do more with this method.  Different browsers can do the same things but different ways.  Feature simulation lets you discover which way will work so you can use that capability.

Interesting example of a browser capability that’s hard to detect: IE 7 introduced a native XMLHttpRequest capability in addition to the ActiveX wrapped method previously used.  But the native one had a bug that the ActiveX one didn’t.  So object detection would lead programmers to use the native one (even though the ActiveX method was still available) and be exposed to a new subtle bug.  And there are some problems that just can’t be discovered, like how some code makes the page actually looks, or whether a particular construct will crash a browser (since you’ll never get the result back if you make that happen).

Writing cross-browser code boils down to reduce the number of assumptions you make.  Ideally, assume nothing and check everything you want to use before using it.  But taken to the extreme, that’s unreasonable, so you have to draw the line at a point that works for you.

We’re taking an early half-break, and he’s planning another one later.  I think that’s smart; three hour-long parts works better for most of us than two 90 minute ones.  He’s also giving us a link to presentation slides.  But they’re not the ones for this talk!  They’re relevant, though.  And he’s giving us a corrected URL to the correct slides.  The slides ask us to keep them private, so I’m not linking to them here.

Good JavaScript Code

If you understand three things, you’ll be able to write good JavaScript: functions, objects, and closures.

Functions can be defined much as in other languages, but also as anonymous functions assigned to variables or properties of objects:

function f(){return true;}
var f = function(){return true;}
window.f = function(){return true;}

Traditional named functions are available throughout the scope, even in code earlier than the definition, but functions assigned to variables or properties are only available after the assignment has executed.

Okay… you can have named anonymous functions (anonymous named functions?):

var ninja = {
   yell: function hello(n){
      return n>0 ? yell(n-1)+"a" : "hiy";
   }
}

But the function name yell is only available within the scope where it’s defined (the definition of ninja).  The property name yell is available whenever you can refer to the object.  They’re the same thing here, but don’t have to remain that way (assignment to other objects, alteration of the ninja object).  The purpose of this is to allow recursion without problems in those cases.  If the function were truly anonymous it would have to call itself relative to the object it’s defined in; but if that’s assigned to a different object, that’s not what we want.

var x = ninja;
x.yell(3);
ninja = {};
x.yell(3);

This works if yell is an anonymous named function.  But if it were truly anonymous the second invocation would fail because it would try to call ninja.yell recursively, which no longer exists.

Since functions are full-fledged objects, they can have properties attached to them.  We see an example of self-memoization, where the function saves prior results and returns them instead of recomputing them when called again with the same arguments.

Functions exist in a context, and you can refer to the context as this.  I’m used to seeing that when the function is a method, and this refers to the object.  But it’s usable even for global named functions (this is the global context, so this.something is the global variable something).  The .call() and .apply() methods for all functions set the context of the function as well as invoke it.

Declarations inside a function are private, but you can make them properties of this to make them externally available:

function Secret(){
var f = function(){true;}
this.g = function(){false;}
}

Outside code can refer to Secret.g, but not Secret.f.  I think.  If I’m following this right.

The last section of the talk covers a wide variety of advanced topics in a lot of small bites.  It’s reminiscent of the extremely advanced Perl talks that Mark-Jason Dominus gives, and it’s interesting, but way above what I expect to be using directly.  I’m much more likely to be the user of a library that requires these techniques inside it, than to write one.

A really excellent, useful talk, even if I did get bogged down a bit at the end.

July 21, 2008

Introduction to Django

Filed under: OSCON 2008,Uncategorized — Charles Engelke @ 7:47 pm
Tags: , ,

Well, this morning I learned about Rails and now I’m moving over to the Python world with and introduction to Django.  Maybe I’ll look at the Perl take on MVC web frameworks tomorrow by attending the Catalyst tutorial; I haven’t decided yet.  This tutorial is being given by Jacob Kaplan-Moss, and he’s put his presentation materials online.

I don’t know much Python, but I did create a reasonably large Google AppEngine project with it and learned as I went.  I ended up liking Python a lot, but there are big holes in my knowledge.  I’m glad I started with the simple AppEngine webapp framework instead of Django because I could see how the pieces fit together more easily.  But now I’m ready to move up.

Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design.

That’s Django’s “elevator pitch”.  Django came out of a small, family run newspaper in Lawrence, Kansas.  That’s pretty interesting for such an important web tool.

We start by actually creating a Django application that will eventually be something useful.  The finished example application is online at cheeserater.com.  It’s as easy to start as rails:

django-admin.py startproject cheeserater
cd cheeserater
manage.py runserver

And it starts a development web server on port 8000.

Models: this is one of the pillars of MVC, of course.  And we won’t be using SQL to define or manage them, also of course.  The speaker makes a great point that one of the (many) problems with SQL is that it’s not very compatible with version control.  Just doing an svn update on your working copy isn’t going to really update it.  It’s still a copy, but it’s probably not still “working”!

Since Django models are Python code, they’re managed like all your other code.  But the code can also manage fixing the database structure when the code changes, just like Rails migrations.  Django doesn’t have migrations, but you can do the same thing manually in it.

Unfortunately, the example application is pretty boring to me, and it uses unreleases Django features so I can’t play along with it.  The framework is powerful and straightforward, but just watching it isn’t as interesting as getting into yourself.

Now Jacob’s going into a rant about URL design.  It’s entertaining, and makes a good point: the URL should be designed from the user’s point of view, not the developer’s one.  So nothing like fetchpage.cgi?number=72.  Instead, it should be something like electioninfo.  Oddly enough (to me) newspapers are particularly bad at this.

Anyway, this means that Django doesn’t specify your URL structure for you, that’s your job.  I like that.  The Rails idea of one right way to do things often appeals to me, but not with the URL structure.  So with Django (and Google AppEngine) you give a list of regular expressions, and the first one that a URL matches is paired with a handler for the URL.

The rest of the tutorial covers the all the different pieces of the framework.  There’s a lot of detail, but conceptually there’s little new here for me.  And once again, I got a lot more out of the first half of the tutorial than the second.  I don’t seem to be able to absorb more than an hour (or maybe an hour and a half) of material unless I’m actively working with the material.  I can’t just watch.

Advanced ActiveRecord

Filed under: OSCON 2008 — Charles Engelke @ 2:58 pm
Tags: , ,

It’s the first morning at OSCON, and I’m taking the tutorial Advanced ActiveRecord.  It’s being given by Gregg Pollack and Jason Seifer of RailsEnvy.  We’re at the mid-session break, so I’ve got a few minutes to jot down my thoughts.

So far, this is really valuable.  We started with looking at the various ways you can use ActiveRecord and relationships between models in your Rails code.  For each approach, we saw how that was actually translated into database queries, and how different approaches in Rails gave very different performance as a result.  Obviously, this will change over time, but the idea of looking at how ActiveRecord methods actually execute will remain valuable.

We then looked at some performance tweaks, mostly consisting of adding indexes.  That’s nothing special for ActiveRecord, per se, but it’s good to see how to do it in Rails and how to notice when you’ll want it.

The rest of the pre-break session was looking at new methods of dealing with polymorphic relationships.  For example, if you’ve got packages that might be transported by car, truck, or bike, and what to know which car, truck, or bike is being used, you’d traditionally relate the package model to three different models, one for each transport type.  Now you can handle this kind of situation with a single model by adding a field (or fields, depending on the situation) and new relationship properties.

I’m not going to give examples here.  I’ve successfully been doing the exercises, but I’m not fluent enough with Rails to explain this coherently yet.  Some exercises and hints relating to this are available online.

After the break:

This is nice.  Rails 2.1 tracks dirty objects (changed but not yet saved).  This makes updates more efficient (it’s smart enough to not have to save all the fields, but only the ones that have changed.  And it not only marks it as dirty or not, it even keeps the old and new values until it has been saved.

One gotcha for that: dirtiness is triggered by assigning a value to a model field.  Change a field some other way and it’s not marked.

Callbacks: don’t override the standard ones (someone else might do it, too).  Add new ones of your own.

Observers: classes that trigger when other classes change.

Unfortunately, I can’t do any of these after-break exercises because I’d have to remember too much detailed syntax that’s new to me.  That’s shown in the slides, but by the time I start the exercises the slides are gone.  The pre-break exercises all had some sample code on-line that helped with this.  It would be an enormous help to have copies of the slides they’re using so I could refer back to them.

Sharding: breaking one logical database into multiple physical ones, with different subsets of related records in each physical database.  Five Runs has released a gem to do this for you.  This generated a lot of discussion this morning about whether this is appropriate for Rails to worry about instead of leaving it all to the database system.  Adam Keys of Five Runs makes a good case for putting it in the application instead (and I’ve run into many cases where putting “database” stuff in my application code instead seems to make things much clearer, more maintainable, and faster, so this rings true to me).

The discussion led to the general concept of how to create a plug-in for Rails.  I think we’re about to have an exercise on this that there’s no chance I’ll be able to do during the talk.  Oh well, I can do them later when I have time to find references on the pieces I need.

All in all, this has been a very worthwhile talk, but it could have been better if:

  • The slides were available, preferably electronically
  • Solutions to the exercises were available for download along with the exercises
  • There was at least some reference material for the last three exercises like there is online for the first three ones.

I know I’ll be using a lot of this material at work soon.

Blog at WordPress.com.