Charles Engelke's Blog

April 15, 2011

MongoDB Windows Service trick

Filed under: Uncategorized — Charles Engelke @ 12:00 pm
Tags: install, mongo, mongodb, service, windows

I just spent a lot of time trying to reinstall MongoDB on my Windows 7 machine because I wanted to turn authentication on. (I don’t really feel a need for authentication in this development environment, but it seems access from outside localhost requires it for Windows.) Every time I installed the service, it seemed to work fine. But when I tried to start the service, I kept getting an error message: The system cannot find the file specified.

What file can’t it find? I could run the server from the command line, why not as a service?

It turns out that the option to install a service apparently reads the name of the executable to install from the command you use to install it. So, if you happen to be in the same directory as your mongod executable, and use the command:

mongod –install –auth –dbpath “somefolder” –directoryperdb –logpath “somefile”

the service is installed with just mongod as the executable, not mongod.exe. In fact, you need to run the installation with the fully qualified filename, including the extension. And put quotes around it if there are any spaces in the path. In my case, that was:

“C:\Program Files\MongoDB\bin\mongod.exe” –install –auth –dbpath “somefolder” –directoryperdb –logpath “somefile”

That worked. I’m putting it here because I found hints of this via searching, but they all left at least one piece I needed out.

Comments (1)

March 9, 2011

Typical gray London day?

Filed under: Uncategorized — Charles Engelke @ 4:15 pm

This is not what I was expecting this week!

St. James's Park across from Buckingham Palace

Comments Off

February 19, 2011

Mediterranean Vacation Pics – Italy

Filed under: Uncategorized — Charles Engelke @ 9:01 pm

More catching up on organizing older vacation photos. Today I started on the pictures from the Mediterranean cruise we took in late 2009. The cruise went from Rome to Athens, visiting many Greek and Turkish ports, Cyprus, and a full day in Egypt to see the pyramids. We spent an extra day in Rome before the cruise, and several days in Athens afterwards.

Digital cameras with large memory cards sure have changed vacation snapshots. Laurie and I apparently took almost 4000 photos over three weeks, so it’s going to take some time to select the reasonably decent ones. Today I sifted through the shots for Rome, Pompeii, and cruising by Stromboli.

In Rome for a day, we walked through the city by the Trevi Fountain and into the Pantheon, perhaps my favorite building:

Interior of the Pantheon

Most of the day we spent walking through the ancient Forum:

Ancient Roman Forum

The cruise’s first port was Sorrento, and we went on a tour to Pompeii:

Pompeii

We had a day at sea on our way to Greece and Turkey, and cruised by Stromboli, one of the few active volcanoes in Italy near Sicily:

Stromboli volcano

Comments Off

February 16, 2011

IE9 and Web Apps

Filed under: Uncategorized — Charles Engelke @ 12:15 pm
Tags: html5, ie, ie9, modern browser

Yesterday, Paul Rouget, a Mozilla tech evangelist, wrote a blog entry stating that IE 9 is not a “modern browser”. Not long after that, Ed Bott tweeted that the post was “surprisingly shrill”. Several folks (including me) responded that the post made important points, and Bott asked for specific examples of real web sites that used the HTML5 features that IE9 is missing. (I’m using “HTML5” to refer not only to the language itself, but also to the new APIs related to it.)

That’s hard to do, especially in a tweet. If the most widely used web browser doesn’t support these features, even in its upcoming newest release, how many mainstream sites can use them? They’ve been added to the HTML5 spec because there are strong use cases for them, and when users have browsers that support them sites can start taking advantage of them. Of course, there are some sites that use these features, but Bott specifically said he didn’t want to hear about pilots or demos, which excludes a lot of them.

There’s a chicken and egg problem here. We can’t make heavy use of HTML5 features in web sites unless web browsers support them, and Ed Bott seems to be saying that the upcoming version of IE9 doesn’t need to support them because they aren’t yet widely used. That kind of problem is part of what stalled HTML and browser advances ten years ago. The WHAT WG didn’t accept that, and pushed for what became HTML5. I think that Google was a major help because it had the resources to improve browsers (first with the non-standard Gears plug-in, later with their standards-based Chrome web browser) in order to be able to develop more sophisticated web applications. Their experimental ChromeOS devices like the CR-48 show that Google is still very interested in the idea that the browser can be an application platform, not just a viewer of web sites.

For me, IE9 is most disappointing because it fails to implement many key HTML5 features that are essential to building good web apps. (I use “web apps” to mean platform independent applications that live and run inside a modern browser, including many mobile browsers.) Yes, IE9 makes a lot of advances and I appreciate them all, but some of what it leaves out is essential and does not seem nearly as hard to implement as some of what they included. Consider some use cases that I actually encounter.

In a traditional web browser no data persists in the browser between separate visits to a web page. If I want to start working on something in my web browser and then finish it later, the browser has to send it to a server to remember it, and when I revisit the page in the future it has to fetch that information back from the server. But what if I don’t want to disclose that information to the server yet? Maybe I’m preparing a tax form, and I don’t want to give a third party a history of all the changes I’m making as I fill it out, I just want to submit the final filled-out form? In a traditional web browser I can only do that if I perform all the work during a single page visit.

If only the browser could store the data I enter within the browser, so I could come back and work on the form over multiple visits without ever disclosing my work in progress. Actually, HTML5 (and related technologies) lets you do that. Web storage (including local storage and session storage), indexed database, and the file system API can each meet that need. (So can web SQL databases, but that approach will likely not be in any final standard.) Of these solutions, only web storage is widely available today. It’s on all major current browsers, including IE8 and IE9. Good for IE.

Now, suppose I want to work on my tax form and I don’t have an internet connection. The data I need is in my browser, so shouldn’t I be able to do this? If my web browser supports application cache, I can. Every major web browser supports this, and most have for the last several versions of them. Except for IE. Not only does IE8 fail to support this, so does IE9. If I try to work on my tax form in IE9 I’ll just get an error message that the page is unavailable. Even though all the functionality of my tax form program lives inside the web browser I can’t get to it unless the server is reachable. That’s a problem for an app. This is my biggest disappointment with IE9, especially since application cache seems like a pretty easy extension of the caching all web browsers, including IE, already do.

But you might ask, so what? This is a web app, and it’s not that big a problem if it only works when the server can be reached. After all, it’s going to have to talk to that server sooner or later in order to submit the tax form. But let’s switch to a different use case. Suppose I want to do some photo editing. The HTML5 canvas API gives me a lot of ways to do that. I gave some talks last summer on HTML5 techniques and built an application that could resize photos and convert color photos to black and white or sepia toned. The whole example took less than an hour to do. This is an application that doesn’t need to ever talk to a server except for the initial installation. It’s something that I could use on my machine with any modern web browser, so I can write it once and use it everywhere. There are two big challenges for this application, though: getting photos into the code in my browser, and then getting the edited photos back out.

There’s no way to do that in an old-fashioned web browser. If I’ve got a binary file on my PC and want to get it to the code in the browser, I have to use a form to upload that file to a server. My browser code can then fetch it back from the server. It goes through the browser to get to the server, but is inaccessible to code running inside the browser. With the HTML5 File API, I no longer have that restriction. I can select a file with a form and the code in the browser can directly read that file instead of sending it to the server. That’s how I get a photo into my application. Every current major browser supports the File API except for IE and Opera. And Opera might add it in their next version (they haven’t said), but IE9 won’t have it.

Once I’ve edited the photo I need to get it back out. What I need is either an img element (so the user can right-click and choose to save the image) or a simple link that user can click to download the image. The problem here is that for either of these methods to work, the photo has to be in a resource with a URL. How do I get it there? In an old fashioned web browser, the code in the browser would send it to a server, which would save it and make it accessible at some specific URL. Once again, my browser ends up having to send something to a server so that the browser code and browser user can share something. With a Data URL, I can create a resource with a URL inside the browser so that no server is needed. Data URLs are a lot older than HTML5 and have been supported in all major browsers. However, until recently IE limited their size so much as to make them not very useful. IE9 does allow large Data URLs, though. Again, good for IE9.

So, for these use cases we need four key technologies: persistent storage in the browser, offline access, reading files, and creating resources and URLs for them in the browser. Every modern web browser supports all of them (assuming the next version of Opera adds the File API). IE9 supports only half of them, and can’t serve either use case.

That’s one reason we should not consider IE9 to be a “modern browser”.

Comments Off

February 13, 2011

Alaska Cruise Pictures

Filed under: Uncategorized — Charles Engelke @ 7:05 pm
Tags: alaska, cruise, zipline

Last weekend I did our taxes. This weekend I organized photos from the Alaska cruise we took in July and August 2009 and posted selected ones on my Picasa web albums page.

Morning in Skagway

They’re organized by port; we visited Ketchikan, Skagway, Valdez, Seward, Kodiak, Hoonah, and Juneau. There are also photos from our day in Glacier Bay, and the Princess Cruise’s Chef’s Table dinner during a day at sea.

Overlooking Glacier Bay

One truly bizarre thing about this cruise was that Laurie and I were among the most active folks on the ship. We went ziplining in Ketchikan and Juneau:

Zip lining near Ketchikan

Rock climbing near Skagway:

Rock climbing

And hiked on a Glacier near Valdez:

Glacier hike

There was one other passenger along with us for one ziplining outing, one couple for the other zipline, and that passenger and couple for the rock climbing. The glacier hike was better attended, though.

Comments Off

February 9, 2011

Source Control Basics, by Example

Filed under: Uncategorized — Charles Engelke @ 3:52 pm
Tags: managers, source control, subversion

Many non-developers understand the value of source code and realize that a source control system such as Subversion is extremely important, but don’t really understand how it should be used. To a lot of people, it’s just a safe used to lock up this important asset. But really, it’s a much more valuable tool than just a safe. I’m going to try to describe how it can be used to aid release management, support, and maintenance of products by example. These examples use Subversion, but the general principles apply to most source control systems.

Core principles

Subversion doesn’t manage each file, it works on an entire directory tree of a files at a time. That’s a good match for source code. If you start with an empty subversion repository, you can check it out to a working copy on your own computer, and then start adding your source files and directories to that working copy.

repository: the area on a Subversion server where every version of your source code directory tree is stored.
working copy: a local folder on your computer where the version of the source code you are working on is kept.

Whenever you want, you can commit your working copy to the repository. In effect, Subversion stores a snapshot of your source code forever. You can get a log showing every version that was ever committed, and you can check out a working copy of any version you want, at any time.

commit: make the Subversion server keep a snapshot of the source code that matches your current working copy.
check out: create a new working copy from any desired snapshot that Subversion has available. Usually this is based on the latest snapshot, but doesn’t have to be.

Subversion simply numbers each version, or revision, sequentially, so you’ll see versions 1, 2, 3, and so on. I recently noticed that one of our six year old projects is up to revision twelve thousand and something. That means that on average, a new snapshot was saved once each business hour over the life of the project.

Before I move on, there are two more points to mention. First, you don’t have to check out and commit the whole repository at a time. You can work with any subdirectory you want. That’s good for dividing up different kinds of work in a project that have little interaction, and it enables the management techniques I’ll be talking about in a minute. Second, you can’t really commit “whenever you want”. You can only commit if nobody else has changed the same files you changed since your last checkout. Otherwise, you need to do another checkout first, and possibly manually resolve any conflicts between your changes and the other folks’ changes. That sounds like a potential problem to a lot of people (including me) but in practice it works great.

Handling a Release

When you’re ready for a release, all you need to do is note the version number you’re building and packaging from. That way, if you need to get that exact code back for support or maintenance, it’s extremely easy. But it could be even easier. Since you can work on subdirectories of your repository instead of the entire thing, just structure it a bit differently. Don’t put your source code at the repository root, but in a subdirectory. That subdirectory is conventionally called the trunk. To do this, when you first create the repository immediately create a subdirectory called trunk. Then instead of ever checking out the whole repository, just check out the trunk subdirectory.

The advantage of this is that you can now create a directory sibling to trunk, which will contain copies of all your releases. By convention, this directory is called tags. When you are ready to release your code, you copy the entire trunk directory tree to a new child of the tags directory. Let’s say this release is going to be for 2.1beta2. The your repository will look something like:

Repository
   |
   +--trunk
   |    |
   |    +--your latest source tree
   |
   +--tags
        |
        +--2.1beta2
              |
              +--snapshot of trunk contents at time of release

Don’t worry about the storage needed to keep this new copy. Remember, Subversion already needs to keep track of every version of your source tree, and it’s smart enough to store this new “copy” of a snapshot using almost no actual storage. But even if it needed to use up enough space for a whole new copy, it would be worth it. Storage is plentiful, and anything that helps you manage the history of your product’s source is priceless.

trunk: the subdirectory of your repository containing the current version of your source code (and every prior version, too).
tags: the subdirectory that contains other subdirectories, each of which is a copy of a particular version of the trunk. Each subdirectory should have a meaningful name, and should never be updated (Subversion allows you to check out and update tags, but you should not do it).

Software Maintenance

Everything up to now is useful, important, well-known and widely followed. But the next step, using source control for more effective software maintenance, seems to be less used, even among seasoned developers I’ve observed. That’s a shame, because it’s easy to do and a big win.

Suppose you released your software a few weeks ago, and now a user reports a bug. How are you going to fix it?

You could use your current working copy of the trunk, find the problem, fix it, and then do a build and package from that working copy. Wait! You’re using tags now, so you create a new tag that’s a copy of the trunk, and then build and release from that tag.

What’s wrong with that? Well, your new release doesn’t contain the fixed version of the old release, it contains a fixed version of your trunk. And that trunk probably has had all sorts of changes made to it in the weeks following the release that contained the bug. It probably has some new errors in it. It may even have partially finished new functions and other changes in it. Even if you work hard to make every build green (passing all tests), you are risking pushing out new errors as you fix the old one.

What you should do instead is make the fix to the exact code you released (which is available in the tag). Then you’ll know that the only changes between the prior release and your new corrected release were those needed to repair the reported problems. New functions, restructured code, and other changes that you need to be making in the trunk, won’t affect the bug fix release.

We want to keep each tag frozen, representing exactly what we released. Sure, we could update it and remember to go back to the proper version when we need to, but its a lot easier to avoid problems if tags aren’t changed. So we deal with maintenance using branches. A branch is pretty much like a tag, except that it is generally a copy of a tag, not the trunk, and it is intended to change.

branch: the repository subdirectory that contains other subdirectories, each of which is a copy of a tag. Each subdirectory will be updated as needed to make fixes in the release represented by the tag.

Specifically, you will create a subdirectory of the repository called branches, then copy the 2.1beta2 tag to a subdirectory of branches. Say you call it 2.1beta2-maintenance. Next, you will check out a working copy from that branch and do your programming work on it to fix the bug. As you work on it you commit your changes, and when everything is ready, copy the latest version of the branch to a new tag, perhaps 2.1beta3 (or even 2.1beta2-patch1). Build the new release from that tag and send it to your users. You’ve fixed their bug with the least possible chance of creating new problems that didn’t already exist in their release.

Merging Fixes

There’s just one big problem. The next time you do a new feature release, from a tag copied from the trunk, your fix won’t be in it. You did all the work on a branch, instead.

Subversion (and other, similar tools) make it easy to solve this problem, too. You can get a report showing every single change you made on the branch, and then use that report to make the same changes to the trunk. In fact, Subversion can even make the same changes for you. This isn’t just copying the changed files from the branch to the trunk, because each of them may have been changed in other ways while you were working on the branch. This is just looking at what was changed in the branch (delete these lines, add these others) and making the same changes to the trunk. With luck, the trunk hasn’t diverged so much that the same changes won’t fix the problem there, too. But if it has, so what? You’re a developer, and using your head to figure out how to make the same effective changes without messing other things up is one of the things you’re being paid for.

Some people really worry a lot about the potential for duplication of effort in making a fix on a branch and then having to recreate the same fix on the trunk. But in reality, this rarely requires any thought at all; the automated tools handle it perfectly. And when they don’t, it’s still just not very hard to do it in both places. This approach to branching and merging works much better than making the whole team roll back their work in progress, or freezing their changes, while you make a fix. And it’s one of the biggest wins in using source control.

Summary

Source control tools like Subversion help you keep on top of exactly what source code went in each and every release. Used properly, they also give you a way to do maintenance fixes with the least possible risk of new problems or errors creeping in. They cost little or nothing to buy, and require very little effort to run, support and use them. There are a lot of other ways they help developers, too (comments on the reason for each revision, seeing what was changed at the same time, and knowing who did what if you have a question). For a manager who wants to know how the team can deal with fixes for multiple releases in an efficient and safe way, understanding tagging, branching and merging as described here are essential.

Comments Off

Last Day at StrataConf

Filed under: Uncategorized — Charles Engelke @ 11:28 am
Tags: strata, strataconf

It’s been almost a week since StrataConf ended, but I’ve been busy recovering from the travel and catching up. Before I forget too much about the last day, though, I want to get my notes down here.

The day opened with a bunch of short “keynotes” again, just like Wednesday, and they were of highly variable value (also just like Wednesday). Ed Boyajian of EnterpriseDB presented nothing but straight marketing material, a commercial that I think influenced no one. But DJ Patil of LinkedIn gave a very interesting talk focused on hiring extremely talented people and helping them do their best work, and Carol McCall of Tenzing Healthcare gave a not only interesting, but inspiring talk about how to start fixing the mess our country has made of healthcare (video here).

The day was shorter than Wednesday, but still pretty long, ending at about 6:00PM. I felt the sessions were, overall, weaker this day than on Wednesday, but they closed extremely strong. The panel on Predicting the Future, chaired by Drew Conway and with short talks from Christopher Ahlberg, Robert McGrew, and Rion Snow, followed discussion, was fantastic. The format of short talks to set the stage for the panel worked great.

All in all, StrataConf was eye opening to me. I had very little background in using data these ways, and now I feel ready to explore much more deeply on my own. Many of the presentations and some videos are available online, and they’re worth a look. And if you ever get a chance to attend a talk by Drew Conway, Joseph Turian, or Hilary Mason, I recommend you take it. They each have a lot of interesting things to say, and they’re very good at saying them.

Comments Off

February 3, 2011

Day One at StrataConf

Filed under: Uncategorized — Charles Engelke @ 3:24 pm
Tags: oreilly, strata, strataconf

The schedule for day one was so packed, and continued until late at night, that I had no time to write anything as it happened. And today, it’ll be just a quick recap.

The day started with keynotes. These were the now standard O’Reilly conference “keynotes” consisting of 10-15 minute presentations, some intrinsically interesting, some little more than infomercials the speakers’ companies paid O’Reilly for. I dislike the format (can you tell?), and would flat out boycott them but there are always a few moments of value in there. And the first day keynotes were no exception:

Hilary Mason (who participated in the prior day’s Data Bootcamp) of bit.ly opened with a breezy, interesting talk. It’s available online now, too. Nothing very deep in only ten minutes, of course.
James Powell of Thomson Reuters gave a talk that wasn’t very interesting, but it was certainly okay.
Mark Madsen‘s talk was fine and light. I liked Hilary Mason’s a lot better.
Werner Vogel of Amazon gave a short informative talk about their web services. It wasn’t a sales pitch, it was actually interesting for its content.
Zane Adams of Microsoft presented the most blatant commercial, including a video from Microsoft’s marketing group that was simply embarrassing.
That was followed by a panel discussion on “Delivering Big Data”. There’s a video available. I didn’t think much of the session; you can’t have a worthwhile panel in ten minutes.
The closing talk wasn’t announced ahead of time. Anthony Goldbloom of Kaggle talked about the $3 million prize for producing a good model for predicting who will need to go to the hospital in the coming year. They acted like this was an announcement of the prize, but it was publicized at least a few days before.

Overall, the keynotes simply weren’t worth the time they took. The sessions later in the day were better. I’m not going to talk about them all, just a few highlights (to my mind):

MAD Skills: a Magnetic, Agile and Deep Approach to Scalable Analytics exposed me to a lot of new tools and techniques. I’ll be following up to learn how to apply them to my own problems.
Small is the New Big, by Kim Rees of Periscopic, piqued my interest. Unfortunately, a lot of her graphics were intended for looking at on your own screen, not a distant projector, so I didn’t get the full effect. I hope she posts her slides so I can see the details better. (Later: she did post them!)
New Developments in Large Data Techniques by Joseph Turian of MetaOptimize was excellent. Though it made me understand that most of my data problems just aren’t at the scale that most of those tools address.
Google Cloud for Data Storage could have been considered a pitch for a bunch of Google products and tools, but I don’t care (and it didn’t seem others did, too). The tools are extremely useful, affordable or free, and really accessible to users new to these areas, like I am. I thought this was a great talk.
Building Data Products with Hadoop by Sam Shah of LinkedIn should not have kept my interest. It was late at night, and I felt that the sessions shouldn’t have been scheduled so late. I was tired, and almost skipped this. I’m glad I didn’t. Very interesting and well presented.

That’s a bit more than half the sessions I attended. The others weren’t bad, but just not as useful or interesting to me as the ones above. I’ll update this later today or tomorrow with links to the material as I get a chance.

Comments (1)

February 1, 2011

Strata Data Bootcamp

Filed under: Uncategorized — Charles Engelke @ 4:18 pm
Tags: bootcamp, strata, strataconf

It’s day one at O’Reilly’s Strata conference, and it’s off to a bit of a rocky start. I’m attending the all day Data Bootcamp tutorial, which was supposed to start at 9:00. So I grabbed a muffin from the hotel, and a cup of tea from the conference, and went to get settled in at 8:30. I figured I’d catch up on email while waiting. Nope. They told the local staff to not open the rooms until 8:45. But the door doesn’t open at 8:45. It actually opens a bit after 9:00, at which point there is a huge line and crush, and everybody trying to get a spot near a power strip since this is supposed to be an all-day hands-on tutorial and it turns out most rows of seats aren’t near power. The session doesn’t start until about 9:15, because the sound equipment (and possibly video, too) aren’t working right.

Eventually we get started. And the first thing they do is put up a slide telling us to download conference materials from a git repository (via the command git clone https://github.com/drewconway/strata_bootcamp.git). The network is already hosed; the download comes at 22KB/s. [Later: the download took almost an hour.]

Within the repository, the initial slides are at slides/intro/viz_intro.pdf. It’s a nice way to lay the foundation and I recommend you take a look. Drew Conway is speaking, and I liked his comment on the Afghanistan slide: “This is taking a complex thing – which is a war – and representing it as a complex thing.” Which doesn’t aid understanding. The philosophy should be:

Make complex ideas simple
Extract small info from big data
Present truth, don’t deceive

We will do hands on work with R and Python.

Okay, we just had the first hands-on tutorial. It did not go great. Flashing code for people to run over four or five slides, advancing very quickly, is not the way for anyone to keep up. I’m a fast typist, but could not keep up. I eventually got most of it. Of course, this is somewhere in the downloaded material from the Git repository, which finally finished downloading after nearly an hour. Now that I have the slides, I could follow the tutorial examples much better next time. I’ll be reviewing them after the conference, because the material seems very good, just not set up to follow in real-time.

On to the session on Image Data, given by Jake Hoffman. The slides are at slides/image_data/image_data.pdf and sample code and data at /code/image_data. The first question to us from the speaker is how many people regularly work with image data? With text data? No surprise, the text data users are a much larger group. But image data isn’t that hard to work with, and is valuable, so we will learn about it. Text data this afternoon.

It’s impossible to keep up with the code examples. This tutorial is not structured clearly and simply enough to do so. So I’m just going to run the sample code from the repository. This is a failure for immediately learning how to do things, but I think it’s a success is learning what I want to go learn on my own. There are interesting concepts that I’ll find useful, but I’m going to have to research and learn them on my own, not here today. It’s not all new to me, though. A speaker question: “how many here have worked with k-nearest neighbors?” Very few hands up. I realize that I should raise mine, because I used it for my CS master’s thesis – more than 25 years ago. I had forgotten. I am ancient.

Now for the lunch break. And as we break, they point us to a download of just the slides and code, but I can’t see it on the screen so I don’t know what it is.

For the afternoon, we start on working with text data. Hilary Mason is speaking, and the slides are at slides/text_data/strata_bootcamp.pptx.

Our first example uses curl from the command line to start getting data from a web server. I wrote a curl cheat-sheet post a while ago, and really like using it. If you want to talk via HTTP and explore as you go, curl is the way to go. The speaker also shows using Beautiful Soup and lynx to grab data.

Now to e-mail. Exchange servers are really hard to work with, but nobody in the room will admit to using one. Most people seem to use GMail or GMail for their own domain. Others use POP and IMAP protocols, which are old, but widely available. “IMAP sucks the least.” And GMail supports it, too. Hilary thanks Google for making GMail accessible with IMAP, an open, though perhaps old-fashioned, protocol. Example code is in code/text_data/email_analysis, and the programs have a dummy account and password baked in to them. That account works today, but probably will be disabled after the workshop. I didn’t want to risk my own account on an open network with it, but looking at the source I see that it is using SSL.

Hilary gives a nice example of Bayes Law. Take a look at it in the slides.

What about classifying email (or web pages)? She gives an example of a cuil search for herself that’s a total disaster. (cuil is long gone; I wrote a post about it and the poor job it did searching for me.)

Clean data > More data > Fancier math

We close this sub-session with running the various sample programs with various test data. Hilary shows how easy it is to create your own “Priority Inbox” feature if you first star some important messages. These general techniques work well here. And a final challenge to us: write a script to figure out who you’re waiting for replies from, and remind them after a certain amount of time.

Back from the afternoon break, a new topic: Big Data by Joseph Adler. His slides are at slides/big_data/big data.pptx (there’s an embedded space there).

The first point: don’t jump to using big data techniques. Small data techniques are easier, so use them unless you can’t. And when you can’t, try to do something to let you use small data techniques. Shrink you data by using fewer variables or few observations. Get a bigger computer. If nothing works, then move to big data methods.

There’s a lot of discussion on statistically valid sampling techniques, so you can run your analyses on a very small subset of your total data, yet still get good answers.

Everything discussed in the Big Data session seems useful, but not particularly new or interesting to me. Solid material, but it didn’t trigger a lot of new connections to my mind.

And now we will close with a mash-up example they put together, plus questions and answers. Most of the panel is participating.

All in all, a worthwhile survey of the information. Not really a bootcamp, and not really hands-on, though.

Comments (1)

January 4, 2011

Catching up on photos

Filed under: Trips — Charles Engelke @ 10:33 pm
Tags: barcelona, montserrat, spring break

I maxed out my free Picasa photo storage in early 2009, and figured I’d wait and see if Google raised the limit before buying storage. Then I pretty much forgot about it.

Which is too bad, because 2009 was Laurie’s sabbatical year and we did more travel than ever before, which led to lots of photos. I started going over them during the Christmas break and decided to catch up posting some of them, and now the first ones are up. Of course, I had to shell out all of $5 to get 20GB of space for year…

First up, our quick spring break trip (before the sabbatical began) to Barcelona. Here’s Laurie in Montserrat on a day trip outside the city:

Montserrat near Barcelona

Comments Off

July 23, 2010

Fantastic Friday at OSCON

Filed under: Uncategorized — Charles Engelke @ 3:43 pm
Tags: oscon

Well, they saved the best for last. In general I found this year’s OSCON to be pretty weak in content, but today has been great. In particular, Simon Wardley’s keynote was excellent (and clearly much longer than it was expected to be, but well worth the time), and Tim Bray’s talk on Practical Concurrency was the best of the conference.

We will close out with what is sure to be an entertaining talk on the world’s worst inventions from Paul Fenwick, and then be on our own for the rest of the day until our red-eye flight home late tonight.

Comments Off

July 22, 2010

Node.js at OSCON

Filed under: Uncategorized — Charles Engelke @ 8:12 pm
Tags: oscon

Tom Hughes-Croucher is going to tell us about Node.js, a JavaScript web server. He starts by offering a doughnut to anyone asking non-awful questions.

Why server-side JavaScript? Well, first there are a lot of JavaScript programmers. Pretty much all web programmers use it, because that’s all they have available on the client. So why not use it on the server, too? And why write things twice, separately for the server and client sides? And progressive enhancement is free (close enough to free).

JavaScript runtimes include V8 (Google) in C++, Spider Monkey (Mozilla) in C++, Rhino (Mozilla) in Java, JavaScript Core (Apple) in C++. V8 is significantly faster than Spider Monkey (at the moment), but Mozilla is coming back with Trace Monkey. Google’s success with V8 has sparked a speed war among JavaScript and browser builders.

Node.js is a server-side JavaScript process that uses V8. It runs on anything POSIX-enough. (May be okay on Cygwin on Windows.) It’s non-blocking and event driven. It uses the CommonJS module format (we’ll find out soon what that means). Node is very fast. It’s almost as fast as nginx, which is all native C and highly optimized.

Here’s some code (I think I got it down right):

var http = require('http');
http.createServer(function (req, res) {
   res.writeHead(200, {'Content-Type': 'text/plain'});
   res.end('Hello, World\n');
}).listen(8124, '127.0.0.1');
console.log('Server started.\n');

There are plenty of packages available for Node.js, which can be installed with NPM, the Node Package Manager. Which is itself written in JavaScript.

He shows more examples, and explains how to use things. A very good session.

Comments Off

JavaScript at OSCON

Filed under: Uncategorized — Charles Engelke @ 3:07 pm
Tags: oscon

I’m starting day 2 here with session on JavaScript. First up is Programming Web Sockets by Sean Sullivan, to be followed by a talk on jQuery UI by Mike Hostetler and Jonathan Sharp.

Web sockets are a lightweight way for web servers and clients to communicate instead of using full HTTP. Think of it as a push technology. We start with an example of a multi-player game. There are two specs to learn: the API and the protocol. As a programmer, we care more about the API, which is how we use the facility. He gives it all on one slide, which I don’t have time to show here. Basically, instantiate a new WebSocket object, set some handlers for various events on it (like receiving and handling data: onopen, onmessage, onerror, onclose), and put data into it with a send method. Eventually, call the close method to stop using the web socket. It does look quite simple.

But how do I program the server-side? That’s more fluid right now; the protocol specification is changing in incompatible ways. On the browser side, we have support in Chrome 4.0.249.0 and later, Safari 5.0, Firefox 4.0. IE 9? Still not known. Apparently (per a tweet from yesterday) Apple used to support web sockets in iOS, but now no longer does. On the server side, there’s an enhancement request in for Apache httpd. There’s a Python extension called pywebsocket available, though. Django supports web sockets, and maybe some Ruby stuff, too. Jetty has it.

No actual coding examples, which is a disappointment to me. We’re finishing early, and have a long gap until the jQuery UI talk (which I think may be pretty full).

It is pretty full, but not standing-room only.

We start with effects, which are pretty ways to show changes on elements. There’s not much meat here. Now we move on to interactions, which are more functional. For example, make an element draggable and attach handlers to various events related to that. There’s also making a list sortable.

This is all great stuff, but I think I see why I have a hard time getting into it. There isn’t a grand core idea here, but instead an enormous number of small, focused helper tools. So a talk like this touches on one and then moves on right away. Forty minutes of that is hard to stay focused on. But now we’re getting a more complete coding example. I don’t know; I like the functionality and appearance, but the necessary coding seems very complex for the examples.

It’s convinced me to use it, anyway.

Comments Off

Android Hands-on at OSCON

Filed under: Uncategorized — Charles Engelke @ 12:37 am
Tags: Android, oscon

Well, I really didn’t think Google would do it (again) but they handed out free phones to everyone attending this three hour evening session. We all got a Nexus One, on AT&T frequencies by default, but you could ask for a T-Mobile one. It only matters for 3G data; 2G data and voice are the same on both.

So now I’ve received four free phones from Google over the last 14 months. I’m using one as a phone, two others for development and sharing with co-workers, and now I have to figure out whether I’m going to migrate to this fourth one, or pass it on to a co-worker, and use it just for development. I’m sure not complaining, though.

I took an Android development class at Google IO in May, but it only last an hour and really just gave us time to take an existing app and uncomment code bit by bit to see what it did. I’m hoping to get a bit deeper tonight.

We’re starting with design principles, not jumping right into programming. Mobile apps are a bit different from desktop ones, and Google says that they want people to create great ones. Though I don’t know how much attention folks are paying to the speaker; they’re unboxing and setting up their phones.

Some good UI points: don’t just port your UI from other platforms; your app should behave like other apps on the same platform. Don’t do positioning and sizing that won’t adjust to different devices. Don’t overuse modal dialogs. Of course, DO support different resolutions, handle orientation changes, support non-touch navigation, and make large, obvious tap targets.

[Yes, I swear I’m paying attention. But I’m also updating my new phone to FroYo!]

Design philosophy: choose clear over “simple”. Focus on content, not chrome. Enhance the app through use of the cloud (Yes!).

Show feedback: have at least four states for all interactive UI elements: default, disabled, focused, pressed.

Avoid confirmation dialogs (are you sure?). Instead, support undo.

(By the way, we’re back on the OSCON_BG wi-fi tonight, and it’s performing very badly again. Of course, that might be due to everyone downloading FroYo to their phones via wi-fi, since they didn’t come with SIMs.)

Some new UI design patterns: Dashboard, Action Bar, and Quick Actions. They’ll show them in the context of the new Android twitter app.

This is all good content, but I’m ready for the “hands-on” part. Let’s build something.

Tim Bray just announced that Android is now shipping 160,000 phones a day, which comes to more than 50 million a year. So developers should be interested in creating apps for it, and tools to create apps for it. (Nobody’s mentioned AppInventor yet, though.)

Now we’re going to talk about how to have your app interact with RESTful services. I missed that session at Google IO (too much competing content there) and I care deeply about this, so I’m really glad to see this. First issue: why a native REST Client versus a Mobile Web App? Basically because web apps can’t do all the things a native app can do. Yet. Google’s working on it making web apps more and more capable, but there are still things you can’t do on your phone with a web app that you could do with a native app.

First up for REST: how not to do it. Start up a new thread to talk to the remote server, save results in memory, and then have your main thread take that data out of memory and use it. Why is this bad? Well, the OS can shut down the process any time the user navigates away from it (since the device has limited memory). So if you haven’t finished the fetch, or processed the data fetched to memory, it’s gone and you’ll have to do it over. Instead, you need to start a service for this, which the OS won’t kill as easily, and if it does kill it, the OS will save and later restore the state.

Now we’ll see step by step how to do that. There’s a diagram showing with 11 steps involved to have your activity create and use a service to perform a REST method. The specific example is getting a list of photos in an album, or saving a new photo to an album, accessed via a RESTful web service. Some other tips:

Can use the new built-in JSON parser in FroYo
Always enable gzip encoding. If the server supports it, it will not only download faster, but use the radio (and hence the battery) less.
Run the method in a worker thread
Use the Apache HTTP client library

The whole thing starts when the Service receives the Intent sent by the Service Helper and starts the correct REST method. Then the Processor that actually makes the request is done it triggers a callback, and then the Service Helper invoices the “Service Helper binder callback”. (I think I know what they mean by that.) It can queue multiple downloads.

That was how the Service responds to the Service Helper. Where does that come in? It’s what your Activity actually invokes. It’s a singleton that exposes a simple (asynchronous) API to be used by the user interface.

(Again, good information. But where’s the hands-on? I’ve already created the Hello, Android example and pushed it to the new phone, but I want to create something tonight!)

I’ve been to a couple of short session on programming Android, and created the baby Hello, World app, but I’m barely keeping up at this point. I strongly suspect that this example is way too complicated for the majority of people here who just don’t have any Android context to put it in yet.

Okay, we’ve finally finished the description of how that would work. Now we’re going to hear how to use the ContentProvider API. Which is apparently much simpler. Okay, now we have the background to understand all the stuff the Content Provider is doing for us. And finally, the third option is to use something called a Sync adapter. Those are new in Android 2.0+, and they are important. Use them! They’re battery efficient, but not instantaneous due to queueing.

Well, we’re moving on the Android NDK development kit (for native C/C++ code instead of using Java). I no longer believe that there will be any hands-on this evening. Good information, though. However, I have no intention of programming Android in C/C++. Still, there are nice advances. You can now debug NDK application on retail devices, starting with Android 2.2. But we’re now seeing the original Tetris code (from Russia, from a long time ago) running natively on an Android phone using it. Cool, but I’m still not going to do it myself.

And that’s it. No actual hands-on, except what I’ve done myself during gaps in the talks. But still a great session (even without the free phone, but FroYo is cool.)

Comments Off

July 21, 2010

OSCON Infrastructure

Filed under: Uncategorized — Charles Engelke @ 5:29 pm
Tags: oscon

The wi-fi is working great today. (Maybe it wasn’t offered as wireless-N yesterday? The access point was just called OSCONBG.) But now the air conditioning is no good. For the first time I’m uncomfortable. It must be over 80. The temperature isn’t so bad, but the air is completely stagnant.

Comments Off

Devops

Filed under: Uncategorized — Charles Engelke @ 4:55 pm
Tags: oscon

Or is is DevOps? Regardless, it still looks like an ugly name to me. I’ve been hearing this word for two days now, and nobody ever bothers to define it, nor give the derivation of the word. From context, the word is clearly combined from Development and Operations, and seems to refer to managing operations for applications deployed in the cloud.

It seems like a good idea to me. I know I have long found that developers with some operations experience, or at least perspective, really bring a lot to the creation of easily deployed and managed systems.

Comments Off

Mobile Apps with HTML, CSS, and JavaScript

Filed under: Uncategorized — Charles Engelke @ 4:51 pm
Tags: oscon

This morning’s talk by Jonathan Stark was excellent. In particular, I’m definitely going to be using jQTouch, and probably will use PhoneGap. In fact, I think I’d like a version of PhoneGap for Windows – let me deploy a web application inside a native wrapper, with extensions that let me manipulate the machine via JavaScript.

Comments Off

OSCON Begins

Filed under: Uncategorized — Charles Engelke @ 1:43 pm
Tags: oscon

Yesterday’s Cloud Summit was pretty good. I didn’t get a lot of detailed, concrete information, but acquired a good overview of what’s going on and how the pieces all need to connect. But today, the real conference begins.

The morning began with “keynotes”. I put that in quotes, because they were more like lightning talks. But since the speakers were mostly executives instead of technical staff, they each needed ten minutes to make their points, not just the five usually allocated to a lightning talk. The talks were okay, and whenever one dragged it was over soon, but they didn’t add much. If they just skipped them, and had one or two real, deeply interesting keynotes, for the entire conference, that would be a lot better. As it is, the conference itself doesn’t get started until nearly 11:00 AM.

For the rest of the morning (actually, the middle of the day) I’m going to attend sessions on programming for mobile devices. First up is Android, the Whats & Wherefores by Dan Morrill of Google. I’m already somewhat familiar with Android, so may not see much new. But the second half the session is about Building Mobile Apps with HTML, CSS, and JavaScript, by Jonathan Stark, and I am very deeply interested in that topic. I like all mobile platforms, but I don’t want to have to master lots of different technologies. I think web technologies are mature enough to meet my development needs.

Comments Off

July 20, 2010

Trying OSCON’s Wi-Fi Again

Filed under: Uncategorized — Charles Engelke @ 5:55 pm
Tags: oscon

Let’s see if things go better this afternoon. We’re going to have a few debates in the afternoon, starting with one about the importance of open standards for cloud computing. Sam Johnston of Google (his blog is here) starts the debate, speaking in favor of the importance of open standards. After he’s had 15 minutes to present his case, Benjamin Black will have 15 minutes to make the opposite case. Then there will be 15 minutes of back and forth, between the speakers and with a “jury” and audience members.

The “for” argument isn’t very interesting to me, because I already agree with what he’s saying. I need to hear some contrary information when the opposition comes on. Which is now starting. Black starts by pointing to the dysfunctional processes often behind defining and agreeing to standards with a Monty Python video (the fish-slapping dance). Then: what’s important? Utility. If it doesn’t solve my problem, I don’t care about standards. Then interoperability. Then being free of vendor lock-in (independence). Those three aren’t all equal.

Some problems don’t need go past utility. For example, SQL (in reality, not in theory). His point seems to be that there is no meaningful interoperability between SQL implementations, yet we still use SQL. Well, I don’t think I agree with the premise there. A lack of perfect interoperability doesn’t mean that there isn’t any interoperability!

Suppose something new comes out with massive utility and a lot of imperfection. People will adopt it rapidly. Then you get lots of competition and exploration, and lots of “standards” that are all different from each other (think networking in the early days). Eventually, the different islands begin to interoperate with each other as demanded by their users. That’s where the cloud is now. So it’s too early to define what the correct standards should be.

That happens in the next stage: maturation. That’s where we worry about independence, not earlier. Successful standards formalize what is already true. “Standards are side effects of successful technology.” “All successful standards are de facto standards.”

All good points. But is there nothing in cloud computing ready to benefit from the independence? His next point is that, even if so, it’s too early. Because as things become more standardized, the rate of innovation has to drop, and we aren’t ready for that to happen in the cloud. Very good quote: Standardize too soon, and you lock in to the wrong thing.

Excellent speaker, and I agree with his points. But not necessarily all his conclusions. Mainly, I think some cloud issues are more mature that he seems to be saying, and are ready to improve interoperability, and perhaps even independence. But he makes a great case.

There’s some back and forth and questions next. It seems that they favor the “against” position. But it seems that the question has changed a bit over the talk. Now people are agreeing that a priori standards are bad. But the question was about whether any standards were needed.

The next debate is on whether Open APIs are enough to prevent vendor lock-in. George Reese will argue that they are; James Duncan will say that they aren’t. Of course, the question starts with trying to determine just what would make an API “open”. But that’s dismissed early on as not the core question. It seems that the “pro” advocate is arguing against it: even if the APIs are open, if the platform itself isn’t, then you can’t take your top layer and move it elsewhere.

I don’t find this debate very interesting, though. Nothing really new or useful for me. But the first debate was excellent. It’s a good format.

On the plus side, the conference Wi-fi is kind of working now. It’s not great, but not dead, either. I notice a lot of non-conference access points are now gone; I wonder if interference, rather than bandwidth, was the major problem.

Comments (1)

No more blogging at OSCON

Filed under: Uncategorized — Charles Engelke @ 2:13 pm
Tags: oscon

I’ve had to switch to my phone because wifi is unusable. They say they have 60Mb/s, which sounds like a lot until you divide it by a few thousand users.

Swype on android is great, but requires too much attention while trying to listen.

Comments Off

« Previous Page — Next Page »

April 15, 2011

March 9, 2011

February 19, 2011

February 16, 2011

February 13, 2011

February 9, 2011

Core principles

Handling a Release

Software Maintenance

Merging Fixes

Summary

February 3, 2011

February 1, 2011

January 4, 2011

July 23, 2010

July 22, 2010

July 21, 2010

July 20, 2010

Recent Posts

Links

Archives

Tags