Tuesday, November 1, 2011

From 262 to 26.2

I'm terribly nervous about running the NYC Marathon this Sunday, Nov. 6, 2011. I started training the first week of June. Over the past 5 months, I've ran over 350 miles. 70 miles/month. I've destroyed 1 pair of sneakers. Ran 15 miles in 100 degree heat. Ran across the Golden Gate Bridge. Ran on the Miami Beach Boardwalk at Sunrise. Just this weekend ran in 3" of slush.

18 Miles with Josh Anish in SF.
And for what?

Nearly a year ago, rushing to the PATH station, I felt my stomach slump over my ice cold belt buckle. "Holy Shit! How did I get so fat?" Early on in our relationship, my wife and I passed a man in the mall whose belly stuck out from under the bottom of his shirt over his pants. We agreed we would never let ourselves to go like that. I was not keeping up my end of the bargain.

This is how I decided I needed to do something? Neither fear of diabetes nor fear of heart disease had been enough to scare me into fixing my diet and exercising. This simple, all too fleshy sensation was the trigger. I knew I needed a change.

Shortly after my epiphany, Tim Ferriss released 4 Hour Body. The most important aspect of the book was that it illustrated how other over-worked techies could fix their diets by making simple, repeatable changes. He also got me to think of my diet as a science project instead of something to dread. Cheat day was a god send and got me through a couple tough weeks of travel. At my heaviest, I was 268, at my lightest, I'm down to 222 (really want to be ~210-215). I used to rush off to the train in the morning, and grab a 600 calorie breakfast sandwich, I now make myself a couple eggs with veggies and turkey bacon every morning. It doesn't take me anymore time. And I lost 40 pounds.

Sunrise run in Miami.
Then, in March, @naveen from Foursquare retweeted a message from @dens about Camp Interactive looking for marathon runners for their team in the 2011 NYC Marathon. I committed to run a marathon the year before as my New Year's Resolution, because I'm an idiot. Nevertheless, I filled out the form, I was honest that I didn't think I could run 26.2 miles or raise $5,000.

In June, Camp Interactive accepted me anyways.

I hadn't been exercising or running regularly for over a year. I had no idea how I was going to do run a marathon. Fortunately, as usual, our VP, Product Development had a plan for me. Brian setup some simple, achievable goals for my monthly distances. I also found a first-time marathon plan with the minimum amount of running during the week.

Final run in the snow?!?!
Every run was a new, faster time, or a new longer distance. I learned how to hydrate, when to eat Goo, how to master a negative split (fingers crossed for getting any/all of those right this Sunday), but most of all, I learned that I my body is an amazing, flexible (figuratively) piece of machinery willing to be pushed well beyond my mind's breaking point.

In less than a year, I've dropped 40 pounds and have run at least 20 miles. Fingers crossed for making it the full 26.2 this Sunday. Thank you all for your support and donations (not too late for more), especially my amazing wife who's put up with me spending 1 day every weekend for the past 5 months disappearing in the woods for 2-4 hours and coming home to ice my knees and pass out on the couch.

For all of my friends in or near the city, I'd love to hear your cheers on Sunday. My bib number is: 44712.

Best of luck to all my fellow runners -- esp. Team Interactive and fellow Knerds: Ken and Jen!

Monday, February 14, 2011

3 things you forgot about when adding a new language to your runtime environment.

Many engineers seek to use the right tool for the job (and they should). Most tech managers get a bit freaked out at the idea of endless language proliferation (who cares?). To help your tech managers be a little less freaked out and ensure you never ever again get stuck in a J2EE stack, there are 3 things that you need to keep in mind when adding Python, Ruby, Haskell or Lisp to your runtime environment.

  1. The MySQL bindings will suck.
  2. The initial appserver config will be wrong.
  3. The exception handling and reporting will be useless.
We'll miss you too Duke.

The MySQL bindings will suck.

I'm 0 for 3 (Java, Ruby, Python) in implementing a new language and having anything sort of like smart idle connection timeout handling. I've needed to implement wrappers in Java (hopefully this isn't still the case), patch the Ruby MySQL gems and have tried to figure out the one right Python wrapper for ensuring my app doesn't need a restart once connections are timed out by the server. These are all solvable. Just make sure you test for these issues and resolve them prior to production deployment.

The initial appserver config will be wrong.

Most languages run in some sort of appserver, typically that's wsgi or some mod_ apache plug-in. But, some appservers have strong language preferences - esp. in the Java world. Make sure you understand how your new language is going to run in a new appserver - multithreaded or multiple processes? how many threads or processes do you need? are database connections pooled?

The exception handling and reporting will be useless.

On our team, at Knewton, we email all engineers exceptions and stack traces. We've needed to implement this for both Ruby and Python. For both, we needed to add in important information for debugging - request params, headers, along with a usable stack trace. One other common issue we've encountered is that all libraries for connecting to external resources (curl) have useless default errors. Most of these libraries do not tell you explicitly what they were trying to connect to when the exception has occurred - "Destination Unreachable" is not useful; which destination (hostname, ip, port)? We've patched curb in Ruby and are currently patching Python to let us know what we were unable to connect to.

These are the most common issues I've seen when adding a new language to an environment. Are there any I'm missing (or conveniently forgetting about)? If you can think of any, drop me a comment below.

Thursday, February 10, 2011

How-To: Turn a .pdf to plaintext using Google Docs (even if it's an image)

Every once in awhile, I'll receive a large set of documents that I need to quickly read and categorize. Some day I hope to use NLP for those categorizations, but I still have much to learn. One document format that I always struggle with converting on Mac OS X with Python is .pdf. But, not anymore...

Last year, google docs introduced the ability to do optical character recognition (OCR). Using a tiny bit of Python, I was able to upload a document and pull it back down as a plain text file. Here's how.

Step 1:

install gdata python libraries

Step 2:

create pdf2txt.py
import os.path
import gdata.data
import gdata.docs.client
import sys

if __name__ == "__main__":
    # read in the pdf file
    f = open(sys.argv[1])

    # setup your google docs client
    client = gdata.docs.client.DocsClient(source='pdf2txt')
    client.ssl = True  # Force all API requests through HTTPS

    user = 'YOURUSERNAME@gmail.xxx'
    password = 'TE$T'

    # login to Google Docs
    client.ClientLogin(user, password, client.source)

    # create the media source object for upload    
    ms = gdata.data.MediaSource(file_handle=f, content_type="application/pdf", content_length=os.path.getsize(f.name))        

    # upload your pdf
    entry = client.Upload(ms, f.name, folder_or_uri="https://docs.google.com
/feeds/default/private/full?ocr=true")

    # get the file as text (the ext sets the format, can also be .doc)
    client.Export(entry, f.name + ".txt")

Step 3:

Run your new script:
> python pdf2txt yourpdf_file.pdf
this will add a file to the directory you ran python from and create a file named:

Step 4:

check out your file:
yourpdf_file.pdf.txt

Use my code at your own risk, feel free to submit even better code that uses getopts() for command line args.

Monday, January 24, 2011

How-To: SMS Q+A with Twilio, App Engine, and CherryPy

When learning a new API or programming language, I often get a bit nostalgic for the first programs I wrote in BASIC. Those programs looked a lot like this:

10 INPUT "What is your name? "; U$
20 PRINT"Hello, ", U$


To illustrate the simplicity of Twilio, Google's App Engine and CherryPy, I'll take this simple program and show you how to build it for an SMS interface.

Twilio allows you to automate placing and receiving phone calls, and placing and receiving SMS-messages (or TXTs). For this tutorial, we'll focus on placing and receiving SMS-messages. App Engine is a way to inexpensively (free to start) run web applications in Google's cloud. CherryPy is an extremely simple HTTP server for Python. CherryPy has nominal configuration and reminds me a *lot* of the simplicity of the original Java Servlet specification.

Before we get started, you need to signup for Twilio and Google App Engine accounts. Both are free and easy to setup. Once you've done that, swing on back, and we'll walk through the rest.

The Spec

  • A user sends an SMS to your sample Twilio account phone number with the text "?".
  • Twilio sends that request to your new Google App Engine App.
  • Google App Engine dispatches the request to your CherryPy app.
  • CherryPy reads the body of the request.
  • If the request field Body is "?", CherryPy responds with "What is your name?" 
  • If the request field Body is not "?", CherryPy responds with "Hello, [Body]."
  • The user that sent the original SMS message receives the response on their phone.

The Development Process

First, we're going to get a simple app running in CherryPy on your local machine. Then, we'll get it running on  App Engine locally, and in the cloud. Finally, we'll connect Twilio to App Engine and you can watch it all in action.


Get your app running in CherryPy

First, create a directory for your app:
$ mkdir twilio_qa

Download cherrypy

Unzip cherrypy in the twilio_qadirectory.

Then, use your favorite editor to create twilio_qa.py.

#!/usr/bin/env python
import cherrypy

class TwilioQA:
  # the main entry point, this is the method that will get called
  # when someone hits your server.
  def index(self, Body="?", **kwargs):
    if (Body == "?"):
      return self.instructions()
    else:
      return self.wrap("Hello, %s.\r\nShall we play a game?" %Body)

  # print out the instructions
  def instructions(self):
    s = "What is your name?"
    return self.wrap(s)
  
  # helper function to wrap response with twilio SMS Response XML
  def wrap(self, response):
    s = """<?xml version="1.0" encoding="UTF-8" ?>   
<Response>
  <Sms>%s</Sms> 
</Response>
""" % (self.trim(response))
    return s

  MAX_SMS_LENGTH = 160

  # remove all chars after 160, otherwise Twiilo will reject the
  # response.
  def trim(self, s):
    return s[0:self.MAX_SMS_LENGTH]

  # tell cherrypy which methods it can call.
  index.exposed = True

# setup cherrypy
app = cherrypy.tree.mount(TwilioQA(), "/")  
cherrypy.quickstart(app)
Make twilio_qa.py executable and run it.

$ chmod +x twilio_qa.py
$ ./twilio_qa.py

Test your app

With your app running, open a browser and go to http://localhost:8080/.

The page should return the instructions:

What is your name?
Add the parameter for the body to see how it works with a response  http://localhost:8080/?Body=Pete. # Note case for form fields matters, cherrypy automatically maps query paramaters to variables in your index method.

You should see the response:

Hello, Pete. Shall we play a game?
Viewing the source will show you the XML that you will be sending to Twilio in just a little bit.

The cherrypy setup is really simple. You import cherrypy, mount the default URL "/" to your TwilioQA() class, and run the server with the quickstart() method.

Run your app in App Engine

Download and install the App Engine SDK for Python.
Create an app.yml in your twilio_qa directory.
application: twilio-qa-CHANGE-ME
version: 1
runtime: python
api_version: 1

handlers:
- url: /.*
  script: twilio_qa.py

In twilio_qa.py, import wsgiref.handlers to work with appengine:

...
import cherrypy
import wsgiref.handlers
...

In twilio_qa.py, comment out quickstart and add the App Engine handler:
... 
app = cherrypy.tree.mount(root, "/")
# cherrypy.quickstart(app)
wsgiref.handlers.CGIHandler().run(app)

Run your app in app engine's localserver:

$ dev_appserver.py --port=8080 ../twilio_qa/

Follow the steps in Test your app and scroll back here when you're done.

Deploy on App Engine

Now that you've tested locally, you can deploy to the cloud in a few easy steps.


Create a new application at appengine.google.com:
  1. Click [Create Application].
  2. Choose an [application identifier] (you'll use this for the application field in app.yml later). This needs to be unique across the entire appspot.com domain. Good luck :-)
  3. Create a title.
  4. Update your app.yml, set your application identifier as the value in application.
  5. Deploy your app by typing:
    $ appcfy.py update ../twilio_qa/
  6. Go test your app, instead of localhost, substitute http://[your-application-id].appspot.com/ for http://localhost:8080

For more on appengine, definitely check out their great knowledge base for getting started in Python.

Connect to Twilio

Now that you have your twilio trial account and your app ready to be connected, you just need to connect the 2.

In the Developer Tools section of your account, set you SMS URL to:

http://[your-application-id].appspot.com/

Click [Save].

Send a TXT to the Sandbox Phone Number you configured with the Body of "[PIN] ?".

You'll need to put the [PIN] in for the next question as well.


The end

Go forth and create inexpensive Question and Answer apps using SMS, Twilio and App Engine.

If you have any questions, issues, or create something awesome with this tutorial, please drop me an a comment.