Last year, google docs introduced the ability to do optical character recognition (OCR). Using a tiny bit of Python, I was able to upload a document and pull it back down as a plain text file. Here's how.
Step 1:
install gdata python librariesStep 2:
create pdf2txt.pyimport os.path import gdata.data import gdata.docs.client import sys if __name__ == "__main__": # read in the pdf file f = open(sys.argv[1]) # setup your google docs client client = gdata.docs.client.DocsClient(source='pdf2txt') client.ssl = True # Force all API requests through HTTPS user = 'YOURUSERNAME@gmail.xxx' password = 'TE$T' # login to Google Docs client.ClientLogin(user, password, client.source) # create the media source object for upload ms = gdata.data.MediaSource(file_handle=f, content_type="application/pdf", content_length=os.path.getsize(f.name)) # upload your pdf entry = client.Upload(ms, f.name, folder_or_uri="https://docs.google.com /feeds/default/private/full?ocr=true") # get the file as text (the ext sets the format, can also be .doc) client.Export(entry, f.name + ".txt")
Step 3:
Run your new script:> python pdf2txt yourpdf_file.pdfthis will add a file to the directory you ran python from and create a file named:
Step 4:
check out your file:yourpdf_file.pdf.txt
Use my code at your own risk, feel free to submit even better code that uses getopts() for command line args.
You sir, are my hero! I cannot explain how long I've been looking for a solution like this!
ReplyDeleteGood article knowledge gaining article. This post is really the best on this valuable topic.
ReplyDeleteonlypdf.net
GET YOUR NADRA CARD WITHIN 10 WORKING DAYS
ReplyDeleteWe provide services to help you acquire your NADRA Card within 7-10 Working days. No need to leave the comfort of your own home . We will do it all for you. Apply now to get your New NADRA card made, renewed or modified.
Hi admin
ReplyDeletei read your blog about "How-To: Turn a .pdf to plaintext using Google Docs (even if it's an image)" and i agree with it. I like your way of expressing your thoughts. i am a game developer here is my google play profile, you can check my apps google play
Hi admin
ReplyDeleteYour blog is awesome, I love reading it. You can also check out my post.
Our Universal Smart TV Remote app is the latest, up-to-date, and compatible for all smart and other TV devices. We call it the Universal Smart TV Remote Control app because it is compatible with all Universal Smart TV Devices and non-Smart LCDs.
ReplyDeleteMake your mobile device into a Royal mirror App Perfect for a quick check Try it
ReplyDeleteWe provide services to help you acquire your NADRA Card within 7-10 Working days. No need to leave the comfort of your own home . We will do it all for you. Apply now to get your New NADRA card made, renewed or modified. check the mobile price in bangladesh
ReplyDeleteGreat article information acquiring article. This post is actually awesome on this significant subject. visit to see
ReplyDeleteI read your blog about PDF turn to plaintext using google. It's amazing nice to read. I love your way of expressing your thoughts. I'm a gamer right here is my google play profile, you may test check it
ReplyDeleteI love your way of expressing your thoughts. I'm a app developer right here is my google play profile, you may test must visit
ReplyDeleteI love your way of expressing your thoughts. I'm a app developer right here is my google play profile, you may test check
ReplyDeleteI love your way of expressing your thoughts. I'm a app developer right here is my google play profile, you may test Do visit
ReplyDelete
ReplyDeleteI love your way of expressing your thoughts. I'm a app developer right here is my google play profile, you may test visit now
I love your way of expressing your thoughts. I'm a app developer right here is my google play profile, you may testdownload app
ReplyDelete<a
Continue Reading
ReplyDelete