TODO: Peachies

i am planning on updating the code for my “peachies” newspaper mining script to utilize only the highest circulation volumes according to this page:

i am only going to be hosting the code (now on github as jcrow) and not a full-fledged web app as i had done previously because hosting is expensive and i’ve been hacked a couple of times using privately served web hosting services and have lost data. i am planning on using the google translator api in my code for the first time in order to mine newspapers in foreign languages. also, i am initially only seeking term frequency across the lexicon and will post my word counts here on this wordpress site. if successful i may initiate the natural language processing of:

for the first time for the sake of doing the same with proper names. also, i am using the python library:

to strip html tags and get the lang attribute of the tag


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: