TODO: Commodities in World News

algorithm to scan: http://www.inkdrop.net/news
english speaking international newspapers seeking spikes or drops in trading volumes or prices based on predictively on linear regression according to term frequency in lexicon

interestingly, news sources such as: allafrica.com display a aggregative approach, in their case, comprehensive staffed and budgeted aggregation of news in continent of africa

according to previous attempts some articles match others word for word, perhaps an indicator by ‘datepublished’ parameter in http response header of proximity to more timely source, in a journalistic sense, a more trusted, reliable, or authenticated news source while synchronization of news with later ‘datepublished’ headers may indicate lesser proximity or syndicators

also, consider constructing viable news aggregator source based on this formula that can actually rely on search engine parameters or high term frequency to populate a headline or front page of news stories, positioning itself as a target for press

Advertisements

3 responses to this post.

  1. another possibility for utilizing the ‘datepublished’ parameter in the http response header is ranking the returned documents where the target keyword appears and ranking them by date published, ensuring the timeliness of earliest published dates may be an indicator of more valuable newsworthiness and proximity to more valuable news sources

    Reply

  2. this code returns date published from any url entered:

    import urllib2

    def dateh(u):
    myurl = ‘http://www.myafghan.com’
    page = urllib2.urlopen(myurl)
    return page.info().headers[10]

    Reply

  3. my original design was to seek repetition of terms indicating commodities such as ‘oil’. however, given the capabilities of my initial implementation, i have discovered two new potential research avenues. namely, seeking document frequency by crawling each home page of the news sites indexing articles (however, i am ignoring term frequency for now and only utilizing document frequency, the # of documents in a lexicon where a term appears at least once. a high df value may indicate global relevance in this lexicon for the sake of indicating broad global news coverage) for the sake of establishing an indicator of what i call: ‘global relevance’. while this is experimental i have yet to finish implementation. also, i have discovered another interesting source of information on proximity to avenues or sources of information by seeking the top ranking document frequency terms in the lexicon and ranking them by the date published function published in the previous reply seeking an answer to who is getting the news first and who is merely syndicating the news from their ‘international’ journalistic position, indicating that with this considerably brief script, i can host a news aggregator affixed to proximity of news sources or established as a source of competent news that can itself approximate news sources daily. i am also interesting in adding multimedia sources such as images and youtube videos, the likes of which can populate a front page and comprehensive selection of news articles from around the world that are considered to rank highly for the sake of a ‘document frequency’ measures and use keywords from high ranking articles as query terms for youtube videos or other sites.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: