TODO: NLP POS tagger for english and foreign language newspapers to detect proper names in native language

considering a java version of NLP POS tagging to document prevalence of proper names in high volume circulation newspapers in various languages, several processable by the stanford nlp software available, also persistently storable in original language. the stanford POS tagger is available in arabic, chinese, french, spanish, & german


2 responses to this post.

  1. i am interested in documenting information such as: first occurences, keyword collocation, avoiding translation & maintaining data on circulated media in original language


  2. performing a SWOT analysis of proper names appearing in arabic newspapers such as in this listing: in a relational db maintaining a table of proper names along with metadata such as article listings, keyword occurrences, & links to longblob or structured documents such as office spreadsheets or longer detailed documents would be an interesting way to maintain awareness of current topics in arab culture


