- Webextension plugin for bookmarking pages
- Readability clone strips content of ads
- Full text search of bookmarked pages
- natural language processing classifies pages and extracts entities
Bookmarks are so limited. Unless you can remember the page title or the url you may never find that oh so important page again, that one you know you saw once that had that one piece of information you need to finish your article or end a debate. To save me the tedious task of trawling through my web history, I created Newsicles, a database of bookmarked news articles that strips them of advertizing and other cruft (using this python version of readability) and provides full text search of their content. Then I added tagging to the articles to group them around subjects and story ideas. When I found even tagging too tedious, I added a Natural Language Processor that guessed at the main subject based on frequency, and identified people, places and organizations within the text. Eventually, I want to create a database of those newsworthy entities -- the fundamental particles of news -- because that is how I want to read the news: as clusters of information around different entities, rather than wading through a series of reports on whatever ephemera a journalist classified as an "event" on any particular day.
The code isn't posted right now. I deleted my Gitlab account recently and I haven't had time to start a new one. Stay tuned, or get in touch if you would like to know more about this.