1
Many a times we build dictionaries in online forums for typical words used in a forum, the user responds in following format in a phpbb forum
term: defination
hence the job is to collect all these unique entries eliminating noise , we typically copy the entire page and sort it to remove the noise:
Joined: Fri Jan 24, 2014 9:49 pm Joined: Fri Jun 05, 2009 5:57 pm Joined: Mon Jul 07, 2014 7:20 am Joined: Mon Jul 07, 2014 7:20 am Joined: Mon Nov 25, 2013 6:46 am Posts: 49 Posts: 49 Posts: 49 Posts: 49 Posts: 5 Posts: 8152 Progessium: A light peptide necoliye
So how can a command line or a python script sort the above contents removing noise and only getting entries in alphabetical order like:
Progessium: A light peptide necoliye
http://nejc.skoberne.net/2011/02/phpbb-export-all-posts-for-a-user-into-a-file/ – STTR – 2014-11-12T04:36:24.133
we dont have admin access to it, just want to scrape the html – suuser – 2014-11-12T04:39:42.943
https://gist.github.com/evandhoffman/2030469 – STTR – 2014-11-12T04:39:47.150
http://stackoverflow.com/questions/3620875/how-to-download-all-posts-of-phpbb3-forum-if-i-am-not-admin – STTR – 2014-11-12T04:41:41.343
Use Adobe Acrobat Pro, as variant) – STTR – 2014-11-12T04:43:59.260