There is an annoying tendency of the BBC News site to have numerous ancient stories in their ‘Popular’ sidebar. It was a good excuse to try out Python and collect a bit more information on when this occurs.
This Python script follows the following process:
- Visit the BBC News homepage and scrape the ‘Most Popular’ sidebar.
- Visit the URL of each story.
- Collect the published date (from the meta data and front end)
- Calculate the difference between present date / time and the published date.
- Store the data as a CSV file.
- Repeat the whole process every n minutes or seconds.
Beautiful Soup makes relatively light work of parsing what we want, along with PrettyTable, CSV, Regular Expressions and Requests.
Running this in IDLE shell looks like this:
and some of the ‘old’ results collected over a 3 hour period can be seen in this spreadsheet . Out of 46 news stories listed under popular stories during this period, 21 were over 67 days old!
|Days Old||Story Title||URL||Present D/T||Original Pub Date|
|281||Rent ‘unaffordable’ in third of UK||http://www.bbc.co.uk/news/business-23273448||23/04/2014 00:01||15/07/2013 06:35|
|852||O’Donnell warns of UK challenges||http://www.bbc.co.uk/news/uk-politics-16295421||23/04/2014 00:01||22/12/2011 18:10|
|67||Pension system ‘is not working’||http://www.bbc.co.uk/news/business-26178113||23/04/2014 00:11||14/02/2014 11:31|
|593||Union joint strike action warning||http://www.bbc.co.uk/news/business-19514195||23/04/2014 00:26||06/09/2012 23:40|
|854||Government outlines pension deal||http://www.bbc.co.uk/news/business-16259238||23/04/2014 00:36||20/12/2011 22:20|
|244||Millions ‘worse off’ on new pension||http://www.bbc.co.uk/news/business-23770327||23/04/2014 00:46||21/08/2013 12:35|
|508||Plain packs for Australia smokers||http://www.bbc.co.uk/news/world-asia-20559585||23/04/2014 01:12||01/12/2012 01:08|
|509||Energy Bill for ‘cleaner economy’||http://www.bbc.co.uk/news/business-20539981||23/04/2014 01:17||29/11/2012 15:14|
|615||Australia court backs tobacco law||http://www.bbc.co.uk/news/business-19264245||23/04/2014 01:32||15/08/2012 01:56|
|887||Doctors call for car smoking ban||http://www.bbc.co.uk/news/health-15744352||23/04/2014 01:52||17/11/2011 17:10|
|435||Bedroom tax’s’ impact on the north||http://www.bbc.co.uk/news/uk-england-tyne-21412826||23/04/2014 02:02||11/02/2013 14:01|
|317||Labour ‘would cap welfare spending’||http://www.bbc.co.uk/news/uk-politics-22785282||23/04/2014 02:07||09/06/2013 16:37|
|323||Ed Balls seeks to restore Labour’s economic credibility||http://www.bbc.co.uk/news/uk-politics-22753040||23/04/2014 02:17||03/06/2013 13:24|
|281||Benefit cap ‘leads to more in work’||http://www.bbc.co.uk/news/business-23306092||23/04/2014 02:17||15/07/2013 17:51|
|818||Britons ‘becoming more dishonest’||http://www.bbc.co.uk/news/uk-16714872||23/04/2014 02:43||25/01/2012 08:54|
|218||Benefit cheats face 10-year terms||http://www.bbc.co.uk/news/uk-24104743||23/04/2014 02:48||16/09/2013 12:49|
|811||Family life on benefits||http://www.bbc.co.uk/news/uk-16812185||23/04/2014 02:53||01/02/2012 12:47|
|135||Most people in poverty are ‘in work’||http://www.bbc.co.uk/news/uk-25287068||23/04/2014 03:23||08/12/2013 21:49|
|187||Work ‘may be no way out of poverty’||http://www.bbc.co.uk/news/uk-politics-24553611||23/04/2014 03:28||17/10/2013 14:59|
|692||Professions ‘must be more open’||http://www.bbc.co.uk/news/uk-politics-18254219||23/04/2014 03:39||30/05/2012 15:32|
|309||Top unis ‘now less socially diverse’||http://www.bbc.co.uk/news/education-22912609||23/04/2014 04:04||17/06/2013 07:38|