Linux.com

Using Python

Posted by: Anonymous Coward on March 31, 2006 12:47 AM
Here's how I'd do it, using Python:

#-------start of code --------------
import urllib, re, csv

URL = 'http://home.earthlink.net/~robreilly/index-demo.<nobr>h<wbr></nobr> tm'
output = csv.writer(file('c:/temp/output.csv', 'wb'))

content = urllib.urlopen(URL).read()
data = re.findall('(.*).*\((.*) page views\)', content)
output.writerows(data)
#------- end of code --------------

This just creates the CSV file -- it doesn't open OpenOffice for you. Another difference: in my version, I let the regexp do the field separation.

#

Return to Extract data from the Internet with Web scraping