23 March, 2010

Grab All Links Of Web Page Using Python

I always try to make useful program to learn a Programming Language. Now I am trying to learn Python. I will try to post my practice code here. Today I will show you how to grab all links of web page using Python's "BeautifulSoup" module.

  1. import urllib,BeautifulSoup
  2. def get_links(url):
  3.      page = urllib.urlopen(url)
  4.      soup = BeautifulSoup.BeautifulSoup(page.read())
  5.      aTag = soup.findAll("a")
  6.      link = []
  7.      for a in aTag:
  8.           if a.has_key("href"):
  9.                link.append(a['href'])
  10.      return link
  11. print get_links("http://localhost/PRACTIECE/a2/")
Enjoy!!!