Please help Scrap a list of numbers from the source code of a webpage ?

krishna108 · Postby **krishna108** » Sat Sep 19, 2015 2:54 pm

1. I want the program to read the website source

and then copy just a set of numbers that appear on after the = sign only for links that are like mypage.php?REF=23273273

Then i want to put them in a list (in a subsequent code) with will take each number from that list

and put that in a paragraph which will copy it self.

then i want to print such paragraphs in a txt file.

The desired output is

5646556
6564654
454654
4646546

and so on

This is the code i am working with.

Syntax: Select all

from bs4 import BeautifulSoup
import urllib2
import re

url = "somewebsite"

headers = { 'User-Agent' : 'Mozilla/5.0' }
html = urllib2.urlopen(urllib2.Request(url, None, headers)).read()
soup = BeautifulSoup(html)

links = soup.findAll('a', href=re.compile('.*mypage\.php\?REF=[0-9]*'))
template = """lasljasfkljaslkfj{}
slajfljasflk
aslkjfklasjflkasjf
alksjflkasjf;lk
"""

replace = [ link.split("=")[1] for link in links ]

output = [template.format(r) for r in replace]

print output
with open('output.txt', 'w') as f_output:
    f_output.write(''.join([template.format(r) for r in replace]))

Here was the other half of the original program. This program just takes numbers from a list that u have to ype and it puts each of those numbers in a paragraph and then copies that paragraph with the next number being inserted from the list.

Syntax: Select all

template = """fjajflakjfakjfl;kj REF={}
sklkasalsjklas
klajsl;kdajs;djas
aksljl;askjflka
"""

replace = [1131062,
    1140921,
1141326,
1141355,
1141426,
1141430,
1141461,
1141473,
1141477,
1141502,
1141525,
1141622,
1141662,
757053,
989967]

output = [template.format(r) for r in replace]

with open('output.txt', 'w') as f_output:
    f_output.write(''.join([template.format(r) for r in replace]))

Postby **Ayuto** » Sat Sep 19, 2015 9:34 pm

Just wondering: what's the question? Or where do you have problems?

krishna108 · Postby **krishna108** » Sun Sep 20, 2015 7:42 am

Ayuto wrote:Just wondering: what's the question? Or where do you have problems?

which part was misunderstood , pls tell me i will clarify
ty

Postby **Ayuto** » Sun Sep 20, 2015 3:09 pm

You just posted code, but didn't say what's wrong with it. What's your question?

Please help Scrap a list of numbers from the source code of a webpage ?

Please help Scrap a list of numbers from the source code of a webpage ?

Who is online