![]() |
聪明伶俐的墨镜 · 滴滴携手网约车品牌大使潘展乐升级产品服务 ...· 4 月前 · |
![]() |
率性的砖头 · CURLOPT_CUSTOMREQUEST· 4 月前 · |
![]() |
率性的水煮鱼 · 全国农村集体产权制度改革工作先进集体和先进个 ...· 5 月前 · |
![]() |
小胡子的瀑布 · 日本的經濟發展與產業結構 - ...· 5 月前 · |
![]() |
胡子拉碴的牛肉面 · 正则查找字符串中特定后缀(某单位)的数值_正 ...· 8 月前 · |
vtol • September 5, 2018
I only signed up to leave this comment.
Thank you so much, very good, on-point tutorial. This is really one of the best sources on the internet on this topic and I believe it will be more than helpful to people like me who just started web scraping with python.
muhammettan28 • April 9, 2018
Great tutorial, it’s very useful thank you!
samueljhuskey • October 12, 2017
Thanks for this terrific tutorial, Lisa! The part about iterating over a series of result pages was especially helpful to me.
finestjava • July 22, 2017
The first part of the tutorial was fine - but then it falls apart. This is a very common problem - The author knows the material so well they forget we have never seen most of the topic or task. So just getting the Z names and printing them to the terminal and CSV files worked just fine.
When it came to adding all the pages and writing the code to the CSV files - I started getting the dreaded Unicode encoding errors. I have worked with beautiful soup before and I really liked how you started out.
File “nga_z_artists.py”, line 29, in <module> f.writerow([names, links]) UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe4’ in position 16: ordinal not in range(128)
File “nga_z_artists.py”, line 3 f.writerow([names, links])nd associated link to a row’)m’) SyntaxError: invalid syntax
Arty Caiado • July 22, 2017
Thanks for the tutorial! I’ve been using the requests and BeautifulSoup libraries for a little while, but always struggle matching the regex. Previously I hadn’t found much good documentation out there, and I always spend hours doing trial and error. This is pretty detailed and helpful. Thank you. I was able to populate this Python/Django website SeekingBeer using these libraries, but I spent forever messing around with the code.
totorikacfrm • July 3, 2022
Again, very amazing tutorial. Thank you.
waylandchin • January 20, 2021
I ran this tutorial on an iPad with Pythonista. The data prints to the console just fine, however the resulting CSV is blank.
omkulkarni22 • June 8, 2020
Hello the tutorial is best. But how can I run my scrapping script 24*7 on Digital Ocean ? Any tutorial ?
rksrp96 • February 5, 2019
``I have a small doubt? while grabbing the details more than 10 pages i got an error of Traceback (most recent call last):
File “<ipython-input-3-3844d097c07c>”, line 1, in <module> runfile(‘C:/Users/user/Test/BS4/test_webscraping_2.py’, wdir=‘C:/Users/user/Test/BS4’)
File “C:\Users\user\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 668, in runfile execfile(filename, namespace)
File “C:\Users\user\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 108, in execfile exec(compile(f.read(), filename, ‘exec’), namespace)
File “C:/Users/user/Test/BS4/test_webscraping_2.py”, line 35, in <module> actual_price = act_prices[0].text
IndexError: list index out of range
this is my from bs4 import BeautifulSoup as BS import requests import csv
pages = []
file = ‘LedTv_List.csv’ f = csv.writer(open(file,‘w’)) f.writerow([‘Brand Names’])
for i in range(1, 5): url = ‘ https://www.flipkart.com/search?q=led+tv&sid=ckf%2Cczl&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_0_3&otracker1=AS_QueryStore_OrganicAutoSuggest_0_3&as-pos=0&as-type=RECENT&as-searchtext=led+&page= ’ + str(i) pages.append(url)
for page in pages: web = requests.get(page) #web = requests.get(“ https://www.flipkart.com/search?q=mobile&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&“+i+” ”) soup = BS(web.content,“html.parser”) item = soup.findAll(“div”,{“class”:“_1-2Iqu row”}) #items = item[0] for items in item: brand = items.find_all(“div”,{“class”:“_3wU53n”}) brand_name = brand[0].text #print (brand_name) act_prices = items.find_all(“div”,{“class”:“_3auQ3N _2GcJzG”}) actual_price = act_prices[0].text offers = items.find_all(“div”,{“class”:“VGWI6T”}) discount = offers[0].text prices = items.find_all(“div”,{“class”:“_1vC4OE _2rQ-NK”}) price = prices[0].text