使用 BeautifulSoup从HTML中提取JSON - 晓得博客

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

不羁的山羊 · ModelScope中，我在安装pytorc ...· 1 年前 ·

没有腹肌的雪糕 · Python Seaborn: ...· 2 年前 ·

大鼻子的骆驼 · delphi 7 ...· 2 年前 ·

直爽的猕猴桃 · sqlalchemy 递归查询-掘金· 2 年前 ·

page = requests.get(base_url) soup = BeautifulSoup(page.text, "html.parser") books = soup.find_all('li', attrs={'class':'col-xs-6 col-sm-4 col-md-3 col-lg-3'}) star = ['One', 'Two', 'Three', 'Four', 'Five'] res, book_no = [], 1 # Iterate books classand check for the given tags for book in books: title = book.find('img')['alt'] link = base_url[:37] + book.find('a')['href'] for index in range(5): find_stars = book.find('p', attrs={'class': 'star-rating ' + star[index]}) if find_stars is not None: stars = star[index] + " out of 5" break price = book.find('p', attrs={'class': 'price_color'}).text instock = book.find('p', attrs={'class':'instock availability'}).text.strip() data = {'book no': str(book_no), 'title': title,'rating': stars, 'price': price, 'link': link,'stock': instock} # Append the dictionary to the list res.append(data) book_no += 1 return res # Main Function if __name__ == "__main__": # Enter the url of website base_url = "https://books.toscrape.com/catalogue/page-1.html" res = json_from_html_using_bs4(base_url) # it to books.json file. with open('books.json', 'w', encoding='latin-1') as f: json.dump(res, f, indent=8, ensure_ascii=False) print("Created Json File")

推荐文章

不羁的山羊 · ModelScope中，我在安装pytorch老师报错，网上也搜不到这是为啥？-问答-阿里云开发者社区-阿里云

1 年前

没有腹肌的雪糕 · Python Seaborn: 常见的画图与保存方法总结_seaborn 保存图片-CSDN博客

2 年前

大鼻子的骆驼 · delphi 7 中的tcpClient如何接收服务器发送的数据_百度知道

2 年前

直爽的猕猴桃 · sqlalchemy 递归查询-掘金

2 年前

关于