添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm trying to get a JavaScript var value from an HTML source code using BeautifulSoup.

For example I have:

<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>

I want something to return the value of the var "my" in Python

How can I achieve that?

The simplest approach is to use a regular expression pattern to both locate the element via BeautifulSoup and extract the desired substring:

import re
from bs4 import BeautifulSoup
data = """
<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>
soup = BeautifulSoup(data, "html.parser")
pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print(pattern.search(script.text).group(1))

Prints hello.

Another idea would be to use a JavaScript parser and locate a variable declaration node, check the identifier to be of a desired value and extract the initializer. Example using slimit parser:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var my = 'hello';
var name = 'hi';
var is = 'halo';
</script>
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=lambda text: text and "var my" in text)
# parse js
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
        print(node.initializer.value)

Prints hello.

the answer, pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL) should get a wrong way, have to remove the line-end sign $ when set re.MULTILINE re.DOTALL at same time.

try with python 3.6.4

Building on @alecxe's answer, but considering a more complex case of an array of dictionaries - or an array of flat json objects:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var my = [{'dic1key1':1}, {'dic2key1':1}];
var name = 'hi';
var is = 'halo';
</script>
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=lambda text: text and "var my" in text)
# parse js
parser = Parser()
tree = parser.parse(script.text)
array_items = []
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
        for item in node.initializer.items:
            parsed_dict = {getattr(n.left, 'value', '')[1:-1]: getattr(n.right, 'value', '')[1:-1]
                for n in nodevisitor.visit(item)
                if isinstance(n, slimit.ast.Assign)}
        array_items.append(parsed_dict)
print(array_items)
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.