Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm trying to get a JavaScript var value from an HTML source code using BeautifulSoup.
For example I have:
<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>
I want something to return the value of the var "my" in Python
How can I achieve that?
The simplest approach is to use a regular expression pattern to both locate the element via BeautifulSoup
and extract the desired substring:
import re
from bs4 import BeautifulSoup
data = """
<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>
soup = BeautifulSoup(data, "html.parser")
pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print(pattern.search(script.text).group(1))
Prints hello
.
Another idea would be to use a JavaScript parser and locate a variable declaration node, check the identifier to be of a desired value and extract the initializer. Example using slimit
parser:
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var my = 'hello';
var name = 'hi';
var is = 'halo';
</script>
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=lambda text: text and "var my" in text)
# parse js
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
print(node.initializer.value)
Prints hello
.
the answer, pattern = re.compile(r"var my = '(.*?)';$", re.MULTILINE | re.DOTALL)
should get a wrong way, have to remove the line-end sign $
when set re.MULTILINE re.DOTALL
at same time.
try with python 3.6.4
Building on @alecxe's answer, but considering a more complex case of an array of dictionaries - or an array of flat json objects:
from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
<script>
var my = [{'dic1key1':1}, {'dic2key1':1}];
var name = 'hi';
var is = 'halo';
</script>
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=lambda text: text and "var my" in text)
# parse js
parser = Parser()
tree = parser.parse(script.text)
array_items = []
for node in nodevisitor.visit(tree):
if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
for item in node.initializer.items:
parsed_dict = {getattr(n.left, 'value', '')[1:-1]: getattr(n.right, 'value', '')[1:-1]
for n in nodevisitor.visit(item)
if isinstance(n, slimit.ast.Assign)}
array_items.append(parsed_dict)
print(array_items)
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.