添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
憨厚的大脸猫  ·  python ...·  1 周前    · 
霸气的花卷  ·  python list 错位相减 ...·  1 周前    · 
帅气的领带  ·  【Pyspark ...·  5 天前    · 
近视的橙子  ·  python ...·  4 天前    · 
深沉的勺子  ·  LibriVox·  3 月前    · 
聪明的海龟  ·  中国钢铁新闻网·  5 月前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Given a URL to a text file, what is the simplest way to read the contents of the text file?

Ask Question

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line

Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2

Actually the simplest way is:

import urllib2  # the lib that handles the url stuff
data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

You don't even need "readlines", as Will suggested. You could even shorten it to: *

import urllib2
for line in urllib2.urlopen(target_url):
    print line

But remember in Python, readability matters.

However, this is the simplest way but not the safe way because most of the time with network programming, you don't know if the amount of data to expect will be respected. So you'd generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:

import urllib2
data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines
for line in data:
    print line

* Second example in Python 3:

import urllib.request  # the lib that handles the url stuff
for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

I'm a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is

import urllib.request
data = urllib.request.urlopen(target_url)
for line in data:

or alternatively

from urllib.request import urlopen
data = urlopen(target_url)

Note that just import urllib does not work.

The requests library has a simpler interface and works with both Python 2 and 3.

import requests
response = requests.get(target_url)
data = response.text

There's really no need to read line-by-line. You can get the whole thing like this:

import urllib
txt = urllib.urlopen(target_url).read()
http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

This can be a better option than urllib since urllib3 boasts having

  • Thread safety.
  • Connection pooling.
  • Client-side SSL/TLS verification.
  • File uploads with multipart encoding.
  • Helpers for retrying requests and dealing with HTTP redirects.
  • Support for gzip and deflate encoding.
  • Proxy support for HTTP and SOCKS.
  • 100% test coverage.
  • Actually this is the only one of the above answers that will install (urllibx) for the latest version of Python to date. – D Left Adjoint to U Jan 24, 2020 at 0:26 +1, but please note that it's the simplest way, NOT THE SAFEST. If any error occurs on the server side and this one delivery content for ever, you could ends up with an infinite loop. – e-satis Sep 8, 2009 at 11:03

    For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):

    from urllib.request import urlopen
    data = urlopen("[your url goes here]").read().decode('utf-8')
    # Do what you need to do with the data.
    

    Just updating here the solution suggested by @ken-kinder for Python 2 to work with Python 3:

    import urllib
    urllib.request.urlopen(target_url).read()
    as @Andrew Mao suggested

    import requests
    response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
    data = response.text
    for i, line in enumerate(data.split('\n')):
        print(f'{i}   {line}')
    
    0    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
    1    prices and the demand for clean air', J. Environ. Economics & Management,
    2    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
    3    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
    4    pages 244-261 of the latter.
    6    Variables in order:
    

    Checkout kaggle notebook on how to extract dataset/dataframe from URL

    I do think requests is the best option. Also note the possibility of setting encoding manually.

    import requests
    response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
    # response.encoding = "utf-8"
    hehe = response.text
    

    You can use this, as well for simple methodology:

    import requests
    url_res = requests.get(url= "http://www.myhost.com/SomeFile.txt")
    with open(filename + ".txt", "wb") as file:
        file.write(url_res.content)
    

    None of these answers work in Python 3. I am using Python 3.9 and refuse to import urllib2 which dates back to at least 2010.

    Here is how you read a text file located on a remote server given the url:

    import io
    import urllib
    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}
    url = 'https://server.com/path/hello world.txt'
    req = urllib.request.Request(url, headers=hdr)
    u = urllib.request.urlopen(req)
    file = io.TextIOWrapper(u, encoding='utf-8')
    file_contents = file.read()
    print(file_contents)
    

    Hope this is helpful to someone because it was very hard to find the answer.

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.