urllib, urllib2的一点区别

用一个命令行翻译小工具dict,发现只能翻译英文,不能翻译中文,阅读源代码,发现使用的是urllib2下的urlopen方法去获取有道翻译api的返回结果。

输入中文返回结果显示乱码,然而使用urllib的urlopen方法却能返回正常的结果,于是去查看python源码查找原因。

在urllib中:

1
2
3
4
5
6
7
8
9
10
11
12
13
def urlopen(url, data=None, proxies=None):
global _urlopener
if proxies is not None:
opener = FancyURLopener(proxies=proxies)
elif not _urlopener:
opener = FancyURLopener()
_urlopener = opener
else:
opener = _urlopener
if data is None:
return opener.open(url)
else:
return opener.open(url, data)

它创建了类 FancyURLopener 对象,并调用了它的 open方法.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def open(self, fullurl, data=None):
fullurl = unwrap(toBytes(fullurl))
fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")
if self.tempcache and fullurl in self.tempcache:
filename, headers = self.tempcache[fullurl]
fp = open(filename, 'rb')
return addinfourl(fp, headers, fullurl)
urltype, url = splittype(fullurl)
if not urltype:
urltype = 'file'
if urltype in self.proxies:
proxy = self.proxies[urltype]
urltype, proxyhost = splittype(proxy)
host, selector = splithost(proxyhost)
url = (host, fullurl) # Signal special case to open_*()
else:
proxy = None
name = 'open_' + urltype
self.type = urltype
name = name.replace('-', '_')
if not hasattr(self, name):
if proxy:
return self.open_unknown_proxy(proxy, fullurl, data)
else:
return self.open_unknown(fullurl, data)
try:
if data is None:
return getattr(self, name)(url)
else:
return getattr(self, name)(url, data)
except socket.error, msg:
raise IOError, ('socket error', msg), sys.exc_info()[2]

主要是第三行中调用了quote方法,会将字符串url编码,所以有道的api可以解析,而urllib2不会将其转码,这里就不贴代码了。
改写一下,这个小工具就支持中英互译了:dict,使用方法:

1
2
3
4
5
dict dick
###################################
# dick 迪克 (U: dik E: dik )
# n. 阴茎,鸡巴;侦探;誓言
###################################
1
2
3
4
5
dict 鸡巴
###################################
# 鸡巴 dick (拼音: jī bā )
# dick
###################################