Network programming for PyS60 (X)
by Marcelo Barros
To celebrate our tenth post, I will talk about urllib module and one interesting application using it: how to retrieve blogs statistics from WordPress accounts.
urllib is a versatile python module for fetching data across the Internet. It has several interesting features, like:
- opening URLs with an interface similar to that one found in file operations.
- function for processing URL, like escaping HTML and parameter processing
- proxy and HTTP basic authentication support
Powerful programs can be created with urllib. For instance, suppose you want to fetch the content of page http://wiki.forum.nokia.com/ and save it into local file wikiforumnokia.html. This can be performed by urllib using few lines of code:
import urllib # returns a file like interface furl = urllib.urlopen("http://wiki.forum.nokia.com/") # reading the "file" contents = furl.read() # saving the page contents flocal = open("wikiforumnokia.html","wt") flocal.write(contents)
Or, if you prefer, urlretrieve() method can make this job with just two lines of code:
import urllib urllib.urlretrieve("http://wiki.forum.nokia.com/","wikiforumnokia.html")
In fact, urlopen() performs an request and fetches the contents of the desired page, striping the HTTP header. If additional parameters are necessary in your request, they may be added to the URL, like a typical URL GET request.
import urllib params = urllib.urlencode({'name': 'My name is bond', 'phone':'007007'}) url = "www.exemplo.com/yourname?" + params print url
The output is an URL with all parameters encoded, as you can see in the address bar when searching at Yahoo! or Google.
htp://www.exemplo.com/yourname?phone=007007&name=My+name+is+bond
It is possible to simulate anĀ HTTP POST request as well. POST requests do not use the URL to encode the parameters. Instead, parameters are included in the body of the request. Forms are a good example of POST requests, where all form parameters are hidden inside html body.
We can access www.exemplo.com with same previous arguments but now using POST, as demonstrated in the following code snippet.
import urllib params = urllib.urlencode({'name': 'My name is bond', 'phone':'007007'}) result = urllib.urlopen("www.exemplo.com/yourname",params).read()
In this case, an additional parameter must be supplied to urlopen, indicating the POST request parameters.
As an example, I will use the wordpress statistics (http://stats.wordpress.com/csv.php) for retrieving information about blog views and post views. It is necessary an api_key (instructions here) and to create an appropriated HTTP GET request (see this link for more details).
Blog views may be fetched with the following URL:
http://stats.wordpress.com/csv.php?api_key=your_api_key&
blog_uri=http://yourblogname.wordpress.com&blog_id=0&table=views
All post views may be fetched with the following URL:
http://stats.wordpress.com/csv.php?api_key=your_api_key&
blog_uri=http://yourblogname.wordpress.com&blog_id=0&table=postviews
WordPress can send the response in CSV (comma separated values) or XML. I will use CSV but just parsing few fields. It is necessary a smarter strategy for avoiding problems with commas in post titles, for example (or switch to XML). Moreover, you may have a lot of headaches when transforming HTML in Unicode (avoided here as well).
The code is below and it is not difficult to understand after this urllib lesson.
# -*- coding: utf-8 -*- # Marcelo Barros de Almeida # marcelobarrosalmeida (at) gmail.com import urllib from appuifw import * import e32 class WPStats(object): """ This classe uses urllib for accessing wordpress blog statistics. Only blogs hosted at wordpress.com may be used. """ STAT_URL = "http://stats.wordpress.com/csv.php?" def __init__(self,api_key,blog_uri,blog_id=0,max_days=30): """ Init WPStats parmeters. Please use: api_key: copy it from http://yourblogname.wordpress.com/wp-admin/profile.php blog_uri: your blog uri (http://yourblogname.wordpress.com) max_days: all accesses will provided statistics for the last max_days """ self.api_key = api_key self.blog_uri = blog_uri self.blog_id = blog_id self.max_days = max_days def __request_stats(self,custom_params): """ Common request function. Additional parameters may be encoded for GET using custom_params dictionary """ params = {"api_key":self.api_key, "blog_id":self.blog_id, "blog_uri":self.blog_uri, "format":"cvs", "days":self.max_days} params.update(custom_params) # add custom_params values to params try: f = urllib.urlopen(self.STAT_URL + urllib.urlencode(params)) except Exception, e: raise e data = [] rsp = f.read() if rsp: # this split may fail for post title with "\n" on it - improve it data = rsp.split("\n")[1:-1] # discard column names and last empty element return data def get_post_views(self,post_id = 0): """ Get the number of views for a given post id or number of views for all posts (post id = 0) Response is an array of tuples like below: [(date,post_id,views),date,post_id,views),...] """ params = {"table":"postviews"} if post_id: params['post_id'] = post_id data = self.__request_stats(params) res = [] for d in data: # this split may fail for post title with "," on it row = d.split(",") res.append((row[0],row[1],row[-1])) return res def get_blog_views(self): """ Get the number of views Response format is an array of tuples like below: [(date,views),(date,views),...] """ params = {"table":"view"} data = self.__request_stats(params) res = [] for d in data: res.append(tuple(d.split(","))) return res class WPStatClient(object): """ Get statistics from wordpress """ def __init__(self): self.lock = e32.Ao_lock() app.title = u"WP Stats demo" app.menu = [(u"Get blog views", self.blog_views), (u"Get post views", self.post_views), (u"About", self.about), (u"Exit", self.close_app)] self.body = Listbox([(u"Please, update statistics",u"")]) app.body = self.body app.screen = "normal" self.wpstats = WPStats("put_api_key_here","http://your_blog_name_here.wordpress.com") self.lock.wait() def blog_views(self): try: bv = self.wpstats.get_blog_views() except: note(u"Impossible to get stats","error") else: if bv: items = [] for stat in bv: items.append((unicode(stat[0]), u"Views:" + unicode(stat[1]))) self.body.set_list(items) else: self.body.set_list([(u"",u"")]) def post_views(self): try: pv = self.wpstats.get_post_views() except: note(u"Impossible to get stats","error") else: if pv: items = [] for stat in pv: items.append((unicode(stat[0]), u"PostID:"+unicode(stat[1]) + u" Views:"+unicode(stat[2]))) self.body.set_list(items) else: self.body.set_list([(u"",u"")]) def about(self): note(u"WP stats demo by Marcelo Barros (marcelobarrosalmeida@gmail.com)","info") def close_app(self): self.lock.signal() app.set_exit() if __name__ == "__main__": WPStatClient()
Some screenshots:
Reference: Bringing REST architecture to S60 devices with Python: a guided tutorial using Twitter API
Related posts:
- Network programming for PyS60 (XVI) Qik is a new and innovative service that allows you...
- Network programming for PyS60 (XIV) Have you already heard about Beautiful Soup ? Beautiful Soup...
- Network programming for PyS60 (VII) Everything is about "protocols" in computer networks, doesn't it ?...
- Network programming for PyS60 (VI) Before presenting some server code it is important to discuss...
- Network programming for PyS60 (VIII) Did you do your homework ? So, I would like...
Related posts brought to you by Yet Another Related Posts Plugin.
Is it possible to download a file or a page?
If I save a page as html (as shown in the example), can I reconstruct a webage from it?
Thanks in advance
Yes. For files, better to use urllib.urlretrieve(). But, for html pages, if you want to reconstruct all elements in the page, you need to download them. I mean, it is necessary to parse the html page and download all elements (images, for instance). beautifulsoup module may help you.
http://www.crummy.com/software/BeautifulSoup/documentation.html