Tag: page rank

  • Google Page Rank Python Script

    This isn’t my script but I thought it would appeal to the reader of this blog.  It’s a script that  will lookup the Google Page Rank for any website and uses the same interface as the Google Toolbar to do it. I’d like to thank Fred Cirera for writing it and you can checkout his blog about this script here.

    I’m not exactly sure what I would use this for but it might have applications for anyone who wants to do some really advanced SEO work and find a real way to accomplish Page Rank sculpting. Perhaps finding the best websites to put links on.

    The reason it is such an involved bit of math is that it need to compute a checksum in order to work. It should be pretty reliable since it doesn’t involve and scraping.

    Example usage:

    $ python pagerank.py http://www.google.com/
    PageRank: 10	URL: http://www.google.com/
    
    $ python pagerank.py http://www.mozilla.org/
    PageRank: 9	URL: http://www.mozilla.org/
    
    $ python pagerank.py http://halotis.com
    PageRange: 3   URL: http://www.halotis.com/
    

    And the script:

    #!/usr/bin/env python
    #
    #  Script for getting Google Page Rank of page
    #  Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm
    #
    #  original from http://pagerank.gamesaga.net/
    #  this version was adapted from http://www.djangosnippets.org/snippets/221/
    #  by Corey Goldberg - 2010
    #
    #  Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
    
    
    
    import urllib
    
    
    def get_pagerank(url):
        hsh = check_hash(hash_url(url))
        gurl = 'http://www.google.com/search?client=navclient-auto&features=Rank:&q=info:%s&ch=%s' % (urllib.quote(url), hsh)
        try:
            f = urllib.urlopen(gurl)
            rank = f.read().strip()[9:]
        except Exception:
            rank = 'N/A'
        if rank == '':
            rank = '0'
        return rank
        
        
    def  int_str(string, integer, factor):
        for i in range(len(string)) :
            integer *= factor
            integer &= 0xFFFFFFFF
            integer += ord(string[i])
        return integer
    
    
    def hash_url(string):
        c1 = int_str(string, 0x1505, 0x21)
        c2 = int_str(string, 0, 0x1003F)
    
        c1 >>= 2
        c1 = ((c1 >> 4) & 0x3FFFFC0) | (c1 & 0x3F)
        c1 = ((c1 >> 4) & 0x3FFC00) | (c1 & 0x3FF)
        c1 = ((c1 >> 4) & 0x3C000) | (c1 & 0x3FFF)
    
        t1 = (c1 & 0x3C0) < < 4
        t1 |= c1 & 0x3C
        t1 = (t1 << 2) | (c2 & 0xF0F)
    
        t2 = (c1 & 0xFFFFC000) << 4
        t2 |= c1 & 0x3C00
        t2 = (t2 << 0xA) | (c2 & 0xF0F0000)
    
        return (t1 | t2)
    
    
    def check_hash(hash_int):
        hash_str = '%u' % (hash_int)
        flag = 0
        check_byte = 0
    
        i = len(hash_str) - 1
        while i >= 0:
            byte = int(hash_str[i])
            if 1 == (flag % 2):
                byte *= 2;
                byte = byte / 10 + byte % 10
            check_byte += byte
            flag += 1
            i -= 1
    
        check_byte %= 10
        if 0 != check_byte:
            check_byte = 10 - check_byte
            if 1 == flag % 2:
                if 1 == check_byte % 2:
                    check_byte += 9
                check_byte >>= 1
    
        return '7' + str(check_byte) + hash_str
    
    
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            url = 'http://www.google.com/'
        else:
            url = sys.argv[1]
    
        print get_pagerank(url)