So we haven't kept you that updated on this blog recently.
We've launched a video search engine off our research, and launched a webmaster ad network to help you bring in more cash.
We've also created several webmaster tutorial videos on a whole range of topics, from how to put widgets on your blog to how to avoid banner blindness using virals.
Yes, it's been a very quick week (has it only been a week?), but perhaps the most exciting news to me as an SEO (Search Engine Optimiser) is that viralsauce.com now ranks #1 in Google for "viral video search engine". Not bad for a site that has only been properly launched in Alpha for a week!
Friday, 11 January 2008
Thursday, 25 October 2007
Using Flash? Here are some statistics...
We (team rubber) have recently posted a huge bunch of statistics covering what Adobe Flash Player Versions are being used around the web.
These statistics are taken from a sample of millions of unique users across a wide variety of our viral marketing campaigns, and give a really good insight into the spread of new versions of the flash player.
These statistics are taken from a sample of millions of unique users across a wide variety of our viral marketing campaigns, and give a really good insight into the spread of new versions of the flash player.
Tuesday, 11 September 2007
Because Speed matters: 11 ways to optimise your python code
In a post that is both longer and more technical normal I am going to cover 11 ways in which you can speed up python code, and in the name of speed, let's get started:
1) Use the profiler
Much of optimising your python code is counter-intuitive (as you will see) - so it is vital to really know how your changes affect the speed of your code. If you have not already done so, run your code through the python profiler.
2) Lookups aren't free
Python is a very dynamic language and, unlike in compiled C, variable look-ups take time. With this in mind it is a good idea to make local references to any global variables you are using regualarly.
In Python 2.2+, variables are looked up in the following order:
The function's local scope
Any lexically enclosing functions, from inner to outer
The Global scope
Built-in Functions
So, it is especially important to define local references to Global variables. For example the (rather contrived) function foo(), which modifies a globaly scoped list a:
def foo():
for i in range(len(a)):
a[i] = a[i] * 2
would be better written:
def foo():
loca = a
for i in range(len(loca)):
a[i] = a[i] * 2
3) Use the -O flag
By running the python interpreter using the -O flag the interpreter generates optimised byte-code, yeilding a minor performance increase. If you are running your code directly rather than from the command line, you can add this to your shebang line. e.g.
#!/usr/local/bin/python -O
If you don't require docstrings in your compiled code you can use the -OO flag to generate byte-code without docstrings.
4) Use List Comprehensions
In general, list comprehensions are fast. For example, on my machine:
for i in range(100000):
b = [i for i in a if i%2 == 0]
runs 25 percent faster than:
for i in range(100000):
b = []
for j in a:
if j%2 == 0:
b.append(j)
5)Use as much C as possible:
Any available C functions are probably faster than what you would write in python. For example, if you are doing any linear algebra, it is highly recommended that you try using Numpy / SciPy if you have the option. If you have a function that needs to run really quickly, think about writing a C extension module. (unless you want to keep your code pure python for portability)
6)Use .join for string concatination.
For an example, for a tupple of six strings
endstr = "".join(list(strings))
runs 10-20 percent faster on my machine than
endstr = ""
for s in strings:
endstr = endstr + s
and
endstr = "".join(["a","b","c","d","e","f"])
runs 20 percent faster than
endstr = "a"
endstr = endstr + "b"
endstr = endstr + "c"
endstr = endstr + "d"
endstr = endstr + "e"
endstr = endstr + "f"
7) Store references to functions
Just as variable lookup is not free in python, neither is looking up functions. As an important example that you are probably using while you optimise your code, defining
loctime = time.time
allows you to call loctime() instead of time.time(), with an increase in speed of between 10 and 50 percent. This is especially important while optimising, even if you are only calling time a couple of times, since the varying look-up time can make a significant difference to your tests.
As previously mentioned, the Built-in functions are in the final scope searched, so this is useful for commonly used built-in functions.
8) Inline your functions
If possible, try to inline your commonly called functions. This saves a great deal of look-up time, and could make a significant difference. Here the code above takes up to 2 times as long to run as the code below.
slow:
def foo(a):
return a*2
for i in range(1000000):
b = foo(i)
faster:
for i in range(1000000):
b = 2*i
9) Go Psyco
Try using Psyco - a JIT compiler for python - if it is available on your system. Even if it is not, it's worth trying to import the module if your code is going to be run on other machines that may have it. Psyco can speed up code by between 2 and 1000 times.
10) Go fourth and multiply
General good advice no matter what language you are coding in (from ASM to javascript), try to avoid floating point division where possible. This is because floating point multiplication can be done in a single operation in the processor, where division is much slower. Many optimising compilers will try to do this for you, but it is better to take control if you can. e.g.
b = i/2.
takes twice as much time to run (on my system) as
b = i * 0.5
11 ) Don't cast if you don't have to.
If i is an int, remember (for example) that division by a float will automatically cast the result to a float. This can be important, here
b = i/2.
runs 3 times faster than
b = float(i)/2.
due to the overhead of looking up and calling float().
I hope these have been useful, feel free to add any more in the comments.
1) Use the profiler
Much of optimising your python code is counter-intuitive (as you will see) - so it is vital to really know how your changes affect the speed of your code. If you have not already done so, run your code through the python profiler.
2) Lookups aren't free
Python is a very dynamic language and, unlike in compiled C, variable look-ups take time. With this in mind it is a good idea to make local references to any global variables you are using regualarly.
In Python 2.2+, variables are looked up in the following order:
The function's local scope
Any lexically enclosing functions, from inner to outer
The Global scope
Built-in Functions
So, it is especially important to define local references to Global variables. For example the (rather contrived) function foo(), which modifies a globaly scoped list a:
def foo():
for i in range(len(a)):
a[i] = a[i] * 2
would be better written:
def foo():
loca = a
for i in range(len(loca)):
a[i] = a[i] * 2
3) Use the -O flag
By running the python interpreter using the -O flag the interpreter generates optimised byte-code, yeilding a minor performance increase. If you are running your code directly rather than from the command line, you can add this to your shebang line. e.g.
#!/usr/local/bin/python -O
If you don't require docstrings in your compiled code you can use the -OO flag to generate byte-code without docstrings.
4) Use List Comprehensions
In general, list comprehensions are fast. For example, on my machine:
for i in range(100000):
b = [i for i in a if i%2 == 0]
runs 25 percent faster than:
for i in range(100000):
b = []
for j in a:
if j%2 == 0:
b.append(j)
5)Use as much C as possible:
Any available C functions are probably faster than what you would write in python. For example, if you are doing any linear algebra, it is highly recommended that you try using Numpy / SciPy if you have the option. If you have a function that needs to run really quickly, think about writing a C extension module. (unless you want to keep your code pure python for portability)
6)Use .join for string concatination.
For an example, for a tupple of six strings
endstr = "".join(list(strings))
runs 10-20 percent faster on my machine than
endstr = ""
for s in strings:
endstr = endstr + s
and
endstr = "".join(["a","b","c","d","e","f"])
runs 20 percent faster than
endstr = "a"
endstr = endstr + "b"
endstr = endstr + "c"
endstr = endstr + "d"
endstr = endstr + "e"
endstr = endstr + "f"
7) Store references to functions
Just as variable lookup is not free in python, neither is looking up functions. As an important example that you are probably using while you optimise your code, defining
loctime = time.time
allows you to call loctime() instead of time.time(), with an increase in speed of between 10 and 50 percent. This is especially important while optimising, even if you are only calling time a couple of times, since the varying look-up time can make a significant difference to your tests.
As previously mentioned, the Built-in functions are in the final scope searched, so this is useful for commonly used built-in functions.
8) Inline your functions
If possible, try to inline your commonly called functions. This saves a great deal of look-up time, and could make a significant difference. Here the code above takes up to 2 times as long to run as the code below.
slow:
def foo(a):
return a*2
for i in range(1000000):
b = foo(i)
faster:
for i in range(1000000):
b = 2*i
9) Go Psyco
Try using Psyco - a JIT compiler for python - if it is available on your system. Even if it is not, it's worth trying to import the module if your code is going to be run on other machines that may have it. Psyco can speed up code by between 2 and 1000 times.
10) Go fourth and multiply
General good advice no matter what language you are coding in (from ASM to javascript), try to avoid floating point division where possible. This is because floating point multiplication can be done in a single operation in the processor, where division is much slower. Many optimising compilers will try to do this for you, but it is better to take control if you can. e.g.
b = i/2.
takes twice as much time to run (on my system) as
b = i * 0.5
11 ) Don't cast if you don't have to.
If i is an int, remember (for example) that division by a float will automatically cast the result to a float. This can be important, here
b = i/2.
runs 3 times faster than
b = float(i)/2.
due to the overhead of looking up and calling float().
I hope these have been useful, feel free to add any more in the comments.
Tuesday, 21 August 2007
Viral Updates
Alrightey then here are some videos that have piqued my interest - some for their laugh-out-loud hilarity, others for being downright bad and shoddy. Sorry for my abysmal attempt at a rhyme, watch the first video in my list and you'll find out why.
The Good Un's:
Free NYC Rap:
Banned Condom Advert:
Fat People:
Fat People - Awesome video clips here
Talk, Talk:
And The Rotten Un's:
Fired Teacher Flips Out In Cafeteria - Watch more free videos
Joanna.
The Good Un's:
Free NYC Rap:
Banned Condom Advert:
Fat People:
Fat People - Awesome video clips here
Talk, Talk:
And The Rotten Un's:
Fired Teacher Flips Out In Cafeteria - Watch more free videos
Joanna.
Monday, 20 August 2007
Handling timeouts with urllib2 in python
On a more technical note than usual, I recently came across a bug that took a very long time to solve. The problem was related to using urllib2 in Python to fetch a url.
Essentially, urllib2 does not have an inbuilt timeout set, so by default your code will stall on a opener.open() call if the server timesout without telling you. This doesn't happen very often, so your code can run for several weeks before failing to respond.
What is more, there appears to be no direct access to set this timeout, however there is a way around it (ref). The trick is to set the default timeout for all sockets before creating a connection. In the same scope as urllib2, and before you begin using the module, enter the following lines:
import socket
socket.setdefaulttimeout(timeout)
(where timeout is the number of seconds you would like to wait for a timeout.)
This should set the timeout for the socket that will be opened by urllib2. Hope this is as useful to someone else as it was to me...
Essentially, urllib2 does not have an inbuilt timeout set, so by default your code will stall on a opener.open() call if the server timesout without telling you. This doesn't happen very often, so your code can run for several weeks before failing to respond.
What is more, there appears to be no direct access to set this timeout, however there is a way around it (ref). The trick is to set the default timeout for all sockets before creating a connection. In the same scope as urllib2, and before you begin using the module, enter the following lines:
import socket
socket.setdefaulttimeout(timeout)
(where timeout is the number of seconds you would like to wait for a timeout.)
This should set the timeout for the socket that will be opened by urllib2. Hope this is as useful to someone else as it was to me...
Tuesday, 14 August 2007
More v-rank hotties
Here are a couple more videos that have been picked by our new algorithm:
Vertical Football:
And the video with the highest v-rank almost continually for the past fourtnight,
Johnny Cash - Hurt: link
Vertical Football:
And the video with the highest v-rank almost continually for the past fourtnight,
Johnny Cash - Hurt: link
Monday, 6 August 2007
Summaries, summaries
You're a busy person, we all are. So what happens when you want to select from a range of items with really long descriptions? Sometimes it would be great if you could just read a quick summary of the text.
Here is a quick example of a brief summary. I also provide the full text that it was generated from:
Full Text:
A short clip from Surrealist film "The Inconspicuous Furry Banana." Mia
Crackpot and Iona Frisbee meet a wizard, who tells them a recipe to
freeze time. But things go horribly wrong- and instead of freezing time,
they accidentally replace time with a tangerine. Then suddenly, they
begin to experience vivid hallucinations. One of them believes they are
an inconspicuous furry banana, and the other believes that they are half
a metre long with wings that can fly. Together they believe that their
mission is to meet the wizard master- and to save him from the imaginary
sprites... Narration by Chewy Benson.
256 characters
Then suddenly, they begin to experience vivid hallucinations. One of
them believes they are an inconspicuous furry banana, and the other
believes that they are half a metre long with wings that can fly.
Narration by Chewy Benson.
150 characters
Then suddenly, they begin to experience vivid hallucinations. Narration
by Chewy Benson.
Now if you had a list of ten items, would you prefer 10 full descriptions, or 10 of the shorter versions?
If you are interested, here is the video that this text was describing:
Here is a quick example of a brief summary. I also provide the full text that it was generated from:
Full Text:
A short clip from Surrealist film "The Inconspicuous Furry Banana." Mia
Crackpot and Iona Frisbee meet a wizard, who tells them a recipe to
freeze time. But things go horribly wrong- and instead of freezing time,
they accidentally replace time with a tangerine. Then suddenly, they
begin to experience vivid hallucinations. One of them believes they are
an inconspicuous furry banana, and the other believes that they are half
a metre long with wings that can fly. Together they believe that their
mission is to meet the wizard master- and to save him from the imaginary
sprites... Narration by Chewy Benson.
256 characters
Then suddenly, they begin to experience vivid hallucinations. One of
them believes they are an inconspicuous furry banana, and the other
believes that they are half a metre long with wings that can fly.
Narration by Chewy Benson.
150 characters
Then suddenly, they begin to experience vivid hallucinations. Narration
by Chewy Benson.
Now if you had a list of ten items, would you prefer 10 full descriptions, or 10 of the shorter versions?
If you are interested, here is the video that this text was describing:
Subscribe to:
Posts (Atom)