Thursday 25 October 2007

Using Flash? Here are some statistics...

We (team rubber) have recently posted a huge bunch of statistics covering what Adobe Flash Player Versions are being used around the web.

These statistics are taken from a sample of millions of unique users across a wide variety of our viral marketing campaigns, and give a really good insight into the spread of new versions of the flash player.

Tuesday 11 September 2007

Because Speed matters: 11 ways to optimise your python code

In a post that is both longer and more technical normal I am going to cover 11 ways in which you can speed up python code, and in the name of speed, let's get started:

1) Use the profiler

Much of optimising your python code is counter-intuitive (as you will see) - so it is vital to really know how your changes affect the speed of your code. If you have not already done so, run your code through the python profiler.

2) Lookups aren't free
Python is a very dynamic language and, unlike in compiled C, variable look-ups take time. With this in mind it is a good idea to make local references to any global variables you are using regualarly.

In Python 2.2+, variables are looked up in the following order:
The function's local scope
Any lexically enclosing functions, from inner to outer
The Global scope
Built-in Functions

So, it is especially important to define local references to Global variables. For example the (rather contrived) function foo(), which modifies a globaly scoped list a:

def foo():
for i in range(len(a)):
a[i] = a[i] * 2


would be better written:

def foo():
loca = a
for i in range(len(loca)):
a[i] = a[i] * 2

3) Use the -O flag
By running the python interpreter using the -O flag the interpreter generates optimised byte-code, yeilding a minor performance increase. If you are running your code directly rather than from the command line, you can add this to your shebang line. e.g.

#!/usr/local/bin/python -O

If you don't require docstrings in your compiled code you can use the -OO flag to generate byte-code without docstrings.

4) Use List Comprehensions
In general, list comprehensions are fast. For example, on my machine:

for i in range(100000):
b = [i for i in a if i%2 == 0]

runs 25 percent faster than:

for i in range(100000):
b = []
for j in a:
if j%2 == 0:
b.append(j)

5)Use as much C as possible:
Any available C functions are probably faster than what you would write in python. For example, if you are doing any linear algebra, it is highly recommended that you try using Numpy / SciPy if you have the option. If you have a function that needs to run really quickly, think about writing a C extension module. (unless you want to keep your code pure python for portability)

6)Use .join for string concatination.

For an example, for a tupple of six strings

endstr = "".join(list(strings))

runs 10-20 percent faster on my machine than

endstr = ""
for s in strings:
endstr = endstr + s


and

endstr = "".join(["a","b","c","d","e","f"])

runs 20 percent faster than

endstr = "a"
endstr = endstr + "b"
endstr = endstr + "c"
endstr = endstr + "d"
endstr = endstr + "e"
endstr = endstr + "f"

7) Store references to functions

Just as variable lookup is not free in python, neither is looking up functions. As an important example that you are probably using while you optimise your code, defining

loctime = time.time

allows you to call loctime() instead of time.time(), with an increase in speed of between 10 and 50 percent. This is especially important while optimising, even if you are only calling time a couple of times, since the varying look-up time can make a significant difference to your tests.

As previously mentioned, the Built-in functions are in the final scope searched, so this is useful for commonly used built-in functions.

8) Inline your functions
If possible, try to inline your commonly called functions. This saves a great deal of look-up time, and could make a significant difference. Here the code above takes up to 2 times as long to run as the code below.

slow:

def foo(a):
return a*2

for i in range(1000000):
b = foo(i)

faster:

for i in range(1000000):
b = 2*i

9) Go Psyco
Try using Psyco - a JIT compiler for python - if it is available on your system. Even if it is not, it's worth trying to import the module if your code is going to be run on other machines that may have it. Psyco can speed up code by between 2 and 1000 times.

10) Go fourth and multiply
General good advice no matter what language you are coding in (from ASM to javascript), try to avoid floating point division where possible. This is because floating point multiplication can be done in a single operation in the processor, where division is much slower. Many optimising compilers will try to do this for you, but it is better to take control if you can. e.g.

b = i/2.

takes twice as much time to run (on my system) as

b = i * 0.5

11 ) Don't cast if you don't have to.

If i is an int, remember (for example) that division by a float will automatically cast the result to a float. This can be important, here

b = i/2.

runs 3 times faster than

b = float(i)/2.

due to the overhead of looking up and calling float().

I hope these have been useful, feel free to add any more in the comments.

Tuesday 21 August 2007

Viral Updates

Alrightey then here are some videos that have piqued my interest - some for their laugh-out-loud hilarity, others for being downright bad and shoddy. Sorry for my abysmal attempt at a rhyme, watch the first video in my list and you'll find out why.

The Good Un's:

Free NYC Rap:


Banned Condom Advert:


Fat People:

Fat People - Awesome video clips here

Talk, Talk:


And The Rotten Un's:


Fired Teacher Flips Out In Cafeteria - Watch more free videos



Joanna.

Monday 20 August 2007

Handling timeouts with urllib2 in python

On a more technical note than usual, I recently came across a bug that took a very long time to solve. The problem was related to using urllib2 in Python to fetch a url.

Essentially, urllib2 does not have an inbuilt timeout set, so by default your code will stall on a opener.open() call if the server timesout without telling you. This doesn't happen very often, so your code can run for several weeks before failing to respond.

What is more, there appears to be no direct access to set this timeout, however there is a way around it (ref). The trick is to set the default timeout for all sockets before creating a connection. In the same scope as urllib2, and before you begin using the module, enter the following lines:

import socket
socket.setdefaulttimeout(timeout)

(where timeout is the number of seconds you would like to wait for a timeout.)

This should set the timeout for the socket that will be opened by urllib2. Hope this is as useful to someone else as it was to me...

Tuesday 14 August 2007

More v-rank hotties

Here are a couple more videos that have been picked by our new algorithm:

Vertical Football:



And the video with the highest v-rank almost continually for the past fourtnight,
Johnny Cash - Hurt: link

Monday 6 August 2007

Summaries, summaries

You're a busy person, we all are. So what happens when you want to select from a range of items with really long descriptions? Sometimes it would be great if you could just read a quick summary of the text.

Here is a quick example of a brief summary. I also provide the full text that it was generated from:

Full Text:
A short clip from Surrealist film "The Inconspicuous Furry Banana." Mia
Crackpot and Iona Frisbee meet a wizard, who tells them a recipe to
freeze time. But things go horribly wrong- and instead of freezing time,
they accidentally replace time with a tangerine. Then suddenly, they
begin to experience vivid hallucinations. One of them believes they are
an inconspicuous furry banana, and the other believes that they are half
a metre long with wings that can fly. Together they believe that their
mission is to meet the wizard master- and to save him from the imaginary
sprites... Narration by Chewy Benson.


256 characters
Then suddenly, they begin to experience vivid hallucinations. One of
them believes they are an inconspicuous furry banana, and the other
believes that they are half a metre long with wings that can fly.
Narration by Chewy Benson.


150 characters
Then suddenly, they begin to experience vivid hallucinations. Narration
by Chewy Benson.


Now if you had a list of ten items, would you prefer 10 full descriptions, or 10 of the shorter versions?

If you are interested, here is the video that this text was describing:

Thursday 2 August 2007

Interesting vid sites

Surfing the interweb for the best video sites, I came across these two gems:

VodPod - apart from a cool sounding name, these guys basically allow you to create your own aggregated feed and create your own personalised channel with content you like (based on tags)

WeShow - these guys seem to surfing the net for the best in video around the net. They have a fun daily video round up and also have a community thing going on. I first read about this in the London Paper a couple of days back.

Q

Sunday 29 July 2007

Waterskiing Monkeys

Our new "v-rank" rating for content has picked out a great new video that I thought I had to share with all of our readers, in case you don't keep an eye out on the main viral content network rankings (shame on you, go over there and add it to your favorates!):

Friday 27 July 2007

VCN, meet Jo, Jo meet VCN

Hey all, Joanna is in the Bristol Office for the next few weeks and is charged with feeding the penguins as much as possible. She has already done good work on the environmental stuff when she was with us a few weeks back and is taking a look around for some other bits. Today she went through some past B3TA newsletters to see if there were any gems among their weird websites. In addition to this, she's working hard on the Ad database, looking in particular at adding bloggers. It's a slow process but she's found some good people that we can use in the future.

Next week she should have a proper dabble in seeding a campaign too. The week after that, she'll have my job...

Have a good weekend, y'all.

Kirk.

Wednesday 25 July 2007

V-rank lives, a selection of the highest ranked videos so far

As promised, here is a selection of some of the highest ranked content according to our new method for ranking the viral quality of content. These have been chosen from a small test set, but they are certainly all worth watching! If you like what you see, stay glued to the viral content network.

The frustrated video dater:




Where's Waldo?:
A strange mix between Blair Witch and what us Brits will remember as "Where's Wally" - can you spot him?





And finally, just to show that it is impossible to predict what people will love, here's Falcor:

Tuesday 24 July 2007

Progress so far

I realised that I am still to post on the blog, so I thought I would pop a quick blog post up to mention some of the exciting recent developments we are having with the work on the viral content network:
  1. Our new backend database has a few thousand entries in it, which means lots of interesting statistics are coming out of it, and there is lots of data for us to test our system.
  2. Within the next few days we will be testing our new method of measuring the viral quality of content, which I am currently dubbing the content's "v-rank". Stay tuned for a look at what our new method says are the best viral videos and games in our system.
  3. The feedback from Kho's phone spree has started to find it's way into the system
Than's enough for this time, stay tuned for a first look at how our statistical wizardy works on selecting really great content, or even better, subscribe to our RSS feed.

Monday 16 July 2007

Cold-Calling-Tastic

To get the ball rolling I have finally worked up the courage to do some cold calling done for VCN! Woo!

I have spent the afternoon contacting a few publishers to see if they would be interested in what VCN has to offer. Judging from their websites it looks promising. The people on the other end are friendly although they don't really know what VCN is as its so new. Hopefully the publisher pack that Chris provided should help. (I've hit some dead ends as well).

However more work needs to be done! We really do need more publisher websites to contact and if you can think of any good sites/blogs that would love VCN and the free content contact me immediately!

- Kho.

Wednesday 11 July 2007

VCN Lives!


The crack VCN team met up today in Bristol to kick things off properly. This blog is part of our new post-stealth mode team blitz thingy.
Here's a photo of the team in full note making mode . . .