Lua Performance Ricing.pdf

(149 KB) Pobierz
2
Lua Performance Tips
Roberto Ierusalimschy
In Lua, as in any other programming language, we should always follow the two
maxims of program optimization:
Rule #1:
Don’t do it.
Rule #2:
Don’t do it yet.
(for experts only)
Those rules are particularly relevant when programming in Lua. Lua is famous
for its performance, and it deserves its reputation among scripting languages.
Nevertheless, we all know that performance is a key ingredient of program-
ming. It is not by chance that problems with exponential time complexity are
called
intractable.
A too late result is a useless result. So, every good program-
mer should always balance the costs from spending resources to optimize a piece
of code against the gains of saving resources when running that code.
The first question regarding optimization a good programmer always asks is:
“Does the program needs to be optimized?” If the answer is positive (but only
then), the second question should be: “Where?”
To answer both questions we need some instrumentation. We should not
try to optimize software without proper measurements. The difference between
experienced programmers and novices is
not
that experienced programmers are
better at spotting where a program may be wasting its time: The difference is
that experienced programmers know they are not good at that task.
A few years ago, Noemi Rodriguez and I developed a prototype for a CORBA
ORB (Object Request Broker) in Lua, which later evolved into OiL (Orb
in
Lua).
As a first prototype, the implementation aimed at simplicity. To avoid
Copyright c 2008 by Roberto Ierusalimschy. Used by permission.
15
16
2
·
Lua Performance Tips
the need for extra C libraries, the prototype serialized integers using a few
arithmetic operations to isolate each byte (conversion to base 256). It did not
support floating-point values. Because CORBA handles strings as sequences of
characters, our ORB first converted Lua strings into a sequence (that is, a Lua
table) of characters and then handled the result like any other sequence.
When we finished that first prototype, we compared its performance against
a professional ORB implemented in C++. We expected our ORB to be somewhat
slower, as it was implemented in Lua, but we were disappointed by how much
slower it was. At first, we just laid the blame on Lua. Later, we suspected
that the culprit could be all those operations needed to serialize each number.
So, we decided to run the program under a profiler. We used a very simple
profiler, not unlike the one described in Chapter 23 of
Programming in Lua.
The profiler results shocked us. Against our gut feelings, the serialization of
numbers had no measurable impact on the performance, among other reasons
because there were not that many numbers to serialize. The serialization of
strings, however, was responsible for a huge part of the total time. Practically
every CORBA message has several strings, even when we are not manipulating
strings explicitly: object references, method names, and some other internal
values are all coded as strings. And the serialization of each string was an
expensive operation, because it needed to create a new table, fill it with each
individual character, and then serialize the resulting sequence, which involved
serializing each character one by one. Once we reimplemented the serialization
of strings as a special case (instead of using the generic code for sequences), we
got a respectable speed up. With just a few extra lines of code, the performance
of your implementation was comparable to the C++ implementation.
1
So, we should always measure when optimizing a program for performance.
Measure before, to know where to optimize. And measure after, to know whether
the “optimization” actually improved our code.
Once you decide that you really must optimize your Lua code, this text may
help you about how to optimize it, mainly by showing what is slow and what is
fast in Lua. I will not discuss here general techniques for optimization, such
as better algorithms. Of course you should know and use those techniques,
but there are several other places where you can learn them. In this article
I will discuss only techniques that are particular to Lua. Along the article, I will
constantly measure the time and space performance of small programs. Unless
stated otherwise, I do all measures on a Pentium IV 2.9 GHz with 1 GB of main
memory, running Ubuntu 7.10, Lua 5.1.1. Frequently I give actual measures
(e.g., 7 seconds), but what is relevant is the relationship between different
measures. When I say that a program is “X% times faster” than another it
means that it runs in
X%
less time. (A program 100% faster would take no time
to run.) When I say that a program is “X% times slower” than another I mean
that the other is
X%
faster. (A program 50% slower means that it takes twice
the time.)
1
Of
course our implementation was still slower, but not by an order of magnitude.
17
Basic facts
Before running any code, Lua translates (precompiles) the source into an in-
ternal format. This format is a sequence of instructions for a virtual machine,
similar to machine code for a real CPU. This internal format is then interpreted
by C code that is essentially a
while
loop with a large
switch
inside, one case for
each instruction.
Perhaps you have already read somewhere that, since version 5.0, Lua uses
a register-based virtual machine. The “registers” of this virtual machine do not
correspond to real registers in the CPU, because this correspondence would be
not portable and quite limited in the number of registers available. Instead,
Lua uses a stack (implemented as an array plus some indices) to accommodate
its registers. Each active function has an
activation record,
which is a stack
slice wherein the function stores its registers. So, each function has its own
registers
2
. Each function may use up to 250 registers, because each instruction
has only 8 bits to refer to a register.
Given that large number of registers, the Lua precompiler is able to store all
local variables in registers. The result is that access to local variables is very
fast in Lua. For instance, if
a
and
b
are local variables, a Lua statement like
a = a + b
generates one single instruction:
ADD 0 0 1
(assuming that
a
and
b
are in registers 0 and 1, respectively). For comparison, if both
a
and
b
were
globals, the code for that addition would be like this:
GETGLOBAL
GETGLOBAL
ADD
SETGLOBAL
0
1
0
0
0
1
0 1
0
; a
; b
; a
So, it is easy to justify one of the most important rules to improve the perfor-
mance of Lua programs:
use locals!
If you need to squeeze performance out of your program, there are several
places where you can use locals besides the obvious ones. For instance, if you
call a function within a long loop, you can assign the function to a local variable.
For instance, the code
for i = 1, 1000000 do
local x = math.sin(i)
end
runs 30% slower than this one:
local sin = math.sin
for i = 1, 1000000 do
local x = sin(i)
end
2
This
is similar to the
register windows
found in some CPUs.
18
2
·
Lua Performance Tips
Access to external locals (that is, variables that are local to an enclosing
function) is not as fast as access to local variables, but it is still faster than
access to globals. Consider the next fragment:
function foo (x)
for i = 1, 1000000 do
x = x + math.sin(i)
end
return x
end
print(foo(10))
We can optimize it by declaring
sin
once, outside function
foo:
local sin = math.sin
function foo (x)
for i = 1, 1000000 do
x = x + sin(i)
end
return x
end
print(foo(10))
This second code runs 30% faster than the original one.
Although the Lua compiler is quite efficient when compared with compilers
for other languages, compilation is a heavy task. So, you should avoid compiling
code in your program (e.g., function
loadstring)
whenever possible. Unless you
must run code that is really dynamic, such as code entered by an end user, you
seldom need to compile dynamic code.
As an example, consider the next code, which creates a table with functions
to return constant values from 1 to 100000:
local lim = 10000
local a = {}
for i = 1, lim do
a[i] = loadstring(string.format("return %d", i))
end
print(a[10]())
--> 10
This code runs in 1.4 seconds.
With closures, we avoid the dynamic compilation. The next code creates the
1
same 100000 functions in
10
of the time (0.14 seconds):
function fk (k)
return function () return k end
end
19
local lim = 100000
local a = {}
for i = 1, lim do a[i] = fk(i)
print(a[10]())
--> 10
end
About tables
Usually, you do not need to know anything about how Lua implement tables to
use them. Actually, Lua goes to great lengths to make sure that implementation
details do not surface to the user. However, these details show themselves
through the performance of table operations. So, to optimize programs that use
tables (that is, practically any Lua program), it is good to know a little about
how Lua implements tables.
The implementation of tables in Lua involves some clever algorithms. Every
table in Lua has two parts: the
array
part and the
hash
part. The array part
stores entries with integer keys in the range 1 to
n,
for some particular
n.
(We
will discuss how this
n
is computed in a moment.) All other entries (including
integer keys outside that range) go to the hash part.
As the name implies, the hash part uses a hash algorithm to store and find its
keys. It uses what is called an
open address
table, which means that all entries
are stored in the hash array itself. A hash function gives the primary index of a
key; if there is a collision (that is, if two keys are hashed to the same position),
the keys are linked in a list, with each element occupying one array entry.
When Lua needs to insert a new key into a table and the hash array is full,
Lua does a
rehash.
The first step in the rehash is to decide the sizes of the new
array part and the new hash part. So, Lua traverses all entries, counting and
classifying them, and then chooses as the size of the array part the largest power
of 2 such that more than half the elements of the array part are filled. The hash
size is then the smallest power of 2 that can accommodate all the remaining
entries (that is, those that did not fit into the array part).
When Lua creates an empty table, both parts have size 0 and, therefore,
there are no arrays allocated for them. Let us see what happens when we run
the following code:
local a = {}
for i = 1, 3 do
a[i] = true
end
It starts by creating an empty table
a.
In the first loop iteration, the assignment
a[1]=true
triggers a rehash; Lua then sets the size of the array part of the table
to 1 and keeps the hash part empty. In the second loop iteration, the assignment
a[2]=true
triggers another rehash, so that now the array part of the table has
size 2. Finally, the third iteration triggers yet another rehash, growing the size
of the array part to 4.
Zgłoś jeśli naruszono regulamin