s = s..x
)
can be quite expensive.
This note describes an efficient way to create a string
piece by piece in Lua.
Suppose you are building a string piecemeal, for instance reading a file line by line. Your typical code would look like this:
-- WARNING: bad code ahead!! local buff = "" while 1 do local line = read() if line == nil then break end buff = buff..line.."\n" endDespite its innocent look, this code in Lua can cause a huge performance penalty for large files: For instance, it takes almost a minute to read a 350 Kbyte file.
Frequently this is not a problem.
For small strings, the above loop is OK.
To read a whole file, you can use the "*all"
option,
that reads it at once.
But sometimes you have no such simple solution.
Then, the only solution is a more efficient algorithm for your problem.
Here we show one (algorithm, not problem).
function newBuffer () return {n=0} -- 'n' counts number of elements in the stack end function addString (stack, s) tinsert(stack, s) -- push 's' into the top of the stack for i=stack.n-1, 1, -1 do if strlen(stack[i]) > strlen(stack[i+1]) then break end stack[i] = stack[i]..tremove(stack) end endTo get the final contents of the buffer, we just need to concatenate all strings down the bottom:
function toString (stack) for i=stack.n-1, 1, -1 do stack[i] = stack[i]..tremove(stack) end return stack[1] end
Using this new data structure, we can rewrite our program as follows:
local s = newBuffer() while 1 do local line = read() if line == nil then break end addString(s, line.."\n") end s = toString(s)This new program reduces our original time to read a 350 Kbyte file from 40 seconds to 0.5 seconds. (The call
read"*all"
is still faster,
finishing the job in 0.02 seconds.)
To understand what happens with the naive approach,
let us assume that we are in the middle of the reading;
buff
is already a string with 50 Kbytes,
and each line has 20 bytes.
After the assignment
buff = buff..line.."\n"
buff
is a new string with 50,020 bytes,
and the old string in now garbage.
After two loop cycles, buff
is a string with 50,040 bytes,
and there are two old strings making a
total of more than 100 Kbytes of garbage.
Therefore, Lua decides, quite correctly,
that it is a good time to run its garbage collector,
and so it frees those 100 Kbytes.
The problem is that this will happen every two cycles,
and so Lua will run its garbage collector
two thousand times before finishing the loop.
Even with all this work,
its memory usage will be around three times the file size.
To make things worse,
each concatenation must copy the whole string content
(50 Kbytes and growing) into the new string.
This problem is not exclusive of Lua: other languages with true garbage collection, and where strings are immutable objects, present a similar behavior (Java being the most famous example).
Our original loop did a "linear" approach to the problem, concatenating small strings one by one into the accumulator. The new algorithm avoids this, using a binary approach. It concatenates many small strings among them, and once in a while it concatenates the resulting large strings into larger ones.