GNU bug report logs -
#14158
performance: Base64 -d is slow
Previous Next
To reply to this bug, email your comments to 14158 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#14158
; Package
coreutils
.
(Mon, 08 Apr 2013 07:39:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ole Tange <ole <at> tange.dk>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 08 Apr 2013 07:39:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I was astonished to learn that:
perl -MMIME::Base64 -e 'while(read(STDIN,$buf,770*50)){print
decode_base64($buf)}'
is faster than:
base64 -d
My C-skills are quite limited, but it seems the current implementation
does a lot of computation and not table lookups.
Below are my thoughts on how this could be improved.
/Ole
= Designing a fast Base64 decoder =
By: Ole Tange <tange <at> gnu.org>
== Goal ==
We want something that is parallelizable since that can take advantage of:
* Multiple cores/CPUs
* Vector processing (MMX/SSE)
* Processors with pipelines
== Observations ==
In Base64 every 4 bytes represent 3 bytes of decoded information.
Binary input: aaaaaaaa bbbbbbbb cccccccc dddddddd
=>
Base64 index: 111111 222222 333333 444444
=>
Decoded: 11111122 22223333 33444444
Given input byte 1 and 2 we can determine output byte 1.
Given input byte 2 and 3 we can determine output byte 2.
Given input byte 3 and 4 we can determine output byte 3.
A way to make an algorithm parallelizable is by not making
computations based on previous results, but only base the computation
on the input.
== Idea ==
Make a lookup table for the output byte.
By making a 16-bit lookup table for the first 2 bytes, the middle 2
bytes and the last 2 bytes, conversion of 4 byte to 3 bytes is fast
and independent of previous computations.
The lookup table can also deal with the conversion to Base64 index -
thus skipping this step. It can even deal with all the non-conflicting
variations of which char codes for index 62 and 63.
out_byte1 = first[in_byte1][in_byte2];
out_byte2 = middle[in_byte2][in_byte3];
out_byte3 = last[in_byte3][in_byte4];
Each lookup table will at most be 64kbytes = 192kbytes total.
== Implementation ==
Here is the implementation in pseudo code:
byte compute_first_byte(int i, int j) {
/* Return the decoded value of the first byte, given the first 2
bytes are i and j */
}
byte compute_middle_byte(int i, int j) {
/* Return the decoded value of the middle byte, given the middle 2
bytes are i and j */
}
byte compute_last_byte(int i, int j) {
/* Return the decoded value of the last byte, given the last 2 bytes
are i and j */
}
/* Lookup tables */
static byte first[256][256];
static byte middle[256][256];
static byte last[256][256];
static bool already_computed = 0;
void decode(byte* in, int len, byte* out) {
/* in = pointer to input bytes with no garbage (\n and similar)
len = length of input (must in 4 byte blocks)
out = pointer to output buffer of length 3/4*len
*/
/* Initialize lookup tables */
if(! already_computed) {
parallel_for(int i = 0; i < 256; i++) {
parallel_for(int j = 0; j < 256; j++) {
first[i][j] = compute_first_byte(i,j);
middle[i][j] = compute_middle_byte(i,j);
last[i][j] = compute_last_byte(i,j);
}
}
already_computed = 1;
}
/* Lookup byte1+2, byte2+3, byte3+4 */
parallel_for(out_idx = 0, in_idx = 0; in_idx < len; in_idx += 4,
out_idx += 3) {
out[out_idx] = first[ in[in_idx] ][ in[in_idx+1] ];
out[out_idx+1] = middle[ in[in_idx+1] ][ in[in_idx+2] ];
out[out_idx+2] = last[ in[in_idx+2] ][ in[in_idx+3] ];
}
}
== Improvement ideas ==
Use SIMD to do multiple 4-byte blocks in parallel - maybe a line at a
time?
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#14158
; Package
coreutils
.
(Mon, 08 Apr 2013 20:02:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 14158 <at> debbugs.gnu.org (full text, mbox):
severity 14158 wishlist
tag 14158 + confirmed
thanks
Severity set to 'wishlist' from 'normal'
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Mon, 08 Apr 2013 22:01:01 GMT)
Full text and
rfc822 format available.
Added tag(s) confirmed.
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Mon, 08 Apr 2013 22:01:01 GMT)
Full text and
rfc822 format available.
Changed bug title to 'performance: Base64 -d is slow' from 'Base64 -d is slow'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 19 Oct 2018 01:47:02 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 188 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.