博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
用x64汇编优化8位S盒置换(三)
阅读量:6111 次
发布时间:2019-06-21

本文共 3035 字,大约阅读时间需要 10 分钟。

hot3.png

在本文的上一章,S盒置换的C语言实现的最佳方案是将8位S盒扩展至16位,确定了S盒数组的大小与数据单元的大小,首先回顾相应的C代码:

#include 
static uint8_t s_sbox8[256] = { 0xd6,0x90,0xe9,0xfe,0xcc,0xe1,0x3d,0xb7,0x16,0xb6,0x14,0xc2,0x28,0xfb,0x2c,0x05, 0x2b,0x67,0x9a,0x76,0x2a,0xbe,0x04,0xc3,0xaa,0x44,0x13,0x26,0x49,0x86,0x06,0x99, 0x9c,0x42,0x50,0xf4,0x91,0xef,0x98,0x7a,0x33,0x54,0x0b,0x43,0xed,0xcf,0xac,0x62, 0xe4,0xb3,0x1c,0xa9,0xc9,0x08,0xe8,0x95,0x80,0xdf,0x94,0xfa,0x75,0x8f,0x3f,0xa6, 0x47,0x07,0xa7,0xfc,0xf3,0x73,0x17,0xba,0x83,0x59,0x3c,0x19,0xe6,0x85,0x4f,0xa8, 0x68,0x6b,0x81,0xb2,0x71,0x64,0xda,0x8b,0xf8,0xeb,0x0f,0x4b,0x70,0x56,0x9d,0x35, 0x1e,0x24,0x0e,0x5e,0x63,0x58,0xd1,0xa2,0x25,0x22,0x7c,0x3b,0x01,0x21,0x78,0x87, 0xd4,0x00,0x46,0x57,0x9f,0xd3,0x27,0x52,0x4c,0x36,0x02,0xe7,0xa0,0xc4,0xc8,0x9e, 0xea,0xbf,0x8a,0xd2,0x40,0xc7,0x38,0xb5,0xa3,0xf7,0xf2,0xce,0xf9,0x61,0x15,0xa1, 0xe0,0xae,0x5d,0xa4,0x9b,0x34,0x1a,0x55,0xad,0x93,0x32,0x30,0xf5,0x8c,0xb1,0xe3, 0x1d,0xf6,0xe2,0x2e,0x82,0x66,0xca,0x60,0xc0,0x29,0x23,0xab,0x0d,0x53,0x4e,0x6f, 0xd5,0xdb,0x37,0x45,0xde,0xfd,0x8e,0x2f,0x03,0xff,0x6a,0x72,0x6d,0x6c,0x5b,0x51, 0x8d,0x1b,0xaf,0x92,0xbb,0xdd,0xbc,0x7f,0x11,0xd9,0x5c,0x41,0x1f,0x10,0x5a,0xd8, 0x0a,0xc1,0x31,0x88,0xa5,0xcd,0x7b,0xbd,0x2d,0x74,0xd0,0x12,0xb8,0xe5,0xb4,0xb0, 0x89,0x69,0x97,0x4a,0x0c,0x96,0x77,0x7e,0x65,0xb9,0xf1,0x09,0xc5,0x6e,0xc6,0x84, 0x18,0xf0,0x7d,0xec,0x3a,0xdc,0x4d,0x20,0x79,0xee,0x5f,0x3e,0xd7,0xcb,0x39,0x48};static uint16_t s_sbox[65536];void sbox_init(void){ uint64_t i; for(i = 0; i < 65536; i++) { s_sbox[i] = s_sbox8[i & 0xff] ^ (s_sbox8[i >> 8] << 8); }}uint32_t sbox(uint32_t src){ uint32_t dst; dst = s_sbox[src & 0xffff] ^ (s_sbox[(src >> 16)] << 16); return(dst);}

为了测试结果更加精确,将性能测试程序的循环次数从1亿增加到2亿次:

#include 
#include
#include
#include
#include
uint32_t sbox(uint32_t src);void sbox_init(void);int main(int argc, char *argv[]){ uint32_t i, data; sbox_init(); data = 0x00010203; for(i = 0; i < 200000000; i++) { data = sbox(data); } printf("data = %08x\n", data); exit(EXIT_SUCCESS);}

以此为基础,用x64汇编人工优化函数sbox()的实现,首先看GCC编译得到的汇编代码:

movl    %edi, %eax        movzwl  %di, %edi        shrl    $16, %eax        movzwl  s_sbox(%rdi,%rdi), %edx        mov     %eax, %eax        movzwl  s_sbox(%rax,%rax), %eax        sall    $16, %eax        xorl    %edx, %eax        ret

然后按照个人习惯精简一下得到:

movzwq  %di, %rax        shrq    $16, %rdi        movzwq  s_sbox(%rax,%rax), %rax        movzwq  s_sbox(%rdi,%rdi), %rdx        shlq    $16, %rdx        xorq    %rdx, %rax        ret

汇编代码修改前性能测试结果:

[root@sxy-lenovo step4]# time ./test_sboxdata = 5adc1e2breal	0m0.900suser	0m0.899ssys	0m0.002s

汇编代码修改后性能测试结果:

[root@sxy-lenovo step8]# time ./test_sboxdata = 5adc1e2breal	0m0.805suser	0m0.805ssys	0m0.000s

还是有相当成效的。

 

转载于:https://my.oschina.net/safedead/blog/833469

你可能感兴趣的文章
并发和并行的区别
查看>>
Java增强的for循环和普通for循环对比
查看>>
颜色渐变的算法
查看>>
第四次作业
查看>>
getLocation需要在app.json中声明permission字段,解决办法
查看>>
xtrabackup工具
查看>>
.NET下WPF学习之Socket通信
查看>>
【转载】APK反破解之一:Android Java混淆(ProGuard)
查看>>
安装SQL SERVER 2005 SP2后, IIS无法辨认SVC文件
查看>>
log4j配置后行号乱码显示为?问号
查看>>
HRBUST 1478 最长公共子序列的最小字典序
查看>>
MySQL所有函数及操作符
查看>>
常用快捷键
查看>>
js 四舍五入函数 toFixed(),小数位数精度
查看>>
正则表达式快速入门
查看>>
perference
查看>>
log4php使用及配置
查看>>
dataGridView加行标识方法与制作
查看>>
bzoj1079[SCOI2008]着色方案
查看>>
ABI(Application Binary Interface)
查看>>