Writing

jpype性能测试

场景

  1. 数据文件是二进制流文件,需要用C读取。
  2. 经过TRW算法,输出结果,算法用Java实现
  3. 用Python来粘接中间过程

实现

大致有三种选择方案

  1. read_a_rec_per_time(): C一次读取取一条记录,Python使用jpype传给Java来判断
  2. read_a_file_per_time():
    • C一次性读取完一个文件,结果保存在另一文件中,Python通过jpype调用Java类来读取文件,进行判断
    • C一次性读取完一个文件,结果保存在另一文件中, Python通过调用shell命令来调用Java类,进行判断

测试

1. 测试环境

Ubuntu 14.04 Desktop X64, Python 2.7.6, jpype 0.5.4.2

2. 测试工具

  • shell命令time
  • line_profiler

安装方法

1
# pip install line_profiler

3. 测试用例

选了三个不同大小文件用于测试,分别为13M, 30M, 80M

结果

shell调用方法用时:

13M 30M 80M
time usr: 1.69s sys: 0.17s usr: 2.33s sys: 0.34s usr: 7.01s sys: 0.85s
line profiler 0.9234s 1.5148s 4.1259s

jpype模块调用时:

13M 30M 80M
time usr: 1.60s sys: 0.18s usr: 2.14s sys: 0.32s usr: 8.99s sys: 0.90s
line profiler 0.8943s 1.5208s 5.5045s

shell调用分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Total time: 4.12598 s
File: flow_to_trw.py
Function: read_a_file_by_shell at line 107
Line # Hits Time Per Hit % Time Line Contents
==============================================================
107 @profile
108 def read_a_file_by_shell():
109
110 1 57 57.0 0.0 target_filename_list = get_flow_file_name()
111
112 1 99 99.0 0.0 c_so = cdll.LoadLibrary(cLibPath+cLibName)
113
114 2 18 9.0 0.0 for target_filename in target_filename_list:
115 1 4 4.0 0.0 target_filename_c = c_char_p(target_filename)
116 1 30 30.0 0.0 fd = c_so.open_file(target_filename_c)
117
118 1 1265198 1265198.0 30.7 ret = c_so.read_a_file_per_time(fd, target_filename_c)
119 1 3 3.0 0.0 if ret == 0:
120 1 3 3.0 0.0 res_file_name = ResFilePath + target_filename + ResFilePostfix
121 1 46 46.0 0.0 c_so.close_file(fd)
122 # print "reading complete"
123 # print "result file %s" % res_file_name
124 1 2860518 2860518.0 69.3 val = os.system("java "+"AttackerJudge"+" "+target_filename)
125 # print val
126 else:
127 print "reading error"

jpype调用分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Timer unit: 1e-06 s
Total time: 5.50454 s
File: flow_to_trw.py
Function: read_a_file_per_time at line 77
Line # Hits Time Per Hit % Time Line Contents
==============================================================
77 @profile
78 def read_a_file_per_time():
79 """
80 fd_ret: file descriptor return from cLib `int open_file(const char *file_name)` function
81 """
82 1 1855 1855.0 0.0 jvmPath = jpype.getDefaultJVMPath();
83 1 862435 862435.0 15.7 jpype.startJVM(jvmPath, "-ea", "-Djava.class.path="+javaClassPath)
84
85 1 10590 10590.0 0.2 javaGenFile = jpype.JClass('AttackerJudge')
86 1 99 99.0 0.0 javaClass = javaGenFile()
87
88 1 92 92.0 0.0 target_filename_list = get_flow_file_name()
89
90 1 152 152.0 0.0 c_so = cdll.LoadLibrary(cLibPath+cLibName)
91
92 2 5 2.5 0.0 for target_filename in target_filename_list:
93 1 5 5.0 0.0 target_filename_c = c_char_p(target_filename)
94 1 46 46.0 0.0 fd = c_so.open_file(target_filename_c)
95
96 1 1315676 1315676.0 23.9 ret = c_so.read_a_file_per_time(fd, target_filename_c)
97 1 3 3.0 0.0 if ret == 0:
98 1 3 3.0 0.0 res_file_name = ResFilePath + target_filename + ResFilePostfix
99 1 48 48.0 0.0 c_so.close_file(fd)
100 # print "reading complete"
101 # print "result file %s" % res_file_name
102 1 1787918 1787918.0 32.5 javaClass.judge_file(target_filename+ResFilePostfix)
103 1 1525618 1525618.0 27.7 javaClass.writeResult(target_filename)
104 else:
105 print "reading error"

结果分析

  1. shell调用方法在文件越来越大的情况下,是要快于jpype调用的
  2. jpype调用时间主要用于开启startJVM()和shutdownJVM()
  3. jpype调用Java类的效率并没有shell调用Java类高