How much faster is the native implementation of th

2019-02-06 19:10发布

I'm providing hashes for sets of data in order to fingerprint the data and identify it by hash - this is the core use case for fast hashes like SHA1 and MD5.

In .Net, there is an option to go with the native or managed implementations of some of these hashes (the SHA variants, anyway). I'm looking for an MD5 managed implementation, and there doesn't appear to be one in the .Net Framework, but wondered if the wrapped native CSP is faster anyway, and if I should just use it content that there will be no perf problems using it. The top answer to Why is there no managed MD5 implementation in the .NET framework? indicates that faster performance could be the reason that a managed variant doesn't exist.

Is this true, and if so, how much faster is the native CSP?

1条回答
该账号已被封号
2楼-- · 2019-02-06 19:18

Unfortunately, the wrapped native CSP for MD5 - MD5CryptoServiceProvider - is significantly slower than a pure managed implementation. It is an obstinate viewpoint that holds that native code is unequivocally faster than managed code: in many cases the opposite is true. This is such a case, at least in head-to-head measurements.

Using the translated reference MD5 implementation by David Anson, I constructed a quick performance test (source) which aims to measure any large differences in performance between the two implementations. While for small data arrays the difference are negligible, as expected, at around 16kB the native implementation starts to show potentially significant delay - on the order of milliseconds. This might not seem like much, but it is orders of magnitude slower than the pure managed implementation. This difference is maintained as the size of the data being hashed increases, and at the largest tested data array - ~250MB - the difference in CPU time was about 8.5 seconds. Considering that a hash like this is often used to fingerprint very large files, this extra delay would become noticeable, even against the often much larger delays from I/O.

It's not abundantly clear where the delay comes from, since a pure native test was not performed (one which would dispense with the wrapping of a CSP and consumption in managed code), but given the nearly identical shape of the graphs on the log scale, it would appear that the managed and native implementations have the same intrinsic performance, but that the native code performance is "shifted" down in performance likely due to the cost of the interop between native and managed code at runtime. This performance difference between wrapped native CSPs and pure managed implementations has also been reproduced and documented by other investigators.

In addition to answering the question "how much faster is the native implementation" in this particular case, I hope this evidence serves to prompt more reflection and investigation when the question of native vs. managed arises, breaking the long-standing and pernicious reaction to similar questions that native code is always faster, and thus, somehow, better. Managed code is clearly very fast, even in this performance-sensitive domain of bulk data hashing.

MD5 Hash Computation Time MD5 Hash Computation Time (Logarithmic)

查看更多
登录 后发表回答