How can you convert a byte array to a hexadecimal string, and vice versa?
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
And for inserting into an SQL string (if you're not using command parameters):
Either:
or:
There are even more variants of doing it, for example here.
The reverse conversion would go like this:
Using
Substring
is the best option in combination withConvert.ToByte
. See this answer for more information. If you need better performance, you must avoidConvert.ToByte
before you can dropSubString
.This is an answer to revision 4 of Tomalak's highly popular answer (and subsequent edits).
I'll make the case that this edit is wrong, and explain why it could be reverted. Along the way, you might learn a thing or two about some internals, and see yet another example of what premature optimization really is and how it can bite you.
tl;dr: Just use
Convert.ToByte
andString.Substring
if you're in a hurry ("Original code" below), it's the best combination if you don't want to re-implementConvert.ToByte
. Use something more advanced (see other answers) that doesn't useConvert.ToByte
if you need performance. Do not use anything else other thanString.Substring
in combination withConvert.ToByte
, unless someone has something interesting to say about this in the comments of this answer.warning: This answer may become obsolete if a
Convert.ToByte(char[], Int32)
overload is implemented in the framework. This is unlikely to happen soon.As a general rule, I don't much like to say "don't optimize prematurely", because nobody knows when "premature" is. The only thing you must consider when deciding whether to optimize or not is: "Do I have the time and resources to investigate optimization approaches properly?". If you don't, then it's too soon, wait until your project is more mature or until you need the performance (if there is a real need, then you will make the time). In the meantime, do the simplest thing that could possibly work instead.
Original code:
Revision 4:
The revision avoids
String.Substring
and uses aStringReader
instead. The given reason is:Well, looking at the reference code for
String.Substring
, it's clearly "single-pass" already; and why shouldn't it be? It operates at byte-level, not on surrogate pairs.It does allocate a new string however, but then you need to allocate one to pass to
Convert.ToByte
anyway. Furthermore, the solution provided in the revision allocates yet another object on every iteration (the two-char array); you can safely put that allocation outside the loop and reuse the array to avoid that.Each hexadecimal
numeral
represents a single octet using two digits (symbols).But then, why call
StringReader.Read
twice? Just call its second overload and ask it to read two characters in the two-char array at once; and reduce the amount of calls by two.What you're left with is a string reader whose only added "value" is a parallel index (internal
_pos
) which you could have declared yourself (asj
for example), a redundant length variable (internal_length
), and a redundant reference to the input string (internal_s
). In other words, it's useless.If you wonder how
Read
"reads", just look at the code, all it does is callString.CopyTo
on the input string. The rest is just book-keeping overhead to maintain values we don't need.So, remove the string reader already, and call
CopyTo
yourself; it's simpler, clearer, and more efficient.Do you really need a
j
index that increments in steps of two parallel toi
? Of course not, just multiplyi
by two (which the compiler should be able to optimize to an addition).What does the solution look like now? Exactly like it was at the beginning, only instead of using
String.Substring
to allocate the string and copy the data to it, you're using an intermediary array to which you copy the hexadecimal numerals to, then allocate the string yourself and copy the data again from the array and into the string (when you pass it in the string constructor). The second copy might be optimized-out if the string is already in the intern pool, but thenString.Substring
will also be able to avoid it in these cases.In fact, if you look at
String.Substring
again, you see that it uses some low-level internal knowledge of how strings are constructed to allocate the string faster than you could normally do it, and it inlines the same code used byCopyTo
directly in there to avoid the call overhead.String.Substring
Manual method
Conclusion? If you want to use
Convert.ToByte(String, Int32)
(because you don't want to re-implement that functionality yourself), there doesn't seem to be a way to beatString.Substring
; all you do is run in circles, re-inventing the wheel (only with sub-optimal materials).Note that using
Convert.ToByte
andString.Substring
is a perfectly valid choice if you don't need extreme performance. Remember: only opt for an alternative if you have the time and resources to investigate how it works properly.If there was a
Convert.ToByte(char[], Int32)
, things would be different of course (it would be possible to do what I described above and completely avoidString
).I suspect that people who report better performance by "avoiding
String.Substring
" also avoidConvert.ToByte(String, Int32)
, which you should really be doing if you need the performance anyway. Look at the countless other answers to discover all the different approaches to do that.Disclaimer: I haven't decompiled the latest version of the framework to verify that the reference source is up-to-date, I assume it is.
Now, it all sounds good and logical, hopefully even obvious if you've managed to get so far. But is it true?
Yes!
Props to Partridge for the bench framework, it's easy to hack. The input used is the following SHA-1 hash repeated 5000 times to make a 100,000 bytes long string.
Have fun! (But optimize with moderation.)
Performance Analysis
Note: new leader as of 2015-08-20.
I ran each of the various conversion methods through some crude
Stopwatch
performance testing, a run with a random sentence (n=61, 1000 iterations) and a run with a Project Gutenburg text (n=1,238,957, 150 iterations). Here are the results, roughly from fastest to slowest. All measurements are in ticks (10,000 ticks = 1 ms) and all relative notes are compared to the [slowest]StringBuilder
implementation. For the code used, see below or the test framework repo where I now maintain the code for running this.Disclaimer
WARNING: Do not rely on these stats for anything concrete; they are simply a sample run of sample data. If you really need top-notch performance, please test these methods in an environment representative of your production needs with data representative of what you will use.
Results
unsafe
(via CodesInChaos) (added to test repo by airbreather)BitConverter
(via Tomalak){SoapHexBinary}.ToString
(via Mykroft){byte}.ToString("X2")
(usingforeach
) (derived from Will Dean's answer){byte}.ToString("X2")
(using{IEnumerable}.Aggregate
, requires System.Linq) (via Mark)Array.ConvertAll
(usingstring.Join
) (via Will Dean)Array.ConvertAll
(usingstring.Concat
, requires .NET 4.0) (via Will Dean){StringBuilder}.AppendFormat
(usingforeach
) (via Tomalak){StringBuilder}.AppendFormat
(using{IEnumerable}.Aggregate
, requires System.Linq) (derived from Tomalak's answer)Lookup tables have taken the lead over byte manipulation. Basically, there is some form of precomputing what any given nibble or byte will be in hex. Then, as you rip through the data, you simply look up the next portion to see what hex string it would be. That value is then added to the resulting string output in some fashion. For a long time byte manipulation, potentially harder to read by some developers, was the top-performing approach.
Your best bet is still going to be finding some representative data and trying it out in a production-like environment. If you have different memory constraints, you may prefer a method with fewer allocations to one that would be faster but consume more memory.
Testing Code
Feel free to play with the testing code I used. A version is included here but feel free to clone the repo and add your own methods. Please submit a pull request if you find anything interesting or want to help improve the testing framework it uses.
Func<byte[], string>
) to /Tests/ConvertByteArrayToHexString/Test.cs.TestCandidates
return value in that same class.GenerateTestInput
in that same class.Update (2010-01-13)
Added Waleed's answer to analysis. Quite fast.
Update (2011-10-05)
Added
string.Concat
Array.ConvertAll
variant for completeness (requires .NET 4.0). On par withstring.Join
version.Update (2012-02-05)
Test repo includes more variants such as
StringBuilder.Append(b.ToString("X2"))
. None upset the results any.foreach
is faster than{IEnumerable}.Aggregate
, for instance, butBitConverter
still wins.Update (2012-04-03)
Added Mykroft's
SoapHexBinary
answer to analysis, which took over third place.Update (2013-01-15)
Added CodesInChaos's byte manipulation answer, which took over first place (by a large margin on large blocks of text).
Update (2013-05-23)
Added Nathan Moinvaziri's lookup answer and the variant from Brian Lambert's blog. Both rather fast, but not taking the lead on the test machine I used (AMD Phenom 9750).
Update (2014-07-31)
Added @CodesInChaos's new byte-based lookup answer. It appears to have taken the lead on both the sentence tests and the full-text tests.
Update (2015-08-20)
Added airbreather's optimizations and
unsafe
variant to this answer's repo. If you want to play in the unsafe game, you can get some huge performance gains over any of the prior top winners on both short strings and large texts.Complement to answer by @CodesInChaos (reversed method)
Explanation:
& 0x0f
is to support also lower case lettershi = hi + 10 + ((hi >> 31) & 7);
is the same as:hi = ch-65 + 10 + (((ch-65) >> 31) & 7);
For '0'..'9' it is the same as
hi = ch - 65 + 10 + 7;
which ishi = ch - 48
(this is because of0xffffffff & 7
).For 'A'..'F' it is
hi = ch - 65 + 10;
(this is because of0x00000000 & 7
).For 'a'..'f' we have to big numbers so we must subtract 32 from default version by making some bits
0
by using& 0x0f
.65 is code for
'A'
48 is code for
'0'
7 is the number of letters between
'9'
and'A'
in the ASCII table (...456789:;<=>?@ABCD...
).This version of ByteArrayToHexViaByteManipulation could be faster.
From my reports:
...
And I think this one is an optimization: