可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm in a discussion at work over how to secure sensitive information (e.g. passwords) stored in a Java program. Per security requirements, memory containing sensitive information is cleared, e.g. by setting the values of the bytes to all zeroes. The concern is that an attacker can observe the memory associated with the application process, and so we want to limit as much as possible the window of time such sensitive information hangs around. Previously, projects involved C++, so a memset() sufficed.
(Incidentally, the use of memset() has been called into question because some compilers are known to optimize it's use out of the resulting binary based on the assumption that, since the memory is not used later, there is no need to zero it in the first place. This blurb is a disclaimer for those who Google for "memset" and "clear memory", etc).
Now we have on our hands a Java project being pressed against this requirement.
For Java objects, my understanding is that:
- a nulled reference only changes the value of the reference; the memory on the heap for the object still contains data
- an immutable object like String would not be able to have it's data modified (or at least not easily, within the confines of a VM with an appropriately enabled security manager)
- the generational garbage collectors may make copies of objects all over the place (as noted here)
And for primitives, my understanding is that:
- a primitive-type variable in a local method would get allocated on the stack, and:
- when you change it's value, you modify it directly in memory (as opposed to using a reference to handle an object on the heap).
- copies can/would be made "behind the scenes" in some situations, such as passing it as an argument into methods or boxing (auto- or not) creating instances of the wrappers which contain another primitive variable holding the same value.
My coworker claims that Java primitives are immutable and that there is documentation from both the NSA and Oracle regarding the lack of support in Java for this requirement.
My position is that primitives can (at least in some situations) be zeroed by setting the value to zero (or boolean to false), and the memory is cleared that way.
I'm trying to verify if there's language in the JLS or other "official" documentation about the required behavior of JVMs when it comes to memory management with respect to primitives. The closest I could find was a "Secure Coding Guidelines for the Java Programming Language" on Oracle's site which mentions clearing char arrays after use.
I'd quibble over definitions when my coworker called primitives immutable, but I'm pretty sure he meant "memory cannot be appropriately zeroed" - let's not worry about that. We did not discuss whether he meant final variables - from context we were talking in general.
Are there any definitive answers or references on this? I'd appreciate anything that could show me where I'm wrong or confirm that I'm right.
Edit: After further discussion, I've been able to clarify that my coworker was thinking of the primitive wrappers, not the primitives themselves. So we are left with the original problem of how to clear memory securely, preferably of objects. Also, to clarify, the sensitive information is not just passwords, but also things like IP addresses or encryption keys.
Are there any commercial JVMs which offer a feature like priority handling of certain objects? (I imagine this would actually violate the Java spec, but I thought I'd ask just in case I'm wrong.)
回答1:
Edit: Actually I just had three ideas that may indeed work - for different values of "work" at least.
The first that is more or less documented would be ByteBuffer.allocateDirect! As I understand it allocateDirect allocates the buffer outside the usual java heap so won't be copied around. I can't find any hard guarantees about it not getting copied in all situations though - but for the current Hotspot VM that is actually the case (ie it's allocated in an extra heap) and I assume this will stay that way.
The second one is using the sun.misc.unsafe package - which as the name says has some rather obvious problems but at least that would be pretty much independent of the used VM - either it's supported (and it works) or it's not (and you get linking errors). The problem is, that the code to use that stuff will get horribly complicated pretty fast (alone getting an unsafe variable is non trivial).
The third one would be to allocate a much, much, MUCH larger size than is actually needed, so that the object gets allocated in the old generation heap to begin:
l-XX:PretenureSizeThreshold= that can be set to limit the
size of allocations in the young
generation. Any allocation larger than
this will not be attempted in the
young generation and so will be
allocated out of the old generation.
Well the drawback of THAT solution is obvious I think (default size seems to be about 64kb).
.
.
Anyways here the old answer:
Yep as I see it you pretty much cannot guarantee that the data stored on the heap is 100% removed without leaving a copy (that's even true if you don't want a general solution but one that'll work with say the current Hotspot VM and its default garbage collectors).
As said in your linked post (here), the garbage collector pretty much makes this impossible to guarantee. Actually contrary to what the post says the problem here isn't the generational GC, but the fact that the Hotspot VM (and now we're implementation specific) is using some kind of Stop & Copy gc for its young generation per default.
This means that as soon as a garbage collection happens between storing the password in the char array and zeroing it out you'll get a copy of the data that will be overwritten only as soon as the next GC happens. Note that tenuring an object will have exactly the same effect, but instead of copying it to to-space it's copied to the old generation heap - we end up with a copy of the data in from space that isn't overwritten.
To avoid this problem we'd pretty much need some way to guarantee that either NO garbage collection is happening between storing the password and zeroing it OR that the char array is stored from the get go in the old generation heap. Also note that this relies on the internas of the Hotspot VM which may very well change (actually there are different garbage collectors where many more copies can be generated; iirc the Hotspot VM supports a concurrent GC using a train algorithm). "luckily" it's impossible to guarantee either one of those (afaik every method call/return introduces a safe point!), so you don't even get tempted to try (especially considering that I don't see any way to make sure the JIT doesn't optimize the zeroing out away) ;)
Seems like the only way to guarantee that the data is stored only in one location is to use the JNI for it.
PS: Note that while the above is only true for the Heap, you can't guarantee anything more for the stack (the JIT will likely optimize writes without reads to the stack away, so when you return from the function the data will still be on the stack)
回答2:
Tell your co-workers that this is a hopeless cause. What about the kernel socket buffers, just for a start.
If you cannot prevent unwanted programs from spying on memory on your machine, the passwords are compromised. Period.
回答3:
Weird, never thought of anything like this.
My first idea would be to make a char[100] to store your password in. Put that in there, use it for whatever, and then do a loop to set every char to blank.
The problem is, the password would at some point turn into a String inside of the database driver, which could live in memory for 0 to infinity seconds.
My second idea would be to have all authentication done through some kind of JNI call to C, but that would be really hard if you are trying to use something like JDBC....
回答4:
Just aside but some of environments the java core security libs use char[] so it can be zeroed. I imagine that you don't get a guarantee tho.
回答5:
I have been trying to work out some similar issues with credentials.
Until now, my only answer is "not to use strings at all for secrets". The strings are comfortable to use and store in human terms, but computers can work well with byte arrays. Even the encryption primitives work with byte[].
When you don't need the password anymore, just fill the array with zeroes and don't let the GC to invent new ways to reuse your secrets.
In another thread (Why can't strings be mutable in Java and .NET?) they make an assumption that it is very short sight. That the strings are immutable because of security reasons; what was not devised is that not always the operational problems are the only ones in existence and that security sometimes need some flexibility and/or support to be effective, a support doesn't exist in the native Java.
To complement. How could we read a password without using strings? Well ... be creative and don't use things like the Android EditText with input-type password, that just is not secure enough and requires you to go to strings.