Suppose there is a bug in a Java Card applet: a temporary byte array is stored in EEPROM instead of RAM. Moreover, suppose this byte array is overwritten with each APDU.
This bug should damage the card sooner or later.
What symptoms could we expect? Incorrect values in the array without any explicit warnings or errors? Some exceptions thrown when accessing this array? The applet unselectable? The whole card completely unresponsive?
Should the card be damaged "once and forever", or will these failures occur more and more often?
In my experiment (J2E145) there was the first failure after 5 000 000 APDUs and the symptom was that the card did not send R-APDU at all and just died. However, the next APDU was OK again, then approximately 1 APDU out of 10000 failed (with increasing frequency) and finally after 5 100 000 APDUs the card stopped communicating forever.
Is there any standard which says what should happen in case of the EEPROM damage? (I was looking for it, but I did not find any.)
I know the question is broad and it probably depends on the particular chip (I am interested especially in NXP chips), but I think your comments, answers and experience could help many Java Card developers, who found a bug in their code after deployment.
Here the picture from a native operating system: When writing a new value to non-volatile memory, the hardware routine does a check by itself, whether the value could be written correctly and returns an error status otherwise. This is translated to a SW1/SW2 of 65 81. The affected file or object is marked as corrupted, and future attempts to access it are cleanly rejected. If it is essential for the application, this will no longer be able to work.
If I remember correctly, our hardware (non-NXP) even issues a pre-warning, indicating, that while the value could be written correctly this time, the memory cell is about to reach its limits.
I guess the best shot at finding some non-NDA'd information is the Common Criteria security targets for the specific platforms.
An example for a hardware platform from NXP: NXP Secure Smart Card Controllers P5Cx128V0A/P5Cx145V0A, MSO (BSI-DSZ-CC-0645)
From the TOE overview:
From security feature SF.OPC:
From security feature SF.PHY:
So this hardware platform is capable of detecting EEPROM cell failures and can even automatically correct 1-bit errors within each byte. For all other detected errors it will raise an exception that can be handled by software.
That's for the hardware platform (without OS / JCRE). So let's see what a security target of JCOP tells us. I chose NXP J3A128 and J3A095 Secure Smart Card Controller Rev. 3 (BSI-DSZ-CC-0731).
From security feature SF.Audit:
From security feature SF.SecureManagement:
So this software platform is (again) capable of detecting EEPROM cell failures and can even automatically correct 1-bit errors within each byte. For all other detected EEPROM errors it will "lock the card session", which means that it simply stops processing and performs a reset. This seems to match your observation "the symptom was that the card did not send R-APDU at all and just died".