Symptoms of EEPROM damage

2019-02-05 09:35发布

问题:

Suppose there is a bug in a Java Card applet: a temporary byte array is stored in EEPROM instead of RAM. Moreover, suppose this byte array is overwritten with each APDU.

This bug should damage the card sooner or later.

What symptoms could we expect? Incorrect values in the array without any explicit warnings or errors? Some exceptions thrown when accessing this array? The applet unselectable? The whole card completely unresponsive?

Should the card be damaged "once and forever", or will these failures occur more and more often?

In my experiment (J2E145) there was the first failure after 5 000 000 APDUs and the symptom was that the card did not send R-APDU at all and just died. However, the next APDU was OK again, then approximately 1 APDU out of 10000 failed (with increasing frequency) and finally after 5 100 000 APDUs the card stopped communicating forever.

Is there any standard which says what should happen in case of the EEPROM damage? (I was looking for it, but I did not find any.)

I know the question is broad and it probably depends on the particular chip (I am interested especially in NXP chips), but I think your comments, answers and experience could help many Java Card developers, who found a bug in their code after deployment.

回答1:

I guess the best shot at finding some non-NDA'd information is the Common Criteria security targets for the specific platforms.

An example for a hardware platform from NXP: NXP Secure Smart Card Controllers P5Cx128V0A/P5Cx145V0A, MSO (BSI-DSZ-CC-0645)

  • From the TOE overview:

    The non-volatile EEPROM [...] contains high reliability cells, which guarantee data integrity. [...] Security functionality protects the contents of all memories.

  • From security feature SF.OPC:

    An exception is forced by the [...] single fault injection detection circuitry. In case minor configuration option "Inverse EEPROM Error Correction" is enabled [...] the probability to detect fault injection errors increases and the error correction logic raises an exception when detecting an error.

  • From security feature SF.PHY:

    The EEPROM is able to correct a 1-bit error within each byte. [...] The EEPROM corrects errors automatically without user interaction [...]

So this hardware platform is capable of detecting EEPROM cell failures and can even automatically correct 1-bit errors within each byte. For all other detected errors it will raise an exception that can be handled by software.

That's for the hardware platform (without OS / JCRE). So let's see what a security target of JCOP tells us. I chose NXP J3A128 and J3A095 Secure Smart Card Controller Rev. 3 (BSI-DSZ-CC-0731).

  • From security feature SF.Audit:

    The following reactions by the TOE based on indication of a potential violation of the TSP are possible:

    • Throw an exception
    • Terminate the card (Life cycle state: TERMINATED)
    • Reinitialize the Java Card System (warm reset)
    • [...] The EEPROM is able to correct a 1-bit error within each byte. [...] The EEPROM corrects errors automatically without user interaction [...]
    • Lock the card session (simply stops processing; escape with reset the session/Card tearing)

    Based on these types of response/reaction the events listed above will have the following mapping:

    • EEPROM failure audited through exceptions in the read/write operations and consistency/integrity check: Lock card session
    • self test mechanism on start-up: Lock card session
    • Corruption of check-summed objects: Lock card session
  • From security feature SF.SecureManagement:

    The TSF run a suite of self-tests during initial start-up (at each power on) to demonstrate the correct operation of the TSF, to verify the integrity of TSF data, and to verify the integrity of stored TSF executable code. This includes checking the EEPROM integrity. If an error is detected, the TOE enters into a secure state (lock card session)

    The TSF monitors user data D.APP_CODE, D.APP_I_DATA, D.PIN, D.APP_KEYs for integrity errors. If an error occurs, the TSF maintain a secure state (lock card session)

So this software platform is (again) capable of detecting EEPROM cell failures and can even automatically correct 1-bit errors within each byte. For all other detected EEPROM errors it will "lock the card session", which means that it simply stops processing and performs a reset. This seems to match your observation "the symptom was that the card did not send R-APDU at all and just died".



回答2:

Here the picture from a native operating system: When writing a new value to non-volatile memory, the hardware routine does a check by itself, whether the value could be written correctly and returns an error status otherwise. This is translated to a SW1/SW2 of 65 81. The affected file or object is marked as corrupted, and future attempts to access it are cleanly rejected. If it is essential for the application, this will no longer be able to work.

If I remember correctly, our hardware (non-NXP) even issues a pre-warning, indicating, that while the value could be written correctly this time, the memory cell is about to reach its limits.