I have strange a memory corruption problem. After many hours debugging and trying I think I found something.
For example: I do a simple string assignment:
sTest := 'SET LOCK_TIMEOUT ';
However, the result sometimes becomes:
sTest = 'SET LOCK'#0'TIMEOUT '
So, the _ gets replaced by an 0 byte.
I have seen this happening once (reproducing is tricky, dependent on timing) in the System.Move function, when it uses the FPU stack (fild, fistp) for fast memory copy (in case of 9 till 32 bytes to move):
...
@@SmallMove: {9..32 Byte Move}
fild qword ptr [eax+ecx] {Load Last 8}
fild qword ptr [eax] {Load First 8}
cmp ecx, 8
jle @@Small16
fild qword ptr [eax+8] {Load Second 8}
cmp ecx, 16
jle @@Small24
fild qword ptr [eax+16] {Load Third 8}
fistp qword ptr [edx+16] {Save Third 8}
...
Using the FPU view and 2 memory debug views (Delphi -> View -> Debug -> CPU -> Memory) I saw it going wrong... once... could not reproduce however...
This morning I read something about the 8087CW mode, and yes, if this is changed into $27F I get memory corruption! Normally it is $133F:
The difference between $133F and $027F is that $027F sets up the FPU for doing less precise calculations (limiting to Double in stead of Extended) and different infiniti handling (which was used for older FPU’s, but is not used any more).
Okay, now I found why but not when!
I changed the working of my AsmProfiler with a simple check (so all functions are checked at enter and leave):
if Get8087CW = $27F then //normally $1372?
if MainThreadID = GetCurrentThreadId then //only check mainthread
DebugBreak;
I "profiled" some units and dll's and bingo (see stack):
Windows.StretchBlt(3372289943,0,0,514,345,4211154027,0,0,514,345,13369376)
pngimage.TPNGObject.DrawPartialTrans(4211154027,(0, 0, 514, 345, (0, 0), (514, 345)))
pngimage.TPNGObject.Draw($7FF62450,(0, 0, 514, 345, (0, 0), (514, 345)))
Graphics.TCanvas.StretchDraw((0, 0, 514, 345, (0, 0), (514, 345)),$7FECF3D0)
ExtCtrls.TImage.Paint
Controls.TGraphicControl.WMPaint((15, 4211154027, 0, 0))
So it is happening in StretchBlt...
What to do now? Is it a fault of Windows, or a bug in PNG (included in D2007)?
Or is the System.Move function not failsafe?
Note: simply trying to reproduce does not work:
Set8087CW($27F);
sSQL := 'SET LOCK_TIMEOUT ';
It seems to be more exotic... But by debugbreak on 'Get8087CW = $27F' I could reproduce it on an other string:
FPU part 1:
FPU part 2:
FPU part 3:
FPU Final: corrupt!:
Note 2: Maybe the FPU stack must be cleared in the System.Move?
I haven't seen this particular issue, but Move can definitely get messed up if the FPU is in a bad state. Cisco's VPN driver can screw things up horribly, even if you're not doing anything network related.
http://brianorr.blogspot.com/2006/11/intel-pentium-d-floating-point-unit.html [broken]
https://web.archive.org/web/20160601043520/http://www.dankohn.com/archives/343
http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/ (comments by Ritchie Annand)
In our case we detect the buggy VPN driver and swap out Move and FillChar with the Delphi 7 versions, replace IntToStr with a Pascal version (Int64-version uses the FPU), and, since we're using FastMM, we disable its custom fixed size move routines too, since they're even more susceptible than System.Move.
It might be a bug in your video driver that does not preserve the 8087 control word when it performs the StretchBlt operation.
In the past I have seen similar behaviour when using certain printer drivers. They think they own the 8087 CW and are wrong...
Note the default value of the 8087 CW in Delphi seems $1372; for a more detailed explanation of the CW values, see this article: it also explains a situation that Michael Justin described when his 8087CW got hosed.
--jeroen
Just for your information (in case some else has same problem too): we did an upgrade of our software for a customer, and the complete touchscreen locked up when our application was started! Windows was completely frozen! The pc had to be restarted (power off). It took some time to figure out the cause of the complete freeze.
Fortunately we had one (only 1!) stacktrace of an AV in FastMove.LargeSSEMove. I disabled the usage of SSE in fastmove, and the problem is gone.
By the way: touchscreen has an VIA Nehemiah cpu with an S3 chipset.
So not only you can get memory corruptions when using the FPU, but also a complete freeze!
For those still interested in this: There's yet another possible cause of problems:
Delphi Rio still ships with a broken ASM version of Move
.
I had the pleasure to run into that bug today, luckily enough I had a reproducible test case. The issue is this piece of code:
* ***** BEGIN LICENSE BLOCK *****
*
* The assembly function Move is licensed under the CodeGear license terms.
*
* The initial developer of the original code is Fastcode
*
* Portions created by the initial developer are Copyright (C) 2002-2004
* the initial developer. All Rights Reserved.
*
* Contributor(s): John O'Harrow
*
* ***** END LICENSE BLOCK ***** *)
// ... some less interesting parts omitted ...
@@LargeMove:
JNG @@LargeDone {Count < 0}
CMP EAX, EDX
JA @@LargeForwardMove
// the following overlap test is broken
// when size>uint(destaddr), EDX underflows to $FFxxxxxx, in which case
// we jump to @LargeForwardMove even if a backward loop would be appropriate
// this will effectively shred everything at EDX + size
SUB EDX, ECX // when this underflows ...
CMP EAX, EDX // ... we also get CF=1 here (EDX is usually < $FFxxxxxx)
LEA EDX, [EDX+ECX] // (does not affect flags)
JNA @@LargeForwardMove // ... CF=1 so let's jump into disaster!
SUB ECX, 8 {Backward Move}
PUSH ECX
FILD QWORD PTR [EAX+ECX] {Last 8}
FILD QWORD PTR [EAX] {First 8}
ADD ECX, EDX
AND ECX, -8 {8-Byte Align Writes}
SUB ECX, EDX
References
- Intel EFLAGS Cross-Reference and Condition Codes
- CMP operation
- SUB operation