Insert bytes into middle of a file (in windows

2019-01-22 06:52发布

问题:

I need a way to insert some file clusters into the middle of a file to insert some data.

Normally, I would just read the entire file and write it back out again with the changes, but the files are multiple gigabytes in size, and it takes 30 minutes just to read the file and write it back out again.

The cluster size doesn't bother me; I can essentially write out zeroes to the end of my inserted clusters, and it will still work in this file format.

How would I use the Windows File API (or some other mechanism) to modify the File Allocation Table of a file, inserting one or more unused clusters at a specified point in the middle of the file?

回答1:

[EDIT:]

Blah - I'm going to say "this ain't doable, at least not via MFT modification, without a LOT of pain"; first off, the NTFS MFT structures themselves are not 100% "open", so I'm starting to delve into reverse-engineering-territory, which has legal repercussions I'm in no mood to deal with. Also, doing this in .NET is a hyper-tedious process of mapping and marshalling structures based on a lot of guesswork (and don't get me started on the fact that most of the MFT structures are compressed in strange ways). Short story, while I did learn an awful lot about how NTFS "works", I'm no closer to a solution to this problem.

[/EDIT]

Ugh...sooo much Marshalling nonsense....

This struck me as "interesting", therefore I was compelled to poke around at the problem...it's still an "answer-in-progress", but wanted to post up what all I had to assist others in coming up with something. :)

Also, I have a rough sense that this would be FAR easier on FAT32, but given I've only got NTFS to work with...

So - lots of pinvoking and marshalling, so let's start there and work backwards:

As one might guess, the standard .NET File/IO apis aren't going to help you much here - we need device-level access:

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
static extern SafeFileHandle CreateFile(
    string lpFileName,
    [MarshalAs(UnmanagedType.U4)] FileAccess dwDesiredAccess,
    [MarshalAs(UnmanagedType.U4)] FileShare dwShareMode,
    IntPtr lpSecurityAttributes,
    [MarshalAs(UnmanagedType.U4)] FileMode dwCreationDisposition,
    [MarshalAs(UnmanagedType.U4)] FileAttributes dwFlagsAndAttributes,
    IntPtr hTemplateFile);

[DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern bool ReadFile(
    SafeFileHandle hFile,      // handle to file
    byte[] pBuffer,        // data buffer, should be fixed
    int NumberOfBytesToRead,  // number of bytes to read
    IntPtr pNumberOfBytesRead,  // number of bytes read, provide NULL here
    ref NativeOverlapped lpOverlapped // should be fixed, if not null
);

[DllImport("Kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
public static extern bool SetFilePointerEx(
    SafeFileHandle hFile,
    long liDistanceToMove,
    out long lpNewFilePointer,
    SeekOrigin dwMoveMethod);

We'll use these nasty win32 beasts thusly:

// To the metal, baby!
using (var fileHandle = NativeMethods.CreateFile(
    // Magic "give me the device" syntax
    @"\\.\c:",
    // MUST explicitly provide both of these, not ReadWrite
    FileAccess.Read | FileAccess.Write,
    // MUST explicitly provide both of these, not ReadWrite
    FileShare.Write | FileShare.Read,
    IntPtr.Zero,
    FileMode.Open,
    FileAttributes.Normal,
    IntPtr.Zero))
{
    if (fileHandle.IsInvalid)
    {
        // Doh!
        throw new Win32Exception();
    }
    else
    {
        // Boot sector ~ 512 bytes long
        byte[] buffer = new byte[512];
        NativeOverlapped overlapped = new NativeOverlapped();
        NativeMethods.ReadFile(fileHandle, buffer, buffer.Length, IntPtr.Zero, ref overlapped);

        // Pin it so we can transmogrify it into a FAT structure
        var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        try
        {
            // note, I've got an NTFS drive, change yours to suit
            var bootSector = (BootSector_NTFS)Marshal.PtrToStructure(
                 handle.AddrOfPinnedObject(), 
                 typeof(BootSector_NTFS));

Whoa, whoa whoa - what the heck is a BootSector_NTFS? It's a byte-mapped struct that fits as close as I can reckon to what the NTFS structure looks like (FAT32 included as well):

[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Ansi, Pack=0)]
public struct JumpBoot
{
    [MarshalAs(UnmanagedType.ByValArray, ArraySubType=UnmanagedType.U1, SizeConst=3)]
    public byte[] BS_jmpBoot;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst=8)]
    public string BS_OEMName;
}

[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Ansi, Pack = 0, Size = 90)]
public struct BootSector_NTFS
{
    [FieldOffset(0)]
    public JumpBoot JumpBoot;
    [FieldOffset(0xb)]
    public short BytesPerSector;
    [FieldOffset(0xd)]
    public byte SectorsPerCluster;
    [FieldOffset(0xe)]
    public short ReservedSectorCount;
    [FieldOffset(0x10)]
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
    public byte[] Reserved0_MUSTBEZEROs;
    [FieldOffset(0x15)]
    public byte BPB_Media;
    [FieldOffset(0x16)]
    public short Reserved1_MUSTBEZERO;
    [FieldOffset(0x18)]
    public short SectorsPerTrack;
    [FieldOffset(0x1A)]
    public short HeadCount;
    [FieldOffset(0x1c)]
    public int HiddenSectorCount;
    [FieldOffset(0x20)]
    public int LargeSectors;
    [FieldOffset(0x24)]
    public int Reserved6;
    [FieldOffset(0x28)]
    public long TotalSectors;
    [FieldOffset(0x30)]
    public long MftClusterNumber;
    [FieldOffset(0x38)]
    public long MftMirrorClusterNumber;
    [FieldOffset(0x40)]
    public byte ClustersPerMftRecord;
    [FieldOffset(0x41)]
    public byte Reserved7;
    [FieldOffset(0x42)]
    public short Reserved8;
    [FieldOffset(0x44)]
    public byte ClustersPerIndexBuffer;
    [FieldOffset(0x45)]
    public byte Reserved9;
    [FieldOffset(0x46)]
    public short ReservedA;
    [FieldOffset(0x48)]
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
    public byte[] SerialNumber;
    [FieldOffset(0x50)]
    public int Checksum;
    [FieldOffset(0x54)]
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 0x1AA)]
    public byte[] BootupCode;
    [FieldOffset(0x1FE)]
    public ushort EndOfSectorMarker;

    public long GetMftAbsoluteIndex(int recordIndex = 0)
    {
        return (BytesPerSector * SectorsPerCluster * MftClusterNumber) + (GetMftEntrySize() * recordIndex);
    }
    public long GetMftEntrySize()
    {
        return (BytesPerSector * SectorsPerCluster * ClustersPerMftRecord);
    }
}


// Note: dont have fat32, so can't verify all these...they *should* work, tho
// refs:
//    http://www.pjrc.com/tech/8051/ide/fat32.html
//    http://msdn.microsoft.com/en-US/windows/hardware/gg463084
[StructLayout(LayoutKind.Explicit, CharSet=CharSet.Auto, Pack=0, Size=90)]
public struct BootSector_FAT32
{
    [FieldOffset(0)]
    public JumpBoot JumpBoot;    
    [FieldOffset(11)]
    public short BPB_BytsPerSec;
    [FieldOffset(13)]
    public byte BPB_SecPerClus;
    [FieldOffset(14)]
    public short BPB_RsvdSecCnt;
    [FieldOffset(16)]
    public byte BPB_NumFATs;
    [FieldOffset(17)]
    public short BPB_RootEntCnt;
    [FieldOffset(19)]
    public short BPB_TotSec16;
    [FieldOffset(21)]
    public byte BPB_Media;
    [FieldOffset(22)]
    public short BPB_FATSz16;
    [FieldOffset(24)]
    public short BPB_SecPerTrk;
    [FieldOffset(26)]
    public short BPB_NumHeads;
    [FieldOffset(28)]
    public int BPB_HiddSec;
    [FieldOffset(32)]
    public int BPB_TotSec32;
    [FieldOffset(36)]
    public FAT32 FAT;
}

[StructLayout(LayoutKind.Sequential)]
public struct FAT32
{
    public int BPB_FATSz32;
    public short BPB_ExtFlags;
    public short BPB_FSVer;
    public int BPB_RootClus;
    public short BPB_FSInfo;
    public short BPB_BkBootSec;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst=12)]
    public byte[] BPB_Reserved;
    public byte BS_DrvNum;
    public byte BS_Reserved1;
    public byte BS_BootSig;
    public int BS_VolID;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst=11)] 
    public string BS_VolLab;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst=8)] 
    public string BS_FilSysType;
}

So now we can map a whole mess'o'bytes back to this structure:

// Pin it so we can transmogrify it into a FAT structure
var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
    try
    {            
        // note, I've got an NTFS drive, change yours to suit
        var bootSector = (BootSector_NTFS)Marshal.PtrToStructure(
              handle.AddrOfPinnedObject(), 
              typeof(BootSector_NTFS));
        Console.WriteLine(
            "I think that the Master File Table is at absolute position:{0}, sector:{1}", 
            bootSector.GetMftAbsoluteIndex(),
            bootSector.GetMftAbsoluteIndex() / bootSector.BytesPerSector);

Which at this point outputs:

I think that the Master File Table is at 
absolute position:3221225472, sector:6291456

Let's confirm that quick using the OEM support tool nfi.exe:

C:\tools\OEMTools\nfi>nfi c:
NTFS File Sector Information Utility.
Copyright (C) Microsoft Corporation 1999. All rights reserved.


File 0
Master File Table ($Mft)
    $STANDARD_INFORMATION (resident)
    $FILE_NAME (resident)
    $DATA (nonresident)
        logical sectors 6291456-6487039 (0x600000-0x62fbff)
        logical sectors 366267960-369153591 (0x15d4ce38-0x1600d637)
    $BITMAP (nonresident)
        logical sectors 6291448-6291455 (0x5ffff8-0x5fffff)
        logical sectors 7273984-7274367 (0x6efe00-0x6eff7f)

Cool, looks like we're on the right track...onward!

            // If you've got LinqPad, uncomment this to look at boot sector
            bootSector.Dump();

    Console.WriteLine("Jumping to Master File Table...");
    long lpNewFilePointer;
    if (!NativeMethods.SetFilePointerEx(
            fileHandle, 
            bootSector.GetMftAbsoluteIndex(), 
            out lpNewFilePointer, 
            SeekOrigin.Begin))
    {
        throw new Win32Exception();
    }
    Console.WriteLine("Position now: {0}", lpNewFilePointer);

    // Read in one MFT entry
    byte[] mft_buffer = new byte[bootSector.GetMftEntrySize()];
    Console.WriteLine("Reading $MFT entry...calculated size: 0x{0}",
       bootSector.GetMftEntrySize().ToString("X"));

    var seekIndex = bootSector.GetMftAbsoluteIndex();
    overlapped.OffsetHigh = (int)(seekIndex >> 32);
    overlapped.OffsetLow = (int)seekIndex;
    NativeMethods.ReadFile(
          fileHandle, 
          mft_buffer, 
          mft_buffer.Length, 
          IntPtr.Zero, 
          ref overlapped);
    // Pin it for transmogrification
    var mft_handle = GCHandle.Alloc(mft_buffer, GCHandleType.Pinned);
    try
    {
        var mftRecords = (MFTSystemRecords)Marshal.PtrToStructure(
              mft_handle.AddrOfPinnedObject(), 
              typeof(MFTSystemRecords));
        mftRecords.Dump();
    }
    finally
    {
        // make sure we clean up
        mft_handle.Free();
    }
}
finally
{
    // make sure we clean up
    handle.Free();
}

Argh, more native structures to discuss - so the MFT is arranged such that the first 16 or so entries are "fixed":

[StructLayout(LayoutKind.Sequential)]
public struct MFTSystemRecords
{
    public MFTRecord Mft;
    public MFTRecord MftMirror;
    public MFTRecord LogFile;
    public MFTRecord Volume;
    public MFTRecord AttributeDefs;
    public MFTRecord RootFile;
    public MFTRecord ClusterBitmap;
    public MFTRecord BootSector;
    public MFTRecord BadClusterFile;
    public MFTRecord SecurityFile;
    public MFTRecord UpcaseTable;
    public MFTRecord ExtensionFile;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)]
    public MFTRecord[] MftReserved;
    public MFTRecord MftFileExt;
}

Where MFTRecord is:

[StructLayout(LayoutKind.Sequential, Size = 1024)]
public struct MFTRecord
{
    const int BASE_RECORD_SIZE = 48;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 4)]
    public string Type;
    public short UsaOffset;
    public short UsaCount;
    public long Lsn;  /* $LogFile sequence number for this record. Changed every time the record is modified. */
    public short SequenceNumber; /* # of times this record has been reused */
    public short LinkCount;  /* Number of hard links, i.e. the number of directory entries referencing this record. */
    public short AttributeOffset; /* Byte offset to the first attribute in this mft record from the start of the mft record. */
    public short MftRecordFlags;
    public int BytesInUse;
    public int BytesAllocated;
    public long BaseFileRecord;
    public short NextAttributeNumber;
    public short Reserved;
    public int MftRecordNumber;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 976)]
    public byte[] Data;
    public byte[] SetData
    {
        get
        {
            return this.Data
               .Skip(AttributeOffset - BASE_RECORD_SIZE)
               .Take(BytesInUse - BASE_RECORD_SIZE)
               .ToArray();
        }
    }
    public MftAttribute[] Attributes
    {
        get
        {
            var idx = 0;
            var ret = new List<MftAttribute>();
            while (idx < SetData.Length)
            {
                var attr = MftAttribute.FromBytes(SetData.Skip(idx).ToArray());
                ret.Add(attr);
                idx += attr.Attribute.Length;
                // A special "END" attribute denotes the end of the list
                if (attr.Attribute.AttributeType == MftAttributeType.AT_END) break;
            }
            return ret.ToArray();
        }
    }
}

And...here's where I peter out for now; mainly because I want to eat dinner and such. I will come back to this, however!

References (partially for my own memory, partially to assist other investigators)

  • http://ntfs.com/ntfs-mft.htm
  • http://technet.microsoft.com/en-us/library/cc781134%28WS.10%29.aspx
  • http://waynes-world-it.blogspot.com/2008/03/viewing-ntfs-information-with-nfi-and.html
  • http://en.wikipedia.org/wiki/NTFS
  • http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx#win32_device_namespaces
  • http://www.pjrc.com/tech/8051/ide/fat32.html
  • http://msdn.microsoft.com/en-us/library/aa364572(VS.85).aspx

Full code dump a'following:

All the native mappings I glazed over above (due to post size limitations, not a full rehash):

public enum MftRecordFlags : ushort
{
    MFT_RECORD_IN_USE = 0x0001,
    MFT_RECORD_IS_DIRECTORY = 0x0002,
    MFT_RECORD_IN_EXTEND = 0x0004,
    MFT_RECORD_IS_VIEW_INDEX = 0x0008,
    MFT_REC_SPACE_FILLER = 0xffff
}
public enum MftAttributeType : uint
{
    AT_UNUSED = 0,
    AT_STANDARD_INFORMATION = 0x10,
    AT_ATTRIBUTE_LIST = 0x20,
    AT_FILENAME = 0x30,
    AT_OBJECT_ID = 0x40,
    AT_SECURITY_DESCRIPTOR = 0x50,
    AT_VOLUME_NAME = 0x60,
    AT_VOLUME_INFORMATION = 0x70,
    AT_DATA = 0x80,
    AT_INDEX_ROOT = 0x90,
    AT_INDEX_ALLOCATION = 0xa0,
    AT_BITMAP = 0xb0,
    AT_REPARSE_POINT = 0xc0,
    AT_EA_INFORMATION = 0xd0,
    AT_EA = 0xe0,
    AT_PROPERTY_SET = 0xf0,
    AT_LOGGED_UTILITY_STREAM = 0x100,
    AT_FIRST_USER_DEFINED_ATTRIBUTE = 0x1000,
    AT_END = 0xffffffff
}

public enum MftAttributeDefFlags : byte
{
    ATTR_DEF_INDEXABLE = 0x02, /* Attribute can be indexed. */
    ATTR_DEF_MULTIPLE = 0x04, /* Attribute type can be present multiple times in the mft records of an inode. */
    ATTR_DEF_NOT_ZERO = 0x08, /* Attribute value must contain at least one non-zero byte. */
    ATTR_DEF_INDEXED_UNIQUE = 0x10, /* Attribute must be indexed and the attribute value must be unique for the attribute type in all of the mft records of an inode. */
    ATTR_DEF_NAMED_UNIQUE = 0x20, /* Attribute must be named and the name must be unique for the attribute type in all of the mft records of an inode. */
    ATTR_DEF_RESIDENT = 0x40, /* Attribute must be resident. */
    ATTR_DEF_ALWAYS_LOG = 0x80, /* Always log modifications to this attribute, regardless of whether it is resident or
                non-resident.  Without this, only log modifications if the attribute is resident. */
}

[StructLayout(LayoutKind.Explicit)]
public struct MftInternalAttribute
{
    [FieldOffset(0)]
    public MftAttributeType AttributeType;
    [FieldOffset(4)]
    public int Length;
    [FieldOffset(8)]
    [MarshalAs(UnmanagedType.Bool)]
    public bool NonResident;
    [FieldOffset(9)]
    public byte NameLength;
    [FieldOffset(10)]
    public short NameOffset;
    [FieldOffset(12)]
    public int AttributeFlags;
    [FieldOffset(14)]
    public short Instance;
    [FieldOffset(16)]
    public ResidentAttribute ResidentAttribute;
    [FieldOffset(16)]
    public NonResidentAttribute NonResidentAttribute;
}

[StructLayout(LayoutKind.Sequential)]
public struct ResidentAttribute
{
    public int ValueLength;
    public short ValueOffset;
    public byte ResidentAttributeFlags;
    public byte Reserved;

    public override string ToString()
    {
        return string.Format("{0}:{1}:{2}:{3}", ValueLength, ValueOffset, ResidentAttributeFlags, Reserved);
    }
}
[StructLayout(LayoutKind.Sequential)]
public struct NonResidentAttribute
{
    public long LowestVcn;
    public long HighestVcn;
    public short MappingPairsOffset;
    public byte CompressionUnit;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
    public byte[] Reserved;
    public long AllocatedSize;
    public long DataSize;
    public long InitializedSize;
    public long CompressedSize;
    public override string ToString()
    {
        return string.Format("{0}:{1}:{2}:{3}:{4}:{5}:{6}:{7}", LowestVcn, HighestVcn, MappingPairsOffset, CompressionUnit, AllocatedSize, DataSize, InitializedSize, CompressedSize);
    }
}

public struct MftAttribute
{
    public MftInternalAttribute Attribute;

    [field: NonSerialized]
    public string Name;

    [field: NonSerialized]
    public byte[] Data;

    [field: NonSerialized]
    public object Payload;

    public static MftAttribute FromBytes(byte[] buffer)
    {
        var hnd = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        try
        {
            var attr = (MftInternalAttribute)Marshal.PtrToStructure(hnd.AddrOfPinnedObject(), typeof(MftInternalAttribute));
            var ret = new MftAttribute() { Attribute = attr };
            ret.Data = buffer.Skip(Marshal.SizeOf(attr)).Take(attr.Length).ToArray();
            if (ret.Attribute.AttributeType == MftAttributeType.AT_STANDARD_INFORMATION)
            {
                var payloadHnd = GCHandle.Alloc(ret.Data, GCHandleType.Pinned);
                try
                {
                    var payload = (MftStandardInformation)Marshal.PtrToStructure(payloadHnd.AddrOfPinnedObject(), typeof(MftStandardInformation));
                    ret.Payload = payload;
                }
                finally
                {
                    payloadHnd.Free();
                }
            }
            return ret;
        }
        finally
        {
            hnd.Free();
        }
    }
}

[StructLayout(LayoutKind.Sequential)]
public struct MftStandardInformation
{
    public ulong CreationTime;
    public ulong LastDataChangeTime;
    public ulong LastMftChangeTime;
    public ulong LastAccessTime;
    public int FileAttributes;
    public int MaximumVersions;
    public int VersionNumber;
    public int ClassId;
    public int OwnerId;
    public int SecurityId;
    public long QuotaChanged;
    public long Usn;
}

// Note: dont have fat32, so can't verify all these...they *should* work, tho
// refs:
//    http://www.pjrc.com/tech/8051/ide/fat32.html
//    http://msdn.microsoft.com/en-US/windows/hardware/gg463084
[StructLayout(LayoutKind.Explicit, CharSet = CharSet.Auto, Pack = 0, Size = 90)]
public struct BootSector_FAT32
{
    [FieldOffset(0)]
    public JumpBoot JumpBoot;
    [FieldOffset(11)]
    public short BPB_BytsPerSec;
    [FieldOffset(13)]
    public byte BPB_SecPerClus;
    [FieldOffset(14)]
    public short BPB_RsvdSecCnt;
    [FieldOffset(16)]
    public byte BPB_NumFATs;
    [FieldOffset(17)]
    public short BPB_RootEntCnt;
    [FieldOffset(19)]
    public short BPB_TotSec16;
    [FieldOffset(21)]
    public byte BPB_Media;
    [FieldOffset(22)]
    public short BPB_FATSz16;
    [FieldOffset(24)]
    public short BPB_SecPerTrk;
    [FieldOffset(26)]
    public short BPB_NumHeads;
    [FieldOffset(28)]
    public int BPB_HiddSec;
    [FieldOffset(32)]
    public int BPB_TotSec32;
    [FieldOffset(36)]
    public FAT32 FAT;
}

[StructLayout(LayoutKind.Sequential)]
public struct FAT32
{
    public int BPB_FATSz32;
    public short BPB_ExtFlags;
    public short BPB_FSVer;
    public int BPB_RootClus;
    public short BPB_FSInfo;
    public short BPB_BkBootSec;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
    public byte[] BPB_Reserved;
    public byte BS_DrvNum;
    public byte BS_Reserved1;
    public byte BS_BootSig;
    public int BS_VolID;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 11)]
    public string BS_VolLab;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
    public string BS_FilSysType;
}

And the test harness:

class Program
{        
    static void Main(string[] args)
    {
        // To the metal, baby!
        using (var fileHandle = NativeMethods.CreateFile(
            // Magic "give me the device" syntax
            @"\\.\c:",
            // MUST explicitly provide both of these, not ReadWrite
            FileAccess.Read | FileAccess.Write,
            // MUST explicitly provide both of these, not ReadWrite
            FileShare.Write | FileShare.Read,
            IntPtr.Zero,
            FileMode.Open,
            FileAttributes.Normal,
            IntPtr.Zero))
        {
            if (fileHandle.IsInvalid)
            {
                // Doh!
                throw new Win32Exception();
            }
            else
            {
                // Boot sector ~ 512 bytes long
                byte[] buffer = new byte[512];
                NativeOverlapped overlapped = new NativeOverlapped();
                NativeMethods.ReadFile(fileHandle, buffer, buffer.Length, IntPtr.Zero, ref overlapped);

                // Pin it so we can transmogrify it into a FAT structure
                var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
                try
                {
                    // note, I've got an NTFS drive, change yours to suit
                    var bootSector = (BootSector_NTFS)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(BootSector_NTFS));
                    Console.WriteLine(
                        "I think that the Master File Table is at absolute position:{0}, sector:{1}",
                        bootSector.GetMftAbsoluteIndex(),
                        bootSector.GetMftAbsoluteIndex() / bootSector.BytesPerSector);
                    Console.WriteLine("MFT record size:{0}", bootSector.ClustersPerMftRecord * bootSector.SectorsPerCluster * bootSector.BytesPerSector);

                    // If you've got LinqPad, uncomment this to look at boot sector
                    bootSector.DumpToHtmlString();

                    Pause();

                    Console.WriteLine("Jumping to Master File Table...");
                    long lpNewFilePointer;
                    if (!NativeMethods.SetFilePointerEx(fileHandle, bootSector.GetMftAbsoluteIndex(), out lpNewFilePointer, SeekOrigin.Begin))
                    {
                        throw new Win32Exception();
                    }
                    Console.WriteLine("Position now: {0}", lpNewFilePointer);

                    // Read in one MFT entry
                    byte[] mft_buffer = new byte[bootSector.GetMftEntrySize()];
                    Console.WriteLine("Reading $MFT entry...calculated size: 0x{0}", bootSector.GetMftEntrySize().ToString("X"));

                    var seekIndex = bootSector.GetMftAbsoluteIndex();
                    overlapped.OffsetHigh = (int)(seekIndex >> 32);
                    overlapped.OffsetLow = (int)seekIndex;
                    NativeMethods.ReadFile(fileHandle, mft_buffer, mft_buffer.Length, IntPtr.Zero, ref overlapped);
                    // Pin it for transmogrification
                    var mft_handle = GCHandle.Alloc(mft_buffer, GCHandleType.Pinned);
                    try
                    {
                        var mftRecords = (MFTSystemRecords)Marshal.PtrToStructure(mft_handle.AddrOfPinnedObject(), typeof(MFTSystemRecords));
                        mftRecords.DumpToHtmlString();
                    }
                    finally
                    {
                        // make sure we clean up
                        mft_handle.Free();
                    }
                }
                finally
                {
                    // make sure we clean up
                    handle.Free();
                }
            }
        }
        Pause();
    }

    private static void Pause()
    {
        Console.WriteLine("Press enter to continue...");
        Console.ReadLine();
    }
}


public static class Dumper
{
    public static string DumpToHtmlString<T>(this T objectToSerialize)
    {
        string strHTML = "";
        try
        {
            var writer = LINQPad.Util.CreateXhtmlWriter(true);
            writer.Write(objectToSerialize);
            strHTML = writer.ToString();
        }
        catch (Exception exc)
        {
            Debug.Assert(false, "Investigate why ?" + exc);
        }

        var shower = new Thread(
            () =>
                {
                    var dumpWin = new Window();
                    var browser = new WebBrowser();
                    dumpWin.Content = browser;
                    browser.NavigateToString(strHTML);
                    dumpWin.ShowDialog();                        
                });
        shower.SetApartmentState(ApartmentState.STA);
        shower.Start();
        return strHTML;
    }

    public static string Dump(this object value)
    {
         return JsonConvert.SerializeObject(value, Formatting.Indented);
    }
}


回答2:

Robert, I don't think that what you want to achieve is really possible to do without actively manipulating file system data structures for a file system which, from the sounds of it, is mounted. I don't think I have to tell you how dangerous and unwise this sort of exercise it.

But if you need to do it, I guess I can give you a "sketch on the back of a napkin" to get you started:

You could leverage the "sparse file" support of NTFS to simply add "gaps" by tweaking the LCN/VCN mappings. Once you do, just open the file, seek to the new location and write your data. NTFS will transparently allocate the space and write the data in the middle of the file, where you created a hole.

For more, look at this page about defragmentation support in NTFS for hints on how you can manipulate things a bit and allow you to insert clusters in the middle of the file. At least by using the sanctioned API for this sort of thing, you are unlikely to corrupt the filesystem beyond repair, although you can still horribly hose your file, I guess.

Get the retrieval pointers for the file that you want, split them where you need, to add as much extra space as you need, and move the file. There's an interesting chapter on this sort of thing in the Russinovich/Ionescu "Windows Internals" book (http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-Developer/dp/0735625301)



回答3:

Abstract question, abstract answer:

It is certainly possible to do this in FAT and probably in most other FS, you would essentially be fragmenting the file, rather than the more common process of defragmenting.

FAT is organized with around cluster pointers which produce a chain of cluster numbers where data is stored, the first link index is stored with the file record, the second one is stored in the allocation table at index [the first link's number] etc. It's possible to insert another link anywhere in the chain, for as long as the data you're inserting ends at the boundary of a cluster.

Chances are you'll have much easier time doing this in C by finding an open source library. While it's probably possible to do that in C# with PInvoke you won't find any good sample code floating around for you to get started.

I suspect you don't have any control over the file format (video files?), if you do it would be much easier to design your data storage to avoid the problem in the first place.



回答4:

No. What you are asking is not directly possible in Windows.

This is because in Windows, files are a logically contiguous collection of bytes, and it is not possible to insert bytes into the middle of the file without overwriting.

To understand why, let's conduct a thought experiment of what it would mean if it were possible.

Firstly, memory mapped files would suddenly become much more complicated. If we've mapped a file at a particular address, and then put some extra bytes in the middle of it, what would that mean for the memory mapping? Should the memory mapping now suddenly move? And if so, what happens to the program that doesn't expect it to?

Secondly, let's consider what happens with GetFilePointer if two handles are open to the same file, and one inserts extra bytes in the middle of that file. Let's suppose Process A has the file open for reading, and Process B has it open for reading and writing.

Process A wants to save it's location whilst doing a few reads, so it writes some code a bit like

DWORD DoAndThenRewind(HANDLE hFile, FARPROC fp){
   DWORD result;
   LARGEINTEGER zero = { 0 };
   LARGEINTEGER li;
   SetFilePointer(hFile, zero, &li, FILE_CURRENT);

   result = fp();

   SetFilePointer(hFile, &li, &li, FILE_BEGIN);
   return result;
}

Now what happens to this function if Process B wants to insert some extra bytes in the file? Well, if we add the bytes after where Process A currently is, all is fine - the file pointer (which is the linear address from the start of the file) remains the same before and after and all is well.

But if we add extra bytes in before where Process A is, well, suddenly our captured file pointers are all misaligned, and bad things start to happen.

Or to put it another way, adding bytes into the middle of the file means that we suddenly need to invent more clever ways of describing where we are in the file for rewinding purposes, since files are no longer a logically contiguous selection of bytes.

So heretofore we've discussed why it's probably a bad idea for Windows to expose this kind of functionality; but that doesn't really answer the question "is it actually possible". The answer here is still no. It is not possible.

Why? Because no such functionality is exposed to user mode programs to do this. As a user-mode program you have one mechanism for getting a handle to a file (NtCreateFile/NtOpenFile), you can read and write to it via NtReadFile/NtWriteFile, you can seek it and rename it and delete it via NtSetFileInformation, and you can release the handle reference via NtClose.

Even from kernel mode, you don't have many more options. The filesystem API is abstracted away from you, and filesystems treat files as logically contiguous collections of bytes, not as linked lists of byte ranges or anything that would make it easy to expose a method for you to insert non-overwriting bytes in the middle of a file.

That's not to say that it's not possible per-se. As others have mentioned, it's possible for you to open up the disk itself, pretend to be NTFS and alter the disk clusters assigned to a particular FCB directly. But doing so is brave. NTFS is barely documented, is complex, subject to change, and difficult to modify even when it's not mounted by the OS, never mind when it is.

So the answer, I'm afraid is no. It is not possible via normal safe Windows mechanisms to add extra bytes to the middle of a file as an insertion rather than as an overwriting operation.

Instead, consider looking at your problem to see whether it's appropriate for you to chunk up your files into smaller files and have an index file. That way you'll be able to modify the index file to insert extra chunks. By breaking your reliance on data needing to reside in one file, you'll find it easier to avoid the filesystem's requirement that a file is logically contiguous collection of bytes. You'll then be able to modify the index file to add extra chunks to your "pseduofile" without needing to read the entire pseudofile into memory.



回答5:

You don't need to (and probably can't) modify the file access table. You can achieve the same using a filter-driver or a stackable FS. Let us consider a cluster size of 4K. I am merely writing out the design for reasons I explain at the end.

  1. Creation of a new file will a layout-map of the file in a header. The header will mention the number of entries and a list of entries. The size of the header will be the same as the size of the cluster. For simplicity let the header be of fixed size with 4K entries. For example suppose there was a file of say 20KB the header may mention: [DWORD:5][DWORD:1][DWORD:2][DWORD:3][DWORD:4][DWORD:5]. This file currently has had no insertions.

  2. Suppose someone inserts a cluster after sector 3. You can add it to the end of the file and change the layout-map to: [5][1][2][3][5][6][4]

  3. Suppose someone needs to seek to cluster 4. You will need to access the layout-map and calculate the offset and then seek to it. It will be after the first 5 clusters so will start at 16K.

  4. Suppose someone reads or writes serially to the file. The reads and writes will have to map the same way.

  5. Suppose the header has only one more entry left: we will need to extend it by having a pointer to a new cluster at the end of the file using the same format as the other pointers above. To know that we have more than one cluster all we need to do is to look at the number of items and calculate the number of clusters that are needed to store it.

You can implement all of the above using a filter driver on Windows or a stackable file-system (LKM) on Linux. Implementing the basic level of functionality is on the level of a grad-school mini project in difficulty. Getting this to work as a commercial filesystem can be quite challenging especially since you don't want to affect IO speeds.

Note that the above filter will not be affected by any change in disk layout / defragmentation etc. You can also defragment your own file if you think it will be helpful.



回答6:

Do you understand that it's nearly 99.99% impossible insert non-aligned data in non-aligned places? (Maybe some hack based on compression can be used.) I think that you do.

The "easiest" solution is to create the sparse run records and then write over the sparse ranges.

  1. Do something with the NTFS cache. It's best to perform the operations on the offline/unmounted drive.
  2. Get the file record (@JerKimball's answer sounds helpful, but stops short of it). There may be problems if the file is overflown with attributes and they are stored away.
  3. Get to the file's data run list. The data run concept and format is described here (http://inform.pucp.edu.pe/~inf232/Ntfs/ntfs_doc_v0.5/concepts/data_runs.html) and some other NTFS format data can be seen on the adjacent pages.
  4. Iterate through data runs, accumulating the file length, to find the correct insertion spot.
  5. You'll most probably find that your insertion point is in the middle of the run. You'll need to split the run which is not hard. (Just store away the two resulting runs for now.)
  6. Creating a sparse run record is very easy. It's just the run length (in clusters) prepended by the byte, which contains the byte size of the length in it's lower 4 bits (the higher 4 bits should be zero to indicate a spare run).
  7. Now you need to calculate how many additional bytes you have to insert in the data runs list, somehow make way for them and do the insertion/replacement.
  8. Then you need to fix the file size attribute to make it consistent with the runs.
  9. Finally you can mount the drive and write the inserted information over the spare spots.


回答7:

It all really depends on what the original problem is, that is what you're trying to achieve. Modification of a FAT / NTFS table is not the problem, it's a solution to your problem -- potentially elegant and efficient, but more likely highly dangerous and inappropriate. You mentioned that you have no control over the users' systems where it will be used, so presumably for at least some of them the administrator would object against hacking into the file system internals.

Anyways, let's get back to the problem. Given the incomplete information, several use cases may be imagined, and the solution will be either easy or difficult depending on the use case.

  1. If you know that after the edit the file won't be needed for some time, then saving the edit in half a second is easy -- just close the window and let the application finish saving in the background, even if it takes half an hour. I know this sounds dumb, but this is a frequent use case -- once you finish editing your file, you save it, close the program, and you don't need that file anymore for a long time.

  2. Unless you do. Maybe the user decides to edit some more, or maybe another user comes along. In both cases your application can easily detect that the file is in the process of being saved to hard disk (for example you may have around a hidden guard file while the main file is being saved). In this case you would open a file as-is (partially saved), but present to the user the customized view of the file which makes it appear as if the file is in the final state. After all, you have all the information about which chunks of file have to be moved where.

  3. Unless the user needs to open the file immediately in another editor (this is not a very common case, especially for a very specialized file format, but then who knows). If so, do you have access to the source code of that other editor? Or can you talk to the developers of that other editor and persuade them to treat the incompletely saved file as if it was in the final state (it's not that hard -- all it takes is to read the offset information from the guard file). I would imagine the developers of that other editor are equally frustrated with long save times and would gladly embrace your solution as it would help their product.

  4. What else could we have? Maybe the user wants to immediately copy or move the file somewhere else. Microsoft probably won't change Windows Explorer for your benefit. In that case you would either need to implement the UMDF driver, or plainly forbid the user to do so (for example rename the original file and hide it, leaving a blank placeholder in its place; when the user tries to copy the file at least he'll know something went wrong).

  5. Another possibility, which doesn't fit in the above hierarchy 1-4 nicely, comes up if you know beforehand which files will be edited. In that case you can "pre-sparse" the file inserting random gaps uniformly along the volume of the file. This is due to the special nature of your file format that you mentioned: there could be gaps of no data, provided that the links correctly point to following next data chunks. If you know which files will be edited (not unreasonable assumption -- how many 10Gb files lie around your hard drive?) you "inflate" the file before the user starts editing it (say, the night before), and then just move around these smaller chunks of data when you need to insert new data. This of course also relies on the assumption that you don't have to insert TOO much.

In any case, there's always more than one answer depending on what your users actually want. But my advice comes from a designer's perspective, not from programmer's.



回答8:

Edited - another approach - how about switching to Mac for this task? They have superior editing capabilities, with automation capabilities!

Edited - the original specs suggested the file was being modified a lot, instead it is modified once. Suggest as others have pointed out to do the operation in the background: copy to new file, delete old file, rename new file to old file.

I would abandon this approach. A database is what you're looking for./YR