For every field in an object, the CLR allocates a special structure, FieldDesc
, containing
metadata for the runtime and Reflection. A FieldDesc
contains information such as the field
offset, whether the field is static
or ThreadStatic
, public
or
private
, etc. To determine the layout of an object, we'll be looking specifically at the
offset metadata.
Before we can determine the layout of an object, we of course need to know the layout of a
FieldDesc
. A FieldDesc
contains 3 fields:
Offset | Type | Name |
0 | MethodTable* |
m_pMTOfEnclosingClass |
8 | DWORD |
(DWORD 1) |
12 | DWORD |
(DWORD 2) |
The CLR engineers designed their structures to be as small as
possible; because of that, all the metadata is actually stored as
bitfields in DWORD 1
and DWORD 2
.
Bits | Name | Description |
24 |
m_mb |
MemberDef metadata. This metadata is eventually used in
FieldInfo.MetadataToken after some manipulation. |
1 |
m_isStatic |
Whether the field is static |
1 |
m_isThreadLocal |
Whether the field is decorated with a ThreadStatic attribute |
1 |
m_isRVA |
(Relative Virtual Address) |
3 |
m_prot |
Access level |
1 |
m_requiresFullMbValue |
Whether |
Bits | Name | Description |
27 |
m_dwOffset |
Field offset |
5 |
m_type |
CorElementType of the field |
We can easily replicate a FieldDesc
in C# using the StructLayout
and
FieldOffset
attributes.
[StructLayout(LayoutKind.Explicit)] public unsafe struct FieldDesc { [FieldOffset(0)] private readonly void* m_pMTOfEnclosingClass; // unsigned m_mb : 24; // unsigned m_isStatic : 1; // unsigned m_isThreadLocal : 1; // unsigned m_isRVA : 1; // unsigned m_prot : 3; // unsigned m_requiresFullMbValue : 1; [FieldOffset(8)] private readonly uint m_dword1; // unsigned m_dwOffset : 27; // unsigned m_type : 5; [FieldOffset(12)] private readonly uint m_dword2; ...
Reading the bitfields themselves is easy using bitwise operations:
/// <summary> /// Offset in memory /// </summary> public int Offset => (int) (m_dword2 & 0x7FFFFFF); public int MB => (int) (m_dword1 & 0xFFFFFF); private bool RequiresFullMBValue => ReadBit(m_dword1, 31); ...
We perform a bitwise AND
operation on m_dword2
to get the value of the 27 bits
for m_dwOffset
.
111111111111111111111111111 (27 bits) = 0x7FFFFFF
I also made a small function for reading bits for convenience:
static bool ReadBit(uint b, int bitIndex) { return (b & (1 << bitIndex)) != 0; }
We won't write the code for retrieving all of the bitfields' values because we're only interested in
m_dwOffset
, but if you're interested, you can view the code for that here.
We'll also go back to MB
and RequiresFullMBValue
later.
Thankfully, we don't have to do anything too hacky for retrieving a FieldDesc
. Reflection
actually already has a way of getting a FieldDesc
.
FieldInfo.FieldHandle.Value
Value
points to a FieldInfo
's corresponding FieldDesc
, where it
gets all of its metadata. Therefore, we can write a method to get a FieldInfo
's
FieldDesc
counterpart.
public static FieldDesc* GetFieldDescForFieldInfo(FieldInfo fi) { if (fi.IsLiteral) { throw new Exception("Const field"); } FieldDesc* fd = (FieldDesc*) fi.FieldHandle.Value; return fd; }
Note: I throw an Exception
when the FieldInfo
is a literal
because you can't access the FieldHandle
of a literal (const
) field.
We'll wrap the above method in another method to let us get the FieldDesc
easier.
private const BindingFlags DefaultFlags = BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static;
public static FieldDesc* GetFieldDesc(Type t, string name, BindingFlags flags = DefaultFlags) { if (t.IsArray) { throw new Exception("Arrays do not have fields"); } FieldInfo fieldInfo = t.GetField(name, flags); return GetFieldDescForFieldInfo(fieldInfo); }
Earlier in the article, I said that the bitfield m_mb
is used for calculating a field's
metadata token, which is used in FieldInfo.MetadataToken
. However, it requires some
calculation to get the proper token. If we look at field.h line 171 in the CoreCLR
repo:
mdFieldDef GetMemberDef() const { LIMITED_METHOD_DAC_CONTRACT; // Check if this FieldDesc is using the packed mb layout if (!m_requiresFullMbValue) { return TokenFromRid(m_mb & enum_packedMbLayout_MbMask, mdtFieldDef); } return TokenFromRid(m_mb, mdtFieldDef); }
We can replicate GetMemberDef
like so:
public int MemberDef { get { // Check if this FieldDesc is using the packed mb layout if (!RequiresFullMBValue) { return TokenFromRid(MB & (int) MbMask.PackedMbLayoutMbMask, CorTokenType.mdtFieldDef); } return TokenFromRid(MB, CorTokenType.mdtFieldDef); } }
MbMask
:
enum MbMask { PackedMbLayoutMbMask = 0x01FFFF, PackedMbLayoutNameHashMask = 0xFE0000 }
TokenFromRid
can be replicated in C# like this:
static int TokenFromRid(int rid, CorTokenType tktype) { return rid | (int) tktype; }
CorTokenType
:
enum CorTokenType { mdtModule = 0x00000000, // mdtTypeRef = 0x01000000, // mdtTypeDef = 0x02000000, // mdtFieldDef = 0x04000000, // ...
Note: This was tested on 64-bit.
We'll make a struct
for testing:
struct Struct { private long l; private int i; public int Int => i; }
First, we'll make sure our metadata token matches the one Reflection has:
var fd = GetFieldDesc<Struct>("l"); var fi = typeof(Struct).GetField("l", BindingFlags.NonPublic | BindingFlags.Instance); Debug.Assert(fi.MetadataToken == fd->MemberDef); // passes!
Then we'll see how the runtime laid out Struct
:
Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset); == 0 Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 8
We'll verify we have the correct offset by writing an int
to s
's memory at the
offset of i
that i
's FieldDesc
gave us.
Struct s = new Struct(); IntPtr p = new IntPtr(&s); Marshal.WriteInt32(p, GetFieldDesc(typeof(Struct), "i")->Offset, 123); Debug.Assert(s.Int == 123); // passes!
i
is at offset 8
because the CLR sometimes puts the largest members
first in memory. However, there are some exceptions:
Let's see what happens when we put a larger value type inside Struct
.
struct Struct { private decimal d; private string s; private int i; }
This will cause the CLR to insert padding to align Struct
:
Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset); == 16 Console.WriteLine(GetFieldDesc(typeof(Struct), "s")->Offset); == 0 Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 8
This means there's 4
bytes of padding at offset 12
.
The CLR also doesn't insert padding at all if the struct
is explicitly laid out:
[StructLayout(LayoutKind.Explicit)] struct Struct { [FieldOffset(0)] private decimal d; [FieldOffset(16)] private int i; [FieldOffset(20)] private long l; }
Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset); == 0 Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset); == 20 Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 16
According to FieldDescs
of static
fields, they still have offsets. However,
their offset will be a big number, like 96. Static
fields are stored in the type's
MethodTable
(another internal structure).
You can make a method identical to C's offsetof
macro:
public static int OffsetOf<TType>(string fieldName) { return GetFieldDesc(typeof(TType), fieldName)->Offset; }
You may be thinking, why not just use Marshal.OffsetOf
? Well, because that's the
marshaled offset and it doesn't work with unmarshalable or reference types.
You can also make a class to print the layout of an object. I wrote one which can get the layout of any object (except arrays). You can get the code for that here.
Struct s = new Struct(); ObjectLayout<Struct> layout = new ObjectLayout<Struct>(ref s); Console.WriteLine(layout);
Output:
| Field Offset | Address | Size | Type | Name | Value | |--------------|--------------|------|---------|-----------|-------| | 0 | 0xD04A3FEE60 | 16 | Decimal | d | 0 | | 16 | 0xD04A3FEE70 | 4 | Int32 | i | 0 | | 20 | 0xD04A3FEE74 | 4 | Byte | (padding) | 0 | | 24 | 0xD04A3FEE78 | 8 | Int64 | s | 0 |