元数据内部机制

SunnyFan大约 11 分钟约 3216 字

元数据内部机制

简介

.NET 程序集包含丰富的元数据——类型定义、成员签名、自定义特性、程序集引用等。理解元数据的内部结构和访问方式，对于实现反射、序列化、AOP 和代码生成等高级功能至关重要。

特点

1.元数据结构 — TypeDef/MethodDef/FieldDef 表
2.反射原理 — Type/MethodInfo/FieldInfo 内部
3.自定义特性 — Attribute 的存储与读取
4.MetadataLoadContext — 轻量元数据加载
5.System.Reflection.Metadata — 高性能元数据读取

元数据表结构

PE/COFF 中的元数据

// .NET 程序集的元数据存储在 PE 文件的 #~ 流中
// 主要元数据表：
// ModuleDef       — 模块定义
// TypeDef         — 类型定义
// FieldDef        — 字段定义
// MethodDef       — 方法定义
// ParamDef        — 参数定义
// EventDef        — 事件定义
// PropertyDef     — 属性定义
// TypeRef         — 类型引用（其他程序集）
// MemberRef       — 成员引用
// AssemblyRef     — 程序集引用
// CustomAttribute — 自定义特性
// MethodImpl      — 方法实现（接口映射）

// 查看程序集元数据
// 使用 ildasm: ildasm MyAssembly.dll /text
// 使用 dnSpy: 打开 DLL 查看元数据表

// 使用反射查看元数据
var assembly = typeof(Program).Assembly;
Console.WriteLine($"程序集: {assembly.FullName}");

foreach (var type in assembly.GetTypes())
{
    Console.WriteLine($"\n类型: {type.FullName}");
    Console.WriteLine($"  基类: {type.BaseType?.Name}");
    Console.WriteLine($"  属性: {type.Attributes}");
    Console.WriteLine($"  IsClass: {type.IsClass}");
    Console.WriteLine($"  IsInterface: {type.IsInterface}");
    Console.WriteLine($"  IsValueType: {type.IsValueType}");

    foreach (var method in type.GetMethods(BindingFlags.Public | BindingFlags.Instance))
    {
        Console.WriteLine($"  方法: {method.Name}({string.Join(", ", method.GetParameters().Select(p => p.ParameterType.Name))})");
    }
}

Type 对象深入

// Type 对象是元数据访问的核心
// 每个 Type 对象对应元数据中的一个 TypeDef 行

// 获取 Type 的方式
// 1. typeof 运算符（编译时已知）
Type t1 = typeof(string);

// 2. object.GetType()（运行时实例）
string s = "hello";
Type t2 = s.GetType();

// 3. Type.GetType（全限定名）
Type? t3 = Type.GetType("System.String, System.Runtime");

// 4. Assembly.GetType
Type? t4 = Assembly.GetExecutingAssembly().GetType("MyNamespace.MyClass");

// Type 内部属性
Type type = typeof(List<int>);
Console.WriteLine($"Name: {type.Name}");              // List`1
Console.WriteLine($"FullName: {type.FullName}");      // System.Collections.Generic.List`1
Console.WriteLine($"Namespace: {type.Namespace}");    // System.Collections.Generic
Console.WriteLine($"Assembly: {type.Assembly.GetName().Name}"); // System.Collections
Console.WriteLine($"IsGenericType: {type.IsGenericType}");       // True
Console.WriteLine($"IsGenericTypeDefinition: {type.IsGenericTypeDefinition}"); // True
Console.WriteLine($"IsConstructedGenericType: {type.IsConstructedGenericType}"); // False

// 泛型类型
Type openType = typeof(List<>);         // 开放泛型
Type closedType = typeof(List<int>);    // 封闭泛型
Type constructed = openType.MakeGenericType(typeof(int)); // 构造封闭类型
Console.WriteLine(constructed == closedType); // True

// 泛型参数
Type def = typeof(Dictionary<,>);
Type[] args = def.GetGenericArguments(); // [TKey, TValue]
Console.WriteLine(args[0].Name); // TKey
Console.WriteLine(args[0].IsGenericParameter); // True

自定义特性机制

特性的存储和检索

// 自定义特性存储在 CustomAttribute 元数据表中
// 每行包含：Parent（关联的 TypeDef/MethodDef/...）、Constructor、Value（序列化参数）

// 定义特性
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class, AllowMultiple = false)]
class ApiVersionAttribute : Attribute
{
    public string Version { get; }
    public bool Deprecated { get; set; }
    public string? Description { get; set; }

    public ApiVersionAttribute(string version) => Version = version;
}

// 使用特性
[ApiVersion("2.0", Deprecated = false, Description = "用户API")]
class UserService
{
    [ApiVersion("2.1")]
    public void GetUser(int id) { }
}

// 检索特性
var typeAttr = typeof(UserService).GetCustomAttribute<ApiVersionAttribute>();
Console.WriteLine($"版本: {typeAttr?.Version}");    // 2.0
Console.WriteLine($"描述: {typeAttr?.Description}"); // 用户API

// 方法级特性
var methodAttr = typeof(UserService)
    .GetMethod("GetUser")?
    .GetCustomAttribute<ApiVersionAttribute>();
Console.WriteLine($"方法版本: {methodAttr?.Version}"); // 2.1

// 使用反射获取所有带特性的方法
var methods = typeof(UserService)
    .GetMethods()
    .Where(m => m.GetCustomAttribute<ApiVersionAttribute>() != null);

// 性能优化：使用 CustomAttributeData 避免实例化特性
var attrs = CustomAttributeData.GetCustomAttributes(typeof(UserService));
foreach (var attr in attrs)
{
    Console.WriteLine($"特性类型: {attr.AttributeType.Name}");
    foreach (var arg in attr.ConstructorArguments)
        Console.WriteLine($"  参数: {arg.Value}");
    foreach (var named in attr.NamedArguments)
        Console.WriteLine($"  命名参数: {named.MemberName} = {named.TypedValue.Value}");
}

MetadataLoadContext

轻量元数据加载

// MetadataLoadContext — 不执行代码，只读取元数据
// 适合代码分析、文档生成等场景

using System.Reflection;

// 创建 MetadataLoadContext
var paths = new[]
{
    @"C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.0\System.Runtime.dll",
    @"C:\Program Files\dotnet\shared\Microsoft.NETCore.App\8.0.0\System.Collections.dll",
};

var resolver = new PathAssemblyResolver(paths);
using var context = new MetadataLoadContext(resolver);

// 加载程序集（不执行任何代码）
Assembly assembly = context.LoadFromAssemblyPath("MyLibrary.dll");

foreach (Type type in assembly.GetTypes())
{
    Console.WriteLine($"Type: {type.FullName}");
    foreach (var method in type.GetMethods())
    {
        Console.WriteLine($"  Method: {method.Name}");
        foreach (var param in method.GetParameters())
        {
            Console.WriteLine($"    Param: {param.ParameterType.Name} {param.Name}");
        }
    }
}

// 适合场景：
// 1. 代码分析工具
// 2. API 文档生成
// 3. 源代码生成器
// 4. 依赖关系分析

System.Reflection.Metadata

高性能元数据读取

using System.Reflection.Metadata;
using System.Reflection.PortableExecutable;

// 高性能元数据读取（避免反射的开销）
using var stream = File.OpenRead("MyLibrary.dll");
using var peReader = new PEReader(stream);
var reader = peReader.GetMetadataReader();

// 遍历所有类型定义
foreach (var typeDefHandle in reader.TypeDefinitions)
{
    TypeDefinition typeDef = reader.GetTypeDefinition(typeDefHandle);
    string? name = reader.GetString(typeDef.Name);
    string? @namespace = reader.GetString(typeDef.Namespace);

    if (string.IsNullOrEmpty(@namespace)) continue;
    Console.WriteLine($"{@namespace}.{name}");

    // 遍历方法
    foreach (var methodHandle in typeDef.GetMethods())
    {
        MethodDefinition method = reader.GetMethodDefinition(methodHandle);
        string methodName = reader.GetString(method.Name);
        Console.WriteLine($"  {methodName}()");
    }

    // 遍历字段
    foreach (var fieldHandle in typeDef.GetFields())
    {
        FieldDefinition field = reader.GetFieldDefinition(fieldHandle);
        string fieldName = reader.GetString(field.Name);
        Console.WriteLine($"  {fieldName}");
    }
}

// 读取自定义特性
foreach (var attrHandle in reader.CustomAttributes)
{
    CustomAttribute attr = reader.GetCustomAttribute(attrHandle);

    // 获取特性构造函数
    if (attr.Constructor.Kind == HandleKind.MemberReference)
    {
        var ctor = reader.GetMemberReference((MemberReferenceHandle)attr.Constructor);
        string attrName = reader.GetString(ctor.Name);
        Console.WriteLine($"特性: {attrName}");
    }

    // 解码特性参数
    var args = attr.DecodeValue(new CustomAttributeTypeProvider());
    foreach (var arg in args.FixedArguments)
    {
        Console.WriteLine($"  参数: {arg.Value}");
    }
}

// 简单的 CustomAttributeTypeProvider
class CustomAttributeTypeProvider : ICustomAttributeTypeProvider<object?>
{
    public object? GetPrimitiveType(PrimitiveTypeCode typeCode) => null;
    public object? GetSystemType() => null;
    public object? GetSZArrayType(object? elementType) => null;
    public object? GetTypeFromDefinition(MetadataReader reader, TypeDefinitionHandle handle, byte rawTypeKind) => null;
    public object? GetTypeFromReference(MetadataReader reader, TypeReferenceHandle handle, byte rawTypeKind) => null;
    public object? GetTypeFromSerializedName(string name) => null;
    public PrimitiveTypeCode GetUnderlyingEnumType(object? type) => PrimitiveTypeCode.Int32;
    public bool IsEnumType(object? type) => false;
    public bool IsSystemType(object? type) => false;
}

反射性能优化

/// <summary>
/// 反射的性能问题与优化策略
/// </summary>

// 反射慢的原因：
// 1. 需要遍历元数据表查找类型信息
// 2. 安全检查（访问权限验证）
// 3. 参数装箱拆箱
// 4. 创建包装对象（MethodInfo、ParameterInfo 等）

// 优化 1：缓存反射结果
public class ReflectionCache
{
    private readonly ConcurrentDictionary<string, PropertyInfo> _propertyCache = new();

    public PropertyInfo? GetProperty(Type type, string name)
    {
        string key = $"{type.FullName}.{name}";
        return _propertyCache.GetOrAdd(key, _ => type.GetProperty(name)!);
    }

    public object? GetValue(object obj, string propertyName)
    {
        var prop = GetProperty(obj.GetType(), propertyName);
        return prop?.GetValue(obj);
    }
}

// 优化 2：使用 Delegate.CreateDelegate 替代 MethodInfo.Invoke
public static class FastInvoker
{
    private static readonly ConcurrentDictionary<MethodInfo, Delegate> _delegateCache = new();

    public static Func<object, object?> CreateGetter(PropertyInfo property)
    {
        var method = property.GetMethod!;
        if (!_delegateCache.TryGetValue(method, out var del))
        {
            // 为实例属性创建委托
            var openDelegate = (Func<object, object?>)Delegate.CreateDelegate(
                typeof(Func<object, object?>), null, method);
            _delegateCache[method] = openDelegate;
            del = openDelegate;
        }
        return (Func<object, object?>)del;
    }

    // 使用
    public static object? FastGetValue(object obj, PropertyInfo property)
    {
        var getter = CreateGetter(property);
        return getter(obj);
        // 比 property.GetValue(obj) 快 10-50 倍
    }
}

// 优化 3：使用 Expression 编译委托
public static class ExpressionInvoker
{
    public static Func<T, object?> CompileGetter<T>(PropertyInfo property)
    {
        var param = Expression.Parameter(typeof(T), "obj");
        var conversion = Expression.Convert(param, typeof(object));
        var propertyAccess = Expression.Property(param, property);
        var body = Expression.Convert(propertyAccess, typeof(object));
        return Expression.Lambda<Func<T, object?>>(body, param).Compile();
    }

    // 编译后的委托接近直接调用的性能
    var getter = ExpressionInvoker.CompileGetter<User>(typeof(User).GetProperty("Name")!);
    object? name = getter(user);
}

// 优化 4：使用 source generator 替代反射
// 编译时生成访问代码，运行时零开销
// 适用于已知类型集合的场景

优点

1.自描述 — 程序集包含完整的类型信息
2.反射能力 — 运行时动态检查和调用
3.高性能读取 — MetadataReader 直接读取元数据表
4.生态丰富 — AOP、序列化、ORM 都依赖元数据

缺点

1.反射性能 — 比直接调用慢 10-100 倍
2.复杂性 — 元数据 API 庞大难学
3.Trim 不友好 — 反射可能被 AOT 裁剪
4.安全风险 — 反射可以绕过访问控制

AOT 与 Trim 对元数据的影响

Native AOT 中的元数据裁剪

/// <summary>
/// .NET Native AOT 编译对元数据的影响
/// </summary>

// Native AOT 编译时，未使用的元数据会被裁剪
// 这可能导致运行时反射失败

// ❌ 通过字符串获取类型 — AOT 可能裁剪该类型
Type? type = Type.GetType("MyNamespace.MyClass");
// type 可能为 null（类型被裁剪了）

// ✅ 使用 typeof 直接引用 — 编译器知道需要保留
Type type = typeof(MyNamespace.MyClass);

// ❌ 通过名称调用方法 — AOT 可能裁剪该方法
var method = type.GetMethod("ProcessData");
method?.Invoke(instance, null);

// ✅ 使用 DynamicallyAccessedMembers 标注保留
[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.All)]
public class MyClass
{
    public void ProcessData() { }
}

// 或者使用 TrimmerRootDescriptor XML 文件
// 指定需要保留哪些类型和成员

反射与 AOT 兼容性策略

/// <summary>
/// 编写 AOT 兼容的代码
/// </summary>

// 策略 1：使用 Source Generator 替代反射
// 编译时生成代码，无需运行时反射
// System.Text.Json 源生成模式就是典型例子

// 策略 2：使用 FunctionPointer 替代 Delegate 反射
// 策略 3：使用 LibraryImport 替代 DllImport 反射
// 策略 4：限制反射范围，使用注解标记保留

// 实际示例：AOT 友好的 JSON 序列化
[JsonSerializable(typeof(User))]
[JsonSerializable(typeof(Order))]
public partial class AppJsonContext : JsonSerializerContext
{
    // Source Generator 会生成 User 和 Order 的序列化代码
    // 无需运行时反射
}

// 使用
string json = JsonSerializer.Serialize(user, AppJsonContext.Default.User);
User? deserialized = JsonSerializer.Deserialize<User>(json, AppJsonContext.Default.User);

元数据表内部结构详解

PE 文件中的元数据布局

/// <summary>
/// .NET 程序集的 PE 文件结构
/// </summary>

// PE 文件结构：
// ┌─────────────────────────┐
// │ DOS Header              │
// │ PE Signature            │
// │ COFF Header             │
// │ Optional Header         │
// │ Section Headers         │
// ├─────────────────────────┤
// │ .text Section           │ ← IL 代码
// │ .rsrc Section           │ ← 资源
// │ .reloc Section          │ ← 重定位
// ├─────────────────────────┤
// │ CLI Header (.text)      │ ← CLR 入口点
// │ Metadata Root           │ ← 元数据根
// │   Stream #~             │ ← 元数据表
// │   Stream #Strings       │ ← 字符串堆
// │   Stream #Blob          │ ← 二进制大对象堆
// │   Stream #GUID          │ ← GUID 堆
// │   Stream #US            │ ← 用户字符串堆
// └─────────────────────────┘

// 元数据表之间的关系：
// Module → TypeDef → FieldDef / MethodDef / EventDef / PropertyDef
// TypeDef → TypeRef（引用外部类型）
// MethodDef → ParamDef
// TypeRef / MemberRef → AssemblyRef（引用外部程序集）
// CustomAttribute → 任何元数据表

// 使用 dnSpy 或 ILSpy 可以可视化查看这些表

元数据令牌（Token）

/// <summary>
/// 元数据令牌的结构
/// </summary>

// 元数据令牌是一个 4 字节的值：
// [表索引 (1 byte)] [行号 (3 bytes)]

// 例如：
// 0x02000001 — TypeDef 表第 1 行
// 0x06000003 — MethodDef 表第 3 行
// 0x04000002 — FieldDef 表第 2 行
// 0x0A000001 — MemberRef 表第 1 行
// 0x23000001 — AssemblyRef 表第 1 行

// 常见表索引：
// 0x01: TypeRef        0x02: TypeDef
// 0x04: FieldDef       0x06: MethodDef
// 0x08: ParamDef       0x09: InterfaceImpl
// 0x0A: MemberRef      0x0B: Constant
// 0x0C: CustomAttribute 0x0E: DeclSecurity
// 0x11: Event          0x14: Property
// 0x23: AssemblyRef

// 使用 MetadataToken
var method = typeof(string).GetMethod("Length")!;
Console.WriteLine(method.MetadataToken); // 0x06000xxx
Console.WriteLine(method.MetadataToken.Table); // MethodDef
Console.WriteLine(method.MetadataToken RID);   // 行号

.NET 元数据存储在 PE 文件的元数据表中（TypeDef、MethodDef、FieldDef 等）。Type 对象是反射的核心入口，对应元数据中的一个 TypeDef 行。自定义特性存储在 CustomAttribute 表中，通过 GetCustomAttribute<T>() 检索。MetadataLoadContext 提供不执行代码的轻量元数据加载。System.Reflection.Metadata 的 MetadataReader 提供最高性能的元数据直接读取。反射性能慢的原因：需要遍历元数据表、进行安全检查、创建包装对象。