数据结构

SunnyFan大约 14 分钟约 4064 字

数据结构

简介

C# 提供了丰富的内置数据结构，涵盖数组、列表、链表、队列、栈、字典、集合等。不同的数据结构在增删查改操作上有不同的时间复杂度，选择合适的数据结构是编写高效代码的基础。

数据结构的选择直接影响程序的性能特征。一个 O(1) 查找的字典和 O(n) 查找的列表，在百万级数据量下性能差距可能达到数百倍。理解各数据结构的内部实现原理，是做出正确选择的前提。

特点

1.泛型集合 — List<T>、Dictionary<K,V> 等保证类型安全
2.丰富的 BCL — 内置队列、栈、字典、有序集合等
3.线程安全版本 — ConcurrentDictionary、ConcurrentQueue 等
4.Span/Memory — 高性能零拷贝访问连续内存
5.ReadOnly 集合 — 防止外部修改内部数据
6.不可变集合 — ImmutableArray/ImmutableDictionary

数据结构对比

数据结构	查询	插入	删除	有序	线程安全	适用场景
Array	O(1)	O(n)	O(n)	是	否	固定大小、高频索引
`List<T>`	O(n)	O(1)末尾	O(n)	是	否	动态数组，最常用
`LinkedList<T>`	O(n)	O(1)	O(1)	否	否	频繁头尾增删
`Queue<T>`	O(n)	O(1)	O(1)	FIFO	否	任务队列、BFS
`Stack<T>`	O(n)	O(1)	O(1)	LIFO	否	撤销操作、DFS
`HashSet<T>`	O(1)	O(1)	O(1)	否	否	去重、集合运算
`SortedSet<T>`	O(log n)	O(log n)	O(log n)	有序	否	排序+去重
`Dictionary<K,V>`	O(1)	O(1)	O(1)	否	否	键值查找
`SortedDictionary<K,V>`	O(log n)	O(log n)	O(log n)	有序	否	有序键值对
`PriorityQueue<T,E>`	O(1) Peek	O(log n)	O(log n)	按优先级	否	优先级队列

Array 数组

数组存储在连续内存上，元素类型相同，通过下标直接访问。

1.内存连续 — 索引访问 O(1)，CPU 缓存友好
2.类型固定 — 声明时确定元素类型
3.长度固定 — 创建后不可改变大小

// 声明和初始化
int[] numbers = new int[5];
string[] names = new[] { "张三", "李四", "王五" };

// 多维数组
int[,] matrix = new int[3, 4];
int[][] jagged = new int[3][];

// ==========================================
// 数组的高级操作
// ==========================================

// Array.Copy — 高效复制
var source = new[] { 1, 2, 3, 4, 5 };
var destination = new int[5];
Array.Copy(source, destination, source.Length);

// Span<T> — 零拷贝切片
var array = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
Span<int> slice = array.AsSpan(2, 4); // [3,4,5,6]
// 修改 slice 会修改原数组
slice[0] = 100;
Console.WriteLine(array[2]); // 100

// Array.Fill (.NET Core 2.0+)
var buffer = new int[1000];
Array.Fill(buffer, -1); // 全部填充为 -1

// 数组排序
var unsorted = new[] { 5, 3, 8, 1, 9 };
Array.Sort(unsorted);
Array.Sort(unsorted, (a, b) => b.CompareTo(a)); // 降序

// Array.BinarySearch — 二分查找（数组必须已排序）
var sorted = new[] { 1, 3, 5, 7, 9 };
int index = Array.BinarySearch(sorted, 5); // 2
if (index < 0) Console.WriteLine("未找到");

// 多维数组遍历
var grid = new int[3, 4];
for (int row = 0; row < grid.GetLength(0); row++)
    for (int col = 0; col < grid.GetLength(1); col++)
        grid[row, col] = row * 4 + col;

// 交错数组（每行长度可以不同）
var jaggedArray = new int[3][];
jaggedArray[0] = new int[2] { 1, 2 };
jaggedArray[1] = new int[3] { 3, 4, 5 };
jaggedArray[2] = new int[1] { 6 };

提示

数组是最基础的数据结构，索引速度快。但长度固定，插入和删除元素需要移动数据，效率低。当元素数量不确定时，推荐使用 List<T>。

ArrayList（不推荐）

非泛型集合，所有元素当作 Object 处理。

1.长度可变 — 动态扩容
2.类型不安全 — 存在装箱拆箱开销

// 不推荐使用，请用 List<T> 替代
ArrayList list = new ArrayList();
list.Add("string");
list.Add(123);  // 装箱操作

注意

ArrayList 已过时，存在装箱拆箱性能问题和类型不安全风险。请始终使用 List<T> 替代。

`List<T>` 泛型列表

最常用的数据结构，内部使用数组实现，支持动态扩容。

1.类型安全 — 编译时检查类型
2.动态扩容 — 容量不足时自动翻倍
3.索引访问 — O(1) 按下标读取

// 基本操作
var list = new List<string> { "苹果", "香蕉" };
list.Add("橘子");           // 添加
list.Insert(1, "葡萄");     // 插入
list.Remove("香蕉");        // 按值删除
list.RemoveAt(0);           // 按索引删除
list.Sort();                // 排序
list.Contains("橘子");      // 查找

// 高效操作
var filtered = list.Where(x => x.StartsWith("橘")).ToList();
var names = list.Select(x => x.ToUpper()).ToList();

// ==========================================
// List 的高级用法
// ==========================================

// 预分配容量 — 避免扩容
var bigList = new List<int>(100000); // 已知大小

// RemoveAll — 批量删除
list.RemoveAll(x => x.Length < 2); // 删除所有长度小于2的

// Find/FindAll/FindIndex — 查找
var found = list.Find(x => x == "苹果");
var foundAll = list.FindAll(x => x.Contains("果"));
var foundIndex = list.FindIndex(x => x == "橘子");

// TrueForAll — 检查所有元素
bool allLong = list.TrueForAll(x => x.Length > 1);

// ConvertAll — 类型转换
var lengths = list.ConvertAll(x => x.Length);

// ForEach — 遍历操作（注意：不能修改列表）
list.ForEach(x => Console.WriteLine(x));

// BinarySearch — 二分查找（列表必须已排序）
list.Sort();
int idx = list.BinarySearch("橘子");

// AddRange — 批量添加
list.AddRange(new[] { "西瓜", "芒果", "菠萝" });

// InsertRange — 批量插入
list.InsertRange(0, new[] { "第一", "第二" });

// GetRange — 获取子列表
var subList = list.GetRange(1, 3); // 从索引1开始取3个

// Capacity vs Count — 内存优化
list.TrimExcess(); // 将容量缩小到实际元素数

提示

List<T> 是日常开发的首选集合。注意：频繁在中间插入/删除时性能较差（O(n)），此时考虑 LinkedList<T>。可以用 Capacity 构造参数预分配大小避免频繁扩容。

`LinkedList<T>` 双向链表

元素在内存中不连续存储，每个节点保存前后节点的引用。

1.头尾增删快 — O(1) 插入和删除
2.不支持索引 — 查找需遍历 O(n)

var linked = new LinkedList<string>();
linked.AddLast("A");
linked.AddLast("B");
linked.AddFirst("Z");  // 头部插入 O(1)

var node = linked.Find("A");
linked.AddBefore(node!, "C");  // 节点前插入 O(1)
linked.Remove(node);           // 删除节点 O(1)

// ==========================================
// LinkedList 的实际应用 — LRU 缓存
// ==========================================
public class LRUCache<TKey, TValue> where TKey : notnull
{
    private readonly int _capacity;
    private readonly Dictionary<TKey, LinkedListNode<LRUItem>> _map = new();
    private readonly LinkedList<LRUItem> _list = new();

    public LRUCache(int capacity) => _capacity = capacity;

    public TValue? Get(TKey key)
    {
        if (!_map.TryGetValue(key, out var node))
            return default;

        // 移到链表头部（最近使用）
        _list.Remove(node);
        _list.AddFirst(node);
        return node.Value.Value;
    }

    public void Set(TKey key, TValue value)
    {
        if (_map.TryGetValue(key, out var existingNode))
        {
            existingNode.Value = new LRUItem(key, value);
            _list.Remove(existingNode);
            _list.AddFirst(existingNode);
            return;
        }

        if (_map.Count >= _capacity)
        {
            // 移除最久未使用的
            var last = _list.Last!;
            _map.Remove(last.Value.Key);
            _list.RemoveLast();
        }

        var newNode = _list.AddFirst(new LRUItem(key, value));
        _map[key] = newNode;
    }

    private record LRUItem(TKey Key, TValue Value);
}

`Queue<T>` 队列

先进先出（FIFO）的数据结构。

1.先进先出 — Enqueue 入队、Dequeue 出队
2.动态扩容 — 默认初始容量 32，增长因子 2.0

var queue = new Queue<string>();
queue.Enqueue("任务1");
queue.Enqueue("任务2");
queue.Enqueue("任务3");

var first = queue.Dequeue();  // "任务1"
var peek = queue.Peek();      // "任务2"（不移除）

// ==========================================
// Queue 的实际应用 — 广度优先搜索
// ==========================================
public class TreeNode
{
    public int Value { get; set; }
    public List<TreeNode> Children { get; } = new();
}

public static List<int> Bfs(TreeNode root)
{
    var result = new List<int>();
    var queue = new Queue<TreeNode>();
    queue.Enqueue(root);

    while (queue.Count > 0)
    {
        var current = queue.Dequeue();
        result.Add(current.Value);
        foreach (var child in current.Children)
            queue.Enqueue(child);
    }
    return result;
}

// TryDequeue — 安全出队
if (queue.TryDequeue(out var item))
    Process(item);

// 队列转数组
var items = queue.ToArray();

`Stack<T>` 栈

后进先出（LIFO）的数据结构。

1.后进先出 — Push 入栈、Pop 出栈
2.默认容量 — 10，支持动态扩容

var stack = new Stack<string>();
stack.Push("步骤1");
stack.Push("步骤2");
stack.Push("步骤3");

var top = stack.Pop();   // "步骤3"
var peek = stack.Peek(); // "步骤2"（不移除）

// ==========================================
// Stack 的实际应用 — 表达式求值
// ==========================================
public static int EvaluateExpression(string expression)
{
    var stack = new Stack<int>();
    var tokens = expression.Split(' ');

    foreach (var token in tokens)
    {
        if (int.TryParse(token, out var number))
        {
            stack.Push(number);
        }
        else
        {
            var b = stack.Pop();
            var a = stack.Pop();
            var result = token switch
            {
                "+" => a + b,
                "-" => a - b,
                "*" => a * b,
                "/" => a / b,
                _ => throw new InvalidOperationException($"未知运算符: {token}")
            };
            stack.Push(result);
        }
    }

    return stack.Pop();
}

// 使用: "3 4 + 5 *" → (3+4)*5 = 35
Console.WriteLine(EvaluateExpression("3 4 + 5 *"));

// TryPop — 安全出栈
if (stack.TryPop(out var item))
    Process(item);

`HashSet<T>` 哈希集合

基于哈希表的无序集合，元素唯一。

1.去重 — 自动去除重复元素
2.集合运算 — 交集、并集、差集

var set = new HashSet<int> { 1, 2, 3, 4, 5 };
set.Add(3);   // 已存在，不重复添加
set.Add(6);   // 添加成功

var other = new HashSet<int> { 3, 4, 5, 6, 7 };
set.IntersectWith(other);   // 交集：{ 3, 4, 5, 6 }
set.UnionWith(other);       // 并集
set.ExceptWith(other);      // 差集

// ==========================================
// HashSet 的高级用法
// ==========================================

// 对称差集（两集合各自独有的元素）
set.SymmetricExceptWith(other);

// 超集/子集判断
set.IsSubsetOf(other);      // 是否是 other 的子集
set.IsSupersetOf(other);    // 是否是 other 的超集
set.Overlaps(other);        // 是否有交集
set.SetEquals(other);       // 两个集合是否相同

// 自定义比较器
var caseInsensitiveSet = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
caseInsensitiveSet.Add("Hello");
caseInsensitiveSet.Add("hello"); // 不会重复添加

// 按属性去重
public class PersonComparer : IEqualityComparer<Person>
{
    public bool Equals(Person? x, Person? y) => x?.Id == y?.Id;
    public int GetHashCode(Person obj) => obj.Id.GetHashCode();
}
var personSet = new HashSet<Person>(new PersonComparer());

`Dictionary<K,V>` 字典

键值对集合，查找速度极快（O(1)），内部使用哈希表实现。

1.快速查找 — O(1) 按键查找
2.空间换时间 — 更多内存开销换取速度

var dict = new Dictionary<string, int>
{
    ["苹果"] = 5,
    ["香蕉"] = 3,
    ["橘子"] = 8
};

dict["葡萄"] = 2;             // 添加或更新
dict.TryGetValue("苹果", out int count);  // 安全查找

// 遍历
foreach (var kv in dict)
{
    Console.WriteLine($"{kv.Key}: {kv.Value}");
}

// ==========================================
// Dictionary 的高级用法
// ==========================================

// TryAdd — 不存在才添加
dict.TryAdd("西瓜", 10);      // 添加成功
dict.TryAdd("苹果", 20);      // 已存在，不覆盖

// Remove — 删除并获取值
dict.Remove("香蕉", out int removedValue);

// GetValueOrDefault — 不存在返回默认值
var mango = dict.GetValueOrDefault("芒果", 0);

// 自定义比较器
var caseInsensitive = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);
caseInsensitive["Hello"] = 1;
Console.WriteLine(caseInsensitive["hello"]); // 1

// 初始化器 — 带初始数据
var config = new Dictionary<string, string>
{
    ["host"] = "localhost",
    ["port"] = "8080",
    ["timeout"] = "30"
};

// 遍历 Keys 和 Values
foreach (var key in dict.Keys)
    Console.WriteLine(key);
foreach (var value in dict.Values)
    Console.WriteLine(value);

// ContainsValue — 按值查找（O(n)）
bool hasValue = dict.ContainsValue(5);

// 清空
dict.Clear();

// Count 和 ContainsKey
int count = dict.Count;
bool exists = dict.ContainsKey("苹果");

提示

字典是"以空间换时间"的典型实现。创建时可以传入初始容量减少扩容开销。注意：空字典也会分配 2 个长度为 3 的数组，数据量少时慎用。

线程安全集合

System.Collections.Concurrent 命名空间下的并发集合，适合多线程场景。

1.ConcurrentDictionary — 线程安全字典
2.ConcurrentQueue — 线程安全队列
3.ConcurrentStack — 线程安全栈
4.ConcurrentBag — 线程安全无序集合

// 线程安全字典
var concurrent = new ConcurrentDictionary<string, int>();
concurrent.TryAdd("key1", 1);
concurrent.AddOrUpdate("key1", 1, (k, v) => v + 1);

// 线程安全队列（生产者-消费者模式）
var queue = new ConcurrentQueue<string>();
// 生产者
queue.Enqueue("任务");
// 消费者
if (queue.TryDequeue(out var item))
{
    Console.WriteLine(item);
}

// ==========================================
// ConcurrentDictionary 陷阱
// ==========================================

// GetOrAdd 的工厂方法可能被多次调用
// 如果多个线程同时 GetOrAdd 同一个不存在的 key
concurrent.GetOrAdd("expensive", key => ComputeExpensiveValue(key));
// ComputeExpensiveValue 可能被并行执行多次

// 安全方式 — 使用 Lazy<T>
concurrent.GetOrAdd("safe", key => new Lazy<int>(() => ComputeExpensiveValue(key))).Value;

// ==========================================
// BlockingCollection — 阻塞式集合
// ==========================================
var blocking = new BlockingCollection<string>(boundedCapacity: 10);

// 生产者
Task.Run(() =>
{
    for (int i = 0; i < 100; i++)
        blocking.Add($"item_{i}");
    blocking.CompleteAdding();
});

// 消费者
Task.Run(() =>
{
    foreach (var item in blocking.GetConsumingEnumerable())
        Process(item);
});

ReadOnly 集合

防止外部修改内部数据，暴露只读视图。

var list = new List<int> { 1, 2, 3, 4, 5 };
IReadOnlyList<int> readOnly = list.AsReadOnly();

// 或直接使用接口返回
public IReadOnlyDictionary<string, int> GetConfig()
{
    return _config;
}

// ==========================================
// ReadOnly 集合的选择
// ==========================================

// IReadOnlyList<T> — 只读列表（索引访问）
// IReadOnlyCollection<T> — 只读集合（只有 Count）
// IEnumerable<T> — 只读可遍历（最小接口）
// ReadOnlyCollection<T> — 包装器，可以强制转换回 List
// ImmutableList<T> — 真正不可变，修改返回新实例
// ImmutableArray<T> — 不可变数组，性能更好

// 返回值类型建议：
// 方法参数 → IReadOnlyList<T> 或 IEnumerable<T>
// 属性 → IReadOnlyList<T> 或 IReadOnlyDictionary<K,V>
// 需要防御性复制 → .AsReadOnly() 或 .ToImmutableList()

优点

1.类型安全 — 泛型集合编译时检查类型
2.丰富选择 — 数组、列表、链表、字典、集合覆盖所有场景
3.线程安全 — Concurrent 系列支持多线程访问
4.LINQ 友好 — 所有集合支持 LINQ 查询
5.不可变集合 — 线程安全的共享数据
6.Span 集成 — 零拷贝高性能访问

缺点

1.选择困难 — 多种数据结构需理解时间复杂度才能正确选择
2.装箱开销 — 非泛型集合（ArrayList、Hashtable）存在性能问题
3.内存开销 — 字典和 HashSet 比数组占用更多内存
4.扩容抖动 — List、Dictionary 扩容时有瞬时性能下降
5.并发代价 — 线程安全集合有同步开销
6.不可变成本 — 不可变集合修改时需要复制

总结

C# 数据结构的核心选择：不确定数量用 List<T>，键值查找用 Dictionary<K,V>，去重用 HashSet<T>，队列用 Queue<T>，栈用 Stack<T>。多线程场景用 ConcurrentDictionary<TKey, TValue> 和 ConcurrentQueue<T>。理解各数据结构的时间复杂度是写出高性能代码的基础。

选择决策树：