Python 内存管理深入

SunnyFan大约 16 分钟约 4932 字

Python 内存管理深入

简介

Python 内存管理是理解 Python 性能行为的关键。与 C/C++ 需要手动分配和释放内存不同，Python 使用自动内存管理机制，包括引用计数（Reference Counting）和分代垃圾回收（Generational GC）。虽然大部分时候不需要关心内存管理，但当遇到内存泄漏、性能异常、大内存占用等问题时，深入理解 Python 内存模型就变得至关重要。

Python 的内存管理架构分为三层：最底层是操作系统提供的虚拟内存；中间层是 Python 内存分配器（pymalloc），管理小块内存的分配；最上层是 Python 对象的引用计数和垃圾回收器。理解每一层的机制，才能在不同层面进行优化。

内存管理的核心挑战是引用计数无法处理循环引用。两个对象互相引用时，即使它们都不再被使用，引用计数也不会降为零。为此 Python 引入了分代垃圾回收器来检测和回收循环引用。但 GC 本身也有开销，在某些高性能场景下需要调优甚至关闭 GC。

特点

1.引用计数为主 — 实时释放不再使用的对象，延迟低
2.分代 GC 辅助 — 处理循环引用，三代分代策略
3.内存池机制 — pymalloc 减少系统调用，提升分配效率
4.可观测性强 — tracemalloc、objgraph 等工具精确定位内存问题
5.可控可调 — GC 阈值和策略可根据场景调优

引用计数

import sys

# ============================================
# 引用计数: Python 内存管理的核心机制
# 每个对象维护一个引用计数器 ob_refcnt
# 当引用计数降为 0 时，对象立即被释放
# ============================================

def reference_counting_demo():
    """引用计数基础示例"""

    # 1. 创建对象，引用计数 = 1
    a = [1, 2, 3]
    print(f"创建后: {sys.getrefcount(a) - 1}")  # getrefcount 自身也增加一次引用
    # 输出: 1

    # 2. 增加引用
    b = a       # 引用 +1 (通过赋值)
    c = a       # 引用 +1
    print(f"赋值后: {sys.getrefcount(a) - 1}")
    # 输出: 3

    # 3. 容器引用
    container = [a]  # 引用 +1
    print(f"放入容器后: {sys.getrefcount(a) - 1}")
    # 输出: 4

    # 4. 减少引用
    del b       # 引用 -1
    print(f"del b 后: {sys.getrefcount(a) - 1}")
    # 输出: 3

    # 5. 离开作用域
    container = None  # 引用 -1

    # 6. 特殊情况
    # 小整数和字符串被缓存（intern），引用计数较高
    x = 42
    print(f"小整数 42 的引用计数: {sys.getrefcount(x)}")
    # 输出可能很大（被 Python 内部多处引用）

    # 字符串驻留
    s1 = "hello_world_test"
    s2 = "hello_world_test"
    print(f"驻留字符串是否同一对象: {s1 is s2}")  # True

def ref_counting_pitfalls():
    """引用计数陷阱"""

    # 陷阱 1: 函数参数临时引用
    def check_ref(obj):
        # 参数 obj 增加了一次引用
        return sys.getrefcount(obj)

    a = [1, 2, 3]
    print(f"函数外: {sys.getrefcount(a)}")
    print(f"函数内: {check_ref(a)}")
    # 函数内的引用计数多 1（参数传递）

    # 陷阱 2: 异常中的引用
    try:
        raise ValueError("test")
    except ValueError as e:
        # e 会在 traceback 中保留引用
        import traceback
        print(f"异常对象引用计数: {sys.getrefcount(e)}")
        # 在 Python 3 中，except 块结束后 e 的引用会被清除

    # 陷阱 3: 类属性和实例属性的引用
    class MyClass:
        shared_data = {"key": "value"}  # 类属性，被所有实例共享

    obj1 = MyClass()
    obj2 = MyClass()
    # obj1.shared_data 和 obj2.shared_data 指向同一个对象
    print(f"共享属性: {obj1.shared_data is obj2.shared_data}")  # True

分代垃圾回收

import gc

# ============================================
# 分代垃圾回收 (Generational Garbage Collection)
#
# Python 使用三代 (Generation) 分代 GC:
# - Generation 0: 新创建的对象
# - Generation 1: 存活过一次 GC 的对象
# - Generation 2: 存活过两次 GC 的对象
#
# 回收策略:
# - 当 Generation 0 的对象数量超过阈值时，触发 Gen0 GC
# - Gen0 GC 后存活的对象晋升到 Gen1
# - Gen1 满时触发 Gen1 GC，存活对象晋升到 Gen2
# - Gen2 满时触发 Full GC
#
# 默认阈值: (700, 10, 5)
# - Gen0: 700 个新对象
# - Gen1: 每 10 次 Gen0 GC 触发一次 Gen1 GC
# - Gen2: 每 5 次 Gen1 GC 触发一次 Gen2 GC
# ============================================

def gc_basics():
    """GC 基础操作"""

    # 查看当前阈值
    print(f"GC 阈值: {gc.get_threshold()}")
    # 默认: (700, 10, 5)

    # 查看各代对象数量
    print(f"各代对象数: {gc.get_count()}")
    # (Gen0_count, Gen1_count, Gen2_count)

    # 手动触发 GC
    collected = gc.collect()
    print(f"回收了 {collected} 个对象")

    # 手动触发指定代
    collected = gc.collect(2)  # 只回收 Gen2

    # 调整 GC 阈值
    gc.set_threshold(1000, 15, 8)  # 更保守的策略

    # 禁用 GC（某些高性能场景）
    gc.disable()
    # ... 执行不需要 GC 的代码
    gc.enable()

    # 查看不可达对象（不回收）
    unreachable = gc.garbage
    print(f"不可达对象数: {len(unreachable)}")

def generational_gc_demo():
    """分代 GC 演示"""

    # 创建大量临时对象
    for _ in range(5):
        count_before = gc.get_count()
        print(f"GC 前: {count_before}")

        # 创建 500 个临时对象
        temp = [{"data": i} for i in range(500)]

        count_after = gc.get_count()
        print(f"创建 500 对象后: {count_after}")

        # 触发 GC
        collected = gc.collect()
        count_collected = gc.get_count()
        print(f"GC 回收 {collected} 个对象后: {count_collected}\n")

        del temp

def circular_reference():
    """循环引用 — 引用计数无法处理的场景"""

    # 循环引用示例
    class Node:
        def __init__(self, name):
            self.name = name
            self.parent = None
            self.children = []

        def add_child(self, child):
            child.parent = self  # 子 -> 父 引用
            self.children.append(child)  # 父 -> 子 引用

        def __repr__(self):
            return f"Node({self.name})"

    # 创建循环引用
    root = Node("root")
    child1 = Node("child1")
    child2 = Node("child2")
    root.add_child(child1)
    root.add_child(child2)

    # 此时 root -> children -> [child1, child2]
    # child1.parent -> root (循环引用)

    # 即使删除外部引用
    del root
    del child1
    del child2

    # 对象仍然存在（互相引用，引用计数不为 0）
    # 但 GC 会检测并回收它们
    collected = gc.collect()
    print(f"GC 回收了 {collected} 个循环引用对象")

`del` 与 weakref

import weakref

# ============================================
# __del__ 的陷阱和 weakref 的应用
# ============================================

def del_method_pitfall():
    """__del__ 方法的陷阱"""

    class Resource:
        def __init__(self, name):
            self.name = name
            print(f"创建资源: {self.name}")

        def __del__(self):
            print(f"释放资源: {self.name}")

    # 正常情况: 引用计数归零时自动调用 __del__
    r = Resource("test")
    r = None  # 输出: 释放资源: test

    # 陷阱: 循环引用 + __del__
    class BadNode:
        def __init__(self, name):
            self.name = name
            self.ref = None

        def __del__(self):
            print(f"释放节点: {self.name}")
            # 如果在 __del__ 中访问其他对象，
            # 这些对象可能已经被回收，导致不可预测的行为

    a = BadNode("A")
    b = BadNode("B")
    a.ref = b
    b.ref = a

    del a
    del b
    # Python 3.4+ 改进了这个问题，但 __del__ 仍不推荐用于循环引用场景
    # 推荐使用上下文管理器 (with) 替代 __del__


def weakref_demo():
    """弱引用 — 不增加引用计数的引用"""

    # 1. 基础弱引用
    data = {"key": "value"}
    weak_data = weakref.ref(data)

    print(f"弱引用对象: {weak_data()}")  # 通过调用获取对象
    print(f"是否存活: {weak_data() is not None}")

    data = None  # 原对象被释放
    print(f"释放后: {weak_data()}")  # 输出: None

    # 2. WeakKeyDictionary — 缓存场景
    cache = weakref.WeakKeyDictionary()

    class ExpensiveObject:
        def __init__(self, name):
            self.name = name
            self.computed_data = f"expensive_{name}"

    obj = ExpensiveObject("obj1")
    cache[obj] = obj.computed_data

    print(f"缓存值: {cache.get(obj)}")
    obj = None  # 对象释放后，缓存自动清理
    print(f"释放后缓存: {len(cache)}")  # 0

    # 3. WeakValueDictionary — 实例注册
    class ObjectRegistry:
        """对象注册表 — 使用弱引用避免内存泄漏"""
        def __init__(self):
            self._objects = weakref.WeakValueDictionary()

        def register(self, name: str, obj):
            self._objects[name] = obj

        def get(self, name: str):
            return self._objects.get(name)

        def list_all(self):
            return list(self._objects.items())

    registry = ObjectRegistry()

    class Service:
        def __init__(self, name):
            self.name = name

    svc1 = Service("auth")
    svc2 = Service("order")
    registry.register("auth", svc1)
    registry.register("order", svc2)

    print(f"注册的服务: {registry.list_all()}")

    svc1 = None  # auth 服务被释放，注册表自动清理
    print(f"释放后: {registry.list_all()}")  # 只有 order

    # 4. 弱引用回调
    def on_finalize(ref):
        print(f"对象被回收了!")

    data = [1, 2, 3]
    ref = weakref.ref(data, on_finalize)
    data = None  # 输出: 对象被回收了!

内存泄漏检测

# ============================================
# 内存泄漏检测工具
# ============================================

import tracemalloc
import linecache
import gc

def tracemalloc_demo():
    """tracemalloc — Python 内置内存追踪"""

    # 启动内存追踪
    tracemalloc.start()

    # 执行一些操作
    data = [i * 2 for i in range(100000)]
    more_data = {i: f"value_{i}" for i in range(50000)}

    # 获取当前内存快照
    snapshot = tracemalloc.take_snapshot()

    # 显示 TOP 10 内存分配
    top_stats = snapshot.statistics('lineno')
    print("TOP 10 内存分配:")
    for stat in top_stats[:10]:
        print(stat)

    # 对比两个快照
    snapshot1 = tracemalloc.take_snapshot()

    # 创建更多对象
    leaky_data = []
    for i in range(10000):
        leaky_data.append({"id": i, "data": list(range(100))})

    snapshot2 = tracemalloc.take_snapshot()

    # 比较差异
    diff = snapshot2.compare_to(snapshot1, 'lineno')
    print("\n内存增长 TOP 10:")
    for stat in diff[:10]:
        print(stat)

    tracemalloc.stop()


def detect_leaks_with_objgraph():
    """objgraph — 可视化对象引用关系"""
    # pip install objgraph

    import objgraph

    # 创建一些对象
    class Node:
        def __init__(self, name):
            self.name = name
            self.next = None

    # 创建循环链表
    nodes = [Node(f"n{i}") for i in range(10)]
    for i in range(len(nodes) - 1):
        nodes[i].next = nodes[i + 1]
    nodes[-1].next = nodes[0]  # 循环

    # 查看对象数量
    print(f"Node 对象数: {objgraph.count('Node')}")

    # 查看引用链
    # objgraph.show_chain(
    #     objgraph.find_backref_chain(
    #         objgraph.by_type('Node')[0],
    #         objgraph.is_proper_module
    #     ),
    #     filename='refs.png'
    # )

    # 常见泄漏模式检测
    # 查看增长最快的类型
    objgraph.show_most_common_types(limit=20)

    del nodes


def common_leak_patterns():
    """常见内存泄漏模式"""

    # 模式 1: 全局列表/字典不断增长
    global_cache = []

    def leaky_function():
        global global_cache
        data = list(range(1000))
        global_cache.append(data)  # 永远不会被释放

    # 修复: 使用 LRU 缓存或限制大小
    from functools import lru_cache

    @lru_cache(maxsize=100)
    def cached_computation(n):
        return sum(range(n))

    # 模式 2: 闭包引用大对象
    def create_closure():
        big_data = list(range(1000000))  # 4MB+

        def process():
            # 闭包隐式引用了 big_data，即使不使用
            return "done"

        return process

    # process() 会持有 big_data 的引用

    # 修复: 不需要的数据在闭包外删除
    def create_closure_fixed():
        big_data = list(range(1000000))
        result = len(big_data)  # 只保留需要的信息
        del big_data  # 释放大对象

        def process():
            return result

        return process

    # 模式 3: __del__ 导致 GC 无法回收
    class LeakyWithDel:
        def __init__(self):
            self.ref = None

        def __del__(self):
            pass  # 即使空的 __del__ 也可能影响 GC

    # 修复: 使用上下文管理器
    class ResourceFixed:
        def __enter__(self):
            return self
        def __exit__(self, *args):
            self.cleanup()
        def cleanup(self):
            pass

`slots` 优化

import sys

# ============================================
# __slots__: 减少 Python 对象的内存占用
#
# 普通 Python 对象使用 __dict__ 存储属性
# __slots__ 替换 __dict__，使用固定大小的数组
# 优点: 减少内存、加快属性访问
# 缺点: 不能动态添加属性
# ============================================

class RegularClass:
    """普通类 — 使用 __dict__"""
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class SlotsClass:
    """使用 __slots__ 的类"""
    __slots__ = ['x', 'y', 'z']

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

def compare_slots():
    """对比内存占用"""

    regular = RegularClass(1, 2, 3)
    slotted = SlotsClass(1, 2, 3)

    print(f"普通类对象大小: {sys.getsizeof(regular)} bytes")
    print(f"普通类 __dict__: {sys.getsizeof(regular.__dict__)} bytes")
    print(f"__slots__ 类大小: {sys.getsizeof(slotted)} bytes")

    # 大量对象对比
    import tracemalloc

    # 创建 100 万个普通对象
    tracemalloc.start()
    regular_objects = [RegularClass(i, i+1, i+2) for i in range(100000)]
    _, regular_peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    del regular_objects

    # 创建 100 万个 slots 对象
    tracemalloc.start()
    slot_objects = [SlotsClass(i, i+1, i+2) for i in range(100000)]
    _, slots_peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()

    print(f"\n10 万对象:")
    print(f"普通类内存: {regular_peak / 1024 / 1024:.2f} MB")
    print(f"__slots__ 内存: {slots_peak / 1024 / 1024:.2f} MB")
    print(f"节省: {(regular_peak - slots_peak) / regular_peak:.1%}")


# dataclass with slots (Python 3.10+)
from dataclasses import dataclass

@dataclass(slots=True)
class Point:
    x: float
    y: float
    z: float

@dataclass
class PointDict:
    x: float
    y: float
    z: float

大数据处理

# ============================================
# 大数据处理的内存优化
# ============================================

import mmap
import struct
from pathlib import Path

def process_large_file(file_path: str):
    """内存优化的文件处理"""

    # 方式 1: 逐行读取（不一次性加载）
    def read_line_by_line():
        with open(file_path, 'r', encoding='utf-8') as f:
            for line in f:
                yield line.strip()

    # 方式 2: 分块读取
    def read_in_chunks(chunk_size: int = 8192):
        with open(file_path, 'rb') as f:
            while True:
                chunk = f.read(chunk_size)
                if not chunk:
                    break
                yield chunk

    # 方式 3: 内存映射文件 (mmap) — 大文件随机访问
    def memory_mapped_access():
        file_size = Path(file_path).stat().st_size

        with open(file_path, 'rb') as f:
            # 将文件映射到内存，不实际加载
            mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

            # 可以像操作内存一样操作文件
            # 只加载实际访问的部分
            first_line = mm.readline()
            print(f"第一行: {first_line[:100]}")

            # 随机访问
            mm.seek(0)
            content = mm.read(1024)  # 只读 1KB

            mm.close()

    return read_line_by_line


def generator_pipeline():
    """生成器管道 — 流式处理大数据"""

    def read_data(filename):
        """生成器: 逐行读取"""
        with open(filename, 'r') as f:
            for line in f:
                yield line.strip()

    def parse(records):
        """生成器: 解析"""
        for record in records:
            parts = record.split(',')
            if len(parts) >= 3:
                yield {
                    'id': int(parts[0]),
                    'name': parts[1],
                    'value': float(parts[2]),
                }

    def filter_invalid(records):
        """生成器: 过滤"""
        for record in records:
            if record['value'] > 0:
                yield record

    def transform(records):
        """生成器: 转换"""
        for record in records:
            record['value_sq'] = record['value'] ** 2
            yield record

    def aggregate(records):
        """聚合: 最终消费"""
        total = 0
        count = 0
        for record in records:
            total += record['value_sq']
            count += 1
        return {'total': total, 'count': count, 'avg': total / count}

    # 串联管道 — 内存占用极低
    pipeline = aggregate(
        transform(
            filter_invalid(
                parse(
                    read_data('large_file.csv')
                )
            )
        )
    )

    return pipeline


class MemoryEfficientAggregation:
    """内存高效的聚合计算"""

    @staticmethod
    def external_sort(input_file: str, output_file: str, chunk_size: int = 100000):
        """外部排序 — 处理超过内存的数据排序"""
        import tempfile
        import heapq

        # 第 1 步: 分块排序
        chunks = []
        with open(input_file, 'r') as f:
            while True:
                lines = [f.readline() for _ in range(chunk_size)]
                lines = [l for l in lines if l]
                if not lines:
                    break
                lines.sort()
                temp = tempfile.NamedTemporaryFile(
                    mode='w+', delete=False, suffix='.tmp'
                )
                temp.writelines(lines)
                temp.flush()
                temp.seek(0)
                chunks.append(temp)

        # 第 2 步: 多路归并
        with open(output_file, 'w') as out:
            for line in heapq.merge(*[c.name for c in chunks], key=lambda x: x):
                out.write(line)

        # 清理临时文件
        for temp in chunks:
            temp.close()
            Path(temp.name).unlink()

对象复用模式

# ============================================
# 对象复用 — 减少内存分配和 GC 压力
# ============================================

class ObjectPool:
    """对象池 — 复用昂贵创建的对象"""

    def __init__(self, factory, reset_func, max_size: int = 100):
        self._factory = factory
        self._reset_func = reset_func
        self._pool = []
        self._max_size = max_size

    def acquire(self):
        if self._pool:
            obj = self._pool.pop()
            self._reset_func(obj)
            return obj
        return self._factory()

    def release(self, obj):
        if len(self._pool) < self._max_size:
            self._reset_func(obj)
            self._pool.append(obj)

# 使用示例: HTTP 连接复用
class ConnectionPool:
    """连接池"""

    def __init__(self, max_connections: int = 20):
        self._pool = []
        self._in_use = set()
        self._max = max_connections

    def get_connection(self):
        if self._pool:
            conn = self._pool.pop()
        elif len(self._in_use) < self._max:
            conn = self._create_connection()
        else:
            raise RuntimeError("连接池已满")
        self._in_use.add(conn)
        return conn

    def return_connection(self, conn):
        self._in_use.discard(conn)
        self._pool.append(conn)

    def _create_connection(self):
        return {"id": id(self), "created": True}


# 不可变对象复用 (Flyweight 模式)
class FlyweightFactory:
    """享元工厂 — 共享相同的不可变状态"""

    def __init__(self):
        self._flyweights = {}

    def get_flyweight(self, key: str):
        if key not in self._flyweights:
            self._flyweights[key] = self._create(key)
        return self._flyweights[key]

    def _create(self, key: str):
        return {"shared_state": key, "metadata": f"data_for_{key}"}

    @property
    def count(self):
        return len(self._flyweights)


# 字符串和整数优化
def string_interning_demo():
    """字符串驻留和整数缓存"""

    # Python 缓存 -5 到 256 的整数
    a = 256
    b = 256
    print(f"256 是同一对象: {a is b}")  # True

    c = 257
    d = 257
    print(f"257 是同一对象: {c is d}")  # False (在交互模式下)

    # 字符串驻留
    s1 = "hello"
    s2 = "hello"
    print(f"'hello' 是同一对象: {s1 is s2}")  # True

    # 包含空格的字符串不会自动驻留
    s3 = "hello world"
    s4 = "hello world"
    print(f"'hello world' 是同一对象: {s3 is s4}")  # 视情况而定

    # 手动驻留
    import sys
    s5 = sys.intern("hello world " * 100)
    s6 = sys.intern("hello world " * 100)
    print(f"手动驻留: {s5 is s6}")  # True

GC 调优

import gc
import time

def gc_tuning():
    """GC 调优指南"""

    # 场景 1: Web 服务器 — 减少 GC 暂停
    # 增大 Gen0 阈值，减少 GC 频率
    gc.set_threshold(5000, 15, 15)

    # 场景 2: 批处理 — 禁用 GC
    gc.disable()
    # ... 大量创建和销毁临时对象
    gc.enable()
    gc.collect()  # 手动回收

    # 场景 3: 长期运行服务 — 监控 GC
    def monitor_gc():
        gc.callbacks.append(gc_callback)

    def gc_callback(phase, info):
        if phase == "start":
            print(f"GC 开始: 代={info['generation']}")
        elif phase == "stop":
            print(f"GC 结束: 回收={info.get('collected', 0)}, "
                  f"不可达={info.get('uncollectable', 0)}")

    # 场景 4: 减少循环引用 — 使用 weakref 打破循环
    import weakref

    class Parent:
        def __init__(self):
            self.children = []

    class Child:
        def __init__(self, parent):
            self._parent_ref = weakref.ref(parent)  # 弱引用父节点

        @property
        def parent(self):
            return self._parent_ref()

    parent = Parent()
    child = Child(parent)
    parent.children.append(child)
    # 没有循环引用，引用计数可以正常回收

def memory_profiling_checklist():
    """内存分析清单"""
    return {
        "step1": "使用 tracemalloc 定位内存增长最快的代码行",
        "step2": "使用 objgraph 查看对象引用关系图",
        "step3": "检查全局变量是否有无限增长",
        "step4": "检查闭包是否意外引用大对象",
        "step5": "检查缓存是否有淘汰策略 (LRU maxsize)",
        "step6": "使用 __slots__ 减少小对象的内存",
        "step7": "大数据使用生成器而非列表",
        "step8": "循环引用使用 weakref 打破",
    }

优点

1.自动管理 — 开发者无需手动分配和释放内存
2.工具丰富 — tracemalloc、objgraph、memory_profiler 精确定位
3.可控可调 — GC 阈值和策略可根据场景调优
4.__slots__ 优化 — 可减少 40-60% 的对象内存
5.弱引用 — 优雅解决缓存和观察者模式的内存泄漏

缺点

1.GIL + GC 暂停 — GC 运行时会暂停所有线程
2.内存开销大 — Python 对象比 C 结构体占用更多内存
3.循环引用检测开销 — 分代 GC 需要遍历对象图
4.不可预测的回收 — GC 时机不确定，可能导致延迟抖动
5.调优复杂 — 不同场景需要不同的 GC 策略

性能注意事项

对象创建开销：频繁创建销毁对象会加重 GC 压力，考虑对象池
__slots__ 适用场景：大量同类小对象时效果显著，少量对象意义不大
GC 暂停：Full GC 可能暂停数十毫秒，实时系统需关注
内存碎片：频繁分配释放可能导致内存碎片，实际可用内存小于统计值
大列表：列表 append 操作均摊 O(1)，但扩容时会复制整个数组
字典扩容：字典超过 2/3 负载因子时会扩容，大量插入时预分配大小

总结

Python 内存管理以引用计数为核心，分代 GC 处理循环引用。理解这一机制有助于编写内存高效的代码：使用 __slots__ 减少对象内存、使用生成器处理大数据、使用 weakref 打破循环引用、使用 tracemalloc 定位内存泄漏。关键原则是减少不必要的对象创建、及时释放不再使用的引用、监控内存增长趋势。

关键知识点

引用计数 — 每个对象的 ob_refcnt，归零立即释放
分代 GC — 三代策略 (700, 10, 5)，处理循环引用
__del__ 陷阱 — 不推荐在循环引用场景使用
weakref — 弱引用不增加引用计数，适合缓存和观察者
tracemalloc — 内置内存追踪，精确到代码行
objgraph — 可视化对象引用关系
__slots__ — 减少 40-60% 对象内存
生成器管道 — 流式处理，内存占用 O(1)
mmap — 内存映射文件，大文件随机访问

常见误区

Python 自动管理就不需要关心内存：自动管理不代表没有泄漏
对象 del 后立即释放：del 只减少引用计数，不保证立即释放
GC 能解决所有内存问题：GC 只处理循环引用，引用计数泄漏它管不了
__slots__ 总是更快：只有大量同类对象时才有意义
关闭 GC 提升性能：可能导致循环引用永远不释放
getrefcount 准确：getrefcount 自身会增加一次引用

进阶路线

入门：理解引用计数和 del 的行为
进阶：tracemalloc 定位内存泄漏、__slots__ 优化
高级：weakref 打破循环引用、GC 调优、mmap 大文件
专家：pymalloc 内部机制、C 扩展内存管理、自定义分配器

适用场景

长期运行的 Web 服务（内存泄漏检测）
大数据处理（生成器、mmap）
高性能计算（对象池、__slots__）
缓存系统（weakref、LRU）
游戏和实时系统（GC 调优）

落地建议

第一步：在 CI 中集成 memory_profiler，检测内存异常增长
第二步：对高频创建的类添加 __slots__
第三步：全局缓存改用 WeakValueDictionary 或 LRU
第四步：大数据处理改用生成器管道
第五步：建立内存监控基线，异常告警
持续：定期用 tracemalloc 检查内存增长

排错清单

内存是否持续增长？使用 tracemalloc 快照对比
是否有循环引用？使用 objgraph.show_refs 可视化
全局缓存是否有淘汰策略？检查 LRU maxsize
闭包是否意外引用大对象？检查内部函数引用的变量
GC 频率是否过高？检查 gc.get_count() 和 gc.get_threshold()
__slots__ 是否正确使用？验证没有动态属性需求
是否使用生成器处理大数据？检查是否有一次性加载全部数据

复盘问题

当前服务的内存使用趋势如何？是否有缓慢增长？
上次内存问题的根因是什么？是否彻底修复？
对象创建/销毁的速率是多少？GC 压力如何？
最大的内存消耗对象是什么？是否可以优化？
GC 调优后的暂停时间是否改善？
缓存的命中率如何？是否有内存浪费？