NLP/CV 面试题

SunnyFan大约 16 分钟约 4897 字

NLP/CV 面试题

简介

自然语言处理（NLP）和计算机视觉（CV）是人工智能的两大核心应用领域。本篇涵盖文本处理、词向量、序列模型、CNN 架构、目标检测（YOLO）等面试话题，帮助 AI 开发者系统性地准备 NLP 和 CV 方向的技术面试。

特点

覆盖 NLP 和 CV 的核心技术栈
包含分词、词向量、CNN、目标检测等关键知识点
提供 PyTorch 代码示例
结合最新的技术趋势（Transformer、YOLOv8）

面试题目

1. 中文分词有哪些方法？

答：中文没有天然的空格分隔，分词是 NLP 的基础任务。主要方法包括基于词典、基于统计和基于深度学习三种。

# 方法一：基于词典的分词（jieba）
import jieba
import jieba.posseg as pseg

# 基本分词
text = "自然语言处理是人工智能的重要方向"
words = jieba.lcut(text)
print("精确模式:", "/".join(words))
# 自然/语言/处理/是/人工智能/的/重要/方向

# 搜索引擎模式（更细粒度）
words_search = jieba.lcut_for_search(text)
print("搜索模式:", "/".join(words_search))

# 带词性标注
words_pos = pseg.lcut(text)
for word, flag in words_pos:
    print(f"  {word} ({flag})")

# 添加自定义词典
jieba.add_word("自然语言处理", freq=1000, tag="n")
jieba.load_userdict("custom_dict.txt")

# 方法二：基于深度学习的分词（BERT + CRF）
# 使用 transformers 库
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-chinese")
tokens = tokenizer.tokenize(text)
print("BERT 分词:", tokens)
# ['自', '然', '语', '言', '处', '理', '是', '人', '工', '智', '能', '的', '重', '要', '方', '向']

2. 词向量有哪些表示方法？

答：词向量是将文本转化为数值向量的方法，从传统的独热编码演进到现代的上下文相关表示。

方法	原理	维度	特点
One-Hot	词表大小的稀疏向量	词表大小	无法表达语义关系
Word2Vec	浅层神经网络	通常 300	捕捉语义相似性
GloVe	全局共现矩阵分解	通常 300	结合全局统计信息
FastText	子词 n-gram	通常 300	处理 OOV 词
BERT	Transformer 编码器	768/1024	上下文相关表示

import torch
import torch.nn as nn
import numpy as np

# Word2Vec 的 Skip-gram 实现
class SkipGramModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim=128):
        super().__init__()
        self.in_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.out_embeddings = nn.Embedding(vocab_size, embedding_dim)

    def forward(self, center, context, neg_context):
        # 正样本
        center_emb = self.in_embeddings(center)        # (batch, emb_dim)
        context_emb = self.out_embeddings(context)     # (batch, emb_dim)
        pos_score = torch.sum(center_emb * context_emb, dim=1)
        pos_loss = -torch.log(torch.sigmoid(pos_score) + 1e-10)

        # 负样本
        neg_emb = self.out_embeddings(neg_context)     # (batch, k, emb_dim)
        neg_score = torch.bmm(neg_emb, center_emb.unsqueeze(2)).squeeze()
        neg_loss = -torch.log(torch.sigmoid(-neg_score) + 1e-10).sum(dim=1)

        return (pos_loss + neg_loss).mean()

# 使用预训练词向量
def use_pretrained_embeddings():
    # 使用 Gensim 加载 Word2Vec
    # from gensim.models import Word2Vec, KeyedVectors
    # model = KeyedVectors.load_word2vec_format("word2vec.bin", binary=True)
    # vector = model["电脑"]  # 获取词向量
    # similar = model.most_similar("电脑", topn=5)

    # 使用 Hugging Face Transformers
    from transformers import BertModel, BertTokenizer
    import torch

    tokenizer = BertTokenizer.from_pretrained("bert-base-chinese")
    model = BertModel.from_pretrained("bert-base-chinese")

    text = "自然语言处理"
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)

    # 获取上下文相关的词向量
    last_hidden_states = outputs.last_hidden_state  # (1, seq_len, 768)
    print(f"BERT 输出形状: {last_hidden_states.shape}")

    # 获取句子级表示（使用 [CLS] token）
    sentence_embedding = last_hidden_states[:, 0, :]
    print(f"句子向量维度: {sentence_embedding.shape}")

3. 什么是注意力机制？Transformer 的核心原理是什么？

答：注意力机制允许模型动态地关注输入序列中最重要的部分。Transformer 完全基于注意力机制，摒弃了 RNN 的顺序计算。

import torch
import torch.nn as nn
import math

# 自注意力机制实现
class SelfAttention(nn.Module):
    def __init__(self, embed_dim, num_heads):
        super().__init__()
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.head_dim = embed_dim // num_heads

        self.q_proj = nn.Linear(embed_dim, embed_dim)
        self.k_proj = nn.Linear(embed_dim, embed_dim)
        self.v_proj = nn.Linear(embed_dim, embed_dim)
        self.out_proj = nn.Linear(embed_dim, embed_dim)

    def forward(self, x, mask=None):
        batch_size, seq_len, _ = x.shape

        # 线性投影
        Q = self.q_proj(x).view(batch_size, seq_len, self.num_heads, self.head_dim).transpose(1, 2)
        K = self.k_proj(x).view(batch_size, seq_len, self.num_heads, self.head_dim).transpose(1, 2)
        V = self.v_proj(x).view(batch_size, seq_len, self.num_heads, self.head_dim).transpose(1, 2)

        # 计算注意力分数
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.head_dim)

        if mask is not None:
            scores = scores.masked_fill(mask == 0, float('-inf'))

        # Softmax 归一化
        attention_weights = torch.softmax(scores, dim=-1)

        # 加权求和
        context = torch.matmul(attention_weights, V)
        context = context.transpose(1, 2).contiguous().view(batch_size, seq_len, self.embed_dim)

        return self.out_proj(context)

# Transformer 编码器块
class TransformerBlock(nn.Module):
    def __init__(self, embed_dim, num_heads, ff_dim, dropout=0.1):
        super().__init__()
        self.attention = SelfAttention(embed_dim, num_heads)
        self.norm1 = nn.LayerNorm(embed_dim)
        self.norm2 = nn.LayerNorm(embed_dim)
        self.ff = nn.Sequential(
            nn.Linear(embed_dim, ff_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(ff_dim, embed_dim),
            nn.Dropout(dropout),
        )

    def forward(self, x, mask=None):
        # 多头自注意力 + 残差连接 + LayerNorm
        x = self.norm1(x + self.attention(x, mask))
        # 前馈网络 + 残差连接 + LayerNorm
        x = self.norm2(x + self.ff(x))
        return x

# 使用示例
def transformer_example():
    batch_size, seq_len, embed_dim = 2, 10, 256
    num_heads, ff_dim = 8, 1024

    x = torch.randn(batch_size, seq_len, embed_dim)
    block = TransformerBlock(embed_dim, num_heads, ff_dim)
    output = block(x)
    print(f"输入形状: {x.shape}")
    print(f"输出形状: {output.shape}")

4. CNN 在图像分类中的应用原理是什么？

答：卷积神经网络通过卷积层提取图像特征，池化层降低空间维度，全连接层进行分类。

import torch
import torch.nn as nn

# 经典 CNN 架构 - 简化版 ResNet
class BasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, 1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        # 残差连接（shortcut）
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        residual = self.shortcut(x)
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual  # 残差连接
        out = torch.relu(out)
        return out

class SimpleResNet(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            BasicBlock(64, 64),
            BasicBlock(64, 128, stride=2),
            BasicBlock(128, 256, stride=2),
            BasicBlock(256, 512, stride=2),
            nn.AdaptiveAvgPool2d((1, 1)),
        )
        self.classifier = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

# 模型概要
model = SimpleResNet(num_classes=10)
dummy_input = torch.randn(1, 3, 224, 224)
output = model(dummy_input)
print(f"输入: {dummy_input.shape} -> 输出: {output.shape}")

# 数据增强
from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

val_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

5. YOLO 目标检测的原理是什么？

答： YOLO（You Only Look Once）将目标检测视为回归问题，一次前向传播同时预测边界框和类别概率。

# YOLO 核心概念示意
import torch
import torch.nn as nn

class YOLODetectionHead(nn.Module):
    """YOLO 检测头简化实现"""
    def __init__(self, num_classes=80, num_anchors=3):
        super().__init__()
        self.num_classes = num_classes
        self.num_anchors = num_anchors
        # 每个锚点预测: (x, y, w, h, confidence, class_scores...)
        self.num_outputs = 5 + num_classes  # 85 for COCO

    def forward(self, feature_map):
        """
        feature_map: (batch, channels, grid_h, grid_w)
        输出: (batch, grid_h, grid_w, num_anchors, 5+num_classes)
        """
        batch, _, grid_h, grid_w = feature_map.shape

        # 重塑为 (batch, grid_h, grid_w, num_anchors, num_outputs)
        predictions = feature_map.view(
            batch, self.num_anchors, self.num_outputs, grid_h, grid_w
        ).permute(0, 3, 4, 1, 2)

        # 解析预测
        xy = torch.sigmoid(predictions[..., 0:2])     # 中心坐标
        wh = predictions[..., 2:4]                      # 宽高
        confidence = torch.sigmoid(predictions[..., 4:5])  # 置信度
        class_scores = torch.sigmoid(predictions[..., 5:]) # 类别概率

        return xy, wh, confidence, class_scores

# NMS（非极大值抑制）
def nms(boxes, scores, iou_threshold=0.5):
    """非极大值抑制去除重复检测框"""
    if len(boxes) == 0:
        return []

    # 按置信度排序
    order = scores.argsort()[::-1]
    keep = []

    while len(order) > 0:
        i = order[0]
        keep.append(i)

        if len(order) == 1:
            break

        # 计算 IoU
        remaining = order[1:]
        ious = compute_iou(boxes[i], boxes[remaining])

        # 保留 IoU 小于阈值的框
        mask = ious < iou_threshold
        order = remaining[mask]

    return keep

def compute_iou(box, boxes):
    """计算交并比（Intersection over Union）"""
    x1 = np.maximum(box[0], boxes[:, 0])
    y1 = np.maximum(box[1], boxes[:, 1])
    x2 = np.minimum(box[2], boxes[:, 2])
    y2 = np.minimum(box[3], boxes[:, 3])

    intersection = np.maximum(0, x2 - x1) * np.maximum(0, y2 - y1)
    area_box = (box[2] - box[0]) * (box[3] - box[1])
    area_boxes = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    union = area_box + area_boxes - intersection

    return intersection / (union + 1e-10)

# YOLOv8 使用示例（ultralytics 库）
# pip install ultralytics
def yolov8_example():
    """
    from ultralytics import YOLO
    model = YOLO('yolov8n.pt')  # 加载预训练模型
    results = model('image.jpg')
    for result in results:
        boxes = result.boxes
        for box in boxes:
            x1, y1, x2, y2 = box.xyxy[0].tolist()
            conf = box.conf[0].item()
            cls = box.cls[0].item()
            print(f"检测: 类别={cls}, 置信度={conf:.2f}, "
                  f"位置=[{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")
    """
    pass

6. RNN/LSTM/GRU 有什么区别？各自适用场景是什么？

答： RNN 通过隐藏状态传递序列信息，但存在梯度消失问题。LSTM 引入门控机制解决长距离依赖。GRU 是 LSTM 的简化版本，参数更少。

import torch
import torch.nn as nn

# LSTM 单元详解
class LSTMCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.hidden_size = hidden_size
        # 所有门的线性变换合并为一个大矩阵运算
        self.gates = nn.Linear(input_size + hidden_size, 4 * hidden_size)

    def forward(self, x, state):
        h, c = state
        combined = torch.cat([x, h], dim=1)
        gates = self.gates(combined)

        # 分割为四个门
        i = torch.sigmoid(gates[:, :self.hidden_size])           # 输入门
        f = torch.sigmoid(gates[:, self.hidden_size:2*self.hidden_size])  # 遗忘门
        g = torch.tanh(gates[:, 2*self.hidden_size:3*self.hidden_size])   # 候选值
        o = torch.sigmoid(gates[:, 3*self.hidden_size:])         # 输出门

        c_new = f * c + i * g      # 更新细胞状态
        h_new = o * torch.tanh(c_new)  # 输出隐藏状态
        return h_new, c_new

# GRU 对比
class GRUCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.gates = nn.Linear(input_size + hidden_size, 3 * hidden_size)

    def forward(self, x, h):
        combined = torch.cat([x, h], dim=1)
        gates = self.gates(combined)

        r = torch.sigmoid(gates[:, :self.hidden_size])           # 重置门
        z = torch.sigmoid(gates[:, self.hidden_size:2*self.hidden_size])  # 更新门
        n = torch.tanh(gates[:, 2*self.hidden_size:])             # 候选值

        h_new = (1 - z) * n + z * h  # 比 LSTM 少一个细胞状态
        return h_new

# 选择建议：
# LSTM — 需要长距离依赖（机器翻译、文本生成）
# GRU  — 序列较短或计算资源有限（短文本分类）
# RNN  — 几乎不推荐，仅在资源极度受限时考虑

7. BERT 的预训练任务和微调策略有哪些？

答： BERT 使用 MLM（掩码语言模型）和 NSP（下一句预测）两个预训练任务。微调策略包括全量微调、层冻结和 Adapter 等。

from transformers import BertForSequenceClassification, BertTokenizer
import torch

# 方案一：全量微调
def full_finetune():
    model = BertForSequenceClassification.from_pretrained(
        "bert-base-chinese", num_labels=2
    )
    # 所有参数都可训练
    return model

# 方案二：冻结底层，只微调高层
def freeze_layers():
    model = BertForSequenceClassification.from_pretrained(
        "bert-base-chinese", num_labels=2
    )
    # 冻结 embedding 和前 8 层
    for param in model.bert.embeddings.parameters():
        param.requires_grad = False
    for layer in model.bert.encoder.layer[:8]:
        for param in layer.parameters():
            param.requires_grad = False
    return model

# 方案三：差分学习率
def differential_lr(model, lr_bert=2e-5, lr_classifier=1e-3):
    optimizer = torch.optim.AdamW([
        {"params": model.bert.parameters(), "lr": lr_bert},
        {"params": model.classifier.parameters(), "lr": lr_classifier},
    ], weight_decay=0.01)
    return optimizer

# 训练循环示例
def train_bert(model, train_loader, epochs=3):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    optimizer = differential_lr(model)
    scheduler = torch.optim.lr_scheduler.LinearLR(
        optimizer, start_factor=0.0, total_iters=100
    )

    for epoch in range(epochs):
        model.train()
        total_loss = 0
        for batch in train_loader:
            optimizer.zero_grad()
            outputs = model(
                input_ids=batch["input_ids"].to(device),
                attention_mask=batch["attention_mask"].to(device),
                labels=batch["labels"].to(device),
            )
            outputs.loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)  # 梯度裁剪
            optimizer.step()
            scheduler.step()
            total_loss += outputs.loss.item()
        print(f"Epoch {epoch+1}: loss={total_loss/len(train_loader):.4f}")

8. 目标检测中 One-Stage 和 Two-Stage 的区别？

答： One-Stage（YOLO、SSD）直接从图像预测边界框和类别，速度快；Two-Stage（Faster R-CNN）先产生候选区域再分类，精度高。

对比总结：
| 特性         | One-Stage (YOLO)    | Two-Stage (Faster R-CNN) |
|-------------|---------------------|--------------------------|
| 速度         | 快（30+ FPS）       | 慢（5-7 FPS）            |
| 精度         | 中等                | 高                       |
| 小目标检测    | 较弱                | 较强                     |
| 候选区域     | 无（密集预测）       | RPN 生成                 |
| 适用场景     | 实时检测、嵌入式     | 高精度需求               |

选型建议：
- 自动驾驶、实时监控 -> YOLOv8/YOLOv10
- 医疗影像、精密检测 -> Faster R-CNN / Cascade R-CNN
- 移动端/边缘设备     -> YOLO-Nano / MobileNet-SSD

9-15. 更多 NLP/CV 面试题简答

6. RNN 和 LSTM 的区别是什么？ RNN 存在梯度消失问题，难以捕捉长距离依赖。LSTM 通过门控机制（遗忘门、输入门、输出门）控制信息流动，有效解决了长距离依赖问题。

7. 什么是 BERT？它的预训练任务是什么？ BERT 是双向 Transformer 编码器模型。预训练任务包括 MLM（掩码语言模型，随机遮蔽 15% 的 token 进行预测）和 NSP（下一句预测，判断两句话是否连续）。

8. 什么是 GPT？与 BERT 有什么区别？ GPT 使用 Transformer 解码器，是单向自回归模型（从左到右生成）。BERT 使用编码器，是双向模型。GPT 适合生成任务，BERT 适合理解任务。

9. 什么是迁移学习？ 迁移学习将在大规模数据上预训练的模型知识迁移到特定任务。在 CV 中常用的有 ImageNet 预训练 + 微调；在 NLP 中有 BERT/GPT 预训练 + 下游任务微调。

10. 图像分割有哪些类型？ 语义分割（每个像素分类，如 FCN、U-Net）、实例分割（区分同类别不同实例，如 Mask R-CNN）、全景分割（语义分割 + 实例分割的结合）。

11. 什么是数据增强？ 数据增强通过变换原始数据生成新样本，防止过拟合。CV 中有翻转、旋转、裁剪、颜色变换、Mixup、CutMix；NLP 中有同义词替换、回译、随机插入删除。

12. 什么是损失函数？常用的有哪些？ 损失函数衡量模型预测与真实值的差距。分类：交叉熵损失（Cross-Entropy）；回归：MSE、MAE、Huber Loss；目标检测：Focal Loss、IoU Loss；生成模型：对抗损失、KL 散度。

13. 什么是 Batch Normalization？ BN 在每个 mini-batch 上对特征进行标准化（减均值除标准差），加速训练和收敛，具有一定的正则化效果。在 Transformer 中通常使用 Layer Normalization。

14. 如何部署深度学习模型？ 方法包括：ONNX Runtime（跨平台推理）、TensorRT（NVIDIA GPU 优化）、TorchServe（PyTorch 原生）、FastAPI + Docker（REST API 封装）、MLflow（模型管理和部署）。

15. 大语言模型（LLM）的核心技术是什么？ LLM 基于 Transformer 解码器架构，通过大规模预训练学习语言规律。关键技术包括：位置编码（RoPE）、高效注意力（Flash Attention）、指令微调（Instruction Tuning）、RLHF（人类反馈强化学习）、量化部署（INT4/INT8）。

进阶面试专题

模型训练优化技巧

# 混合精度训练 — 加速训练、减少显存占用
from torch.cuda.amp import autocast, GradScaler

def train_with_mixed_precision(model, dataloader, optimizer):
    scaler = GradScaler()

    for batch in dataloader:
        optimizer.zero_grad()
        with autocast():
            outputs = model(batch["input_ids"])
            loss = outputs.loss

        scaler.scale(loss).backward()
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        scaler.step(optimizer)
        scaler.update()

# 学习率调度策略
import torch.optim.lr_scheduler as lr_scheduler

# Cosine Annealing — 周期性学习率
scheduler_cosine = lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)

# Warmup + Cosine Decay（推荐用于 Transformer）
class WarmupCosineScheduler:
    def __init__(self, optimizer, warmup_steps, total_steps):
        self.optimizer = optimizer
        self.warmup_steps = warmup_steps
        self.total_steps = total_steps
        self.current_step = 0

    def step(self):
        self.current_step += 1
        if self.current_step <= self.warmup_steps:
            lr = self.current_step / self.warmup_steps * self.base_lr
        else:
            progress = (self.current_step - self.warmup_steps) / \
                       (self.total_steps - self.warmup_steps)
            lr = self.base_lr * 0.5 * (1 + math.cos(math.pi * progress))
        for pg in self.optimizer.param_groups:
            pg["lr"] = lr

# 梯度累积 — 小 batch 模拟大 batch
def train_with_gradient_accumulation(model, dataloader, accum_steps=4):
    optimizer.zero_grad()
    for i, batch in enumerate(dataloader):
        outputs = model(batch)
        loss = outputs.loss / accum_steps  # 缩放损失
        loss.backward()
        if (i + 1) % accum_steps == 0:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            optimizer.zero_grad()

模型评估与指标

from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support,
    roc_auc_score, confusion_matrix
)
import numpy as np

# 分类任务完整评估
def evaluate_classification(y_true, y_pred, y_prob=None):
    acc = accuracy_score(y_true, y_pred)
    precision, recall, f1, _ = precision_recall_fscore_support(
        y_true, y_pred, average="weighted"
    )
    print(f"准确率: {acc:.4f}")
    print(f"精确率: {precision:.4f}")
    print(f"召回率: {recall:.4f}")
    print(f"F1 分数: {f1:.4f}")

    if y_prob is not None:
        auc = roc_auc_score(y_true, y_prob, multi_class="ovr")
        print(f"AUC: {auc:.4f}")

    cm = confusion_matrix(y_true, y_pred)
    print(f"混淆矩阵:\n{cm}")

# 目标检测评估指标
def compute_map(predictions, ground_truths, iou_threshold=0.5):
    """计算 mAP（mean Average Precision）"""
    # precision 和 recall 的权衡
    # AP = 曲线下面积（PR 曲线）
    # mAP = 所有类别 AP 的平均值
    # COCO 标准使用 mAP@[0.5:0.95]
    pass

# 模型推理性能测试
def benchmark_inference(model, input_shape=(1, 3, 224, 224), n_runs=100):
    import time
    device = next(model.parameters()).device
    dummy = torch.randn(*input_shape).to(device)
    model.eval()

    # Warmup
    with torch.no_grad():
        for _ in range(10):
            model(dummy)

    # Benchmark
    times = []
    with torch.no_grad():
        for _ in range(n_runs):
            start = time.perf_counter()
            model(dummy)
            times.append(time.perf_counter() - start)

    avg_ms = np.mean(times) * 1000
    fps = 1000 / avg_ms
    print(f"推理延迟: {avg_ms:.2f} ms, FPS: {fps:.1f}")

模型部署与推理优化

# ONNX 导出
def export_to_onnx(model, output_path="model.onnx"):
    dummy_input = torch.randn(1, 3, 224, 224)
    torch.onnx.export(
        model, dummy_input, output_path,
        input_names=["input"],
        output_names=["output"],
        dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
        opset_version=17,
    )
    print(f"模型已导出到 {output_path}")

# TorchScript 导出（无 Python 依赖的推理）
def export_to_torchscript(model, output_path="model.pt"):
    model.eval()
    dummy_input = torch.randn(1, 3, 224, 224)
    scripted = torch.jit.trace(model, dummy_input)
    scripted.save(output_path)

# 量化 — 减小模型体积和加速推理
import torch.quantization as quant

def quantize_model(model):
    model.eval()
    model.qconfig = quant.get_default_qconfig("fbgemm")
    prepared = quant.prepare(model)
    # 校准（用少量数据跑一遍）
    # for batch in calibration_loader: prepared(batch)
    quantized = quant.convert(prepared)
    return quantized

# 模型服务部署（FastAPI + Docker）
def create_model_server():
    """
    from fastapi import FastAPI
    from pydantic import BaseModel
    import torch

    app = FastAPI()
    model = torch.jit.load("model.pt")
    model.eval()

    class PredictRequest(BaseModel):
        input_data: list

    @app.post("/predict")
    async def predict(request: PredictRequest):
        tensor = torch.tensor(request.input_data).float()
        with torch.no_grad():
            output = model(tensor)
        return {"prediction": output.tolist()}

    # Dockerfile:
    # FROM python:3.11-slim
    # RUN pip install fastapi uvicorn torch
    # COPY model.pt app.py ./
    # CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
    """
    pass

16-20. 补充高频面试题

16. Transformer 中位置编码的作用是什么？ Transformer 本身没有顺序感知能力，位置编码为每个位置生成唯一向量，使模型能区分不同位置的 token。常用方案有正弦余弦编码、可学习位置编码和 RoPE（旋转位置编码，支持长度外推）。

17. 什么是 LoRA（Low-Rank Adaptation）？ LoRA 是一种参数高效微调方法，冻结预训练权重，在每层旁路添加低秩矩阵（A*B），仅训练极少量参数即可达到接近全量微调的效果。显著降低显存需求和训练成本。

18. 深度学习中如何防止过拟合？ 数据层面：数据增强、增加数据量。模型层面：Dropout、权重衰减（L2 正则化）、早停（Early Stopping）、Batch Normalization。训练层面：交叉验证、集成学习、标签平滑。

19. 什么是 FPN（特征金字塔网络）？ FPN 通过自上而下的路径和横向连接，融合不同分辨率的特征图，使检测器能利用多尺度特征。在 YOLOv8 和 Mask R-CNN 中广泛使用，显著提升小目标检测能力。

20. 多模态模型的核心思路是什么？ 多模态模型通过统一的表示空间对齐不同模态（文本、图像、音频）。代表工作包括 CLIP（图文对比学习）、BLIP（图文理解与生成）和 GPT-4V（多模态大模型）。关键技术是跨模态注意力机制和对比学习目标。

优点

系统覆盖 NLP 和 CV 的核心技术
提供关键算法的 PyTorch 实现代码
包含从传统方法到最新技术的演进脉络
结合实际应用场景进行解答

缺点

NLP/CV 领域技术更新快，需要持续学习
深度学习需要大量计算资源进行实验
部分概念涉及较深的数学理论
面试中可能涉及模型细节和推导过程

总结

NLP 和 CV 面试需要展示对核心算法原理的理解和实际应用能力。NLP 方向重点掌握 Transformer 架构、预训练模型（BERT/GPT）和文本处理流程；CV 方向重点理解 CNN 架构、目标检测（YOLO 系列）和图像分割。建议通过实际项目积累经验，在面试中结合模型选择、训练技巧和部署方案展示综合能力。

这组题真正考什么

AI 面试题通常在考你是否能把术语和工程落地连起来。
很多题目的关键不在模型名字，而在数据、评估和成本权衡。
答题时如果能提到失败场景和安全边界，会更有深度。

60 秒答题模板

先解释概念本身。
再说它解决什么问题和有什么限制。
最后补工程落地时的指标、成本或风险点。

容易失分的点

只记英文缩写，不会中文解释。
只谈效果，不谈评估和成本。
把不同模型或不同阶段混为一谈。

刷题建议

把模型原理、训练方法、评估方式和工程落地分层复习。
回答 AI 题时尽量同时提到数据、模型、效果和成本。
对热门术语要准备“是什么、解决什么问题、有什么限制”三个层次。

高频追问

这个方法在效果、延迟和成本上的权衡是什么？
如果模型输出不稳定，你会从哪些环节开始排查？
这个概念在企业落地时最容易踩的坑是什么？

复习重点

把每道题的关键词整理成自己的知识树，而不是只背原句。
对容易混淆的概念要做横向比较，例如机制差异、适用边界和性能代价。
复习时优先补“为什么”，其次才是“怎么用”和“记住什么术语”。

面试作答提醒

避免只背英文缩写，尽量用一句中文把核心机制讲明白。
说效果时最好带上评估方式或具体指标。
不确定的模型细节可以明确说明假设前提。