Accelerate

一、基础概念

Accelerate 是一个库，只需添加四行代码，就可以在任何分布式 configuration 中运行相同的 PyTorch 代码：
</div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><span><span></span>x</span></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ from accelerate import Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator = Accelerator()</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ model, optimizer, training_dataloader, scheduler = accelerator.prepare(</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+     model, optimizer, training_dataloader, scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ )</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  for batch in training_dataloader:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      optimizer.zero_grad()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      inputs, targets = batch</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      inputs = inputs.to(device)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      targets = targets.to(device)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      outputs = model(inputs)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      loss = loss_function(outputs, targets)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+     accelerator.backward(loss)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      optimizer.step()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      scheduler.step()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 374px;"/><div class="CodeMirror-gutters" style="display: none; height: 374px;"/></div></div></pre><p><span>上述代码可以通过 </span><code>Accelerate</code><span> 的 </span><code>CLI</code><span> 接口在任何系统上启动：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch {my_script.py}</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>安装：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang=""><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang=""><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">pip install accelerate</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">conda install -c conda-forge accelerate</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">pip install git+https://github.com/huggingface/accelerate</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre></li><li><p><span>配置：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate config</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>然后 </span><code>accelerate</code><span> 会向你询问一些问题从而生成配置。</span></p><p><span>检查配置：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate env</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>为了使用 </span><code>Accelerate</code><span>，你只需要修改四件事：</span></p><ul><li><p><span>首先，导入 </span><code>Accelerator</code><span> 并创建一个 </span><code>accelerator</code><span> 对象：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><span>然后，移除针对你的模型和输入数据的所有 </span><code>.to(device)</code><span> 或 </span><code>.cuda()</code><span> 的调用。</span><code>accelerator</code><span> 将会为你正确处理这个问题，并为你把所有这些对象放在正确的设备上。</span></p><p><span>如果你知道你在做什么，你可以保留那些 </span><code>.to(device)</code><span> 的调用，但你应该使用 </span><code>accelerator</code><span> 对象提供的设备： </span><code>accelerator.device</code><span> 。</span></p><p><span>要完全停用自动的 </span><code>device placement</code><span> ，在初始化 </span><code>Accelerator</code><span> 时传递 </span><code>device_placement=False</code><span> 。</span></p></li><li><p><span>接着，将所有与训练有关的对象（</span><code>optimizer, model, training dataloader, learning rate scheduler</code><span> ）传递给</span><code>accelerator.prepare()</code><span> 方法。这将确保一切都为训练做好准备。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">train_dataloader</span>, <span class="cm-variable">lr_scheduler</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">train_dataloader</span>, <span class="cm-variable">lr_scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>具体而言，</span><code>training dataloader</code><span> 将被分片到所有可用的 </span><code>GPU/TPU</code><span> 核心上，这样每个设备都能看到训练数据集的不同部分。此外，所有进程的随机数状态将在每次迭代开始时通过 </span><code>dataloader</code><span> 进行同步，以确保数据以相同的方式被混洗（如果你决定使用</span><code>shuffle=True</code><span> 或任何类型的 </span><code>random sampler</code><span> ）。</span></p><p><span>训练的实际 </span><code>batch size</code><span> 将是使用的设备数量乘以你在脚本中设置的 </span><code>batch size</code><span> 。另外，你可以在创建 </span><code>accelerator</code><span> 对象时使用 </span><code>split_batches=True</code><span> 参半，此时无论你在多少个 </span><code>GPU</code><span> 上运行你的脚本，实际 </span><code>batch size</code><span> 都会保持不变。</span></p><p><span>你需要再开始实际的 </span><code>training loop</code><span> 之前执行 </span><code>accelerator.prepare()</code><span> 。</span></p><p><span>只有当 </span><code>scheduler</code><span> 需要再每个 </span><code>optimizer step</code><span> 中被 </span><code>stepped</code><span> 时，才需要把 </span><code>learning rate scheduler</code><span> 传递给 </span><code>prepare()</code><span> 。</span></p><p><span>任何获取 </span><code>training dataloader length</code><span> 的方法（例如，你需要记录 </span><code>total training step</code><span>）都应该在  </span><code>accelerator.prepare()</code><span>  之后进行。</span></p><p><span>你可能想、也可能不想把你的</span><code>validation dataloader</code><span> 发送到 </span><code>prepare()</code><span> ，这取决于你是否想运行分布式评估。</span></p></li><li><p><span>最后，用 </span><code>accelerator.backward(loss)</code><span> 替换 </span><code>loss.backward()</code><span> 。</span></p></li></ul><p><span>现在，你的脚本将在你的本地机器上运行，也可以在多个 </span><code>GPU</code><span> 或 </span><code>TPU</code><span> 上运行。你可以使用你喜欢的工具来启动分布式训练，或者你可以使用 </span><code>Accelerate launcher</code><span> 启动。</span></p></li><li><p><span>分布式评估：</span></p><ul><li><p><span>可以进行常规评估，此时你需要将 </span><code>validation dataloader</code><span> 保持在 </span><code>accelerator.prepare()</code><span> 之外。并且，你需要手动将 </span><code>input</code><span> 数据放在 </span><code>accelerator.device</code><span> 上。</span></p></li><li><p><span>也可以进行分布式评估，此时你需要将 </span><code>validation dataloader</code><span> 放置在 </span><code>accelerator.prepare()</code><span> 之内：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">validation_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">validation_dataloader</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>就像 </span><code>training dataloader</code><span>，这意味着在分布式评估时，每个设备将仅看到部分 </span><code>evaluation</code><span> 数据。这意味着你需要把 </span><code>predictions</code><span>  进行 </span><code>group</code><span> 。可以通过 </span><code>accelerator.gather_for_metrics()</code><span> 方法来实现：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-variable">inputs</span>, <span class="cm-variable">targets</span> <span class="cm-keyword">in</span> <span class="cm-variable">validation_dataloader</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">predictions</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># Gather all predictions and targets</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">all_predictions</span>, <span class="cm-variable">all_targets</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">gather_for_metrics</span>((<span class="cm-variable">predictions</span>, <span class="cm-variable">targets</span>))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># Example of use with a Datasets.Metric</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">metric</span>.<span class="cm-property">add_batch</span>(<span class="cm-variable">all_predictions</span>, <span class="cm-variable">all_targets</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre><p><span>类似 </span><code>training dataloader</code><span> ，把 </span><code>validation dataloader</code><span> 传入 </span><code>prepare()</code><span> 可能会改变该 </span><code>dataloader</code><span> ：如果你在 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="1.357ex" height="1.025ex" role="img" focusable="false" viewbox="0 -442 600 453" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-3-TEX-I-1D45B" d="M21 287Q22 293 24 303T36 341T56 388T89 425T135 442Q171 442 195 424T225 390T231 369Q231 367 232 367L243 378Q304 442 382 442Q436 442 469 415T503 336T465 179T427 52Q427 26 444 26Q450 26 453 27Q482 32 505 65T540 145Q542 153 560 153Q580 153 580 145Q580 144 576 130Q568 101 554 73T508 17T439 -10Q392 -10 371 17T350 73Q350 92 386 193T423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 180T152 343Q153 348 153 366Q153 405 129 405Q91 405 66 305Q60 285 60 284Q58 278 41 278H27Q21 284 21 287Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-3-TEX-I-1D45B"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math></mjx-assistive-mml></mjx-container><span> 个 </span><code>GPU</code><span> 上运行，则它的长度将被除以 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="1.357ex" height="1.025ex" role="img" focusable="false" viewbox="0 -442 600 453" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-3-TEX-I-1D45B" d="M21 287Q22 293 24 303T36 341T56 388T89 425T135 442Q171 442 195 424T225 390T231 369Q231 367 232 367L243 378Q304 442 382 442Q436 442 469 415T503 336T465 179T427 52Q427 26 444 26Q450 26 453 27Q482 32 505 65T540 145Q542 153 560 153Q580 153 580 145Q580 144 576 130Q568 101 554 73T508 17T439 -10Q392 -10 371 17T350 73Q350 92 386 193T423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 180T152 343Q153 348 153 366Q153 405 129 405Q91 405 66 305Q60 285 60 284Q58 278 41 278H27Q21 284 21 287Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-3-TEX-I-1D45B"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math></mjx-assistive-mml></mjx-container><span> （因为你的实际 </span><code>batch size</code><span> 将被乘以 </span><mjx-container class="MathJax" jax="SVG" style="position: relative;"><svg xmlns="http://www.w3.org/2000/svg" width="1.357ex" height="1.025ex" role="img" focusable="false" viewbox="0 -442 600 453" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" style="vertical-align: -0.025ex;"><defs><path id="MJX-3-TEX-I-1D45B" d="M21 287Q22 293 24 303T36 341T56 388T89 425T135 442Q171 442 195 424T225 390T231 369Q231 367 232 367L243 378Q304 442 382 442Q436 442 469 415T503 336T465 179T427 52Q427 26 444 26Q450 26 453 27Q482 32 505 65T540 145Q542 153 560 153Q580 153 580 145Q580 144 576 130Q568 101 554 73T508 17T439 -10Q392 -10 371 17T350 73Q350 92 386 193T423 345Q423 404 379 404H374Q288 404 229 303L222 291L189 157Q156 26 151 16Q138 -11 108 -11Q95 -11 87 -5T76 7T74 17Q74 30 112 180T152 343Q153 348 153 366Q153 405 129 405Q91 405 66 305Q60 285 60 284Q58 278 41 278H27Q21 284 21 287Z"/></defs><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="scale(1,-1)"><g data-mml-node="math"><g data-mml-node="mi"><use data-c="1D45B" xlink:href="#MJX-3-TEX-I-1D45B"/></g></g></g></svg><mjx-assistive-mml unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math></mjx-assistive-mml></mjx-container><span> ），除非你设置 </span><code>split_batches=True</code><span> 。</span></p><p><span>任何获取 </span><code>validation dataloader length</code><span> 的方法都应该在  </span><code>accelerator.prepare()</code><span>  之后进行。</span></p><p><span>数据集末尾的一些数据可能是重复的，所以这个 </span><code>batch</code><span> 的数据可以平均分配给所有的工作者。因此，应该通过</span><code>gather_for_metrics()</code><span> 方法计算指标，以便在收集时自动删除重复的数据。如果出于某种原因，你不希望自动完成这项工作，可以用 </span><code>accelerator.gather()</code><span> 来收集所有进程的数据，然后手动完成。</span></p><p><code>gather()</code><span> 和 </span><code>gather_for_metrics()</code><span> 要求每个进程上的张量是相同尺寸的。如果你在每个进程上有不同尺寸的张量（例如，当动态填充到一个 </span><code>batch</code><span> 的最大长度时），你应该使用 </span><code>accelerator.gather.pad_across_processes()</code><span> 方法将张量填充到跨进程的最大尺寸。</span></p></li></ul></li><li><p><span>启动分布式脚本：你可以使用常规命令来启动你的分布式训练（如 </span><code>PyTorch</code><span> 的 </span><code>torch.distributed.launch</code><span> ），它们与 </span><code>Accelerate</code><span> 完全兼容。这里唯一需要注意的是： </span><code>Accelerate</code><span>  使用 </span><code>environment</code><span> 来确定所有有用的信息，所以 </span><code>torch.distributed.launch</code><span>  应与标志 </span><code>--use_env</code><span> 一起使用。</span></p><p><code>Accelerate</code><span> 还提供了一个 </span><code>CLI</code><span> 工具，它统一了所有的 </span><code>launcher</code><span> ，所以你只需要记住一个命令：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate config</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>你需要回答问题，然后 </span><code>Accelerate</code><span> 将在你的 </span><code>cache folder</code><span> 创建一个 </span><code>default_config.yaml</code><span> 文件。这个缓存目录是（根据优先级递减）：</span></p><ul><li><span>环境变量 </span><code>HF_HOME</code><span> 的内容，以 </span><code>accelerate</code><span> 为后缀。</span></li><li><span>如果不存在，则环境变量 </span><code>XDG_CACHE_HOME</code><span> 的内容，以 </span><code>huggingface/accelerate</code><span> 为后缀。</span></li><li><span>如果也不存在，则为 </span><code>~/.cache/huggingface/accelerate</code><span> 。</span></li></ul><p><span>你也可以通过标志 </span><code>--config_file</code><span> 来指定你要保存的文件的位置。</span></p><p><span>然后，你可以通过运行来测试你的设置是否一切顺利：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate test</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>这将启动一个简短的脚本，测试分布式环境。你也可以在测试期间指定配置文件的位置：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate test <span class="cm-attribute">--config_file</span> path_to_config.yaml</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>如果测试通过，你可以通过如下的命令来执行你的脚本：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch path_to_script.py <span class="cm-attribute">--args_for_the_script</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>也可以指定配置文件的位置：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch <span class="cm-attribute">--config_file</span> path_to_config.yaml path_to_script.py <span class="cm-attribute">--args_for_the_script</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>从 </span><code>notebook</code><span> 中启动：在 </span><code>Accelerate 0.3.0</code><span> 中引入了一个 </span><code>notebook_launcher()</code><span> 从而帮助你在 </span><code>notebook</code><span> 上启动训练。</span></p><p><span>只要在 </span><code>notebook</code><span> 的一个 </span><code>cell</code><span> 中定义一个负责整个 </span><code>train and/or evaluation</code><span> 的函数，然后用以下代码执行一个 </span><code>cell</code><span> ：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">notebook_launcher</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">notebook_launcher</span>(<span class="cm-variable">training_function</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>注意：你的 </span><code>Accelerator</code><span> 对象应该只在 </span><code>training_function</code><span> 中定义，这是因为初始化应该只在 </span><code>launcher</code><span> 内完成。</span></p></li><li><p><span>在 </span><code>TPU</code><span> 上训练：如果你想在 </span><code>TPU</code><span> 上启动你的脚本，有一些注意事项是你应该注意的。在幕后，</span><code>TPU</code><span> 将为你的 </span><code>training step</code><span> （前向传播、反向传播、以及  </span><code>optimizer step</code><span> ）中发生的所有操作创建一个 </span><code>graph</code><span> 。这就是为什么你的第一个训练步总是非常长，因为建立和编译这个 </span><code>graph</code><span> 需要一些时间。</span></p><p><span>好消息是，这个编译将被缓存，所以第二步和所有后续的 </span><code>step</code><span> 将更快。坏消息是，这只适用于你的所有 </span><code>step</code><span> 做完全相同的操作，这意味着：</span></p><ul><li><span>所有 </span><code>batch</code><span> 必须有用相同的张量尺寸。</span></li><li><span>必须使用静态的代码（即，如果单个 </span><code>step</code><span> 中存在循环，那么循环次数在每个 </span><code>step</code><span> 必须相同）。</span></li></ul><p><span>如果上述任何一项在两个 </span><code>step</code><span> 之间发生变化，都会触发新的编译，这将再次花费大量时间。在实践中，这意味着：你必须特别注意让你的输入中的所有张量具有相同的形状（所以没有动态填充），并且不应该使用具有 </span><code>for</code><span> 循环的层，其中 </span><code>for</code><span> 循环的根据 </span><code>input</code><span> 的不同而具有不同长度（如 </span><code>LSTM</code><span>）。否则，训练会慢得令人难受。</span></p><p><span>可以针对 </span><code>TPU</code><span> 执行一些特殊的代码：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">DistributedType</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">distributed_type</span> <span class="cm-operator">==</span> <span class="cm-variable">DistributedType</span>.<span class="cm-property">TPU</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># do something of static shape</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># go crazy and be dynamic</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre><p><span>最后要注意的是：如果你的模型有 </span><code>tied weight</code><span> （比如语言模型将 </span><code>embedding matrix</code><span> 的权重与 </span><code>decoder</code><span> 的权重绑定），将这个模型移动到 </span><code>TPU</code><span> （无论是你自己移动、还是由 </span><code>prepare()</code><span> 移动）会破坏绑定。你将需要在之后重新绑定权重。</span></p></li><li><p><span>在单个进程上执行的语句：有些语句只需要在特定的进程上执行而无需在所有进程上执行，如数据下载、记录日志、以及打印进度条。此时可以执行：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_local_main_process</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># Is executed once per server</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">tqdm</span>.<span class="cm-property">auto</span> <span class="cm-keyword">import</span> <span class="cm-variable">tqdm</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">progress_bar</span> <span class="cm-operator">=</span> <span class="cm-variable">tqdm</span>(<span class="cm-builtin">range</span>(<span class="cm-variable">args</span>.<span class="cm-property">max_train_steps</span>), <span class="cm-variable">disable</span><span class="cm-operator">=</span><span class="cm-keyword">not</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_local_main_process</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre><p><code>local</code><span> 意思是每台机器上运行：如果你在两台服务器上训练，其中每台服务器有几个 </span><code>GPU</code><span> ，则代码将在每台服务器上执行一次。</span></p><p><span>如果你希望对所有进程仅执行一次（如，上传模型到 </span><code>model hub</code><span>），则可以执行：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># Is executed once only</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>对于 </span><code>print</code><span> 语句，你希望在每台机器上执行一次，则可以用 </span><code>accelerator.print</code><span> 代替 </span><code>print</code><span> 函数。</span></p></li><li><p><span>延迟执行：当你运行你的常规脚本时，指令是按顺序执行的。使用 </span><code>Accelerate</code><span> 在几个 </span><code>GPU</code><span> 上同时部署你的脚本会带来一个复杂的问题：虽然每个进程都是按顺序执行所有指令，但有些可能比其他的快。</span></p><p><span>你可能需要等待所有进程达到一定程度后再执行某条指令。例如，在确定每个进程都完成了训练之前，你不应该保存一个模型。要做到这一点，可以执行：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">wait_for_everyone</span>()</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>这条指令将阻塞所有先到的进程，直到所有其他进程都到达该点（如果你只在一个 </span><code>GPU</code><span> 或 </span><code>CPU</code><span> 上运行你的脚本，这不会有任何作用）。</span></p></li><li><p><span>保存/加载模型：保存训练好的模型可能需要一些调整：</span></p><ul><li><p><span>首先，你应该等待所有的进程到达脚本中的 “延迟执行” 所描述的那个点。</span></p></li><li><p><span>然后，你应该在保存模型之前 </span><code>unwrap</code><span> 你的模型。这是因为在通过 </span><code>prepare()</code><span> 方法时，你的模型可能被 </span><code>wrap</code><span> 从而用于分布式训练。如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">wait_for_everyone</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">save</span>(<span class="cm-variable">unwrapped_model</span>.<span class="cm-property">state_dict</span>(), <span class="cm-variable">filename</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>如果你的脚本包含加载 </span><code>checkpoint</code><span> 的逻辑，我们也建议你在 </span><code>unwrapped model</code><span> 中加载你的权重（这只在 </span><code>prepare()</code><span> 后使用加载函数时有用）。如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span>.<span class="cm-property">load_state_dict</span>(<span class="cm-variable">torch</span>.<span class="cm-property">load</span>(<span class="cm-variable">filename</span>))</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li></ul></li><li><p><span>保存/加载整个状态：当训练你的模型时，你可能想保存模型、优化器、随机数生成器、以及潜在的 </span><code>LR scheduler</code><span> 的当前状态，以便在同一个脚本中恢复训练。你可以分别使用 </span><code>save_state()</code><span> 和 </span><code>load_state()</code><span> 来做到这一点，只需简单地传入一个保存位置。</span></p><p><span>如果你通过 </span><code>register_for_checkpointing()</code><span> 注册了任何其他需要存储的 </span><code>stateful item</code><span> ，它们也会被保存和/或加载。</span></p><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">torch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">my_scheduler</span> <span class="cm-operator">=</span> <span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">lr_scheduler</span>.<span class="cm-property">StepLR</span>(<span class="cm-variable">my_optimizer</span>, <span class="cm-variable">step_size</span><span class="cm-operator">=</span><span class="cm-number">1</span>, <span class="cm-variable">gamma</span><span class="cm-operator">=</span><span class="cm-number">0.99</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">my_model</span>, <span class="cm-variable">my_optimizer</span>, <span class="cm-variable">my_training_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerate</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">my_model</span>, <span class="cm-variable">my_optimizer</span>, <span class="cm-variable">my_training_dataloader</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Register the LR scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerate</span>.<span class="cm-property">register_for_checkpointing</span>(<span class="cm-variable">my_scheduler</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Save the starting state</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerate</span>.<span class="cm-property">save_state</span>(<span class="cm-string">"my/save/path"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">device</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">device</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">my_model</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Perform training</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-variable">epoch</span> <span class="cm-keyword">in</span> <span class="cm-builtin">range</span>(<span class="cm-variable">num_epochs</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-keyword">for</span> <span class="cm-variable">batch</span> <span class="cm-keyword">in</span> <span class="cm-variable">my_training_dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">my_optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">inputs</span>, <span class="cm-variable">targets</span> <span class="cm-operator">=</span> <span class="cm-variable">batch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">inputs</span> <span class="cm-operator">=</span> <span class="cm-variable">inputs</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">targets</span> <span class="cm-operator">=</span> <span class="cm-variable">targets</span>.<span class="cm-property">to</span>(<span class="cm-variable">device</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">my_model</span>(<span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">my_loss_function</span>(<span class="cm-variable">outputs</span>, <span class="cm-variable">targets</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">my_optimizer</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">my_scheduler</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Restore previous state</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerate</span>.<span class="cm-property">load_state</span>(<span class="cm-string">"my/save/path"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 726px;"/><div class="CodeMirror-gutters" style="display: none; height: 726px;"/></div></div></pre></li><li><p><span>梯度裁剪：如果你在脚本中使用梯度剪裁，你应该把对 </span><code>torch.nn.utils.clip_grad_norm_</code><span> 或  </span><code>torch.nn.utils.clip_grad_value_</code><span>  的调用分别替换为 </span><code>accelerator.clipgrad_norm()</code><span>  和 </span><code>accelerator.clipgrad_value()</code><span>  。</span></p></li><li><p><span>混合精度训练：如果你用 </span><code>Accelerate</code><span> 在混合精度下训练，那么模型内的计算将以混合精度进行，而模型外的每一次计算都将以 </span><code>full precision</code><span> 执行。例如，</span><code>loss</code><span> 的计算通常在模型外，且涉及 </span><code>softmax</code><span> 。然而，你可能想把你的 </span><code>loss</code><span> 计算放在 </span><code>accelerator.autocast</code><span>上下文管理器中：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">autocast</span>():</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">complex_loss_function</span>(<span class="cm-variable">outputs</span>, <span class="cm-variable">target</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>混合精度训练的另一个注意事项是：梯度会在开始时跳过一些更新，有时在训练过程中也会跳过。这是因为动态损失缩放 </span><code>dynamic loss scaling</code><span> 策略，在训练过程中会有一些时刻，梯度已经溢出，</span><code>loss scaling factor</code><span> 会减少，从而避免在下一步再次发生这种情况。</span></p><p><span>这意味着你可能会在没有梯度更新的时候就更新你的 </span><code>learning rate scheduler</code><span> 。这在一般情况下是没有问题的，但是当你的训练数据非常少，或者你的 </span><code>scheduler</code><span> 的第一个学习率值非常重要时，可能会有影响。在这种情况下，你可以跳过 </span><code>learning rate scheduler</code><span> 的更新：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-keyword">not</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">optimizer_step_was_skipped</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">lr_scheduler</span>.<span class="cm-property">step</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><span>梯度累积：要执行梯度累积，请使用 </span><code>accumulate()</code><span> 并指定 </span><code>gradient_accumulation_steps</code><span> 。在多设备训练时，这也会自动确保梯度同步或不同步，检查是否真的应该执行该 </span><code>step</code><span> ，并自动计算损失：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>(<span class="cm-variable">gradient_accumulation_steps</span><span class="cm-operator">=</span><span class="cm-number">2</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">training_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">training_dataloader</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-builtin">input</span>, <span class="cm-variable">label</span> <span class="cm-keyword">in</span> <span class="cm-variable">training_dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">accumulate</span>(<span class="cm-variable">model</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">predictions</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-builtin">input</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_function</span>(<span class="cm-variable">predictions</span>, <span class="cm-variable">label</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">optimizer</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">scheduler</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre><p><span>相比之下，传统的梯度累加方法会用更冗长的代码：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ from accelerate import Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator = Accelerator()</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ model, optimizer, training_dataloader, scheduler = accelerator.prepare(</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+     model, optimizer, training_dataloader, scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ )</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  for index, batch in enumerate(training_dataloader):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      inputs, targets = batch</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">-     inputs = inputs.to(device)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">-     targets = targets.to(device)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      outputs = model(inputs)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      loss = loss_function(outputs, targets)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      loss = loss / gradient_accumulation_steps</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+     accelerator.backward(loss)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      if (index+1) % gradient_accumulation_steps == 0:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          optimizer.step()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          scheduler.step()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          optimizer.zero_grad()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 418px;"/><div class="CodeMirror-gutters" style="display: none; height: 418px;"/></div></div></pre></li><li><p><code>DeepSpeed</code><span>：</span><code>DeepSpeed</code><span> 支持是实验性的，所以底层 </span><code>API</code><span> 将在不久的将来发展，可能会有一些轻微的破坏性变化。具体而言， </span><code>Accelerate</code><span> 还不支持你自己编写的 </span><code>DeepSpeed</code><span> 配置，这将在下一个版本中添加。</span></p></li><li><p><span>使用 </span><code>accelerate launch</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch {script_name.py} <span class="cm-attribute">--arg1</span> <span class="cm-attribute">--arg2</span> ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>指定单个 </span><code>GPU</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-def">CUDA_VISIBLE_DEVICES</span><span class="cm-operator">=</span><span class="cm-string">"0"</span> accelerate launch {script_name.py} <span class="cm-attribute">--arg1</span> <span class="cm-attribute">--arg2</span> ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>在两个 </span><code>GPU</code><span> 上混合精度训练：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch <span class="cm-attribute">--multi_gpu</span> <span class="cm-attribute">--mixed_precision</span><span class="cm-operator">=</span>fp16 <span class="cm-attribute">--num_processes</span><span class="cm-operator">=</span><span class="cm-number">2</span> {script_name.py} {--arg1} {--arg2} ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>建议总是在 </span><code>accelerate launch</code><span> 之前执行 </span><code>accelerate config</code><span> ，这样就无需再 </span><code>accelerate launch</code><span> 中指定各种配置。</span></p></li><li><p><span>在 </span><code>notebook</code><span> 中 </span><code>launch</code><span>：</span></p><ul><li><span>确保任何使用 </span><code>CUDA</code><span> 的代码在一个函数中，该函数被传递给 </span><code>notebook_launcher()</code><span> 。</span></li><li><span>设置 </span><code>num_processes</code><span> 为训练的设备数量（如，</span><code>GPU, CPU, TPU</code><span> 数量）。</span></li><li><span>如果使用 </span><code>TPU</code><span> ，在 </span><code>training loop</code><span> 函数之外声明你的模型。</span></li></ul><p><span>如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">notebook_launcher</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">args</span> <span class="cm-operator">=</span> (<span class="cm-string">"fp16"</span>, <span class="cm-number">42</span>, <span class="cm-number">64</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">notebook_launcher</span>(<span class="cm-variable">training_loop</span>, <span class="cm-variable">args</span>, <span class="cm-variable">num_processes</span><span class="cm-operator">=</span><span class="cm-number">2</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>对于 </span><code>TPU</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">create_model</span>(<span class="cm-string">"resnet50d"</span>, <span class="cm-variable">pretrained</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>, <span class="cm-variable">num_classes</span><span class="cm-operator">=</span><span class="cm-builtin">len</span>(<span class="cm-variable">label_to_id</span>))</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">args</span> <span class="cm-operator">=</span> (<span class="cm-variable">model</span>, <span class="cm-string">"fp16"</span>, <span class="cm-number">42</span>, <span class="cm-number">64</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">notebook_launcher</span>(<span class="cm-variable">training_loop</span>, <span class="cm-variable">args</span>, <span class="cm-variable">num_processes</span><span class="cm-operator">=</span><span class="cm-number">8</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre></li><li><p><span>启用 </span><code>FSDP</code><span>：</span></p><ul><li><p><span>首先进行配置：</span><code>accelerate config</code><span>。</span><code>FSDP</code><span> 配置的一个例子：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="json"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">compute_environment</span>: <span class="cm-variable">LOCAL_MACHINE</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">deepspeed_config</span>: {}</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">distributed_type</span>: <span class="cm-variable">FSDP</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">downcast_bf16</span>: <span class="cm-string">'no'</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">fsdp_config</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_auto_wrap_policy</span>: <span class="cm-variable">TRANSFORMER_BASED_WRAP</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_backward_prefetch_policy</span>: <span class="cm-variable">BACKWARD_PRE</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_offload_params</span>: <span class="cm-atom">false</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_sharding_strategy</span>: <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_state_dict_type</span>: <span class="cm-variable">FULL_STATE_DICT</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_transformer_layer_cls_to_wrap</span>: <span class="cm-variable">GPT2Block</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">machine_rank</span>: <span class="cm-number">0</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_process_ip</span>: <span class="cm-atom">null</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_process_port</span>: <span class="cm-atom">null</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_training_function</span>: <span class="cm-variable">main</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">mixed_precision</span>: <span class="cm-string">'no'</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_machines</span>: <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_processes</span>: <span class="cm-number">2</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">use_cpu</span>: <span class="cm-atom">false</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 418px;"/><div class="CodeMirror-gutters" style="display: none; height: 418px;"/></div></div></pre><p><span>然后开始训练：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch examples/nlp_example.py</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>这些配置参数的含义为：</span></p><ul><li><p><code>Sharding Strategy</code><span>：</span></p><ul><li><code>FULL_SHARD</code><span>：对 </span><code>optimizer states, gradients, parameters</code><span> 都进行分片。</span></li><li><code>SHARD_GRAD_OP</code><span>：仅对 </span><code>optimizer states, gradients</code><span> 进行分片。</span></li><li><code>NO_SHARD</code><span>：不进行分片。</span></li></ul></li><li><p><code>Offload Params</code><span>：一个布尔值，指定是否将 </span><code>parameters</code><span> 和 </span><code>gradients</code><span> 卸载到 </span><code>CPU</code><span> 。</span></p></li><li><p><code>Auto Wrap Policy</code><span>：可以为 </span><code>TRANSFORMER_BASED_WRAP, SIZE_BASED_WRAP, NO_WRAP</code><span> 。</span></p></li><li><p><code>Transformer Layer Class to Wrap</code><span>：当使用 </span><code>TRANSFORMER_BASED_WRAP</code><span> 时，指定特定的 </span><code>transformer layer class name</code><span> （大小写敏感）从而执行 </span><code>wrap</code><span> 。如 </span><code>BertLayer, GPTJBlock, T5Block,...</code><span> 。</span></p></li><li><p><code>Min Num Params</code><span>：使用 </span><code>SIZE_BASED_WRAP</code><span> 的最小参数数量。</span></p></li><li><p><code>Backward Prefetch</code><span>：可以为 </span><code>BACKWARD_PRE, BACKWARD_POST, NO_PREFETCH</code><span>。</span></p></li><li><p><code>State Dict Type</code><span>：可以为 </span><code>FULL_STATE_DICT, LOCAL_STATE_DICT, SHARDED_STATE_DICT</code><span> 。</span></p></li></ul></li><li><p><span>有几个需要注意的地方：</span></p><ul><li><p><code>PyTorch FSDP</code><span> 会自动 </span><code>wrap</code><span> 子模块，对参数进行扁平化处理，并将参数分片。由于这个原因，任何在 </span><code>model wrapping</code><span> 之前创建的 </span><code>optimizer</code><span> 都会被破坏，并占用更多的内存。因此，强烈建议在创建 </span><code>optimizer</code><span> 之前准备好模型，这也是很有效的。</span><code>Accelerate</code><span> 将自动 </span><code>wrap</code><span> 模型，并在单个模型的情况下为你创建一个优化器，并发出警告信息：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang=""><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang=""><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">FSDP Warning: When using FSDP, it is efficient and recommended to call prepare for the model before creating the optimizer.</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>下面是使用 </span><code>FSDP</code><span> 时准备模型和优化器的推荐方法：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", return_dict=True)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ model = accelerator.prepare(model)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">optimizer = torch.optim.AdamW(params=model.parameters(), lr=lr)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">-        model, optimizer, train_dataloader, eval_dataloader, lr_scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">-    )</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+         optimizer, train_dataloader, eval_dataloader, lr_scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+    )</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 264px;"/><div class="CodeMirror-gutters" style="display: none; height: 264px;"/></div></div></pre></li><li><p><span>在单个模型的情况下，如果你用多个 </span><code>parameter groups</code><span>创建了优化器，并且用它们一起调用 </span><code>prepare</code><span> ，那么 </span><code>parameter groups</code><span> 将被丢失，并显示以下警告：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang=""><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang=""><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">FSDP Warning: When using FSDP, several parameter groups will be conflated into a single one due to nested module wrapping and parameter flattening.</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>这是因为，由于嵌套的 </span><code>FSDP</code><span> 模块的参数扁平化为一维数组，在 </span><code>wrapping</code><span> 前创建的 </span><code>parameter groups</code><span> 在 </span><code>wrapping</code><span> 后将没有意义。</span></p></li><li><p><span>在有多个模型的情况下，有必要在创建优化器之前准备好模型，否则会抛出一个错误。然后将优化器以与相应模型相同的顺序传递给 </span><code>prepare()</code><span>  方法，否则 </span><code>accelerator.save_state()</code><span> 和 </span><code>accelerator.load_state()</code><span> 将导致错误/意外的行为。</span></p></li><li><p><span>这个功能与 </span><code>Transformers library</code><span> 的 </span><code>run_translation.py</code><span> 脚本中的 </span><code>--predict_with_generate</code><span> 不兼容。</span></p></li></ul></li><li><p><span>对于更多的控制，用户可以利用 </span><code>FullyShardedDataParallelPlugin</code><span> 。在创建这个类的实例后，用户可以把它传递给 </span><code>Accelerator</code><span> 类的实例。</span></p></li></ul></li><li><p><span>启用 </span><code>DeepSpeed</code><span>：</span><code>DeepSpeed</code><span> 实现了 </span><code>ZeRO</code><span> 论文中描述的一切。目前，它提供了如下的支持：</span><code>Optimizer state partitioning</code><span>（</span><code>ZeRO stage 1</code><span>）、</span><code>Gradient partitioning</code><span>（</span><code>ZeRO stage 2</code><span>）、</span><code>Parameter partitioning</code><span>（</span><code>ZeRO stage 3</code><span>）、</span><code>Custom mixed precision training handling</code><span>、一系列基于 </span><code>CUDA</code><span> 扩展的快速优化器、</span><code>ZeRO-Offload</code><span> 到 </span><code>CPU</code><span> 和 </span><code>Disk/NVMe</code><span> 。</span></p><p><code>DeepSpeed ZeRO-2</code><span> 主要只用于训练，因为它的功能对推理没有用处。</span><code>DeepSpeed ZeRO-3</code><span> 也可以用于推理，因为它允许在多个 </span><code>GPU</code><span> 上加载巨大的模型。</span></p><ul><li><p><code>Accelerate</code><span> 通过两种方式集成 </span><code>DeepSpeed</code><span>：</span></p><ul><li><span>通过 </span><code>deepspeed</code><span> 配置文件来集成。它支持 </span><code>DeepSpeed</code><span> 的所有核心功能，并为用户提供了很大的灵活性。用户可能需要根据配置来改变几行代码。</span></li><li><span>通过 </span><code>deepspeed_plugin</code><span> 来集成。这支持 </span><code>DeepSpeed</code><span> 功能的子集，并对其余的配置使用默认选项。用户不需要改变任何代码。</span></li></ul></li><li><p><span>什么被集成了？</span></p><ul><li><p><span>训练：</span><code>DeepSpeed ZeRO</code><span> 训练支持完整的 </span><code>ZeRO stages 1, 2 and 3</code><span>、以及 </span><code>optimizer states, gradients and parameters</code><span> 的 </span><code>CPU/Disk offload</code><span> 。</span></p><ul><li><code>Stage 1</code><span>：将 </span><code>optimizer states</code><span> 分片到数据并行 </span><code>workers/GPUs</code><span> 上。</span></li><li><code>Stage 2</code><span>：将 </span><code>optimizer states + gradients</code><span> 分片到数据并行 </span><code>workers/GPUs</code><span> 上。</span></li><li><code>Stage 3</code><span>：将 </span><code>optimizer states + gradients + model parameters</code><span> 分片到数据并行 </span><code>workers/GPUs</code><span> 上。</span></li><li><code>Optimizer Offload</code><span>：将 </span><code>optimizer states + gradients</code><span>  卸载到 </span><code>CPU/Disk</code><span> ，建立在 </span><code>ZERO Stage 2</code><span> 之上。</span></li><li><code>Param Offload</code><span>：将 </span><code>model parameters</code><span>  卸载到 </span><code>CPU/Disk</code><span> ，建立在 </span><code>ZERO Stage 3</code><span> 之上。</span></li></ul><p><span>注意：关于 </span><code>Disk Offload</code><span> ，磁盘应该是 </span><code>NVME</code><span> 的，以便有好的速度，但技术上可以在任何磁盘上工作。</span></p></li><li><p><span>推断：</span><code>DeepSpeed ZeRO Inference</code><span> 支持 </span><code>ZeRO Stage 3</code><span> 与 </span><code>ZeRO-Infinity</code><span> 。它使用与训练相同的 </span><code>ZeRO</code><span> 协议，但它不使用优化器和 </span><code>lr scheduler</code><span> 。</span></p></li></ul></li><li><p><span>如何工作：</span></p><ul><li><p><span>首先安装 </span><code>DeepSpeed version >=0.6.5</code><span> 。</span></p></li><li><p><span>然后配置：</span><code>accelerate config</code><span> 。一个配置的例子：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="json"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">compute_environment</span>: <span class="cm-variable">LOCAL_MACHINE</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">deepspeed_config</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_accumulation_steps</span>: <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">gradient_clipping</span>: <span class="cm-number">1.0</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">offload_optimizer_device</span>: <span class="cm-variable">none</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">offload_param_device</span>: <span class="cm-variable">none</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">zero3_init_flag</span>: <span class="cm-atom">true</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">zero_stage</span>: <span class="cm-number">2</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">distributed_type</span>: <span class="cm-variable">DEEPSPEED</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">fsdp_config</span>: {}</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">machine_rank</span>: <span class="cm-number">0</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_process_ip</span>: <span class="cm-atom">null</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_process_port</span>: <span class="cm-atom">null</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">main_training_function</span>: <span class="cm-variable">main</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">mixed_precision</span>: <span class="cm-variable">fp16</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_machines</span>: <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">num_processes</span>: <span class="cm-number">2</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">use_cpu</span>: <span class="cm-atom">false</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 396px;"/><div class="CodeMirror-gutters" style="display: none; height: 396px;"/></div></div></pre></li><li><p><span>最后执行训练：</span><code>accelerate launch examples/nlp_example.py</code><span> 。</span></p></li></ul></li><li><p><span>配置参数的含义：</span></p><ul><li><code>zero_stage</code><span>：</span><code>0</code><span> 表示禁用，</span><code>1</code><span> 表示 </span><code>optimizer state partitioning</code><span>，</span><code>2</code><span> 表示 </span><code>optimizer+gradient state partitioning</code><span> ，</span><code>3</code><span> 表示 </span><code>optimizer+gradient+parameter partitioning</code><span> 。</span></li><li><code>gradient_accumulation_steps</code><span>：一个整数，表示在 </span><code>averaging</code><span> 和 </span><code>applying</code><span> 这些梯度之前，积累梯度的 </span><code>training steps</code><span> 数量。</span></li><li><code>gradient_clipping</code><span>：一个浮点数，指定启用梯度剪裁的值。</span></li><li><code>offload_optimizer_device</code><span>：</span><code>none</code><span> 表示禁用 </span><code>optimizer offloading</code><span>，</span><code>cpu</code><span> 表示 </span><code>offload optimizer</code><span> 到 </span><code>CPU</code><span>，</span><code>nvme</code><span> 表示</span><code>offload optimizer</code><span> 到 </span><code>NVMe SSD</code><span> 。仅适用于 </span><code>ZeRO >= Stage-2</code><span> 。</span></li><li><code>offload_param_device</code><span>：</span><code>none</code><span> 表示禁用 </span><code>parameter offloading</code><span>，</span><code>cpu</code><span> 表示 </span><code>offload parameter</code><span> 到 </span><code>CPU</code><span>，</span><code>nvme</code><span> 表示</span><code>offload parameter</code><span> 到 </span><code>NVMe SSD</code><span> 。仅适用于 </span><code>ZeRO Stage-3</code><span> 。</span></li><li><code>zero3_init_flag</code><span>：决定是否启用 </span><code>deepspeed.zero.Init</code><span> 来构建大规模模型。只适用于 </span><code>ZeRO Stage-3</code><span> 。</span></li><li><code>zero3_save_16bit_model</code><span>：决定是否在使用 </span><code>ZeRO Stage-3</code><span> 时保存 </span><code>16</code><span> 位模型权重。</span></li><li><code>mixed_precision</code><span>：</span><code>no</code><span> 用于 </span><code>FP32</code><span> 训练，</span><code>fp16</code><span> 用于 </span><code>FP16</code><span> 混合精度训练，</span><code>bf16</code><span>用于 </span><code>BF16</code><span> 混合精度训练。</span></li></ul></li><li><p><span>当使用配置文件时，需要修改一些代码：</span></p><ul><li><p><code>DeepSpeed Optimizers and Schedulers</code><span>：</span></p><ul><li><p><span>如果是 </span><code>DeepSpeed Optim + DeepSpeed Scheduler</code><span> ：用户必须使用 </span><code>enhance.utils.DummyOptim</code><span> 和</span><code>enhance.utils.DummyScheduler</code><span> 来取代他们代码中的 </span><code>PyTorch/Custom</code><span> 优化器和调度器：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Creates Dummy Optimizer if `optimizer` was spcified in the config file else creates Adam Optimizer</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable">optimizer_cls</span> <span class="cm-operator">=</span> (</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-variable">torch</span>.<span class="cm-property">optim</span>.<span class="cm-property">AdamW</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">state</span>.<span class="cm-property">deepspeed_plugin</span> <span class="cm-keyword">is</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-keyword">or</span> <span class="cm-string">"optimizer"</span> <span class="cm-keyword">not</span> <span class="cm-keyword">in</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">state</span>.<span class="cm-property">deepspeed_plugin</span>.<span class="cm-property">deepspeed_config</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-keyword">else</span> <span class="cm-variable">DummyOptim</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> )</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-variable cm-error">optimizer</span> <span class="cm-operator">=</span> <span class="cm-variable">optimizer_cls</span>(<span class="cm-variable">optimizer_grouped_parameters</span>, <span class="cm-variable">lr</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">learning_rate</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-comment"># Creates Dummy Scheduler if `scheduler` was spcified in the config file else creates `args.lr_scheduler_type` Scheduler</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword cm-error">if</span> (</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-variable">accelerator</span>.<span class="cm-property">state</span>.<span class="cm-property">deepspeed_plugin</span> <span class="cm-keyword">is</span> <span class="cm-keyword">None</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-keyword">or</span> <span class="cm-string">"scheduler"</span> <span class="cm-keyword">not</span> <span class="cm-keyword">in</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">state</span>.<span class="cm-property">deepspeed_plugin</span>.<span class="cm-property">deepspeed_config</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> ):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-variable">lr_scheduler</span> <span class="cm-operator">=</span> <span class="cm-variable">get_scheduler</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">         <span class="cm-variable">name</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">lr_scheduler_type</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">         <span class="cm-variable">optimizer</span><span class="cm-operator">=</span><span class="cm-variable">optimizer</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">         <span class="cm-variable">num_warmup_steps</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">num_warmup_steps</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">         <span class="cm-variable">num_training_steps</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">max_train_steps</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     )</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"> <span class="cm-keyword cm-error">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     <span class="cm-variable">lr_scheduler</span> <span class="cm-operator">=</span> <span class="cm-variable">DummyScheduler</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">         <span class="cm-variable">optimizer</span>, <span class="cm-variable">total_num_steps</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">max_train_steps</span>, <span class="cm-variable">warmup_num_steps</span><span class="cm-operator">=</span><span class="cm-variable">args</span>.<span class="cm-property">num_warmup_steps</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">     )</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 572px;"/><div class="CodeMirror-gutters" style="display: none; height: 572px;"/></div></div></pre></li><li><p><code>Custom Optim + Custom Scheduler</code><span>：当 </span><code>DeepSpeed</code><span> 配置文件中没有 </span><code>optimizer key</code><span> 和 </span><code>scheduler key</code><span> 的情况。在这种情况下，不需要用户修改代码，通过 </span><code>DeepSpeed Plugin</code><span> 使用集成时就是这种情况。</span></p></li><li><p><code>Custom Optim + DeepSpeed Scheduler</code><span> ：这种情况下，用户必须使用</span><code>accelerate.utils.DummyScheduler</code><span>  来替换代码中的 </span><code>PyTorch/Custom scheduler</code><span> 。</span></p></li><li><p><code>DeepSpeed Optim + Custom Scheduler</code><span>：这将导致一个错误，因为当使用 </span><code>DeepSpeed Optim</code><span> 时必须使用 </span><code>DeepSpeed Scheduler</code><span> 。</span></p></li></ul></li><li><p><code>DeepSpeed</code><span> 配置文件中存在一些 </span><code>"auto"</code><span> 值，这些值是由 </span><code>prepare</code><span> 方法根据所提供的模型、</span><code>dataloaders</code><span> 、</span><code>dummy optimizer</code><span> 和 </span><code>dummy schedulers</code><span> 自动处理的。那些不是 </span><code>"auto"</code><span> 的字段必须由用户明确指定。如 </span><code>zero_stage2_config.json</code><span> 文件：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="json"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">{</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"fp16"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"enabled"</span>: <span class="cm-atom">true</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"loss_scale"</span>: <span class="cm-number">0</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"loss_scale_window"</span>: <span class="cm-number">1000</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"initial_scale_power"</span>: <span class="cm-number">16</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"hysteresis"</span>: <span class="cm-number">2</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"min_loss_scale"</span>: <span class="cm-number">1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    },</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"optimizer"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"type"</span>: <span class="cm-string">"AdamW"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"params"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"lr"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"weight_decay"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"torch_adam"</span>: <span class="cm-atom">true</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"adam_w_mode"</span>: <span class="cm-atom">true</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        }</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    },</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"scheduler"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"type"</span>: <span class="cm-string">"WarmupDecayLR"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"params"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"warmup_min_lr"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"warmup_max_lr"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"warmup_num_steps"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">            <span class="cm-string cm-property">"total_num_steps"</span>: <span class="cm-string">"auto"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        }</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    },</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"zero_optimization"</span>: {</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"stage"</span>: <span class="cm-number">2</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"allgather_partitions"</span>: <span class="cm-atom">true</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"allgather_bucket_size"</span>: <span class="cm-number">2e8</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"overlap_comm"</span>: <span class="cm-atom">true</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"reduce_scatter"</span>: <span class="cm-atom">true</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"reduce_bucket_size"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-string cm-property">"contiguous_gradients"</span>: <span class="cm-atom">true</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    },</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"gradient_accumulation_steps"</span>: <span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"gradient_clipping"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"steps_per_print"</span>: <span class="cm-number">2000</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"train_batch_size"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"train_micro_batch_size_per_gpu"</span>: <span class="cm-string">"auto"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string cm-property">"wall_clock_breakdown"</span>: <span class="cm-atom">false</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">}</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 946px;"/><div class="CodeMirror-gutters" style="display: none; height: 946px;"/></div></div></pre></li></ul></li><li><p><span>保存和加载：</span></p><ul><li><p><span>对于 </span><code>ZeRO Stage-1</code><span> 和 </span><code>ZeRO Stage-2</code><span>，模型的保存和加载不需要改动。</span></p></li><li><p><span>对于 </span><code>ZeRO Stage-3</code><span> ，</span><code>state_dict</code><span> 仅只包含占位符，因为模型的权重被分片到多个 </span><code>GPU</code><span> 。</span><code>ZeRO Stage-3</code><span> 有两个选项：</span></p><ul><li><p><span>保存整个 </span><code>16</code><span> 位的模型权重，然后使用 </span><code>model.load_state_dict(torch.load(pytorch_model.bin))</code><span> 来直接加载。为此，要么在 </span><code>DeepSpeed</code><span> 配置文件中把 </span><code>zero_optimization.stage3_gather_16bit_weights_on_model_save</code><span> 设为</span><code>True</code><span> ，要么在 </span><code>DeepSpeed Plugin</code><span> 中把 </span><code>zero3_save_16bit_model</code><span> 设为 </span><code>True</code><span> 。</span></p><p><span>请注意，这个选项需要在一个 </span><code>GPU</code><span> 上整合权重，这可能会很慢，而且对内存要求很高，所以只有在需要时才使用这个功能。</span></p><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># New Code #</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Saves the whole/unpartitioned fp16 model when in ZeRO Stage-3 to the output directory if</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># `stage3_gather_16bit_weights_on_model_save` is True in DeepSpeed Config file or</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># `zero3_save_16bit_model` is True in DeepSpeed Plugin.</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># For Zero Stages 1 and 2, models are saved as usual in the output directory.</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># The model name saved is `pytorch_model.bin`</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span>.<span class="cm-property">save_pretrained</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">args</span>.<span class="cm-property">output_dir</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">is_main_process</span><span class="cm-operator">=</span><span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">save_function</span><span class="cm-operator">=</span><span class="cm-variable">accelerator</span>.<span class="cm-property">save</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">state_dict</span><span class="cm-operator">=</span><span class="cm-variable">accelerator</span>.<span class="cm-property">get_state_dict</span>(<span class="cm-variable">model</span>),</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 308px;"/><div class="CodeMirror-gutters" style="display: none; height: 308px;"/></div></div></pre></li><li><p><span>为了获得 </span><code>32</code><span> 位的权重，首先使用 </span><code>model.save_checkpoint()</code><span> 保存模型：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">success</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>.<span class="cm-property">save_checkpoint</span>(<span class="cm-variable">PATH</span>, <span class="cm-variable">ckpt_id</span>, <span class="cm-variable">checkpoint_state_dict</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">status_msg</span> <span class="cm-operator">=</span> <span class="cm-string">"checkpointing: PATH={}, ckpt_id={}"</span>.<span class="cm-property">format</span>(<span class="cm-variable">PATH</span>, <span class="cm-variable">ckpt_id</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">success</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">logging</span>.<span class="cm-property">info</span>(<span class="cm-string">f"Success </span>{<span class="cm-variable">status_msg</span>}<span class="cm-string">"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">logging</span>.<span class="cm-property">warning</span>(<span class="cm-string">f"Failure </span>{<span class="cm-variable">status_msg</span>}<span class="cm-string">"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre><p><span>这将在 </span><code>checkpoint</code><span> 目录下创建 </span><code>ZeRO model</code><span> 和 </span><code>optimizer</code><span> 的 </span><code>partitions</code><span> 以及 </span><code>zero_to_fp32.py</code><span> 脚本。你可以使用这个脚本来做离线整合，这不需要配置文件或 </span><code>GPU</code><span> 。如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">cd</span> /path/to/checkpoint_dir</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">./zero_to_fp32.py . pytorch_model.bin</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Processing zero checkpoint at global_step1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Detected checkpoint of type zero stage 3, world_size: 2</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Saving fp32 state dict to pytorch_model.bin (total_numel=60506624)</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre><p><span>要想加载 </span><code>32</code><span> 位的模型，做法如下：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">deepspeed</span>.<span class="cm-property">utils</span>.<span class="cm-property">zero_to_fp32</span> <span class="cm-keyword">import</span> <span class="cm-variable">load_state_dict_from_zero_checkpoint</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">unwrapped_model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">fp32_model</span> <span class="cm-operator">=</span> <span class="cm-variable">load_state_dict_from_zero_checkpoint</span>(<span class="cm-variable">unwrapped_model</span>, <span class="cm-variable">checkpoint_dir</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>如果你仅仅想得到 </span><code>state_dict</code><span>，做法如下：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">deepspeed</span>.<span class="cm-property">utils</span>.<span class="cm-property">zero_to_fp32</span> <span class="cm-keyword">import</span> <span class="cm-variable">get_fp32_state_dict_from_zero_checkpoint</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">state_dict</span> <span class="cm-operator">=</span> <span class="cm-variable">get_fp32_state_dict_from_zero_checkpoint</span>(<span class="cm-variable">checkpoint_dir</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>注意，加载时需要大约 </span><code>2</code><span> 倍于 </span><code>final checkpoint</code><span> 大小的内存。</span></p></li></ul></li></ul></li><li><p><code>ZeRO Inference</code><span>：</span><code>DeepSpeed ZeRO Inference</code><span> 支持 </span><code>ZeRO stage 3</code><span> 。通过 </span><code>accelerate</code><span> 的集成，你只需要 </span><code>prepare</code><span> 模型和 </span><code>dataloader</code><span> ，如下：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span>, <span class="cm-variable">eval_dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>, <span class="cm-variable">eval_dataloader</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre></li><li><p><span>注意事项：</span></p><ul><li><span>目前的集成不支持 </span><code>DeepSpeed</code><span> 的 </span><code>Pipeline Parallelism</code><span> 。</span></li><li><span>当前的集成不支持 </span><code>mpu</code><span> ，限制了 </span><code>Megatron-LM</code><span> 中支持的张量并行。</span></li><li><span>目前的集成不支持多个模型。</span></li></ul></li></ul></li><li><p><span>目前 </span><code>Accelerate</code><span> 支持如下的 </span><code>tracker</code><span>：</span><code>TensorBoard, WandB, CometML, MLFlow</code><span> ，如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span> <span class="cm-keyword">import</span> <span class="cm-variable">LoggerType</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>(<span class="cm-variable">log_with</span><span class="cm-operator">=</span><span class="cm-string">"all"</span>)  <span class="cm-comment"># For all available trackers in the environment</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>(<span class="cm-variable">log_with</span><span class="cm-operator">=</span><span class="cm-string">"wandb"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>(<span class="cm-variable">log_with</span><span class="cm-operator">=</span>[<span class="cm-string">"wandb"</span>, <span class="cm-variable">LoggerType</span>.<span class="cm-property">TENSORBOARD</span>])</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre><p><span>然后需要初始化 </span><code>tracker</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">hps</span> <span class="cm-operator">=</span> {<span class="cm-string">"num_iterations"</span>: <span class="cm-number">5</span>, <span class="cm-string">"learning_rate"</span>: <span class="cm-number">1e-2</span>}</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">init_trackers</span>(<span class="cm-string">"my_project"</span>, <span class="cm-variable">config</span><span class="cm-operator">=</span><span class="cm-variable">hps</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>然后记录日志：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">log</span>({<span class="cm-string">"train_loss"</span>: <span class="cm-number">1.12</span>, <span class="cm-string">"valid_loss"</span>: <span class="cm-number">0.8</span>}, <span class="cm-variable">step</span><span class="cm-operator">=</span><span class="cm-number">1</span>)</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>最后在训练结束时调用：</span><code>accelerator.end_training()</code><span> 。</span></p><p><span>你也可以通过 </span><code>accelerator.get_tracker</code><span> 来获取内置的 </span><code>tracker</code><span> 对象：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">wandb_tracker</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">get_tracker</span>(<span class="cm-string">"wandb"</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">wandb_run</span>.<span class="cm-property">log_artifact</span>(<span class="cm-variable">some_artifact_to_log</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre></li><li><p><span>处理大模型：常规的加载预训练模型的方式：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">torch</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">my_model</span> <span class="cm-operator">=</span> <span class="cm-variable">ModelClass</span>(<span class="cm-operator">...</span>)               <span class="cm-comment"># step 1</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">state_dict</span> <span class="cm-operator">=</span> <span class="cm-variable">torch</span>.<span class="cm-property">load</span>(<span class="cm-variable">checkpoint_file</span>) <span class="cm-comment"># step 2</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">my_model</span>.<span class="cm-property">load_state_dict</span>(<span class="cm-variable">state_dict</span>)     <span class="cm-comment"># step 3</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre><p><span>这对于常规大小的模型而言很有效，但是无法处理大型模型：在 </span><code>step 1</code><span> 我们在 </span><code>RAM</code><span> 中加载一个完整版本的模型，并花一些时间随机初始化权重（这将在 </span><code>step 3</code><span> 被丢弃）；在 </span><code>step 2</code><span> ，我们在 </span><code>RAM</code><span> 中加载另一个完整版本的模型，并使用预训练的权重。</span></p><p><code>Accelerate</code><span> 提供一些工具来帮助处理大模型（这些 </span><code>API</code><span> 是实验性质的，未来可能会发生改变）：</span></p><ul><li><p><code>init_empty_weights</code><span> 上下文管理器：初始化一个模型而无需使用任何内存。这依赖于 </span><code>PyTorch 1.9</code><span> 中引入的 </span><code>meta device</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">init_empty_weights</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">init_empty_weights</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">my_model</span> <span class="cm-operator">=</span> <span class="cm-variable">ModelClass</span>(<span class="cm-operator">...</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>在该上下文管理器中，每当有一个 </span><code>parameter</code><span> 被创建时，它就被立即移动到 </span><code>meta device</code><span> 。</span></p></li><li><p><code>sharded checkpoints</code><span>：有可能你的模型太大从而无法装入内存，这并不意味着它不能被加载：如果你有一个或几个 </span><code>GPU</code><span> ，这就有更多的内存可用于存储你的模型。此时需要你的 </span><code>checkpoint</code><span> 被拆分为几个小文件，即 </span><code>checkpoint shards</code><span> 。</span></p><p><code>Accelerate</code><span> 将处理  </span><code>checkpoint shards</code><span> ，但是要满足如下格式：你的  </span><code>checkpoint shards</code><span> 应该放在一个文件夹中，并且有几个包含部分 </span><code>state dict</code><span> 的文件、以及一个 </span><code>index.json</code><span> 文件（将 </span><code>parameter name</code><span> 映射到包含该 </span><code>parameter weights</code><span> 的文件）。如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">first_state_dict.bin</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">index.json</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">second_state_dict.bin</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>其中 </span><code>index.json</code><span> 内容为：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="json"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="json"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">{</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-string cm-property">"linear1.weight"</span>: <span class="cm-string">"first_state_dict.bin"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-string cm-property">"linear1.bias"</span>: <span class="cm-string">"first_state_dict.bin"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-string cm-property">"linear2.weight"</span>: <span class="cm-string">"second_state_dict.bin"</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-string cm-property">"linear2.bias"</span>: <span class="cm-string">"second_state_dict.bin"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">}</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 132px;"/><div class="CodeMirror-gutters" style="display: none; height: 132px;"/></div></div></pre></li><li><p><code>load_checkpoint_and_dispatch</code><span>：在 </span><code>empty model</code><span> 中加载一个 </span><code>checkpoint</code><span> 。它支持 </span><code>full checkpoints</code><span> （包含整个 </span><code>state dict</code><span> 的单一文件）以及 </span><code>sharded checkpoints</code><span> 。它还会在你可用的设备（ </span><code>GPU</code><span> 、</span><code>CPU</code><span> ）上自动分配这些权重，所以如果你正在加载一个 </span><code>sharded checkpoints</code><span> ，最大的</span><code>RAM</code><span> 用量将是最大分片的大小。</span></p><p><span>例如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">git</span> clone https://huggingface.co/sgugger/sharded-gpt-j-6B</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">cd</span> sharded-gpt-j-6B</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">git-lfs install</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-builtin">git</span> pull</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>初始化模型：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">from accelerate import init_empty_weights</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">from transformers import AutoConfig, AutoModelForCausalLM</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">checkpoint <span class="cm-operator">=</span> <span class="cm-string">"EleutherAI/gpt-j-6B"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">config <span class="cm-operator">=</span> AutoConfig.from_pretrained(checkpoint)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">with init_empty_weights():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    model <span class="cm-operator">=</span> AutoModelForCausalLM.from_config(config)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 176px;"/><div class="CodeMirror-gutters" style="display: none; height: 176px;"/></div></div></pre><p><span>加载权重：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">from accelerate import load_checkpoint_and_dispatch</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">model <span class="cm-operator">=</span> load_checkpoint_and_dispatch(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    model, <span class="cm-string">"sharded-gpt-j-6B"</span>, <span class="cm-def">device_map</span><span class="cm-operator">=</span><span class="cm-string">"auto"</span>, <span class="cm-def">no_split_module_classes</span><span class="cm-operator">=</span>[<span class="cm-string">"GPTJBlock"</span>]</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre><p><span>通过 </span><code>device_map="auto"</code><span>，</span><code>Accelerate</code><span> 根据可用资源自动决定将模型的每一层放在哪里：</span></p><ul><li><span>首先，我们使用 </span><code>GPU</code><span> 上的最大可用空间。</span></li><li><span>如果我们仍然需要空间，我们将剩余的权重存储在 </span><code>CPU</code><span> 上。</span></li><li><span>如果没有足够的 </span><code>RAM</code><span> ，我们将剩余的权重作为内存映射的张量存储在硬盘上。</span></li></ul><p><code>no_split_module_classes=["GPTJBlock"]</code><span> 表示属于 </span><code>GPTJBlock</code><span> 的模块不应该在不同的设备上分割。你应该在这里设置所有包括某种残差连接的 </span><code>block</code><span> 。</span></p><p><span>可以通过 </span><code>model.hf_device_map</code><span> 查看模型的权重的设备。</span></p></li></ul></li><li><p><span>分布式训练的复现：</span></p><ul><li><p><span>设置随机数种子：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">set_seed</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">set_seed</span>(<span class="cm-number">42</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>它在内部设置了五种随机数种子：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">random</span>.<span class="cm-property">seed</span>(<span class="cm-variable">seed</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">np</span>.<span class="cm-property">random</span>.<span class="cm-property">seed</span>(<span class="cm-variable">seed</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">torch</span>.<span class="cm-property">manual_seed</span>(<span class="cm-variable">seed</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">torch</span>.<span class="cm-property">cuda</span>.<span class="cm-property">manual_seed_all</span>(<span class="cm-variable">seed</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># ^^ safe to call this function even if cuda is not available</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-keyword">if</span> <span class="cm-variable">is_tpu_available</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">xm</span>.<span class="cm-property">set_rng_state</span>(<span class="cm-variable">seed</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre></li><li><p><span>设置 </span><code>batch size</code><span>：当使用 </span><code>Accelerate</code><span> 训练时，传递给 </span><code>dataloader</code><span> 的 </span><code>batch size</code><span> 是 </span><code>batch size/GPU</code><span> ，因此 </span><code>final batch size</code><span> 是 </span><code>batch size * device num</code><span> 。</span></p></li><li><p><span>设置学习率：学习率应该和 </span><code>device num</code><span>  成正比，如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">learning_rate</span> <span class="cm-operator">=</span> <span class="cm-number">1e-3</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">learning_rate</span> <span class="cm-operator">*=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">num_processes</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">optimizer</span> <span class="cm-operator">=</span> <span class="cm-variable">AdamW</span>(<span class="cm-variable">params</span><span class="cm-operator">=</span><span class="cm-variable">model</span>.<span class="cm-property">parameters</span>(), <span class="cm-variable">lr</span><span class="cm-operator">=</span><span class="cm-variable">learning_rate</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 110px;"/><div class="CodeMirror-gutters" style="display: none; height: 110px;"/></div></div></pre></li></ul></li><li><p><span>梯度同步：在 </span><code>DDP</code><span> 中，</span><code>PyTorch</code><span>  在一些特定的点上进行进程间通信。然而在梯度累积时，你会累积 </span><code>n</code><span> 个 </span><code>loss</code><span> 并跳过 </span><code>.backward()</code><span> 。这可能会导致明显的减速，因为所有的进程都需要与它们进行更多次的通信。</span></p><p><span>可以通过 </span><code>no_sync</code><span> 上下文管理器来避免：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">ddp_model</span>, <span class="cm-variable">dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>, <span class="cm-variable">dataloader</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-keyword">for</span> <span class="cm-variable">index</span>, <span class="cm-variable">batch</span> <span class="cm-keyword">in</span> <span class="cm-builtin">enumerate</span>(<span class="cm-variable">dataloader</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      <span class="cm-variable">inputs</span>, <span class="cm-variable">targets</span> <span class="cm-operator">=</span> <span class="cm-variable">batch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      <span class="cm-comment"># Trigger gradient synchronization on the last batch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      <span class="cm-keyword">if</span> <span class="cm-variable">index</span> <span class="cm-operator">!=</span> (<span class="cm-builtin">len</span>(<span class="cm-variable">dataloader</span>)<span class="cm-operator">-</span><span class="cm-number">1</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-operator">-</span>         <span class="cm-keyword">with</span> <span class="cm-variable">ddp_model</span>.<span class="cm-property">no_sync</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-operator">+</span>         <span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">no_sync</span>(<span class="cm-variable">model</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">              <span class="cm-comment"># Gradients only accumulate</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">              <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">ddp_model</span>(<span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">              <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_func</span>(<span class="cm-variable">outputs</span>, <span class="cm-variable">targets</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">              <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">      <span class="cm-keyword">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          <span class="cm-comment"># Gradients finally sync</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">ddp_model</span>(<span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_func</span>(<span class="cm-variable">outputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 374px;"/><div class="CodeMirror-gutters" style="display: none; height: 374px;"/></div></div></pre><p><span>或者直接使用 </span><code>accelerator.accumulate</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">ddp_model</span>, <span class="cm-variable">dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>, <span class="cm-variable">dataloader</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">for</span> <span class="cm-variable">batch</span> <span class="cm-keyword">in</span> <span class="cm-variable">dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">accumulate</span>(<span class="cm-variable">model</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">inputs</span>, <span class="cm-variable">targets</span> <span class="cm-operator">=</span> <span class="cm-variable">batch</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-variable">inputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_function</span>(<span class="cm-variable">outputs</span>, <span class="cm-variable">targets</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre></li><li><p><span>进程间同步：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">wait_for_everyone</span>()</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>这将阻塞所有进程直到所有进程都达到该点。</span></p><p><span>用途：</span></p><ul><li><p><span>加载数据集：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">main_process_first</span>():</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>这等价于：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># First do something on the main process</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">accelerator</span>.<span class="cm-property">wait_for_everyone</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># And then send it to the rest of them</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-keyword">not</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">else</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">accelerator</span>.<span class="cm-property">wait_for_everyone</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre></li><li><p><span>存取 </span><code>state_dict</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">torch</span>.<span class="cm-property">save</span>(<span class="cm-variable">model</span>.<span class="cm-property">state_dict</span>(), <span class="cm-string">"weights.pth"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">main_process_first</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">state</span> <span class="cm-operator">=</span> <span class="cm-variable">torch</span>.<span class="cm-property">load</span>(<span class="cm-string">"weights.pth"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">model</span>.<span class="cm-property">load_state_dict</span>(<span class="cm-variable">state</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 154px;"/><div class="CodeMirror-gutters" style="display: none; height: 154px;"/></div></div></pre></li><li><p><span>在 </span><code>global main</code><span> 进程上 </span><code>tokenizing</code><span>，然后传播到每个 </span><code>worker</code><span> ：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">load_dataset</span>(<span class="cm-string">"glue"</span>, <span class="cm-string">"mrpc"</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">main_process_first</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">tokenized_datasets</span> <span class="cm-operator">=</span> <span class="cm-variable">datasets</span>.<span class="cm-property">map</span>(</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">tokenize_function</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">batched</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">remove_columns</span><span class="cm-operator">=</span>[<span class="cm-string">"idx"</span>, <span class="cm-string">"sentence1"</span>, <span class="cm-string">"sentence2"</span>],</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    )</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 176px;"/><div class="CodeMirror-gutters" style="display: none; height: 176px;"/></div></div></pre></li></ul></li></ol><p> </p><h2 id="二api"><span>二、API</span></h2><h3 id="21-accelerator"><span>2.1 Accelerator</span></h3><ol start=""><li><p><code>Accelerator</code><span> 最佳实践：</span></p><ul><li><p><code>print</code><span> 语句应该由</span><code>accelerator.print()</code><span> 代替，每个进程打印一次。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- print("My thing I want to print!")</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator.print("My thing I want to print!")</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre></li><li><p><span>每个 </span><code>server</code><span> 执行一次的语句，应该使用 </span><code>accelerator.is_local_main_process</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_local_main_process</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_once_per_server</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>或者使用 </span><code>accelerator.on_local_main_process()</code><span> 装饰器：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-meta">@accelerator</span>.<span class="cm-property">on_local_main_process</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">do_my_thing</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string">"Something done once per server"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_once_per_server</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre></li><li><p><span>所有 </span><code>server</code><span> 中仅执行一次的语句，应该使用 </span><code>accelerator.is_main_process</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">if</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">is_main_process</span>:</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_once</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>或者使用 </span><code>accelerator.on_main_process()</code><span> 装饰器：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-meta">@accelerator</span>.<span class="cm-property">on_main_process</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">do_my_thing</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string">"Something done once per server"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_once</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre></li><li><p><span>在指定的进程（局部编号或全局编号）上执行的语句，也可以使用如下的装饰器：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-meta">@accelerator</span>.<span class="cm-property">on_local_process</span>(<span class="cm-variable">local_process_idx</span><span class="cm-operator">=</span><span class="cm-number">0</span>)</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">do_my_thing</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string">"Something done on process index 0 on each server"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_on_index_zero_on_each_server</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-meta">@accelerator</span>.<span class="cm-property">on_process</span>(<span class="cm-variable">process_index</span><span class="cm-operator">=</span><span class="cm-number">0</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">do_my_thing</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-string">"Something done on process index 0"</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">do_thing_on_index_zero</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre></li><li><p><span>同步控制：使用 </span><code>accelerator.wait_for_everyone()</code><span> 来确保所有进程在继续之前，先到达该点。例如，在模型保存之前很有用。</span></p></li><li><p><span>保存和加载：使用 </span><code>accelerator.unwrap_model()</code><span> 来删除所有在分布式过程中添加的特殊的 </span><code>model wrapper</code><span> 。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">MyModel</span>()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Unwrap</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">model</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">unwrap_model</span>(<span class="cm-variable">model</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre><p><span>使用 </span><code>accelerator.save()</code><span> 而不是 </span><code>torch.save()</code><span>：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  state_dict = model.state_dict()</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- torch.save(state_dict, "my_state.pkl")</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator.save(state_dict, "my_state.pkl")</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre></li><li><p><span>使用 </span><code>accelerator.clipgrad_norm()</code><span> 而不是 </span><code>torch.nn.utils.clip_grad_norm_()</code><span> ；使用 </span><code>accelerator.clipgrad_value()</code><span> 而不是</span><code>torch.nn.utils.clip_grad_value()</code><span> 。</span></p></li><li><p><span>梯度累积：要执行梯度累积，请使用 </span><code>accelerator.accumulate()</code><span>  并指定 </span><code>gradient_accumulation_steps</code><span> 。即使在多设备训练时，它也会自动处理。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- accelerator = Accelerator()</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ accelerator = Accelerator(gradient_accumulation_steps=2)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  for (input, label) in training_dataloader:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+     with accelerator.accumulate(model):</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          predictions = model(input)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          loss = loss_function(predictions, labels)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          accelerator.backward(loss)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          optimizer.step()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          scheduler.step()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">          optimizer.zero_grad()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 242px;"/><div class="CodeMirror-gutters" style="display: none; height: 242px;"/></div></div></pre></li></ul></li><li><p><code>class accelerate.Accelerator</code><span>：</span><code>Accelerator</code><span> 类，用于分布式训练或混合精度训练。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">accelerate</span>.<span class="cm-property">Accelerator</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">device_placement</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-variable">Trues</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">plit_batches</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fp16</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">mixed_precision</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span>.<span class="cm-property">dataclasses</span>.<span class="cm-property">PrecisionType</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">gradient_accumulation_steps</span>: <span class="cm-builtin">int</span> <span class="cm-operator">=</span> <span class="cm-number">1</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">cpu</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">deepspeed_plugin</span>: <span class="cm-variable">DeepSpeedPlugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_plugin</span>: <span class="cm-variable">FullyShardedDataParallelPlugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">megatron_lm_plugin</span>: <span class="cm-variable">MegatronLMPlugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">rng_types</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span>.<span class="cm-property">dataclasses</span>.<span class="cm-property">RNGType</span>]], <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">log_with</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span>.<span class="cm-property">dataclasses</span>.<span class="cm-property">LoggerType</span>, <span class="cm-variable">accelerate</span>.<span class="cm-property">tracking</span>.<span class="cm-property">GeneralTracker</span>]], <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">logging_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">os</span>.<span class="cm-property">PathLike</span>, <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">dispatch_batches</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-builtin">bool</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">even_batches</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">step_scheduler_with_optimizer</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">True</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">kwargs_handlers</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Optional</span>[<span class="cm-variable">typing</span>.<span class="cm-property">List</span>[<span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span>.<span class="cm-property">dataclasses</span>.<span class="cm-property">KwargsHandler</span>]] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">dynamo_backend</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-variable">accelerate</span>.<span class="cm-property">utils</span>.<span class="cm-property">dataclasses</span>.<span class="cm-property">DynamoBackend</span>, <span class="cm-builtin">str</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span> </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 462px;"/><div class="CodeMirror-gutters" style="display: none; height: 462px;"/></div></div></pre><p><span>参数：</span></p><ul><li><p><code>device_placement</code><span>：一个布尔值，默认为 </span><code>True</code><span>，指定 </span><code>accelerator</code><span> 是否应该将对象放在 </span><code>device</code><span> 上（由 </span><code>dataloader, model</code><span> 等等产生的张量）。</span></p></li><li><p><code>split_batches</code><span>：一个布尔值，默认为 </span><code>False</code><span>，指定 </span><code>accelerator</code><span> 是否应该将 </span><code>dataloaders</code><span> 产生的 </span><code>batches</code><span> 在设备上进行分割。</span></p><ul><li><span>如果是 </span><code>True</code><span> ，实际使用的 </span><code>batch size</code><span> 在任何类型的分布式进程中都是一样的，但它必须是你使用的 </span><code>num_processes</code><span> （即，进程数量）的整数倍。</span></li><li><span>如果是 </span><code>False</code><span> ，实际使用的 </span><code>batch size</code><span> 将是你脚本中设置的 </span><code>batch size</code><span> 乘以进程数。</span></li></ul></li><li><p><code>mixed_precision</code><span>：一个字符串，指定是否使用混合精度训练（</span><code>fp16</code><span> 或 </span><code>bfloat16</code><span>）。可以为 </span><code>'no', 'fp16', 'bf16'</code><span> 。默认为环境变量 </span><code>ACCELERATE_MIXED_PRECISION</code><span> 中的值，或者通过 </span><code>accelerate.launch</code><span> 传入的选项。</span></p><p><code>'fp16'</code><span> 要求 </span><code>pytorch 1.6</code><span> 及其以上版本，</span><code>'bf16'</code><span> 要求 </span><code>pytorch 1.10</code><span> 及其以上版本。</span></p></li><li><p><code>gradient_accumulation_steps</code><span>：一个整数，指定在累积梯度之前应该经过多少个 </span><code>step</code><span>。默认为 </span><code>1</code><span> ，表示没有梯度累积。一个大于 </span><code>1</code><span> 的数字应该与 </span><code>Accelerator.accumulate</code><span> 相结合。</span></p></li><li><p><code>cpu</code><span>：一个布尔值，指定是否强制脚本在 </span><code>CPU</code><span> 上执行。如果设置为 </span><code>True</code><span> ，将忽略 </span><code>GPU</code><span> 的可用性，并且仅强制在一个进程上执行。默认为 </span><code>False</code><span> 。</span></p></li><li><p><code>deepspeed_plugin</code><span>：一个 </span><code>DeepSpeedPlugin</code><span> ，用于调整 </span><code>DeepSpeed</code><span> 的相关的参数。也可以通过 </span><code>accelerate config</code><span> 来直接调整 </span><code>DeepSpeed</code><span> 。</span></p></li><li><p><code>fsdp_plugin</code><span>：一个 </span><code>FullyShardedDataParallelPlugin</code><span> ，用于调整 </span><code>FSDP</code><span> 的相关的参数。也可以通过 </span><code>accelerate config</code><span> 来直接调整 </span><code>FSDP</code><span> 。</span></p></li><li><p><code>megatron_lm_plugin</code><span>：一个 </span><code>MegatronLMPlugin</code><span> ，用于调整 </span><code>FSDPMegatronLM</code><span> 的相关的参数。也可以通过 </span><code>accelerate config</code><span> 来直接调整 </span><code>MegatronLM</code><span> 。</span></p></li><li><p><code>rng_types</code><span>：一个关于字符串或 </span><code>RNGType</code><span> 的列表，它指定了一个关于随机数生成器的列表，用于在 </span><code>dataloaders</code><span> 的每个 </span><code>iteration</code><span> 开始时进行同步。应该是如下的一个或几个：</span></p><ul><li><code>"torch"</code><span>：基本的 </span><code>torch</code><span> 的随机数生成器。</span></li><li><code>"cuda"</code><span>：</span><code>CUDA</code><span> 随机数生成器（仅限于 </span><code>GPU</code><span>）。</span></li><li><code>"xla"</code><span>：</span><code>XLA</code><span>随机数生成器（仅咸鱼 </span><code>TPU</code><span> ）。</span></li><li><code>"generator"</code><span>：</span><code>sampler</code><span> （或 </span><code>batch sampler</code><span>）的 </span><code>torch.Generator</code><span> 、或 </span><code>iterable dataset</code><span> 的 </span><code>torch.Generator</code><span> 。</span></li></ul><p><span>如果 </span><code>PyTorch</code><span> 版本 </span><code><=1.5.1</code><span> ，将默认为 </span><code>["torch"]</code><span> ；如果 </span><code>PyTorch</code><span> 版本 </span><code>>=1.6</code><span> ，则默认为 </span><code>["generator"]</code><span> 。</span></p></li><li><p><code>log_with</code><span>：一个关于字符串、</span><code>LoggerType</code><span>、</span><code>GeneralTracker</code><span> 的列表，指定 </span><code>loggers</code><span> 。可以为如下的一个或几个：</span><code>"all", "tensorboard", "wandb", "comet_ml"</code><span>。</span></p><p><span>如果选择了 </span><code>"all"</code><span> ，就会接收环境中所有可用的 </span><code>trackers</code><span> 并初始化它们。也可以接受用于自定义 </span><code>tracker</code><span> 的 </span><code>GeneralTracker</code><span> 的实现，并且可以与 </span><code>"all"</code><span> 结合使用。</span></p></li><li><p><code>logging_dir</code><span>：一个字符串或 </span><code>os.PathLike</code><span>，指定用于日志的目录的路径。</span></p></li><li><p><code>dispatch_batches</code><span>：一个布尔值，如果为 </span><code>"True"</code><span> ，</span><code>Accelerator</code><span> 准备的 </span><code>dataloader</code><span> 只在 </span><code>global main</code><span> 进程上进行迭代，然后将 </span><code>batch</code><span> 分割并广播给每个 </span><code>worker</code><span> 进程。对于底层数据集是 </span><code>IterableDataset</code><span> 的 </span><code>DataLoader</code><span> ，默认为 </span><code>True</code><span> ，否则为</span><code>False</code><span> 。</span></p></li><li><p><code>even_batches</code><span>：一个布尔值，如果设置为 </span><code>True</code><span> ，在所有进程的 </span><code>total batch size</code><span> 不能完全分割数据集的情况下，数据集开头的样本将被重复，这样 </span><code>batch</code><span> 就可以在所有 </span><code>worker</code><span> 之间平均分配。默认为 </span><code>True</code><span> 。</span></p></li><li><p><code>step_scheduler_with_optimizer</code><span>：一个布尔值，如果学习率 </span><code>scheduler</code><span>  与优化器同时 </span><code>step</code><span> ，则设置为 </span><code>True</code><span> ；否则设置为 </span><code>False</code><span>。默认为 </span><code>True</code><span> 。</span></p></li><li><p><code>kwargs_handlers</code><span>：一个关于 </span><code>KwargHandler</code><span> 的列表，用于自定义如何创建与分布式训练或混合精度相关的对象。</span></p></li><li><p><code>dynamo_backend</code><span>：一个字符串或 </span><code>DynamoBackend</code><span>，设置一个 </span><code>dynamo</code><span> 后端从而利用 </span><code>Torch dynamo</code><span> 优化你的训练。默认为 </span><code>'no'</code><span> 。</span></p></li></ul><p><span>属性：</span></p><ul><li><code>device</code><span>：一个 </span><code>Torch.device</code><span> 对象，表示要使用的设备。</span></li><li><code>distributed_type</code><span>：一个 </span><code>DistributedType</code><span> 对象，表示分布式训练配置。</span></li><li><code>local_process_index</code><span>：一个整数，表示当前机器上的进程编号。</span></li><li><code>mixed_precision</code><span>：一个字符串，表示配置好的混合精度模式。</span></li><li><code>num_processes</code><span> ：一个整数，表示用于训练的进程总数。</span></li><li><code>optimizer_step_was_skipped</code><span>：一个布尔值，表示当学习率不应该被改变的情况下，优化器的更新是否被跳过（因为混合精度中的梯度溢出）。</span></li><li><code>process_index</code><span>：一个整数，表示当前进程在所有进程中的总编号。</span></li><li><code>state</code><span>：一个 </span><code>AcceleratorState</code><span>，表示分布式的 </span><code>setup state</code><span> 。</span></li><li><code>sync_gradients</code><span>：一个布尔值，表示目前梯度是否在所有进程中被同步。</span></li><li><code>use_distributed</code><span>：一个布尔值，表示当前配置是否用于分布式训练。</span></li></ul><p><span>方法：</span></p><ul><li><p><code>accumulate(model)</code><span>：一个上下文管理器，它 </span><code>wrap</code><span> 模型并自动进行梯度累积。</span></p><p><span>参数：</span><code>model</code><span>：一个 </span><code>torch.nn.Module</code><span> 对象，它是被 </span><code>Accelerator.prepare</code><span> 准备好之后的模型。</span></p></li><li><p><code>autocast()</code><span>：如果启用的话，将在这个上下文管理器中的代码块内应用自动混合精度。否则不会发生任何变化。</span></p></li><li><p><code>backward(loss, **kwargs)</code><span>：根据 </span><code>Accelerator.gradient_accumulation_steps</code><span> 对梯度进行调整，并根据配置来调用正确的</span><code>backward()</code><span> 。应该用来代替 </span><code>loss.backward()</code><span> 。</span></p></li><li><p><code>clear()</code><span>：是 </span><code>Accelerate.free_memory</code><span> 的别名，释放所有内部对象的引用并调用垃圾收集器。你应该在两个不同 </span><code>models/optimizers</code><span> 训练之间调用这个方法。</span></p></li><li><p><code>clip_grad_norm_(parameters, max_norm, norm_type = 2 ) -> torch.Tensor</code><span>：参数梯度的总范数（将所有参数视为单个向量来看待）的范数截断。应该用来代替 </span><code>torch.nn.utils.clip_grad_norm_</code><span> 。</span></p><p><span>参数：</span></p><ul><li><code>parameters</code><span>：待截断梯度的参数列表。</span></li><li><code>max_norm</code><span>：梯度阈值。</span></li><li><code>norm_type</code><span>：范数类型。</span></li></ul></li><li><p><code>clip_grad_value_(parameters, clip_value ) -> torch.Tensor</code><span>：参数梯度的数值截断（绝对值）。应该用来代替 </span><code>torch.nn.utils.clip_grad_value_</code><span> 。</span></p><p><span>参数：</span></p><ul><li><code>parameters</code><span> ：待截断取值的参数列表。</span></li><li><code>clip_value</code><span>：参数阈值（绝对值）。</span></li></ul></li><li><p><code>end_training()</code><span>：运行任何特殊的 </span><code>end training behavior</code><span> ，比如只在 </span><code>global main</code><span> 进程上停止 </span><code>tracker</code><span> 。如果使用实验跟踪，应始终在脚本的最后调用 </span><code>end_training()</code><span> 。</span></p></li><li><p><code>free_memory()</code><span>：参考 </span><code>clear()</code><span> 。</span></p></li><li><p><code>gather(tensor) -> torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor</code><span> ：跨所有进程收集 </span><code>tensor</code><span> 的取值，并在第一个维度上将其拼接起来。在进行评估时，对所有进程的预测进行 </span><code>regroup</code><span> 是很有用的。</span></p><p><span>注意：这种收集发生在所有进程中。</span></p><p><span>参数：</span><code>tensor</code><span>：一个张量或张量的集合，表示需要跨所有进程来收集取值的张量。</span></p><p><span>返回：返回收集后的结果，类型与 </span><code>tensor</code><span> 相同。</span></p></li><li><p><code>gather_for_metrics(tensor)</code><span>：与 </span><code>gather()</code><span> 作用相同，但是 </span><code>gather_for_metrics</code><span> 可能会丢弃 </span><code>last batch</code><span> 中重复的数组。它经常被用于收集 </span><code>inputs</code><span> 和 </span><code>targets</code><span> 来计算指标。</span></p><p><span>参数和返回值：参考 </span><code>gather()</code><span> 。</span></p></li><li><p><code>get_state_dict( model, unwrap = True )</code><span>：以 </span><code>full precision</code><span> 来返回一个模型的 </span><code>state_dict</code><span> ，这个模型是被 </span><code>accelerator.prepare()</code><span> 处理过的。</span></p><p><span>参数：</span></p><ul><li><code>model</code><span>：一个 </span><code>PyTorch</code><span> 模型，它被 </span><code>accelerator.prepare()</code><span> 处理过。</span></li><li><code>unwrap</code><span>：一个布尔值，指定是否返回原始的 </span><code>state_dict</code><span> 。如果为 </span><code>False</code><span>，则返回 </span><code>wrapped state_dict</code><span> 。默认为 </span><code>True</code><span> 。</span></li></ul></li><li><p><code>get_tracker(name: str)</code><span>：基于 </span><code>name</code><span> 从 </span><code>self.trackers</code><span> 中返回一个 </span><code>tracker</code><span>，仅在 </span><code>global main</code><span> 进程上有效。</span></p><p><span>参数：</span><code>name</code><span>：一个字符串，指定 </span><code>tracker</code><span> 的名字。</span></p></li><li><p><code>init_trackers(project_name: str, config: Optional[dict] = None, init_kwargs: Optional[dict] = {})</code><span> ：为存储在 </span><code>self.log_with</code><span> 中的所有 </span><code>trackers</code><span> 执行初始化。</span></p><p><span>参数：</span></p><ul><li><code>project_name</code><span>：一个字符串，指定 </span><code>project</code><span> 的名字。</span></li><li><code>config</code><span>：一个字典，指定 </span><code>starting configuration</code><span> 。</span></li><li><code>init_kwargs</code><span>：一个字典，它将被传递给 </span><code>tracker</code><span> 的初始化方法。</span></li></ul></li><li><p><code>join_uneven_inputs(joinables, even_batches = None )</code><span>：一个上下文管理器，用于在 </span><code>uneven</code><span> 的输入上进行分布式训练或分布式评估。它作为 </span><code>torch.distributed.algorithms.join</code><span> 的 </span><code>wrapper</code><span> ，当 </span><code>total batch size</code><span> 无法整除 </span><code>dataset length</code><span> 时很有用。</span></p><p><span>仅支持多 </span><code>GPU</code><span> 上的 </span><code>Distributed Data Parallel training</code><span>。对于其它配置，则没有效果。</span></p><p><span>参数：</span></p><ul><li><p><code>joinables</code><span>：关于 </span><code>torch.distributed.algorithms.Joinable</code><span> 的列表，它为 </span><code>torch.distributed.algorithms.Joinable</code><span> 所子类化的模型或优化器，如 </span><code>Accelerator.prepare</code><span> 所准备的 </span><code>PyTorch Module</code><span> 。</span></p></li><li><p><code>even_batches</code><span>：一个布尔值，它覆盖 </span><code>Accelerator</code><span> 中设置的 </span><code>even_batches</code><span> 的值。如果未提供，则默认使用 </span><code>Accelerator</code><span> 中的 </span><code>even_batches</code><span> 的值。</span></p><p><span>对于 </span><code>iterable-style</code><span> 的 </span><code>dataloader</code><span>，该参数不生效。</span></p></li></ul><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>(<span class="cm-variable">even_batches</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">ddp_model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">dataloader</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>, <span class="cm-variable">dataloader</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">join_uneven_inputs</span>([<span class="cm-variable">ddp_model</span>], <span class="cm-variable">even_batches</span><span class="cm-operator">=</span><span class="cm-keyword">False</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-keyword">for</span> <span class="cm-builtin">input</span>, <span class="cm-variable">output</span> <span class="cm-keyword">in</span> <span class="cm-variable">dataloader</span>:</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-builtin">input</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_func</span>(<span class="cm-variable">outputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">loss</span>.<span class="cm-property">backward</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">optimizer</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">        <span class="cm-variable">optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 264px;"/><div class="CodeMirror-gutters" style="display: none; height: 264px;"/></div></div></pre></li><li><p><code>load_state(input_dir: str)</code><span>：加载 </span><code>model, optimizer, scaler, RNG generators, registered objects</code><span> 的当前状态。必须与 </span><code>accelerator.save_state()</code><span> 配合工作。</span></p><p><span>参数：</span><code>input_dir</code><span> ：一个字符串或 </span><code>os.PathLike</code><span> 对象，指定存放 </span><code>state</code><span> 的目录。</span></p></li><li><p><code>local_main_process_first()</code><span>：让 </span><code>local main</code><span> 进程先进入一个 </span><code>with block</code><span> ，其它进程将在 </span><code>local main</code><span> 进程退出后进入 </span><code>with block</code><span> 。</span></p></li><li><p><code>log(values: dict, step: Optional[int] = None, log_kwargs: Optional[dict] = {})</code><span>：记录 </span><code>values</code><span> 到 </span><code>self.trackers</code><span> 中的所有 </span><code>trackers</code><span>，仅在 </span><code>global main</code><span> 进程上生效。</span></p><p><span>参数：</span></p><ul><li><code>values</code><span>：一个字典，仅包含 </span><code>int/float/str</code><span> 数据类型，表示待记录的值。</span></li><li><code>step</code><span>：一个整数，指定 </span><code>run step</code><span> 。</span></li><li><code>log_kwargs</code><span>：一个字典，它将被传递给 </span><code>tracker</code><span> 的 </span><code>log</code><span> 函数。</span></li></ul></li><li><p><code>main_process_first()</code><span>：让 </span><code>global main</code><span> 进程先进入一个 </span><code>with block</code><span> ，其它进程将在</span><code>global main</code><span> 进程退出后进入 </span><code>with block</code><span> 。</span></p></li><li><p><code>no_sync(model)</code><span>：一个上下文管理器，它通过调用 </span><code>torch.nn.parallel.DistributedDataParallel.no_sync</code><span>  来禁用跨</span><code>DDP</code><span>  进程的梯度同步。如果模型不在 </span><code>DDP</code><span> 中，这个上下文管理器不做任何事情。</span></p><p><span>参数：</span><code>model</code><span>：一个 </span><code>torch.nn.Module</code><span> 对象，它被 </span><code>accelerator.prepare()</code><span> 处理过。</span></p><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" style="break-inside: unset;"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">Accelerator</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span> <span class="cm-operator">=</span> <span class="cm-variable">Accelerator</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">dataloader</span>, <span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span> <span class="cm-operator">=</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">prepare</span>(<span class="cm-variable">dataloader</span>, <span class="cm-variable">model</span>, <span class="cm-variable">optimizer</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">input_a</span> <span class="cm-operator">=</span> <span class="cm-builtin">next</span>(<span class="cm-builtin">iter</span>(<span class="cm-variable">dataloader</span>))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">input_b</span> <span class="cm-operator">=</span> <span class="cm-builtin">next</span>(<span class="cm-builtin">iter</span>(<span class="cm-variable">dataloader</span>))</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">with</span> <span class="cm-variable">accelerator</span>.<span class="cm-property">no_sync</span>():</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-variable">input_a</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_func</span>(<span class="cm-variable">outputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># No synchronization across processes, only accumulate gradients</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    </span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">outputs</span> <span class="cm-operator">=</span> <span class="cm-variable">model</span>(<span class="cm-variable">input_b</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">loss</span> <span class="cm-operator">=</span> <span class="cm-variable">loss_func</span>(<span class="cm-variable">outputs</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerator</span>.<span class="cm-property">backward</span>(<span class="cm-variable">loss</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Synchronization across all processes</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">optimizer</span>.<span class="cm-property">step</span>()</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">optimizer</span>.<span class="cm-property">zero_grad</span>()</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 418px;"/><div class="CodeMirror-gutters" style="display: none; height: 418px;"/></div></div></pre></li><li><p><code>on_last_process(func)</code><span>：一个装饰器，将仅仅在最后一个进程上运行被装饰的函数。</span></p></li><li><p><code>on_local_main_process(func)</code><span>：一个装饰器，将仅仅在 </span><code>local main</code><span> 进程上运行被装饰的函数。</span></p></li><li><p><code>on_local_process(local_process_idx)</code><span>：一个装饰器，将仅仅在 </span><code>local process index</code><span> 进程上运行被装饰的函数。</span></p></li><li><p><code>on_main_process(func)</code><span>：一个装饰器，将仅仅在 </span><code>global main</code><span> 进程上运行被装饰的函数。</span></p></li><li><p><code>on_process(process_idx)</code><span>：一个装饰器，将仅仅在 </span><code>global process index</code><span> 进程上运行被装饰的函数。</span></p></li><li><p><code>pad_across_processes(tensor, dim = 0, pad_index = 0, pad_first = False )</code><span>：递归地将所有设备的张量（位于嵌套的 </span><code>list/tuple/dict</code><span> 中）填充到相同的 </span><code>size</code><span>，以便它们可以安全地被收集起来。</span></p><p><span>参数：</span></p><ul><li><code>tensor</code><span>：</span><code>torch.Tensor</code><span> 的 </span><code>list/tuple/dict</code><span> ，指定被收集的数据。</span></li><li><code>dim</code><span>：一个整数，指定填充哪一维。默认为 </span><code>0</code><span> 。</span></li><li><code>pad_index</code><span>：一个整数，指定用什么值来填充。</span></li><li><code>pad_first</code><span>：一个布尔值，指定是否从开始填充。默认为 </span><code>False</code><span>，表示从尾部填充。</span></li></ul></li><li><p><code>prepare(*args, device_placement = None )</code><span>：为分布式训练和混合精度准备 </span><code>args</code><span> 中传递的所有对象，然后以相同的顺序返回。</span></p><p><span>如果你使用一个 </span><code>model</code><span> 来用于推断，并且没有任何形式的混合精度，那么你不需要 </span><code>prepare</code><span> 该 </span><code>model</code><span> 。</span></p><p><span>参数：</span></p><ul><li><code>args</code><span>：一个列表，可以包含如下类型的对象：</span><code>torch.utils.data.DataLoader</code><span>、</span><code>torch.nn.Module</code><span>、</span><code>torch.optim.Optimizer</code><span>、</span><code>torch.optim.lr_scheduler.LRScheduler</code><span>。</span></li><li><code>device_placement</code><span>：一个关于布尔值的列表，要求长度与 </span><code>args</code><span> 相同，分别指定每个被 </span><code>prepare</code><span> 的对象是否 </span><code>automatic device placement</code><span> 。</span></li></ul></li><li><p><code>prepare_data_loader(data_loader: DataLoaderde, dvice_placement = None )</code><span>：准备一个 </span><code>PyTorch DataLoader</code><span> 用于分布式训练。推荐使用 </span><code>prepare()</code><span> 函数。</span></p><p><span>参数：参考 </span><code>prepare()</code><span> 函数。</span></p></li><li><p><code>prepare_model(model: Module, dvice_placement = None )</code><span>：准备一个 </span><code>PyTorch model</code><span> 用于分布式训练。推荐使用 </span><code>prepare()</code><span> 函数。</span></p><p><span>参数：参考 </span><code>prepare()</code><span> 函数。</span></p></li><li><p><code>prepare_optimizer(optimizer: Optimizer, dvice_placement = None )</code><span>：准备一个 </span><code>PyTorch Optimizer</code><span> 用于分布式训练。推荐使用 </span><code>prepare()</code><span> 函数。</span></p><p><span>参数：参考 </span><code>prepare()</code><span> 函数。</span></p></li><li><p><code>prepare_scheduler(scheduler: _LRScheduler, dvice_placement = None )</code><span>：准备一个 </span><code>PyTorch Scheduler</code><span> 用于分布式训练。推荐使用 </span><code>prepare()</code><span> 函数。</span></p><p><span>参数：参考 </span><code>prepare()</code><span> 函数。</span></p></li><li><p><code>print(*args, **kwargs )</code><span>：用于替代 </span><code>python</code><span> 内置的 </span><code>print()</code><span> 函数从而仅在每个 </span><code>server</code><span> 上打印一次。</span></p></li><li><p><code>reduce(tensor, reduction = 'sum' ) -> torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor</code><span>：跨所有进程来 </span><code>reduce</code><span> 指定的张量。注意，所有的进程都将得到被 </span><code>reduce</code><span> 之后的值。</span></p><p><span>参数：</span></p><ul><li><code>tensor</code><span>：一个 </span><code>Tensor</code><span> 或 </span><code>Tensor</code><span> 的集合，指定要被 </span><code>reduce</code><span> 的张量。</span></li><li><code>reduction</code><span>：一个字符串，指定 </span><code>reduce</code><span> 方式，可以为 </span><code>'sum', 'mean', 'none'</code><span> 。如果为 </span><code>'none'</code><span>，则不做任何操作。默认为 </span><code>'sum'</code><span> 。</span></li></ul><p><span>返回：与 </span><code>tensor</code><span> 类型相同，表示被 </span><code>reduce</code><span> 之后的值。</span></p></li><li><p><code>register_for_checkpointing(*objects)</code><span>：注册对象，从而在 </span><code>save_state/load_state</code><span> 将该对象保存或加载。</span></p><p><span>注意，该方法应该在同一脚本中加载或保存 </span><code>state</code><span> 时利用。它不是被设计用来在不同的脚本中使用的。</span></p><p><span>参数：</span><code>objects</code><span>：被注册的对象，每个对象必须有一个 </span><code>load_state_dict</code><span> 方法和一个 </span><code>state_dict</code><span> 方法。</span></p></li><li><p><code>save(obj, f)</code><span>：在每台机器上保存 </span><code>obj</code><span> 一次，用于替代 </span><code>torch.save</code><span> 。</span></p><p><span>参数：</span></p><ul><li><code>obj</code><span>：被保存的对象。</span></li><li><code>f</code><span>：一个字符串或 </span><code>os.PathLike</code><span> 对象，指定存储路径。</span></li></ul></li><li><p><code>save_state(output_dir: str)</code><span>：保存 </span><code>model, optimizer, scaler, RNG generator, registered object</code><span> 等对象的当前状态。</span></p><p><span>参数：</span><code>output_dir</code><span>：一个字符串或 </span><code>os.PathLike</code><span> 对象，指定存储路径。</span></p></li><li><p><code>unscale_gradients(optimizer = None )</code><span>：在使用 </span><code>AMP</code><span> 的混合精度训练中 </span><code>unscale</code><span> 梯度。在所有其他 </span><code>setting</code><span> 中，该方法不做任何事情。</span></p><p><span>参数：</span><code>optimizer</code><span>：一个 </span><code>Optimizer</code><span> 或一组 </span><code>Optimizer</code><span>，指定哪些 </span><code>optimizer</code><span> 需要 </span><code>unscale</code><span> 梯度。如果未设置，则默认为传递给 </span><code>prepare()</code><span> 方法的所有 </span><code>optimizers</code><span> 。</span></p></li><li><p><code>unwrap_model(model, keep_fp32_wrapper: bool = False )</code><span>：</span><code>unwrap model</code><span>，其中 </span><code>model</code><span> 是经过 </span><code>prepare()</code><span> 处理过的。该方法常用于保存 </span><code>model</code><span> 。</span></p><p><span>参数：</span></p><ul><li><code>model</code><span>：一个 </span><code>torch.nn.Module</code><span>，指定需要被 </span><code>unwrap</code><span> 的模型。</span></li><li><code>keep_fp32_wrapper</code><span>：一个布尔值，指定是否需要移除 </span><code>mixed precision hook</code><span>（如果有的话）。默认为 </span><code>False</code><span> 。</span></li></ul></li><li><p><code>wait_for_everyone()</code><span>：将停止当前进程的执行，直到其他所有进程都达到该点（因此，当脚本只在一个进程中运行时，这没有任何作用）。在保存模型前执行该方法很有用。</span></p></li></ul></li><li><p><code>class accelerate.state.AcceleratorState</code><span>：单例类，它存储当前训练环境的信息。该类是不可变的，在第一次初始化之后就保存不变。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">accelerate</span>.<span class="cm-property">state</span>.<span class="cm-property">AcceleratorState</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">mixed_precision</span>: <span class="cm-builtin">str</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">cpu</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">dynamo_backend</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">deepspeed_plugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">fsdp_plugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">megatron_lm_plugin</span> <span class="cm-operator">=</span> <span class="cm-keyword">None</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">_from_accelerator</span>: <span class="cm-builtin">bool</span> <span class="cm-operator">=</span> <span class="cm-keyword">False</span>,</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-operator">**</span><span class="cm-variable">kwargs</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 198px;"/><div class="CodeMirror-gutters" style="display: none; height: 198px;"/></div></div></pre><p><span>参数参考 </span><code>Accelerator</code><span> 。</span></p><p><span>属性：</span></p><ul><li><code>device</code><span>：一个 </span><code>torch.device</code><span>，表示要使用的设备。</span></li><li><code>distributed_type</code><span>：一个 </span><code>DistributedType</code><span>，表示当前使用的分布式环境的类型。</span></li><li><code>initialized</code><span>：一个布尔值，表示 </span><code>AcceleratorState</code><span> 是否已经从 </span><code>Accelerator</code><span> 得到初始化。</span></li><li><code>local_process_index</code><span>：一个整数，表示当前进程在当前 </span><code>server</code><span> 上的索引。</span></li><li><code>mixed_precision</code><span>：一个字符串，表示当前脚本是否会使用混合精度，如果是的话，正在执行的混合精度的类型。</span></li><li><code>num_processes</code><span>：一个整数，表示当前并行启动的进程的数量。</span></li><li><code>process_index</code><span> ：一个整数，表示当前进程的 </span><code>global index</code><span> 。</span></li></ul></li><li><p><code>class accelerate.state.GradientState()</code><span>：单例类，它存储梯度同步相关的信息从而用于梯度累积。该类是不可变的，在第一次初始化之后就保存不变。</span></p><p><span>属性：</span></p><ul><li><code>end_of_dataloader</code><span>：一个布尔值，表示我们是否已经到达了当前 </span><code>dataloader</code><span> 的结束。</span></li><li><code>remainder</code><span>：一个整数，表示填充 </span><code>dataloader</code><span> 所需要增加的额外样本的数量。</span></li><li><code>sync_gradients</code><span>：一个布尔值，表示梯度是否应该在所有设备上同步。</span></li></ul></li></ol><h3 id="22-命令行"><span>2.2 命令行</span></h3><ol start=""><li><p><code>accelerate config</code><span> 命令：启动一系列提示，为你的训练系统创建并保存 </span><code>default_config.yml</code><span> 配置文件。该命令应该总是最先执行。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate config [arguments] <span class="cm-comment"># 或者 accelerate-config [arguments]</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>命令参数：</span></p><ul><li><code>--config_file CONFIG_FILE (str)</code><span>：配置文件的存储路径。默认为 </span><code>default_config.yaml</code><span> 文件名，存放在 </span><code>cache location</code><span>  。</span></li><li><code>-h, --help (bool)</code><span>：展示帮助信息。</span></li></ul></li><li><p><code>accelerate config default</code><span> 命令：启动一系列提示，为你的训练系统创建并保存 </span><code>default_config.yml</code><span> 配置文件，但是会在命令行中配置一些参数，如 </span><code>--mixed_precision</code><span> 等等。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate config default [arguments] <span class="cm-comment"># 或者 accelerate-config default [arguments]</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>用法参考 </span><code>accelerate config</code><span> 。</span></p></li><li><p><code>accelerate config update</code><span>：命令：用一组新的参数更新已有的配置文件中的对应项。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate config update [arguments] <span class="cm-comment"># 或者 accelerate-config update [arguments]</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>用法参考 </span><code>accelerate config</code><span> 。</span></p></li><li><p><code>accelerate env</code><span>：列出配置文件的内容。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate env [arguments] <span class="cm-comment"># 或者 accelerate-env [arguments]</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>用法参考 </span><code>accelerate config</code><span> 。</span></p></li><li><p><code>accelerate launch</code><span>：</span><code>launch</code><span> 一个脚本。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate launch [arguments] {training_script} <span class="cm-attribute">--</span>{training_script-argument-1} <span class="cm-attribute">--</span>{training_script-argument-2} ...</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 44px;"/><div class="CodeMirror-gutters" style="display: none; height: 44px;"/></div></div></pre><p><span>位置参数：</span></p><ul><li><code>{training_script}</code><span>：脚本的完整路径。</span></li><li><code>--{training_script-argument-1}</code><span>：脚本的参数。</span></li></ul><p><span>可选参数：</span></p><ul><li><code>-h, --help (bool)</code><span>：展示帮助信息。</span></li><li><code>--config_file CONFIG_FILE (str)</code><span>：替换默认的配置文件。</span></li><li><code>-m, --module (bool)</code><span>：将 </span><code>launch script</code><span> 解释为一个 </span><code>Python</code><span> 模块，即通过 </span><code>python -m</code><span> 执行。</span></li><li><code>--no_python (bool)</code><span>：跳过在脚本前加上 </span><code>"python"</code><span> ，直接执行它。当脚本不是 </span><code>Python</code><span> 脚本时很有用。</span></li><li><code>--debug (bool)</code><span>：当发生故障时，是否打印出 </span><code>torch.distributed stack trace</code><span> 。</span></li><li><code>-q, --quiet (bool)</code><span>：将子进程的错误信息从 </span><code>launch stack trace</code><span> 切换到仅展示相关的信息。仅用于 </span><code>DeepSpeed</code><span> 和单进程。</span></li></ul><p><span>下面的参数可以通过  </span><code>accelerate config</code><span> 来配置。也可以在 </span><code>launch</code><span> 时配置（或更新）：</span></p><ul><li><p><span>硬件选择参数：</span></p><ul><li><code>--cpu (bool)</code><span>：是否强制在 </span><code>CPU</code><span> 上进行训练。</span></li><li><code>--multi_gpu (bool)</code><span>：是否应该启动分布式 </span><code>GPU</code><span> 训练。</span></li><li><code>--mps (bool)</code><span>：否应该在 </span><code>MacOS</code><span> 机器上使用支持 </span><code>MPS</code><span> 的 </span><code>GPU</code><span> 设备。</span></li><li><code>--tpu (bool)</code><span> ：是否应该启动 </span><code>TPU</code><span> 训练。</span></li></ul></li><li><p><span>资源选择参数：</span></p><ul><li><code>--mixed_precision {no,fp16,bf16} (str)</code><span>：是否使用混合精度训练。 </span><code>BF16</code><span> 训练仅在 </span><code>Nvidia Ampere GPU</code><span> 和 </span><code>PyTorch 1.10</code><span> 或更高版本上支持。</span></li><li><code>--num_processes NUM_PROCESSES (int)</code><span>：要并行启动的进程总数。</span></li><li><code>--num_machines NUM_MACHINES (int)</code><span>：本次训练中使用的机器总数。</span></li><li><code>--num_cpu_threads_per_process NUM_CPU_THREADS_PER_PROCESS (int)</code><span>：每个进程的 </span><code>CPU</code><span> 线程数。可以进行调优以获得最佳性能。</span></li></ul></li><li><p><span>训练方式选择参数：</span></p><ul><li><code>--use_deepspeed (bool)</code><span>：是否使用 </span><code>DeepSpeed</code><span> 进行训练。</span></li><li><code>--use_fsdp (bool)</code><span>：是否使用 </span><code>FullyShardedDataParallel</code><span> 进行训练。</span></li><li><code>--use_megatron_lm (bool)</code><span>：是否使用 </span><code>Megatron-LM</code><span> 进行训练。</span></li></ul></li><li><p><span>分布式 </span><code>GPU</code><span> 参数：以下参数只有在传递了 </span><code>multi_gpu</code><span> 或者通过 </span><code>accelerate config</code><span> 配置了 </span><code>multi-gpu training</code><span> 时才有用。</span></p><ul><li><code>--gpu_ids (str)</code><span>：在这台机器上应该使用哪些 </span><code>GPU</code><span> (通过 </span><code>id</code><span> 指定)进行训练，以逗号分隔的方式列出。</span></li><li><code>--same_network (bool)</code><span>：用于多节点训练的所有机器是否存在于同一个 </span><code>local network</code><span>。</span></li><li><code>--machine_rank MACHINE_RANK (int)</code><span>：启动这个脚本的机器的 </span><code>rank</code><span> （即，编号）。</span></li><li><code>--main_process_ip MAIN_PROCESS_IP (str)</code><span>：</span><code>rank 0</code><span> 的机器的 </span><code>IP</code><span> 地址。</span></li><li><code>--main_process_port MAIN_PROCESS_PORT (int)</code><span>：与 </span><code>rank 0</code><span> 的机器通信的端口。</span></li><li><code>--rdzv_conf (str)</code><span>：额外的 </span><code>rendezvous</code><span> 配置（</span><code><key1>=<value1>,<key2>=<value2>,…</code><span>） 。</span></li><li><code>--max_restarts (int)</code><span>：</span><code>worker group</code><span> 最多重启多少次（之后不再重启而是失败）。</span></li><li><code>--monitor_interval (float)</code><span>：监控 </span><code>worker</code><span> 状态的时间间隔，单位是秒。</span></li></ul></li><li><p><code>TPU</code><span> 参数：以下参数只有在传递了 </span><code>tpu</code><span> 或者通过 </span><code>accelerate config</code><span> 配置了 </span><code>tpu training</code><span> 时才有用。</span></p><ul><li><code>--main_training_function MAIN_TRAINING_FUNCTION (str)</code><span>：脚本中要执行的主函数的名称。</span></li><li><code>--downcast_bf16 (bool)</code><span>：当在 </span><code>TPU</code><span> 上使用 </span><code>bf16</code><span> 精度时，是否 </span><code>float</code><span> 和 </span><code>double</code><span> 张量都被类型转换到 </span><code>bfloat16</code><span> ，还是 </span><code>double</code><span> 张量仍为 </span><code>float32</code><span> 。</span></li></ul></li><li><p><code>DeepSpeed</code><span> 参数：以下参数只有在传递了 </span><code>use_deepspeed</code><span> 或者通过 </span><code>accelerate config</code><span> 配置了 </span><code>deepspeed</code><span> 时才有用。</span></p><ul><li><code>--deepspeed_config_file (str)</code><span>：</span><code>DeepSpeed</code><span> 配置文件。</span></li><li><code>--zero_stage (int)</code><span>：</span><code>DeepSpeed</code><span> 的 </span><code>ZeRO</code><span> 优化阶段。</span></li><li><code>--offload_optimizer_device (str)</code><span>：决定在哪里（</span><code>none|cpu|nvme</code><span>）卸载优化器状态。</span></li><li><code>--offload_param_device (str)</code><span>：决定在哪里（</span><code>none|cpu|nvme</code><span>）卸载 </span><code>parameters</code><span>。</span></li><li><code>--gradient_accumulation_steps (int)</code><span>：训练脚本中使用的</span><code>gradient_accumulation_steps</code><span> 的数量。</span></li><li><code>--gradient_clipping (float)</code><span>：训练脚本中使用的梯度剪裁值。</span></li><li><code>--zero3_init_flag (str)</code><span>：决定是否（ </span><code>true|false</code><span> ）启用 </span><code>deepspeed.zero.Init</code><span>来构建大规模模型。只适用于</span><code>DeepSpeed ZeRO Stage-3</code><span> 。</span></li><li><code>--zero3_save_16bit_model (str)</code><span>：决定在使用 </span><code>ZeRO Stage-3</code><span> 时是否（</span><code>true|false</code><span>）保存 </span><code>16</code><span> 位模型权重。只适用于</span><code>DeepSpeed ZeRO Stage-3</code><span> 。</span></li><li><code>--deepspeed_hostfile (str)</code><span>：用于配置多节点计算资源的</span><code>DeepSpeed hostfile</code><span>。</span></li><li><code>--deepspeed_exclusion_filter (str)</code><span>：当使用多节点配置时，</span><code>DeepSpeed exclusion filter</code><span> 字符串。</span></li><li><code>--deepspeed_inclusion_filter (str)</code><span>：当使用多节点配置时，</span><code>DeepSpeed inclusionfilter</code><span> 字符串。</span></li><li><code>--deepspeed_multinode_launcher (str)</code><span>：要使用的 </span><code>DeepSpeed</code><span> 多节点 </span><code>launcher</code><span>。</span></li></ul></li><li><p><code>Fully Sharded Data Parallelism</code><span> 参数：以下参数只有在传递了 </span><code>use_fdsp</code><span> 或者通过 </span><code>accelerate config</code><span> 配置了 </span><code>Fully Sharded Data Parallelism</code><span> 时才有用。</span></p><ul><li><code>--fsdp_offload_params (str)</code><span>： 决定是否（ </span><code>true|false</code><span> ）将 </span><code>parameters</code><span> 和梯度卸载到 </span><code>CPU</code><span> 。</span></li><li><code>--fsdp_min_num_params (int)</code><span>：</span><code>FSDP</code><span> 默认的 </span><code>Default Auto Wrapping</code><span> 的 </span><code>parameters</code><span> 的最少数量。</span></li><li><code>--fsdp_sharding_strategy (int)</code><span>：</span><code>FSDP</code><span> 的分片策略。</span></li><li><code>--fsdp_auto_wrap_policy (str)</code><span>：</span><code>FSDP</code><span> 的 </span><code>auto wrap policy</code><span> 。</span></li><li><code>--fsdp_transformer_layer_cls_to_wrap (str)</code><span>：要 </span><code>wrap</code><span> 的 </span><code>Transformer layer class name</code><span> （区分大小写），例如：</span><code>BertLayer, GPTJBlock, T5Block ...</code><span> 。</span></li><li><code>--fsdp_backward_prefetch_policy (str)</code><span>：</span><code>FSDP</code><span> 的 </span><code>backward prefetch policy</code><span> 。</span></li><li><code>--fsdp_state_dict_type (str)</code><span>：</span><code>FSDP</code><span> 的 </span><code>state dict</code><span> 类型。</span></li></ul></li><li><p><code>Megatron-LM</code><span> 参数：以下参数只有在传递了 </span><code>use_megatron_lm</code><span> 或者通过 </span><code>accelerate config</code><span> 配置了 </span><code>Megatron-LM</code><span> 时才有用。</span></p><ul><li><code>--megatron_lm_tp_degree ('')</code><span>：</span><code>Megatron-LM</code><span> 的张量并行（</span><code>Tensor Parallelism: TP</code><span> ）度。</span></li><li><code>--megatron_lm_pp_degree ('')</code><span>：</span><code>Megatron-LM</code><span> 的管道平行（</span><code>Pipeline Parallelism: PP</code><span> ）度。</span></li><li><code>--megatron_lm_num_micro_batches ('')</code><span>：当管道并行度大于 </span><code>1</code><span> 时，</span><code>Megatron-LM</code><span> 的 </span><code>micro batch</code><span> 数量。</span></li><li><code>--megatron_lm_sequence_parallelism ('')</code><span>：当张量并行度大于 </span><code>1</code><span> 时，决定是否（</span><code>true|false</code><span> ）启用序列并行 </span><code>Sequence Parallelism</code><span> 。</span></li><li><code>--megatron_lm_recompute_activations ('')</code><span>：决定是否（</span><code>true|false</code><span> ）启用 </span><code>Selective Activation Recomputation</code><span> 。</span></li><li><code>--megatron_lm_use_distributed_optimizer ('')</code><span> ：决定是否（</span><code>true|false</code><span> ）使用分布式优化器，将优化器状态和梯度分片到 </span><code>Data Pralellel: DP</code><span> 的 </span><code>ranks</code><span> 。</span></li><li><code>--megatron_lm_gradient_clipping ('')</code><span>：</span><code>Megatron-LM</code><span> 基于全局 </span><code>L2</code><span> 范数的梯度裁剪值（ </span><code>0</code><span> 表示禁用）。</span></li></ul></li><li><p><code>AWS SageMaker</code><span> 参数：以下参数仅当在 </span><code>SageMake</code><span> 中训练时才有用。</span></p><ul><li><code>--aws_access_key_id AWS_ACCESS_KEY_ID (str)</code><span>：用于启动 </span><code>Amazon SageMaker</code><span> 训练工作的 </span><code>AWS_ACCESS_KEY_ID</code><span> 。</span></li><li><code>--aws_secret_access_key AWS_SECRET_ACCESS_KEY (str)</code><span>：用于启动 </span><code>Amazon SageMaker</code><span> 训练工作的</span><code>AWS_SECRET_ACCESS_KEY</code><span> 。</span></li></ul></li></ul></li><li><p><code>accelerate tpu-config</code><span>：配置 </span><code>tpu</code><span> 训练。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="shell"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="shell"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">accelerate tpu-config [arguments]</span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>可选参数：</span></p><ul><li><code>-h, --help (bool)</code><span>：展示帮助信息。</span></li></ul><p><span>配置参数：下面参数也可以通过  </span><code>accelerate config</code><span> 来配置：</span></p><ul><li><code>--config_file (str)</code><span>：</span><code>accelerate</code><span> 配置文件的路径。</span></li><li><code>--tpu_name (str)</code><span>：要使用的 </span><code>TPU</code><span> 的名称。如果没有指定，将使用配置文件中指定的 </span><code>TPU</code><span> 。</span></li><li><code>--tpu_zone (str)</code><span>：要使用的 </span><code>TPU</code><span> 的 </span><code>zone</code><span>。如果没有指定，将使用配置文件中指定的 </span><code>zone</code><span>。</span></li></ul><p><code>TPU</code><span> 参数：下面的参数用于配置 </span><code>TPU</code><span>。</span></p><ul><li><code>--command_file (str)</code><span>：一个文件的路径，该文件包含启动时在 </span><code>pod</code><span> 上运行的命令。</span></li><li><code>--command (str)</code><span>：要在 </span><code>pod</code><span> 上运行的命令。可以传递多次。</span></li><li><code>--install_accelerate (bool)</code><span>：是否在 </span><code>pod</code><span> 上安装 </span><code>accelerate</code><span> 。默认为 </span><code>False</code><span> 。</span></li><li><code>--accelerate_version (str)</code><span>：在 </span><code>pod</code><span> 上安装 </span><code>accelerate</code><span> 的版本。如果不指定，将使用最新的 </span><code>pypi</code><span> 版本。指定 </span><code>'dev'</code><span> 可以从 </span><code>GitHub</code><span> 安装。</span></li><li><code>--debug (bool)</code><span>：如果设置，将打印将运行的命令，而不是运行它。</span></li></ul></li><li><p><code>accelerate test</code><span>：执行 </span><code>accelerate/test_utils/test_script.py</code><span> 从而确保 </span><code>Accelerate</code><span> 被正确地配置。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">accelerate</span> <span class="cm-variable">test</span> [<span class="cm-variable">arguments</span>] <span class="cm-comment"># 或 accelerate-test [arguments]</span></span></pre></div></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 22px;"/><div class="CodeMirror-gutters" style="display: none; height: 22px;"/></div></div></pre><p><span>可选参数：</span></p><ul><li><code>--config_file CONFIG_FILE (str)</code><span>：配置文件的存储路径。默认为 </span><code>default_config.yaml</code><span> 文件名，存放在 </span><code>cache location</code><span>  。</span></li><li><code>-h, --help (bool)</code><span>：展示帮助信息。</span></li></ul></li></ol><h3 id="23-tracker"><span>2.3 Tracker</span></h3><ol start=""><li><p><code>class accelerate.tracking.GeneralTracker()</code><span>：所有 </span><code>Tracker</code><span> 的基类。</span></p><p><span>方法（每个方法都应该接受 </span><code>**kwargs</code><span> 参数）：</span></p><ul><li><p><code>finish()</code><span>：应该运行位于 </span><code>tracking API</code><span> 中的任何 </span><code>finalizing function</code><span> 。如果</span><code>API</code><span> 中没有这类 </span><code>finalizing function</code><span> ，则不要重写 </span><code>finish()</code><span> 方法。</span></p></li><li><p><code>log(values: dict, step: typing.Optional[int], **kwargs )</code><span>：记录当前 </span><code>run</code><span> 的日志。</span></p><p><span>参数：</span></p><ul><li><code>values</code><span>：一个字典，指定 </span><code>key-value</code><span> 的要被记录的内容。注意，</span><code>key</code><span> 为字符串，而</span><code>value</code><span> 必须是字符串、浮点数、或整数类型。</span></li><li><code>step</code><span>：一个整数，指定当前的 </span><code>run step</code><span>。</span></li></ul></li><li><p><code>store_init_configuration(values: dict )</code><span>：将 </span><code>values</code><span> 记录为超参数。</span></p><p><span>参数：参考 </span><code>log()</code><span> 。</span></p></li></ul></li><li><p><code>class accelerate.tracking.TensorBoardTracker</code><span>：</span><code>Tensorboard Tracker</code><span> 。应该在你的脚本开始的地方就被初始化。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">class</span> <span class="cm-def">accelerate</span>.<span class="cm-property">tracking</span>.<span class="cm-property">TensorBoardTracker</span>(</span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">  <span class="cm-variable">run_name</span>: <span class="cm-builtin">str</span>, <span class="cm-variable">logging_dir</span>: <span class="cm-variable">typing</span>.<span class="cm-property">Union</span>[<span class="cm-builtin">str</span>, <span class="cm-variable">os</span>.<span class="cm-property">PathLike</span>, <span class="cm-variable">NoneType</span>] <span class="cm-operator">=</span> <span class="cm-keyword">None</span>, <span class="cm-operator">**</span><span class="cm-variable">kwargs</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 66px;"/><div class="CodeMirror-gutters" style="display: none; height: 66px;"/></div></div></pre><p><span>参数：</span></p><ul><li><code>run_name</code><span>：一个字符串，指定当前 </span><code>experiment run</code><span> 的名字。</span></li><li><code>logging_dir</code><span>：一个字符串或 </span><code>os.PathLike</code><span>，指定 </span><code>TensorBoard logs</code><span> 存储的位置。</span></li><li><code>kwargs</code><span>：关键字参数，传递给 </span><code>tensorboard.SummaryWriter.__init__</code><span> 方法。</span></li></ul></li><li><p><code>class accelerate.tracking.WandBTracker(run_name: str, **kwargs )</code><span>：</span><code>WandB Tracker</code><span>。应该在你的脚本开始的地方就被初始化。</span></p><p><span>参数：</span></p><ul><li><code>run_name</code><span>：一个字符串，指定当前 </span><code>experiment run</code><span> 的名字。</span></li><li><code>kwargs</code><span>：关键字参数，传递给 </span><code>wandb.init</code><span> 方法。</span></li></ul></li><li><p><code>class accelerate.tracking.CometMLTracker(run_name: str, **kwargs )</code><span>：</span><code>comet_ml Tracker</code><span> 。应该在你的脚本开始的地方就被初始化。</span><code>API key</code><span> 必须存储在 </span><code>Comet</code><span> 配置文件中。</span></p><p><span>参数：</span></p><ul><li><code>run_name</code><span>：一个字符串，指定当前 </span><code>experiment run</code><span> 的名字。</span></li><li><code>kwargs</code><span>：关键字参数，传递给 </span><code>Experiment.__init__</code><span> 方法。</span></li></ul></li></ol><h3 id="24-分布式-lancher"><span>2.4 分布式 Lancher</span></h3><ol start=""><li><p><code>accelerate.notebook_launcher( function, args = (), num_processes = None, mixed_precision = 'no', use_port = '29500')</code><span>：启动一个训练函数。如果当前环境中允许的话（如，具有多核的 </span><code>TPU</code><span> ），使用几个进程。</span></p><p><span>要使用这个 </span><code>notebook_launcher</code><span> ，在调用之前，</span><code>notebook session</code><span> 中必须对 </span><code>CUDA</code><span> 设备没有任何调用。如果有任何调用，你将需要重启 </span><code>notebook</code><span> ，并确保没有 </span><code>cell</code><span> 使用任何 </span><code>CUDA</code><span> 设备。</span></p><p><span>参数：</span></p><ul><li><code>function</code><span>：一个可调用对象，指定要执行的训练函数。如果它接受参数，第一个参数应该是运行进程的 </span><code>index</code><span>。</span></li><li><code>args</code><span>：一个元组，指定传递给函数的参数的元组（函数将接收到 </span><code>*args</code><span>）。</span></li><li><code>num_processes</code><span>：一个整数，指定训练时使用的进程的数量。如果有 </span><code>TPU</code><span>，则在 </span><code>Colab/Kaggle</code><span> 中默认为 </span><code>8</code><span>，否则为可用的</span><code>GPU</code><span> 数量。</span></li><li><code>mixed_precision</code><span>：一个字符串，指定混合精度训练。默认为 </span><code>'no'</code><span> 。</span></li><li><code>use_port</code><span>：一个字符串，指定启动多 </span><code>GPU</code><span> 训练时用于进程间通信的端口。默认为 </span><code>'29500'</code><span> 。</span></li></ul><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-comment"># Assume this is defined in a Jupyter Notebook on an instance with two GPUs</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">notebook_launcher</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">def</span> <span class="cm-def">train</span>(<span class="cm-operator">*</span><span class="cm-variable">args</span>):</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-comment"># Your training function here</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;">    <span class="cm-operator">...</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">notebook_launcher</span>(<span class="cm-variable">train</span>, <span class="cm-variable">args</span><span class="cm-operator">=</span>(<span class="cm-variable">arg1</span>, <span class="cm-variable">arg2</span>), <span class="cm-variable">num_processes</span><span class="cm-operator">=</span><span class="cm-number">2</span>, <span class="cm-variable">mixed_precision</span><span class="cm-operator">=</span><span class="cm-string">"fp16"</span>)</span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 176px;"/><div class="CodeMirror-gutters" style="display: none; height: 176px;"/></div></div></pre></li><li><p><code>accelerate.debug_launcher(function, args = (), num_processes = 2)</code><span>：在 </span><code>CPU</code><span> 上使用几个进程启动一个训练函数从而用于调试。</span></p><p><code>debug_launcher</code><span> 仅用于调试，不应该用于真实的训练。它将仅使用 </span><code>CPU</code><span> 。</span></p><p><span>参数：参考 </span><code>notebook_launcher</code><span> 。</span></p></li></ol><h3 id="25-logging"><span>2.5 Logging</span></h3><ol start=""><li><p><code>accelerate</code><span> 有自己的 </span><code>logging</code><span> 工具从而用于分布式系统。使用方法为：用 </span><code>accelerate.logging</code><span> 代替 </span><code>Python logging</code><span> 。如：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="diff"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="diff"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation"><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- import logging</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ from accelerate.logging import get_logger</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-negative">- logger = logging.getLogger(__name__)</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-positive">+ logger = get_logger(__name__)</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 88px;"/><div class="CodeMirror-gutters" style="display: none; height: 88px;"/></div></div></pre></li><li><p><code>accelerate.logging.get_logger(name: str, log_level: str = None )</code><span>：返回一个 </span><code>logging.Logger</code><span> ，它可以用于多进程环境。</span></p><p><span>参数：</span></p><ul><li><code>name</code><span>：一个字符串，指定 </span><code>logger</code><span> 名字。</span></li><li><code>log_level</code><span>：一个字符串，指定 </span><code>log level</code><span> 。默认为 </span><code>LOG_LEVEL</code><span> 环境变量指定的。如果没有 </span><code>LOG_LEVEL</code><span> 环境变量，则默认为 </span><code>INFO</code><span> 。</span></li></ul></li><li><p><span>如果一个 </span><code>log</code><span> 应该在所有进程上都记录，那么使用 </span><code>main_process_only=False</code><span>；否则仅在全局主进程上记录。</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span>.<span class="cm-property">logging</span> <span class="cm-keyword">import</span> <span class="cm-variable">get_logger</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span> <span class="cm-operator">=</span> <span class="cm-variable">get_logger</span>(<span class="cm-variable">__name__</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">info</span>(<span class="cm-string">"My log"</span>, <span class="cm-variable">main_process_only</span><span class="cm-operator">=</span><span class="cm-keyword">False</span>) <span class="cm-comment"># 所有进程都记录</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">debug</span>(<span class="cm-string">"My log"</span>, <span class="cm-variable">main_process_only</span><span class="cm-operator">=</span><span class="cm-keyword">True</span>) <span class="cm-comment"># 仅全局主进程记录</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">
</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span> <span class="cm-operator">=</span> <span class="cm-variable">get_logger</span>(<span class="cm-variable">__name__</span>, <span class="cm-variable">accelerate_log_level</span><span class="cm-operator">=</span><span class="cm-string">"DEBUG"</span>)</span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">info</span>(<span class="cm-string">"My log"</span>)         <span class="cm-comment"># level 太低，被过滤</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-variable">logger</span>.<span class="cm-property">debug</span>(<span class="cm-string">"My second log"</span>) <span class="cm-comment"># level 符合条件，记录</span></span></pre></div></div></div></div></div><div style="position: absolute; ; width: 1px; border-bottom-width: 0px; border-bottom-style: solid; border-bottom-color: transparent; top: 220px;"/><div class="CodeMirror-gutters" style="display: none; height: 220px;"/></div></div></pre></li></ol><h3 id="26-与大型模型一起工作"><span>2.6 与大型模型一起工作</span></h3><h4 id="261-dispatching-and-offloading-models"><span>2.6.1 Dispatching and Offloading Models</span></h4><ol start=""><li><p><code>accelerate.init_empty_weights(include_buffers: bool = False)</code><span>：一个上下文管理器，在这个管理器下，模型被初始化为所有 </span><code>parameters</code><span> 都在 </span><code>meta device</code><span> 上，因此创建一个空模型。当仅仅初始化模型就会耗尽可用的内存时，这很有用。</span></p><p><span>参数：</span><code>include_buffers</code><span>：一个布尔值，指定在模型初始化时是否将所有的 </span><code>buffers</code><span> 也放在 </span><code>meta device</code><span> 上。</span></p><p><span>示例：</span></p><pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python"><div class="CodeMirror cm-s-inner cm-s-null-scroll CodeMirror-wrap" lang="python"><div style="overflow: hidden; position: relative; width: 3px; ; top: 9px; left: 8px;"><textarea autocorrect="off" autocapitalize="off" spellcheck="false" tabindex="0" style="position: absolute; bottom: -1em; padding: 0px; width: 1000px; height: 1em; outline: none;"/></div><div class="CodeMirror-scrollbar-filler" cm-not-content="true"/><div class="CodeMirror-gutter-filler" cm-not-content="true"/><div class="CodeMirror-scroll" tabindex="-1"><div class="CodeMirror-sizer" style="margin-left: 0px; margin-bottom: 0px; border-right-width: 0px; padding-right: 0px; padding-bottom: 0px;"><div style="position: relative; top: 0px;"><div class="CodeMirror-lines" role="presentation"><div role="presentation" style="position: relative; outline: none;"><div class="CodeMirror-measure"><pre><span>xxxxxxxxxx</span></pre></div><div class="CodeMirror-measure"/><div style="position: relative; z-index: 1;"/><div class="CodeMirror-code" role="presentation" style=""><div class="CodeMirror-activeline" style="position: relative;"><div class="CodeMirror-activeline-background CodeMirror-linebackground"/><div class="CodeMirror-gutter-background CodeMirror-activeline-gutter" style="left: 0px; width: 0px;"/><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">import</span> <span class="cm-variable">torch</span>.<span class="cm-property">nn</span> <span class="cm-keyword">as</span> <span class="cm-variable">nn</span></span></pre></div><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span class="cm-keyword">from</span> <span class="cm-variable">accelerate</span> <span class="cm-keyword">import</span> <span class="cm-variable">init_empty_weights</span></span></pre><pre class=" CodeMirror-line " role="presentation"><span role="presentation" style="padding-right: 0.1px;"><span cm-text="" cm-zwsp="">