添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search before asking

  • I have searched the YOLOv8 issues and discussions and found no similar questions.
  • Question

    RuntimeError: CUDA error: device-side assert triggered
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    I couldn't solve this error in colab. I couldn't solve this error in colab. I run with GPU.

    Additional

    No response

    👋 Hello @usertttwm, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

    If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

    If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results .

    Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

    Install

    Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 .

    pip install ultralytics

    Environments

    YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA / CUDNN , Python and PyTorch preinstalled):

  • Notebooks with free GPU: Run on Gradient Open In Colab Open In Kaggle
  • Google Cloud Deep Learning VM. See GCP Quickstart Guide
  • Amazon Deep Learning AMI. See AWS Quickstart Guide
  • Docker Image . See Docker Quickstart Guide Docker Pulls
  • Status

    If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

    @usertttwm hello, thank you for reaching out with your issue.

    The error message you are seeing, RuntimeError: CUDA error: device-side assert triggered , is typical when there may be an issue with the indices used in operations like scatter, gather, etc., most commonly indicating that an index is out of the given range.

    Since you're experiencing this error in Google Colab, which uses shared GPUs, the error could be caused due to a faulty state in CUDA. We suggest to try the following steps:

    Reset the GPU : In Google Colab, you can reset the GPU by restarting the runtime environment. You can do this from "Runtime" > "Restart runtime".

    Check your data : Make sure the labels (for instance bounding box coordinates) for your dataset are correct and not out of bounds.

    Update Pytorch and CUDA : Make sure you are working with the up-to-date versions of both Pytorch and CUDA.

    If the problem persists after these steps, we would need more detailed information about when you are encountering this error. Is it during the training phase, or when running inference? Information about your dataset format, batch size, and other relevant details will also help us address this issue in a more informed manner.

    Hi @glenn-jocher . Thank you very much for your reply. I tried your advice This error is in the train part.
    I get the same error when I run this part.
    !yolo train model=yolov8n.pt data=/content/custom_data.yaml epochs=1 imgsz=640

    raceback (most recent call last):
    File "/usr/local/bin/yolo", line 8, in
    sys.exit(entrypoint())
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/cfg/ init .py", line 446, in entrypoint
    getattr(model, mode)(**overrides) # default args from model
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py", line 341, in train
    self.trainer.train()
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/trainer.py", line 195, in train
    self._do_train(world_size)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/trainer.py", line 348, in _do_train
    self.loss, self.loss_items = self.model(batch)
    File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 44, in forward
    return self.loss(x, *args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py", line 215, in loss
    return self.criterion(preds, batch)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/utils/loss.py", line 181, in call
    _, target_bboxes, target_scores, fg_mask, _ = self.assigner(
    File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func( args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/utils/tal.py", line 115, in forward
    mask_pos, align_metric, overlaps = self.get_pos_mask(pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points,
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/utils/tal.py", line 136, in get_pos_mask
    align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt)
    File "/usr/local/lib/python3.10/dist-packages/ultralytics/utils/tal.py", line 155, in get_box_metrics
    bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt] # b, max_num_obj, h
    w
    RuntimeError: CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

    For additional resources and information, please see the links below:

  • Docs : https://docs.ultralytics.com
  • HUB : https://hub.ultralytics.com
  • Community : https://community.ultralytics.com
  • Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

    Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

    Hello everyone.
    I got the same error and the first thing that came to my mind was to check whether there is a class id that is not in the study.
    (With the help of point 2 in the explanation written by @glenn-jocher )

    For example, in a study with 9 classes, the class order in the data set is expected to be [0,1,2,3,4,5,6,7,8]. But I queried the existence of a class starting with '9' among the classes in the dataset and found it in a txt.

    0 0.492188 0.456543 0.021484 0.024414
    9 0.604492 0.171875 0.089844 0.197266  
    

    When I examined it, I observed that class '9' was used instead of class '5' and corrected it.

    0 0.492188 0.456543 0.021484 0.024414
    5 0.604492 0.171875 0.089844 0.197266
    

    I trained the model again with this correction and :) it worked. (:
    Out of 10k images, only 1 incorrect label lost 1 hour.

    Erutis, WillyK3, Yong988, vivixx1, rungrodkspeed, JansherMughal, fanandli, wei-lingfeng, restureese, and Dragon1633 reacted with thumbs up emoji DucTaiVu and restureese reacted with heart emoji DucTaiVu and restureese reacted with rocket emoji All reactions

    @mchtey it's great to hear that you identified the root of the issue and successfully resolved it. The presence of an incorrect class id can certainly cause the kind of CUDA error you were experiencing.

    This illustrates an important point in model training with YOLOv8 or any other machine learning framework: data integrity and proper annotation can have a significant impact on the training process. It's crucial that the labels in your dataset are accurate and correspond to the classes your model expects.

    For those who encounter similar issues, always make sure to verify the class ids in your dataset. Especially for large datasets, consider using automated scripts to detect any anomalies in annotations to save time and ensure consistency.

    Thank you for sharing your solution with the community. Your experience provides valuable insight for others facing similar problems. Keep up the great work, and feel free to engage with the community if you encounter any more challenges or have any insights to share. 🌟🚀

    @ianc1964 i'm glad to hear you've resolved the issue by identifying the extraneous class id in your dataset. A data checking script can indeed be a valuable tool to prevent such errors. While we don't provide custom scripts, you might find community-contributed utilities in the discussions section or by reaching out to fellow developers in the community. Happy coding! 😊👍

    @Vamsi1223 great to hear you resolved the issue by checking the input data for out-of-bounds values! It's often these small details that can trip us up. Your experience might guide others facing similar hurdles. If you encounter any more queries or have insights to share, feel free to reach out. Happy coding! 😊🚀

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    

    0%| | 0/122 [00:00<?, ?it/s]C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [97,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [98,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [99,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [100,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [101,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [102,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [103,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [104,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [105,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [106,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [107,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [108,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [109,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [110,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [111,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [112,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [113,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [114,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [115,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [116,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [117,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [118,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [119,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [120,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [121,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [122,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [123,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [124,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [125,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [126,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [127,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [97,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [98,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [99,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [100,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [101,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [102,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [103,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [104,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [105,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [106,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [107,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [108,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [109,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [110,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [111,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [112,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [113,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [114,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [115,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [116,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [117,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [118,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [119,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [120,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [121,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [122,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [123,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [124,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [125,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [126,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [17,0,0], thread: [127,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [66,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [67,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [68,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [69,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [70,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [71,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [72,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [73,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [74,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [75,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [76,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [77,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [78,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [79,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [80,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [81,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [82,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [83,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [84,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [85,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [86,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [87,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [88,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [89,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [90,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [91,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [92,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [93,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [94,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [15,0,0], thread: [95,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
    0%| | 0/122 [00:02<?, ?it/s]
    Traceback (most recent call last):
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\runpy.py", line 197, in _run_module_as_main
    return run_code(code, main_globals, None,
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\runpy.py", line 87, in run_code
    exec(code, run_globals)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\Scripts\yolo.exe_main
    .py", line 7, in
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\cfg_init
    .py", line 582, in entrypoint
    getattr(model, mode)(**overrides) # default args from model
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\engine\model.py", line 667, in train
    self.trainer.train()
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\engine\trainer.py", line 198, in train
    self._do_train(world_size)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\engine\trainer.py", line 366, in _do_train
    self.loss, self.loss_items = self.model(batch)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 88, in forward
    return self.loss(x, *args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 267, in loss
    return self.criterion(preds, batch)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\utils\loss.py", line 221, in call
    _, target_bboxes, target_scores, fg_mask, _ = self.assigner(
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
    return func(args, **kwargs)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\utils\tal.py", line 72, in forward
    mask_pos, align_metric, overlaps = self.get_pos_mask(
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\utils\tal.py", line 94, in get_pos_mask
    align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt)
    File "C:\Users\Suhas\anaconda3\envs\yolov8\lib\site-packages\ultralytics\utils\tal.py", line 113, in get_box_metrics
    bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt] # b, max_num_obj, h
    w
    RuntimeError: CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions

    I am getting this error in anaconda3 command prompt how to resolve thiis for training yolov8 objecet detection

    @suhasgogi17 hello! It looks like you're encountering a device-side assert triggered error during training, which often indicates an issue with your data, such as out-of-bounds class indices. Here are a couple of steps you can try to resolve this:

    Verify Class Indices: Ensure that all class indices in your labels are within the valid range [0, num_classes - 1]. If your dataset has N classes, the class indices should be from 0 to N-1.

    Check Labels for Anomalies: Sometimes, incorrect formatting or stray characters in the label files can lead to errors. Make sure your labels are correctly formatted.

    Run with CUDA_LAUNCH_BLOCKING=1: This suggestion helps in getting a precise location of the error. You can set this environment variable in your Anaconda prompt before running your training command:

    set CUDA_LAUNCH_BLOCKING=1
    yolo train model=yolov8n.pt data=/content/custom_data.yaml epochs=1 imgsz=640

    This could slow down your training but will give a clearer error message, pointing you directly to the cause of the assertion.

    If you've checked the above and are still facing issues, you might want to share a snippet of your custom dataset's label file. Sometimes the devil is in the details, and a second pair of eyes can help spot what might be missed. 😊

    Hello,
    I tried to follow the steps mentioned in above solution. Still I am getting CUDA error. As mentioned by glenn I am putting my error snap to debug in detail.

    Traceback (most recent call last):
    File "/home/kwo1kor/llama_sample_code/llama/example_text_completion.py", line 69, in
    fire.Fire(main)
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
    File "/home/kwo1kor/llama_sample_code/llama/example_text_completion.py", line 32, in main
    generator = Llama.build(
    File "/home/kwo1kor/llama_sample_code/llama/llama/generation.py", line 120, in build
    model = Transformer(model_args)
    File "/home/kwo1kor/llama_sample_code/llama/llama/model.py", line 464, in init
    self.layers.append(TransformerBlock(layer_id, params))
    File "/home/kwo1kor/llama_sample_code/llama/llama/model.py", line 397, in init
    self.feed_forward = FeedForward(
    File "/home/kwo1kor/llama_sample_code/llama/llama/model.py", line 358, in init
    self.w1 = ColumnParallelLinear(
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/fairscale/nn/model_parallel/layers.py", line 262, in init
    self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features))
    RuntimeError: CUDA error: unknown error
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    [2024-04-05 18:45:13,720] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1725) of binary: /usr/bin/python3
    Traceback (most recent call last):
    File "/home/kwo1kor/.local/bin/torchrun", line 8, in
    sys.exit(main())
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
    return f(*args, **kwargs)
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 135, in call
    return launch_agent(self._config, self._entrypoint, list(args))
    File "/home/kwo1kor/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

    example_text_completion.py FAILED

    Failures:
    <NO_OTHER_FAILURES>

    Root Cause (first observed failure):
    time : 2024-04-05_18:45:13
    host : BANI-C-003XC.bmh.apac.bosch.com
    rank : 0 (local_rank: 0)
    exitcode : 1 (pid: 1725)
    error_file: <N/A>
    traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

    Hello @mchtey
    I am using Cuda 11.8 with pytorch 2.2.0.
    I have install Cuda toolkit from NVIDIA site Not sure about the CuDNN.
    Is CuDNN need to install separately?

    Hello @kwo1kor! 😊 Indeed, CuDNN needs to be installed separately even after setting up CUDA. It's integral for deep learning tasks and can impact how PyTorch utilizes GPU resources. Given you're facing issues and observing fluctuating GPU memory usage, I'd recommend ensuring that your CUDA and CuDNN versions are compatible with PyTorch 2.2.0. Sometimes, specific combinations work better together.

    You can verify your installation and compatibility with the following short code snippet in your Python environment:

    import torch
    print(torch.__version__)
    print(torch.cuda.is_available())

    This should confirm whether PyTorch can successfully use CUDA on your setup. If torch.cuda.is_available() returns False, there might still be some mismatch or configuration issues with CUDA and CuDNN. In such cases, revisiting the installation guidelines for CUDA, CuDNN, and checking their compatibility versions with PyTorch 2.2.0 would be the path forward. 🚀

    Let's keep the dialogue open if you need further assistance!

    Hello @glenn-jocher ,
    I ran the command that you mentioned, please find the result as below :
    kwo1kor@BANI-C-003XC:~/llama_sample_code/llama$ nvidia-smi
    Mon Apr 8 13:06:59 2024
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 520.61.03 Driver Version: 522.06 CUDA Version: 11.8 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |===============================+======================+======================|
    | 0 Quadro T2000 On | 00000000:01:00.0 On | N/A |
    | N/A 55C P8 3W / N/A | 269MiB / 4096MiB | 1% Default |
    | | | N/A |
    +-------------------------------+----------------------+----------------------+

    +-----------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=============================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------+

    Python 3.9.5 (default, Nov 23 2021, 15:27:38)

    import torch
    print(torch.version)
    2.2.0+cu118
    print(torch.cuda.is_available())
    import torch; print(torch.backends.cudnn.version())

    Also I verified the CUDA & CUDNN version at https://docs.nvidia.com/deeplearning/cudnn/reference/support-matrix.html. They both are compatible with torch 2.2.0.

    Not sure what else can be checked here !
    fyi @mchtey

    Hello @kwo1kor! 🌟

    It looks like you've done a solid job verifying CUDA, CuDNN, and PyTorch installation. Since everything seems set up correctly and compatibility isn't the issue, we might need to dig a bit deeper.

    One small suggestion - could you check the environment variable settings related to CUDA and CuDNN to ensure they're pointing to the correct paths? Sometimes, discrepancies there can cause unexpected behaviors.

    echo $CUDA_HOME
    echo $CUDNN_HOME

    Also, trying to isolate the error by running a smaller or different PyTorch model (if you haven't done so already) could help rule out any model-specific issues.

    Keep us posted on your findings! 🛠️

    thanks for response!

    Please find the output of above command :
    kwo1kor@BANI-C-003XC:~$ echo $CUDA_HOME

    kwo1kor@BANI-C-003XC:~$ echo $CUDNN_HOME

    kwo1kor@BANI-C-003XC:~$

    I donot see any ENV variable for CUDA & CUDNN.
    Do think i should manually export this ?

    Hello @kwo1kor! 😊

    It looks like your CUDA and CuDNN environment variables aren't set, which might be fine as some installations don't necessarily require them to be explicitly set. However, setting them could help in ensuring that all parts of your system are aware of where to find the CUDA and CuDNN libraries.

    You could try setting these environment variables like so (please adjust the paths according to your actual installation directories):

    export CUDA_HOME=/usr/local/cuda
    export CUDNN_HOME=/usr/local/cuda

    After setting these, you might want to add CUDA to your PATH and LD_LIBRARY_PATH as well:

    export PATH=$CUDA_HOME/bin:$PATH
    export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

    You can add these lines to your ~/.bashrc or ~/.bash_profile to make them persistent across sessions. After adding them, don't forget to source your profile or log out and log back in.

    Let's hope this helps! Keep us updated on your progress. 🚀

    Hello @glenn-jocher ,

    I tried to export the cuda Home path. Still getting the same error.

    The issue in my case the program is able to access the NVIDIA GPU in start. However because of some reason it loose the access.

    here I am pasting the file log failure :

    initializing model parallel with size 1
    initializing ddp with size 1
    initializing pipeline with size 1
    /home/kwo1kor/.local/lib/python3.8/site-packages/torch/init.py:696: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:451.)
    _C._set_default_tensor_type(t)
    Traceback (most recent call last):
    File "example_text_completion.py", line 69, in
    fire.Fire(main)
    File "/home/kwo1kor/.local/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
    File "/home/kwo1kor/.local/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
    File "/home/kwo1kor/.local/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
    File "example_text_completion.py", line 32, in main
    generator = Llama.build(
    File "/home/kwo1kor/AI_poc/llama/llama/generation.py", line 119, in build
    model = Transformer(model_args)
    File "/home/kwo1kor/AI_poc/llama/llama/model.py", line 443, in init
    self.layers.append(TransformerBlock(layer_id, params))
    File "/home/kwo1kor/AI_poc/llama/llama/model.py", line 375, in init
    self.attention = Attention(args)
    File "/home/kwo1kor/AI_poc/llama/llama/model.py", line 228, in init
    self.wo = RowParallelLinear(
    File "/home/kwo1kor/.local/lib/python3.8/site-packages/fairscale/nn/model_parallel/layers.py", line 349, in init
    self.weight = Parameter(torch.Tensor(self.out_features, self.input_size_per_partition))
    RuntimeError: CUDA error: unknown error
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    It seems that getting stuck with some of fair scale module in python lib. I tried to debug in to it but no luck.

    any clue how we should proceed for this issue !

    Hey @kwo1kor! 👋

    Thanks for sharing more details. It seems like the issue might be related to a transient problem with GPU access due to the specifics of how CUDA memory allocation works or a potential compatibility issue with the fairscale library.

    A few immediate thoughts:

  • Ensure your fairscale library is up-to-date. Sometimes, compatibility issues get resolved in newer versions.
  • Try isolating a minimal example that triggers the error, if possible. This can sometimes provide clearer insight into when exactly the GPU access is lost.
  • Monitor the GPU memory usage closely using nvidia-smi before and during the execution to see if there are any abrupt changes.
  • If none of these provide a clear path forward, consider raising an issue on the fairscale GitHub page with a detailed explanation and any logs you have. The community or maintainers there might have seen similar issues or could offer more fairscale-specific advice. 🚀

    Hope this nudges you in the right direction! Keep us posted.

    Hello again,

    I am facing a similar issue with YOLOv8 training using a custom dataset. I have 11195 training images (80%) and 4298 validation images (20%). I used YOLOv8n.pt file to train my model as follows:

    import ultralytics
    ultralytics.checks

    import os

    os.environ['CUDA_LAUNCH_BLOCKING']="1"

    from ultralytics import YOLO
    import torch

    if name == "main":

    torch.device('cuda')
    if torch.cuda.is_available():
        print("GPU is available.")
    else:
        print("GPU is NOT available.")
    import torch
    x = torch.rand(5, 3)
    print(x)
    model = YOLO('yolov8n.pt')
    result = model.train(data='data.yaml', epochs=10, imgsz=640)
    

    The output of the model is:

    GPU is available.
    tensor([[0.8128, 0.2631, 0.8411],
    [0.8880, 0.2694, 0.7383],
    [0.7594, 0.3813, 0.3632],
    [0.9111, 0.4940, 0.1640],
    [0.8946, 0.5231, 0.4115]])
    Ultralytics YOLOv8.2.2 🚀 Python-3.8.19 torch-2.3.0 CUDA:0 (NVIDIA GeForce RTX 3090, 24576MiB)
    parameters (no error)
    optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
    optimizer: AdamW(lr=0.001667, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
    Image sizes 640 train, 640 val
    Using 8 dataloader workers
    Logging results to runs\detect\train64
    Starting training for 10 epochs...
    Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10      2.19G      1.861      3.133      1.723          9        640: 100%|██████████| 1075/1075 [03:49<00:00,
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 135/135 [00:
                   all       4298       5176      0.397      0.356      0.306      0.138
    

    ... (no errors)

    Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
    5/10 2.11G 1.722 1.735 1.61 20 640: 30%|██▉ | 319/1075 [00:56<02:08, C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [96,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [97,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [98,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [99,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [100,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [101,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [102,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [103,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [104,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [105,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [106,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [107,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [108,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [109,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [110,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [111,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [112,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [113,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [114,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [115,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [116,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [117,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [118,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [119,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [120,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [121,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [122,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [123,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [124,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [125,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [126,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [38,0,0], thread: [127,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [96,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [97,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [98,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [99,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [100,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [101,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [102,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [103,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [104,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [105,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [106,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [107,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [108,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [109,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [110,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [111,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [112,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [113,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [114,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [115,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [116,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [117,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [118,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [119,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [120,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [121,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [122,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [123,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [124,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [125,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [126,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [34,0,0], thread: [127,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [32,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [33,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [34,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [35,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [36,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [37,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [38,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [39,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [40,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [41,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [42,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [43,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [44,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [45,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [46,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [47,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [48,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [49,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [50,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [51,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [52,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [53,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [54,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [55,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [56,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [57,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [58,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [59,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [60,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [61,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [62,0,0] Assertion false failed.
    C:/cb/pytorch_1000000000000/work\c10/core/DynamicCast.h:78: block: [60,0,0], thread: [63,0,0] Assertion false failed.
    5/10 2.11G 1.722 1.735 1.61 20 640: 30%|██▉ | 319/1075 [00:56<02:14,
    Traceback (most recent call last):
    File "fire_train.py", line 30, in
    result = model.train(data='data.yaml', epochs=10, imgsz=640)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\engine\model.py", line 673, in train
    self.trainer.train()
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\engine\trainer.py", line 199, in train
    self._do_train(world_size)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\engine\trainer.py", line 371, in _do_train
    self.loss, self.loss_items = self.model(batch)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\tasks.py", line 88, in forward
    return self.loss(x, *args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\tasks.py", line 266, in loss
    preds = self.forward(batch["img"]) if preds is None else preds
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\tasks.py", line 89, in forward
    return self.predict(x, *args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\tasks.py", line 107, in predict
    return self._predict_once(x, profile, visualize, embed)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\tasks.py", line 128, in _predict_once
    x = m(x) # run
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\modules\head.py", line 47, in forward
    x[i] = torch.cat((self.cv2i, self.cv3i), 1)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\ultralytics\nn\modules\conv.py", line 50, in forward
    return self.act(self.bn(self.conv(x)))
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
    File "C:\Users\USER\anaconda3\envs\yolotest\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: CUDA error: device-side assert triggered
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

    SYSTEM SPECS
    NVIDIA GeForce RTX 3090, 64GB RAM.

    QUESTIONS

  • What might be causing this issue?
  • How to troubleshoot this error, like where should I look for it?
  • Is there any link to a requirements.txt file which I can run to install YOLOv8 environment?
  • What is a good ratio of object(s) to be detected and background (BG) images?
  • In case of COCO dataset that YOLOv8 is already trained for, there are 80 classes. What is the percentage of object-BG for each of the 80 classes?
  • Looking forward to your support as always.
    Thanks.

    @mariam-162 hey there! 👋

    It looks like you're encountering a device-side assert triggered error during training with a custom dataset. Let's tackle this step by step.

    Cause of the issue: The error might stem from an inconsistency between your dataset labels and the model's expected label range. It's common when class indices in your dataset exceed the model's range or if there's an incorrect label format.

    Troubleshooting:

  • First, try running your script with CUDA_LAUNCH_BLOCKING=1 to get a more detailed error message. This can sometimes pinpoint the exact cause.
  • Verify your dataset labels are correctly indexed from 0 to n-1 where n is the number of classes.
  • Review your data.yaml file to ensure it matches the structure of your custom dataset and that class indices are correctly defined.
  • YOLOv8 environment setup: While I don't have a specific requirements.txt file for YOLOv8 at hand, you can ensure that you have all the necessary packages by installing Ultralytics YOLO via pip (pip install ultralytics) and verifying that PyTorch is correctly installed with CUDA support as per your system's specs.

    Object-to-BG ratio: A balanced dataset is key. While there's no one-size-fits-all ratio, ensuring a varied representation of objects across your images is crucial. Too many BG images might skew the learning process. Start with a ratio of at least 1:1 for object:BG images and adjust based on your model's performance.

    COCO dataset statistics: The COCO dataset does not have a fixed object-to-background ratio. It's a diverse dataset with varying numbers of objects per image. For your dataset, focus on balanced representation and diversity of examples for each class to ensure robust learning.

    Remember, this error often masks the real issue, which might not be directly related to CUDA but rather to how your data is prepared and fed to the model. If after checking the dataset the problem persists, consider reducing the complexity of your debugging scenario: try with fewer classes or even synthetic data known to work well to narrow down the problem.

    Stay curious and keep experimenting! 🚀

    Thanks for your detailed response.
    For starters, I will check all files in my labels folders (both train and val) for consistency. I'll verify them for the format:

    class_id center_x center_y width height

    where,
    class_id can be from 0 to n (n being the total number of classes defined in config.yaml)
    xywh are normalized pixel values (ranging from 0 to 1) obtained by dividing x (and width) by the image's width and dividing y (and height) by the image's height.

    Will update you if this works.

    Hi there! 👋

    That sounds like a great plan! Checking your label files for consistency and ensuring they adhere to the expected format is definitely a solid first step. Your understanding of the label format and normalization seems spot on. 🌟

    Just a quick tip: remember to ensure that your config.yaml also correctly lists all class names to match your class IDs.

    Looking forward to hearing about your progress. Don't hesitate to reach out if you have more questions or need further assistance. Happy training!

    Hi there! 👋

    That sounds like a great plan! Checking your label files for consistency and ensuring they adhere to the expected format is definitely a solid first step. Your understanding of the label format and normalization seems spot on. 🌟

    Just a quick tip: remember to ensure that your config.yaml also correctly lists all class names to match your class IDs.

    Looking forward to hearing about your progress. Don't hesitate to reach out if you have more questions or need further assistance. Happy training!

    I have encountered the same problem. Thank you for your suggestion!

    print(torch.cuda.current_device()) print(torch.cuda.get_device_name(torch.cuda.current_device())) print(torch.cuda.get_device_properties(torch.cuda.current_device())) nvcc: NVIDIA (R) Cuda compiler driver Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0 Tue May 14 04:10:36 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P100-PCIE-16GB Off | 00000000:00:04.0 Off | 0 | | N/A 39C P0 37W / 250W | 2MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ Tesla P100-PCIE-16GB _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16276MB, multi_processor_count=56)

    labels content:

    1 0.3909375 0.36833333333333335 0.065625 0.030833333333333334 0.36 0.35333333333333333 0.421875 0.35333333333333333 0.42375 0.3825 0.358125 0.38416666666666666
    

    data yaml:

    data = '''train: /kaggle/input/crpd-data/crpd_data/train
    val: /kaggle/input/crpd-data/crpd_data/valid
    kpt_shape: [4, 2]
    flip_idx: [0,1,2,3]
    nc: 3
    names: 
      0: single 
      1: double 
      2: others
    # 写入到文件
    with open('custom_data.yaml', 'w') as file:
        file.write(data)
    # 打印文件内容
    with open('custom_data.yaml', 'r') as file:
        print(file.read())

    train code :

    !yolo train model=yolov8m-pose.pt data=/kaggle/working/custom_data.yaml  epochs=25 batch=12 patience=2 save_period=10
    Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-pose.pt to 'yolov8m-pose.pt'...
    100%|██████████████████████████████████████| 50.8M/50.8M [00:00<00:00, 71.8MB/s]
    Ultralytics YOLOv8.2.15 🚀 Python-3.10.13 torch-2.1.2 CUDA:0 (Tesla P100-PCIE-16GB, 16276MiB)
    engine/trainer: task=pose, mode=train, model=yolov8m-pose.pt, data=/kaggle/working/custom_data.yaml, epochs=25, time=None, patience=2, batch=12, imgsz=640, save=True, save_period=10, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/pose/train
    Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
    100%|████████████████████████████████████████| 755k/755k [00:00<00:00, 4.25MB/s]
    2024-05-14 04:28:50.612722: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
    2024-05-14 04:28:50.612876: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
    2024-05-14 04:28:50.751087: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
    Overriding model.yaml kpt_shape=[17, 3] with kpt_shape=[4, 2]
    Overriding model.yaml nc=1 with nc=3
                       from  n    params  module                                       arguments                     
      0                  -1  1      1392  ultralytics.nn.modules.conv.Conv             [3, 48, 3, 2]                 
      1                  -1  1     41664  ultralytics.nn.modules.conv.Conv             [48, 96, 3, 2]                
      2                  -1  2    111360  ultralytics.nn.modules.block.C2f             [96, 96, 2, True]             
      3                  -1  1    166272  ultralytics.nn.modules.conv.Conv             [96, 192, 3, 2]               
      4                  -1  4    813312  ultralytics.nn.modules.block.C2f             [192, 192, 4, True]           
      5                  -1  1    664320  ultralytics.nn.modules.conv.Conv             [192, 384, 3, 2]              
      6                  -1  4   3248640  ultralytics.nn.modules.block.C2f             [384, 384, 4, True]           
      7                  -1  1   1991808  ultralytics.nn.modules.conv.Conv             [384, 576, 3, 2]              
      8                  -1  2   3985920  ultralytics.nn.modules.block.C2f             [576, 576, 2, True]           
      9                  -1  1    831168  ultralytics.nn.modules.block.SPPF            [576, 576, 5]                 
     10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
     11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
     12                  -1  2   1993728  ultralytics.nn.modules.block.C2f             [960, 384, 2]                 
     13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
     14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
     15                  -1  2    517632  ultralytics.nn.modules.block.C2f             [576, 192, 2]                 
     16                  -1  1    332160  ultralytics.nn.modules.conv.Conv             [192, 192, 3, 2]              
     17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
     18                  -1  2   1846272  ultralytics.nn.modules.block.C2f             [576, 384, 2]                 
     19                  -1  1   1327872  ultralytics.nn.modules.conv.Conv             [384, 384, 3, 2]              
     20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
     21                  -1  2   4207104  ultralytics.nn.modules.block.C2f             [960, 576, 2]                 
     22        [15, 18, 21]  1   4339057  ultralytics.nn.modules.head.Pose             [3, [4, 2], [192, 384, 576]]  
    YOLOv8m-pose summary: 320 layers, 26419681 parameters, 26419665 gradients, 81.2 GFLOPs
    Transferred 475/517 items from pretrained weights
    TensorBoard: Start with 'tensorboard --logdir runs/pose/train', view at http://localhost:6006/
    wandb: Currently logged in as: s2667199938 (shusen2000). Use `wandb login --relogin` to force relogin
    wandb: wandb version 0.17.0 is available!  To upgrade, please run:
    wandb:  $ pip install wandb --upgrade
    wandb: Tracking run with wandb version 0.16.6
    wandb: Run data is saved locally in /kaggle/working/wandb/run-20240514_042901-gd9o1lgc
    wandb: Run `wandb offline` to turn off syncing.
    wandb: Syncing run train
    wandb: ⭐️ View project at [https://wandb.ai/shusen2000/YOLOv8](https://wandb.ai/shusen2000/YOLOv8%3C/span%3E)
    wandb: 🚀 View run at [https://wandb.ai/shusen2000/YOLOv8/runs/gd9o1lgc](https://wandb.ai/shusen2000/YOLOv8/runs/gd9o1lgc%3C/span%3E)
    Freezing layer 'model.22.dfl.conv.weight'
    AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
    Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
    100%|██████████████████████████████████████| 6.23M/6.23M [00:00<00:00, 22.5MB/s]
    AMP: checks passed ✅
    train: Scanning /kaggle/input/crpd-data/crpd_data/train/labels... 25000 images, 
    train: WARNING ⚠️ Cache directory /kaggle/input/crpd-data/crpd_data/train is not writeable, cache not saved.
    albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
    val: Scanning /kaggle/input/crpd-data/crpd_data/valid/labels... 6250 images, 0 b
    val: WARNING ⚠️ Cache directory /kaggle/input/crpd-data/crpd_data/valid is not writeable, cache not saved.
    Plotting labels to runs/pose/train/labels.jpg... 
    optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
    optimizer: AdamW(lr=0.001429, momentum=0.9) with parameter groups 83 weight(decay=0.0), 93 weight(decay=0.00046875), 92 bias(decay=0.0)
    TensorBoard: model graph visualization added ✅
    Image sizes 640 train, 640 val
    Using 4 dataloader workers
    Logging results to runs/pose/train
    Starting training for 25 epochs...
          Epoch    GPU_mem   box_loss  pose_loss  kobj_loss   cls_loss   dfl_loss  Instances       Size
           1/25      7.15G      5.591      8.707          0      19.16      2.682   /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [146,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [145,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [722,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [134,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [737,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [736,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [733,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [734,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [39,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [40,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [41,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [42,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [43,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [44,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [45,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [46,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [47,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [48,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [49,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [50,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [51,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [52,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [53,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [54,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [55,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [56,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [58,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [59,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [60,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [61,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [62,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
    /usr/local/src/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [728,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
           1/25      7.15G      5.591      8.707          0      19.16      2.682   
    Traceback (most recent call last):
      File "/opt/conda/bin/yolo", line 8, in <module>
        sys.exit(entrypoint())
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/cfg/__init__.py", line 583, in entrypoint
        getattr(model, mode)(**overrides)  # default args from model
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/engine/model.py", line 674, in train
        self.trainer.train()
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 199, in train
        self._do_train(world_size)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 371, in _do_train
        self.loss, self.loss_items = self.model(batch)
      File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 88, in forward
        return self.loss(x, *args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 267, in loss
        return self.criterion(preds, batch)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/utils/loss.py", line 476, in __call__
        _, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(
      File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/utils/tal.py", line 72, in forward
        mask_pos, align_metric, overlaps = self.get_pos_mask(
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/utils/tal.py", line 94, in get_pos_mask
        align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt)
      File "/opt/conda/lib/python3.10/site-packages/ultralytics/utils/tal.py", line 113, in get_box_metrics
        bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt]  # b, max_num_obj, h*w
    RuntimeError: CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
              

    @YangShusen2001 hi there! It looks like you're encountering a device-side assertion error during your training on Kaggle with the Tesla P100 GPU. This error typically suggests there may be an indexing issue within your dataset, especially within the keypoints or labels you are using for training.

    Here are a few steps to help you troubleshoot:

    Check the Label Files: Verify your label file's format matches the expected input for the YOLOv8 model. Ensure all class ids, bounding box coordinates, and keypoints are within the appropriate ranges (class ids: 0 to nc-1; coordinates and dimensions: 0.0 to 1.0).

    Review Data Loader Configuration: Ensure that your custom_data.yaml correctly aligns with the data specifics, especially kpt_shape and flip_idx. Make sure these are correctly configured according to your dataset particulars.

    Debugging Tip: Run your script with the environment variable CUDA_LAUNCH_BLOCKING=1 to pinpoint which operation is causing the indexing issue. This can offer more specific insights into which line of your dataset or code is problematic.

    Examine the Dataset: Sometimes, errors occur due to hidden problems in the dataset, such as corrupted images, incorrect annotations, or mismatched data types in keypoints. Ensure all data entries are clean and valid.

    By closely inspecting and verifying these elements, you should be able to identify and resolve the assertion error. If issues persist after checking, consider simplifying your dataset to a few samples to test the integrity of both the dataset and the model's response to a controlled input.

    @YangShusen2001 hi there! It looks like you're encountering a device-side assertion error during your training on Kaggle with the Tesla P100 GPU. This error typically suggests there may be an indexing issue within your dataset, especially within the keypoints or labels you are using for training.

    Here are a few steps to help you troubleshoot:

  • Check the Label Files: Verify your label file's format matches the expected input for the YOLOv8 model. Ensure all class ids, bounding box coordinates, and keypoints are within the appropriate ranges (class ids: 0 to nc-1; coordinates and dimensions: 0.0 to 1.0).
  • Review Data Loader Configuration: Ensure that your custom_data.yaml correctly aligns with the data specifics, especially kpt_shape and flip_idx. Make sure these are correctly configured according to your dataset particulars.
  • Debugging Tip: Run your script with the environment variable CUDA_LAUNCH_BLOCKING=1 to pinpoint which operation is causing the indexing issue. This can offer more specific insights into which line of your dataset or code is problematic.
  • Examine the Dataset: Sometimes, errors occur due to hidden problems in the dataset, such as corrupted images, incorrect annotations, or mismatched data types in keypoints. Ensure all data entries are clean and valid.
  • By closely inspecting and verifying these elements, you should be able to identify and resolve the assertion error. If issues persist after checking, consider simplifying your dataset to a few samples to test the integrity of both the dataset and the model's response to a controlled input.

    My problem has been solved exactly. There are many strange class_index in my labels, such as 89, 56, etc. Thank you very much!

    Hey @YangShusen2001! 🎉

    I'm thrilled to hear that the issue was resolved and that the guidance helped you pinpoint the problem with the class indices in your labels. It's great to see your training back on track! If you have any more questions or run into further issues down the road, feel free to reach out. Happy training with YOLOv8! 🚀

    Hello Everyone,

    I encountered the same issue. First, I verified the torch version as suggested by @glenn-jocher, but there was no problem with torch:

    import torch
    print(torch.__version__)
    print(torch.cuda.is_available())

    Next, I examined my dataset, which consists of 75 classes labeled from 0 to 74. To ensure the integrity of the annotations, I employed a script to detect any potential errors in the annotation files. Using this script, I successfully identified the problematic label file. The issue was caused by an object labeled as '75', even though the valid label range is only up to 74. After correcting this labeling error, the issue was resolved, and everything is functioning correctly now.

    Below is the script I used to identify annotation files with invalid labels:

    import os
    import glob
    # Define the path to the directory containing the YOLO format annotation files
    annotation_dir = "annotation dir path/labels/"
    # Define the range of valid class labels
    valid_labels = range(75)
    # Find all .txt annotation files in the directory
    annotation_files = glob.glob(os.path.join(annotation_dir, '*.txt'))
    # Initialize a list to keep track of files with errors
    files_with_errors = []
    # Process each annotation file
    for annotation_file in annotation_files:
        with open(annotation_file, 'r') as file:
            lines = file.readlines()
            for line in lines:
                parts = line.strip().split()
                if len(parts) > 0:
                    try:
                        label = int(parts[0])
                        if label not in valid_labels:
                            files_with_errors.append(annotation_file)
                            break
                    except ValueError:
                        files_with_errors.append(annotation_file)
                        break
    # Print the names of files with errors
    if files_with_errors:
        print("Files with invalid annotations:")
        for file_with_error in files_with_errors:
            print(file_with_error)
    else:
        print("All annotation files are valid.")

    This script reads each annotation file and ensures that all labels are within the valid range of 0 to 74. If it finds any labels outside this range, it records the file name. This process allowed me to quickly identify and correct the labeling error.

    @MuhammadShifa hello,

    Fantastic work on troubleshooting the issue with your dataset labels! It's great to hear that the script you used effectively identified and helped you correct the erroneous label. Your proactive approach to ensuring data integrity is commendable.

    Thank you for sharing the script as well. It's a valuable tool that others in the community might find useful for similar issues. Keep up the great work, and don't hesitate to reach out if you encounter any more challenges or have insights to share!

    Happy coding! 🚀

    Hello there.

    Hello everyone. I got the same error and the first thing that came to my mind was to check whether there is a class id that is not in the study. (With the help of point 2 in the explanation written by @glenn-jocher)

    For example, in a study with 9 classes, the class order in the data set is expected to be [0,1,2,3,4,5,6,7,8]. But I queried the existence of a class starting with '9' among the classes in the dataset and found it in a txt.

    0 0.492188 0.456543 0.021484 0.024414
    9 0.604492 0.171875 0.089844 0.197266  
    

    When I examined it, I observed that class '9' was used instead of class '5' and corrected it.

    0 0.492188 0.456543 0.021484 0.024414
    5 0.604492 0.171875 0.089844 0.197266
    

    I trained the model again with this correction and :) it worked. (: Out of 10k images, only 1 incorrect label lost 1 hour.

    Exactly, That's the reason I made the mistake.

    Hello there,

    Great detective work on identifying and correcting the class ID error in your dataset! It's impressive how a single mislabeled instance can impact the training process. Your methodical approach to verifying and adjusting the class IDs is a fantastic example for others facing similar issues. Thanks for sharing your solution, and I'm glad to hear that your model is now training correctly. Keep up the excellent work! 🌟

    Hello please help. I have encountered the same issue. I have checked all class labels and it all seems to be in order. I only have one class, and I have tried @ENNURSILA solution, but the problem still persist.
    `---------------------------------------------------------------------------
    RuntimeError Traceback (most recent call last)
    Cell In[4], line 1
    ----> 1 results = model.train(data="data.yaml", epochs=50, imgsz=256)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\model.py:674, in Model.train(self, trainer, **kwargs)
    671 pass
    673 self.trainer.hub_session = self.session # attach optional HUB session
    --> 674 self.trainer.train()
    675 # Update model and cfg after training
    676 if RANK in {-1, 0}:

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\trainer.py:199, in BaseTrainer.train(self)
    196 ddp_cleanup(self, str(file))
    198 else:
    --> 199 self._do_train(world_size)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\trainer.py:376, in BaseTrainer._do_train(self, world_size)
    374 with torch.cuda.amp.autocast(self.amp):
    375 batch = self.preprocess_batch(batch)
    --> 376 self.loss, self.loss_items = self.model(batch)
    377 if RANK != -1:
    378 self.loss *= world_size

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
    1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
    1510 else:
    -> 1511 return self._call_impl(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1520, in Module._call_impl(self, *args, **kwargs)
    1515 # If we don't have any hooks, we want to skip the rest of the logic in
    1516 # this function, and just call forward.
    1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
    1518 or _global_backward_pre_hooks or _global_backward_hooks
    1519 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1520 return forward_call(*args, **kwargs)
    1522 try:
    1523 result = None

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\nn\tasks.py:88, in BaseModel.forward(self, x, *args, **kwargs)
    78 """
    79 Forward pass of the model on a single scale. Wrapper for _forward_once method.
    (...)
    85 (torch.Tensor): The output of the network.
    86 """
    87 if isinstance(x, dict): # for cases of training and validating while training.
    ---> 88 return self.loss(x, *args, **kwargs)
    89 return self.predict(x, *args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\nn\tasks.py:267, in BaseModel.loss(self, batch, preds)
    264 self.criterion = self.init_criterion()
    266 preds = self.forward(batch["img"]) if preds is None else preds
    --> 267 return self.criterion(preds, batch)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\loss.py:296, in v8SegmentationLoss.call(self, preds, batch)
    293 # Pboxes
    294 pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4)
    --> 296 _, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(
    297 pred_scores.detach().sigmoid(),
    298 (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),
    299 anchor_points * stride_tensor,
    300 gt_labels,
    301 gt_bboxes,
    302 mask_gt,
    303 )
    305 target_scores_sum = max(target_scores.sum(), 1)
    307 # Cls loss
    308 # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL way

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
    1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
    1510 else:
    -> 1511 return self._call_impl(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1520, in Module._call_impl(self, *args, **kwargs)
    1515 # If we don't have any hooks, we want to skip the rest of the logic in
    1516 # this function, and just call forward.
    1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
    1518 or _global_backward_pre_hooks or _global_backward_hooks
    1519 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1520 return forward_call(*args, **kwargs)
    1522 try:
    1523 result = None

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114 with ctx_factory():
    --> 115 return func(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:72, in TaskAlignedAssigner.forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt)
    63 device = gt_bboxes.device
    64 return (
    65 torch.full_like(pd_scores[..., 0], self.bg_idx).to(device),
    66 torch.zeros_like(pd_bboxes).to(device),
    (...)
    69 torch.zeros_like(pd_scores[..., 0]).to(device),
    70 )
    ---> 72 mask_pos, align_metric, overlaps = self.get_pos_mask(
    73 pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt
    76 target_gt_idx, fg_mask, mask_pos = self.select_highest_overlaps(mask_pos, overlaps, self.n_max_boxes)
    78 # Assigned target

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:94, in TaskAlignedAssigner.get_pos_mask(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt)
    92 mask_in_gts = self.select_candidates_in_gts(anc_points, gt_bboxes)
    93 # Get anchor_align metric, (b, max_num_obj, hw)
    ---> 94 align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt)
    95 # Get topk_metric mask, (b, max_num_obj, h
    w)
    96 mask_topk = self.select_topk_candidates(align_metric, topk_mask=mask_gt.expand(-1, -1, self.topk).bool())

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:113, in TaskAlignedAssigner.get_box_metrics(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_gt)
    111 ind[1] = gt_labels.squeeze(-1) # b, max_num_obj
    112 # Get the scores of each grid for each gt cls
    --> 113 bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt] # b, max_num_obj, hw
    115 # (b, max_num_obj, 1, 4), (b, 1, h
    w, 4)
    116 pd_boxes = pd_bboxes.unsqueeze(1).expand(-1, self.n_max_boxes, -1, -1)[mask_gt]

    RuntimeError: CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

    Hello please help. I have encountered the same issue. I have checked all class labels and it all seems to be in order. I only have one class, and I have tried @ENNURSILA solution, but the problem still persist. `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 results = model.train(data="data.yaml", epochs=50, imgsz=256)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\model.py:674, in Model.train(self, trainer, **kwargs) 671 pass 673 self.trainer.hub_session = self.session # attach optional HUB session --> 674 self.trainer.train() 675 # Update model and cfg after training 676 if RANK in {-1, 0}:

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\trainer.py:199, in BaseTrainer.train(self) 196 ddp_cleanup(self, str(file)) 198 else: --> 199 self._do_train(world_size)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\engine\trainer.py:376, in BaseTrainer._do_train(self, world_size) 374 with torch.cuda.amp.autocast(self.amp): 375 batch = self.preprocess_batch(batch) --> 376 self.loss, self.loss_items = self.model(batch) 377 if RANK != -1: 378 self.loss *= world_size

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs) 1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1510 else: -> 1511 return self._call_impl(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1520, in Module._call_impl(self, *args, **kwargs) 1515 # If we don't have any hooks, we want to skip the rest of the logic in 1516 # this function, and just call forward. 1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1518 or _global_backward_pre_hooks or _global_backward_hooks 1519 or _global_forward_hooks or _global_forward_pre_hooks): -> 1520 return forward_call(*args, **kwargs) 1522 try: 1523 result = None

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\nn\tasks.py:88, in BaseModel.forward(self, x, *args, **kwargs) 78 """ 79 Forward pass of the model on a single scale. Wrapper for _forward_once method. 80 (...) 85 (torch.Tensor): The output of the network. 86 """ 87 if isinstance(x, dict): # for cases of training and validating while training. ---> 88 return self.loss(x, *args, **kwargs) 89 return self.predict(x, *args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\nn\tasks.py:267, in BaseModel.loss(self, batch, preds) 264 self.criterion = self.init_criterion() 266 preds = self.forward(batch["img"]) if preds is None else preds --> 267 return self.criterion(preds, batch)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\loss.py:296, in v8SegmentationLoss.call(self, preds, batch) 293 # Pboxes 294 pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4) --> 296 _, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner( 297 pred_scores.detach().sigmoid(), 298 (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype), 299 anchor_points * stride_tensor, 300 gt_labels, 301 gt_bboxes, 302 mask_gt, 303 ) 305 target_scores_sum = max(target_scores.sum(), 1) 307 # Cls loss 308 # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL way

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs) 1509 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1510 else: -> 1511 return self._call_impl(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1520, in Module._call_impl(self, *args, **kwargs) 1515 # If we don't have any hooks, we want to skip the rest of the logic in 1516 # this function, and just call forward. 1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1518 or _global_backward_pre_hooks or _global_backward_hooks 1519 or _global_forward_hooks or _global_forward_pre_hooks): -> 1520 return forward_call(*args, **kwargs) 1522 try: 1523 result = None

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, **kwargs): 114 with ctx_factory(): --> 115 return func(*args, **kwargs)

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:72, in TaskAlignedAssigner.forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt) 63 device = gt_bboxes.device 64 return ( 65 torch.full_like(pd_scores[..., 0], self.bg_idx).to(device), 66 torch.zeros_like(pd_bboxes).to(device), (...) 69 torch.zeros_like(pd_scores[..., 0]).to(device), 70 ) ---> 72 mask_pos, align_metric, overlaps = self.get_pos_mask( 73 pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt 74 ) 76 target_gt_idx, fg_mask, mask_pos = self.select_highest_overlaps(mask_pos, overlaps, self.n_max_boxes) 78 # Assigned target

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:94, in TaskAlignedAssigner.get_pos_mask(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt) 92 mask_in_gts = self.select_candidates_in_gts(anc_points, gt_bboxes) 93 # Get anchor_align metric, (b, max_num_obj, h_w) ---> 94 align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt) 95 # Get topk_metric mask, (b, max_num_obj, h_w) 96 mask_topk = self.select_topk_candidates(align_metric, topk_mask=mask_gt.expand(-1, -1, self.topk).bool())

    File c:\Users\mhani\AppData\Local\Programs\Python\Python310\lib\site-packages\ultralytics\utils\tal.py:113, in TaskAlignedAssigner.get_box_metrics(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_gt) 111 ind[1] = gt_labels.squeeze(-1) # b, max_num_obj 112 # Get the scores of each grid for each gt cls --> 113 bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt] # b, max_num_obj, h_w 115 # (b, max_num_obj, 1, 4), (b, 1, h_w, 4) 116 pd_boxes = pd_bboxes.unsqueeze(1).expand(-1, self.n_max_boxes, -1, -1)[mask_gt]

    RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

    Turns out the data.yaml is getting the wrong file path this whole time. It didn't get my current working directory, just have to specify it in the data.yaml and it fixed