728x90
이슈 내용
The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click here for more info.
View Jupyter log for further details.
갑자기 돌아가던 코드가 안 돌아가서 당황하였지만
결국
torch를 먼저 import하고 tensorflow를 import하도록 순서를 지정하였더니 해결 되었다
해결방법
import torch
import tensorflow as tf
Error Log
jupyter notebook log
01:24:13.939 [info] Restart requested ~/Anomaly_Detect/aer_anomaly_detection.ipynb
01:24:13.942 [warn] Cancel all remaining cells due to dead kernel
01:24:13.957 [info] Process Execution: ~/.pyenv/versions/AD_project/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
01:24:13.969 [info] Process Execution: ~/.pyenv/versions/AD_project/bin/python -m ipykernel_launcher --f=/run/user/1003/jupyter/runtime/kernel-v34f295111ec1d85b27ddb1d4e27a6d186e408e137.json
> cwd: ~/Anomaly_Detect
01:24:14.387 [info] Restarted d32f2382-514c-448c-ac73-77112e40a0f9
01:24:51.087 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 2025-09-01 01:24:15.578340: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-09-01 01:24:15.610623: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-01 01:24:15.610649: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-01 01:24:15.610678: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-01 01:24:15.617593: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-01 01:24:16.324605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-09-01 01:24:23.481987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 22062 MB memory: -> device: 0, name: NVIDIA A30, pci bus id: 0000:0d:00.0, compute capability: 8.0
2025-09-01 01:24:23.483746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:1 with 22396 MB memory: -> device: 1, name: NVIDIA A30, pci bus id: 0000:b5:00.0, compute capability: 8.0
2025-09-01 01:24:37.173575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22062 MB memory: -> device: 0, name: NVIDIA A30, pci bus id: 0000:0d:00.0, compute capability: 8.0
2025-09-01 01:24:37.174985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22396 MB memory: -> device: 1, name: NVIDIA A30, pci bus id: 0000:b5:00.0, compute capability: 8.0
nvidia-smi
seungjong.yoo@mlsvr:~$ nvidia-smi
Mon Sep 1 01:24:32 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A30 Off | 00000000:0D:00.0 Off | 0 |
| N/A 30C P0 33W / 165W | 567MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A30 Off | 00000000:B5:00.0 Off | 0 |
| N/A 32C P0 33W / 165W | 233MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3701 C /usr/local/bin/python3 328MiB |
| 0 N/A N/A 3543064 C ...ersions/AD_project/bin/python 224MiB |
| 1 N/A N/A 3543064 C ...ersions/AD_project/bin/python 224MiB |
+-----------------------------------------------------------------------------------------+
version 확인
import tensorflow as tf
import torch
import sys
print(tf.__version__)
print(tf.sysconfig.get_build_info().get("cuda_version"))
print(tf.sysconfig.get_build_info().get("cudnn_version"))
print("")
print(sys.version)
print("torch : ",torch.__version__, " cuda : ", torch.version.cuda)
2.14.1
11.8
8
3.10.12 (main, May 21 2025, 07:40:53) [GCC 11.4.0]
torch : 2.5.1+cu124 cuda : 12.4
728x90
'etc > Error Shooting' 카테고리의 다른 글
[Error Shooting] You must type a file name (0) | 2025.09.05 |
---|