본문 바로가기
etc/Error Shooting

[Error Shooting] Jupyter notebook kernel 죽는 문제 (The Kernel crashed while executing code in the current cell or a previous cell.)

by 하람 Haram 2025. 9. 1.
728x90

이슈 내용

 

The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click here for more info.
View Jupyter log for further details.

 

갑자기 돌아가던 코드가 안 돌아가서 당황하였지만

 

결국

torch를 먼저 import하고 tensorflow를 import하도록 순서를 지정하였더니 해결 되었다

 

해결방법

import torch
import tensorflow as tf

 

 

 

Error Log

jupyter notebook log

01:24:13.939 [info] Restart requested ~/Anomaly_Detect/aer_anomaly_detection.ipynb
01:24:13.942 [warn] Cancel all remaining cells due to dead kernel
01:24:13.957 [info] Process Execution: ~/.pyenv/versions/AD_project/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
01:24:13.969 [info] Process Execution: ~/.pyenv/versions/AD_project/bin/python -m ipykernel_launcher --f=/run/user/1003/jupyter/runtime/kernel-v34f295111ec1d85b27ddb1d4e27a6d186e408e137.json
    > cwd: ~/Anomaly_Detect
01:24:14.387 [info] Restarted d32f2382-514c-448c-ac73-77112e40a0f9
01:24:51.087 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 2025-09-01 01:24:15.578340: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-09-01 01:24:15.610623: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-01 01:24:15.610649: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-01 01:24:15.610678: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-01 01:24:15.617593: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-01 01:24:16.324605: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-09-01 01:24:23.481987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 22062 MB memory:  -> device: 0, name: NVIDIA A30, pci bus id: 0000:0d:00.0, compute capability: 8.0
2025-09-01 01:24:23.483746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:1 with 22396 MB memory:  -> device: 1, name: NVIDIA A30, pci bus id: 0000:b5:00.0, compute capability: 8.0
2025-09-01 01:24:37.173575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22062 MB memory:  -> device: 0, name: NVIDIA A30, pci bus id: 0000:0d:00.0, compute capability: 8.0
2025-09-01 01:24:37.174985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22396 MB memory:  -> device: 1, name: NVIDIA A30, pci bus id: 0000:b5:00.0, compute capability: 8.0

 

nvidia-smi

seungjong.yoo@mlsvr:~$ nvidia-smi
Mon Sep  1 01:24:32 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A30                     Off |   00000000:0D:00.0 Off |                    0 |
| N/A   30C    P0             33W /  165W |     567MiB /  24576MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A30                     Off |   00000000:B5:00.0 Off |                    0 |
| N/A   32C    P0             33W /  165W |     233MiB /  24576MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3701      C   /usr/local/bin/python3                  328MiB |
|    0   N/A  N/A         3543064      C   ...ersions/AD_project/bin/python        224MiB |
|    1   N/A  N/A         3543064      C   ...ersions/AD_project/bin/python        224MiB |
+-----------------------------------------------------------------------------------------+

 

version 확인

 

import tensorflow as tf
import torch
import sys

print(tf.__version__)
print(tf.sysconfig.get_build_info().get("cuda_version"))
print(tf.sysconfig.get_build_info().get("cudnn_version"))

print("")

print(sys.version)
print("torch : ",torch.__version__, "    cuda : ", torch.version.cuda)

 


2.14.1
11.8
8

3.10.12 (main, May 21 2025, 07:40:53) [GCC 11.4.0]
torch : 2.5.1+cu124 cuda : 12.4
728x90

'etc > Error Shooting' 카테고리의 다른 글

[Error Shooting] You must type a file name  (0) 2025.09.05