正在加载图片...
J User-Level Checkpointing Definition of Checkpointing A checkpoint is a copy of the computers memory that is periodically saved on disk along with the current register settings (last instruction executed, etc. )which would allow the program to be restarted from this point User-level checkpointing is contained within the program itself. Like OS checkpointing, user-level checkpointing saves a program's state for a later restart. Below are some reasons you should incorporate checkpointing at the user level in your code Even with massively parallel systems, runtime for large models is often measured in days As the number of processors increases, there is a higher probability that one of the nodes your job is running on will suffer a hardware failure Not all operating systems support oS level checkpointing Larger parallel models require more lo to save the state of a programUser-Level Checkpointing • Definition of Checkpointing – A checkpoint is a copy of the computers memory that is periodically saved on disk along with the current register settings (last instruction executed, etc.) which would allow the program to be restarted from this point. • User-level checkpointing is contained within the program itself. Like OS checkpointing, user-level checkpointing saves a program's state for a later restart. Below are some reasons you should incorporate checkpointing at the user level in your code. – Even with massively parallel systems, runtime for large models is often measured in days. – As the number of processors increases, there is a higher probability that one of the nodes your job is running on will suffer a hardware failure. – Not all operating systems support OS level checkpointing. – Larger parallel models require more I/O to save the state of a program
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有