Follow

arcserve-KB : UDP Host-Based Agentless backup jobs for one of the node fails with the error Job crashed.

Last Update: 2018-04-19 14:57:23 UTC


Description: 

UDP Host-Based Agentless backup jobs for one of the node fails with the error Job crashed.  after it has been running for 90 hours (3 days and then fails with Job crashed) 
The Backup job keeps waiting for deleting the snapshot for 16 hours,

From Activity Log: 

Information 6/6/2017 14:34 6672 Backup Deleting software snapshot ...
Warning 6/6/2017 23:34 6672 Backup The application with process Id 18052 failed to respond within 10800 second(s). Please stop the process and retry running the job.
Warning 6/6/2017 23:34 6672 Backup The application with process Id 18052 is timeout within 32400 second(s), the process will be terminated.
Error 6/6/2017 23:34 6672 Backup Job crashed.

Cause: 

The DeleteSnapshot operation begins to delete snapshot and goes into a hang state, because it terminated the process while waiting for the vmware snapshot to finish being removed.  The VM size is more than 10 TB, The backup waits for too long and then eventually  fails. 

Environmental:

UDP: 6.5 GA
Vcentre/ESX: 5.5

HBBU-ESX-20170603-010014-653-job6672-pid18052-Exchange 2013 MBX 01 - Server 2012​ 

[2017/06/06 14:34:19:300 00 18052 03436 ] CD2DOffhostESXMachine::DeleteSnapshot::VMDKIo_Cleanup[[MBX-01] Exchange 2013 MBX 01 - Server 2012/Exchange 2013 MBX 01 - Server 2012.vmx, 192.168.156.115, aum\arcserve_svc, The operation was successful, 0] {AFBackend.exe::AFBackupVirtual.dll(4175.0)}
[2017/06/06 14:34:19:300 00 18052 03436 ] CD2DOffhostESXMachine::DeleteSnapshot::Begin deleting snapshot, VMUUID=503f9a26-0ecf-89ce-eda8-255b30af35d0, snapshot=snapshot-41537 {AFBackend.exe::AFBackupVirtual.dll(4175.0)}
[2017/06/06 17:34:08:554 00 18052 16672 0X80070102] WAIT_TIMEOUT-Hang decteted, end process {AFBackend.exe::VhdxParserLib.dll(4175.0)}
[2017/06/06 17:34:08:601 00 18052 16672 0X00000512] Application appears to be unresponsive {AFBackend.exe::AFBackupVirtual.dll(4175.0)}
[2017/06/06 20:34:09:072 00 18052 16672 0X80070102] WAIT_TIMEOUT-Hang decteted, end process {AFBackend.exe::VhdxParserLib.dll(4175.0)}
[2017/06/06 20:34:09:119 00 18052 16672 0X00000512] Application appears to be unresponsive {AFBackend.exe::AFBackupVirtual.dll(4175.0)}
[2017/06/06 23:34:09:333 00 18052 16672 0X80070102] WAIT_TIMEOUT-Hang decteted, end process {AFBackend.exe::VhdxParserLib.dll(4175.0)}
[2017/06/06 23:34:09:349 00 18052 16672 0X00000512] Application appears to be unresponsive {AFBackend.exe::AFBackupVirtual.dll(4175.0)}
[2017/06/06 23:34:09:364 02 18052 16672 ] [INF][DllMain] MergeMgrDll is detached from process. (Path=[C:\Program Files\Arcserve\Unified Data Protection\Engine\bin\], Name=[AFBackend.exe], ID=[18052])
{AFBackend.exe::MergeMgrDll.dll(4175.0)}



Solution:

The  following change on the Proxy Server: 

[HKEY_LOCAL_MACHINE\SOFTWARE\Arcserve\Unified Data Protection\Engine\AFBackupDll] 

'HangDetectionTimeout'=dword:<number of seconds>

In this case it takes 16 hours to Delete Snapshot, the value was set to 18 hours.
For 18 hours, the value should be 64800.

After the above changes the backups have been successful and job crash observed. 

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments