Red Hat GLOBAL FILE SYSTEM 4.7 Podręcznik Użytkownika Strona 103

  • Pobierz
  • Dodaj do moich podręczników
  • Drukuj
Przeglądanie stron 102
Operational issues 7–7
obdidx objid objid group
0 1860 0x744 0
1 1856 0x740 0
2 1887 0x75f 0
3 1887 0x75f 0
2. Rename the new file to the original name using the mv command, as shown in the following example:
# mv scratch.new scratch
mv: overwrite ’scratch’? y
# lfs getstripe scratch
OBDS:
0: ost1_UUID
1: ost2_UUID
2: ost3_UUID
3: ost4_UUID
./scratch
obdidx objid objid group
0 1860 0x744 0
1 1856 0x740 0
2 1887 0x75f 0
3 1887 0x75f 0
7.3.3 Reset client nodes after an LBUG error
When an LBUG error occurs on a client node, the client node must be restarted. In the event of an LBUG
error, HP recommends that you reset the client node rather than perform a controlled shutdown and reboot.
This is because an LBUG error results in a client thread becoming permanently unresponsive but continuing
to hold whatever resources/locks it may have; because the resources can never be released, a controlled
shutdown procedure will not complete successfully.
When an LBUG error occurs, messages similar to the following are displayed on the console or in the
/var/log/messages file:
delta51:May 26 17:02:19 src_s@delta51 logger: lustre: upcall: LBUG: A critical
error has been detected by the Lustre server in ldlm_lock.c ldlm_lock_cancel
1042. Please reboot delta51.
delta52:May 26 17:02:20 src_s@delta52 logger: lustre: upcall: LBUG: A critical
error has been detected by the Lustre server in lib-move.c lib_copy_buf2iov
341. Please reboot delta52.
7.3.4 Access to a file system hangs
The Lustre software coordinates activity using a lock manager (LDLM). At any given moment, a client node
holds locks for data that it has read or is writing. For another client node to access the file, the lock must be
revoked.
The Lustre software frees locks as follows:
When a lock that is held by a client node is requested by another client node, the Lustre software
requests the client node that owns the lock to give back the lock. If the client node in question has just
crashed, the Lustre software must wait for 6 to 20 seconds before concluding that the client is not
responding. At this point, the Lustre software evicts the crashed client node and takes back the lock.
If a client node has not been in contact for at least 2.25 times the period specified by the Lustre
timeout file system attribute, the Lustre software proactively evicts the client node, but does not
revoke any lock held by the client node until the lock is needed by another client node.
In the second case, it is possible that a lock may not be revoked until several hours after a client node
actually crashed, depending on file access patterns. This explains why a client node may successfully mount
a file system but access to the file system immediately hangs.
Przeglądanie stron 102
1 2 ... 98 99 100 101 102 103 104 105 106 107 108 ... 133 134

Komentarze do niniejszej Instrukcji

Brak uwag