Red Hat GLOBAL FILE SYSTEM 4.7 Podręcznik Użytkownika Pobierz pdf (Strona 103)

Operational issues 7–7

obdidx objid objid group

0 1860 0x744 0

1 1856 0x740 0

2 1887 0x75f 0

3 1887 0x75f 0

2. Rename the new file to the original name using the mv command, as shown in the following example:

# mv scratch.new scratch

mv: overwrite ’scratch’? y

# lfs getstripe scratch

OBDS:

0: ost1_UUID

1: ost2_UUID

2: ost3_UUID

3: ost4_UUID

./scratch

obdidx objid objid group

0 1860 0x744 0

1 1856 0x740 0

2 1887 0x75f 0

3 1887 0x75f 0

7.3.3 Reset client nodes after an LBUG error

When an LBUG error occurs on a client node, the client node must be restarted. In the event of an LBUG

error, HP recommends that you reset the client node rather than perform a controlled shutdown and reboot.

This is because an LBUG error results in a client thread becoming permanently unresponsive but continuing

to hold whatever resources/locks it may have; because the resources can never be released, a controlled

shutdown procedure will not complete successfully.

When an LBUG error occurs, messages similar to the following are displayed on the console or in the

/var/log/messages file:

delta51:May 26 17:02:19 src_s@delta51 logger: lustre: upcall: LBUG: A critical

error has been detected by the Lustre server in ldlm_lock.c ldlm_lock_cancel

1042. Please reboot delta51.

delta52:May 26 17:02:20 src_s@delta52 logger: lustre: upcall: LBUG: A critical

error has been detected by the Lustre server in lib-move.c lib_copy_buf2iov

341. Please reboot delta52.

7.3.4 Access to a file system hangs

The Lustre software coordinates activity using a lock manager (LDLM). At any given moment, a client node

holds locks for data that it has read or is writing. For another client node to access the file, the lock must be

revoked.

The Lustre software frees locks as follows:

• When a lock that is held by a client node is requested by another client node, the Lustre software

requests the client node that owns the lock to give back the lock. If the client node in question has just

crashed, the Lustre software must wait for 6 to 20 seconds before concluding that the client is not

responding. At this point, the Lustre software evicts the crashed client node and takes back the lock.

• If a client node has not been in contact for at least 2.25 times the period specified by the Lustre

timeout file system attribute, the Lustre software proactively evicts the client node, but does not

revoke any lock held by the client node until the lock is needed by another client node.

In the second case, it is possible that a lock may not be revoked until several hours after a client node

actually crashed, depending on file access patterns. This explains why a client node may successfully mount

a file system but access to the file system immediately hangs.

1 2 ... 98 99 100 101 102 103 104 105 106 107 108 ... 133 134

Komentarze do niniejszej Instrukcji

Brak uwag

Red Hat GLOBAL FILE SYSTEM 4.7 Podręcznik Użytkownika Strona 103

Komentarze do niniejszej Instrukcji

Powiązane produkty i podręczniki dla Podręczniki do oprogramowania Red Hat GLOBAL FILE SYSTEM 4.7