Red Hat GLOBAL FILE SYSTEM 4.7 Podręcznik Użytkownika Pobierz pdf (Strona 93)

Using Lustre file systems — performance hints 6–9

6.3.3 Variation of file stripe count with shared file access

When multiple client processes are accessing a shared file, aligning the file layout (file stripe size and file

stripe count) with the access pattern of the application is beneficial. For example, consider a file system with

the following configuration:

• Four Object Storage Servers

• Four SFS20 arrays attached to each Object Storage Server

• Each array populated with eleven 250GB disks, configured as one 2TB LUN with RAID5 redundancy

(that is, a total of 16 OST LUNs, one LUN on each of the SFS20 arrays)

• File system stripe size of 4MB

An application using the file system has the following client access pattern:

• 16 client processes

• Each process accesses a 4MB chunk with a stride of 16 (that is, the file is logically divided into a

number of chunks; the number being a multiple of 16).

In such a configuration, each client process accesses a single OST service for all of its data. This

configuration optimizes both the internal Lustre LDLM traffic and the traffic patterns to an OST service.

6.3.4 Timeouts and timeout tuning

When a file system is created, a Lustre timeout attribute is associated with the file system. The Lustre

timeout attribute, which can be configured, is set to 200 seconds by default. This attribute is used to

calculate the time to be allocated to various Lustre activities. In highly congested or slow networks, and in

cases where client nodes are extremely busy, it may be necessary to increase the default value of the

Lustre timeout attribute for the file system.

This section provides information on how the Lustre timeout attribute is used in various Lustre activities.

Client-side timeouts

When a client node sends an RPC to a server in an HP SFS system, the client node expects to get a response

within the period defined by the Lustre timeout attribute. In normal operation, RPCs are initiated and

completed rapidly, and do not exceed the time allocated for them. If the server does not respond to the client

node within the defined time period, the client node reconnects to the server and resends the RPC.

If a client node times out and reconnects to the server in this way, some time later you may see a message

similar to the following in the server logs:

Sep 13 11:23:57 s8 kernel: Lustre: sfsalias-ost18: haven't heard from

172.32.0.6@vib in 461 seconds. Last request was at 1158142576. I think it's

dead, and I am evicting it.

This message means that the server has detected a non-responsive client connection (that is, there has been

no activity for at least 2.25 times the period specified by the Lustre timeout attribute) and the server is

now proactively terminating the connection.

Note that this is the normal means of evicting client nodes that are no longer present. Client nodes ping

their server connections at intervals of one quarter of the period specified by the Lustre timeout

attribute so that no live client connection will be evicted in this way.

1 2 ... 88 89 90 91 92 93 94 95 96 97 98 ... 133 134

Brak uwag

Red Hat GLOBAL FILE SYSTEM 4.7 Podręcznik Użytkownika Strona 93