oracleops-support: A Huge (Pages) Performance Problem

A Huge (Pages) Performance Problem

A recent late-night incident at a client site involved a database with extreme performance problems following a storage system failure and subsequent database restart. The client observed excessive swapping and paging with the run queue length breaking triple digits and the server and database becoming virtually unresponsive. The server had not been restarted and the database instance parameters were confirmed to be identical to the settings prior to the database bounce. The SGA was on the order of 50 GB with the buffer cache consuming most of that. After a period of speculation about what might be causing the problem we noticed that the huge pages usage didn’t look quite right. The operating system was RHEL 5.4.

$ cat /proc/meminfo | grep Huge
HugePages_Total: 25600
HugePages_Free:  25431
HugePages_Rsvd:    169
Hugepagesize:     2048 kB

With a running database featuring a 50 GB SGA there should be more huge pages reserved. They won’t all get reserved upon startup but more than 169 should have been reserved at this point for sure. So what’s the deal with huge pages and could they be behind this performance issue?

There are many DBAs who swear by Linux-64 huge pages when it comes to managing databases with large SGAs, say on the order of 16 GB and larger. An in-depth discussion of memory architecture is beyond the scope of this blog and over my head anyway. Suffice it to say, however, that when SGAs get larger then you might see performance suffer without the built-in efficiency of huge pages. But when you become reliant on their use how do you prevent the trap described above?

As of 11.2.0.2 there is a new instance parameter, USE_LARGE_PAGES, which can be set to one of the following three values:

TRUE – Use huge pages if they’re available (default). Some combination of large and small pages for the SGA is okay.
FALSE – Don’t use huge pages even if they’re available
ONLY – Allocation of huge pages for the entire SGA is required for the database to start

Since the client database above was 11.2.0.3 if we had set USE_LARGE_PAGES=only then when we had tried to bounce the database it wouldn’t have started and we would have seen something like this in the alert log:

Fri Apr 20 15:16:11 2012
Starting ORACLE instance (normal)
****************** Large Pages Information *****************
Parameter use_large_pages = only

Large Pages unused system wide = 25431 (26041344 KB) (alloc incr 4096 KB)
Large Pages configured system wide = 25600 (51200 MB)
Large Page size = 2048 KB

ERROR:
  Failed to allocate shared global region with large pages, unix errno = 12.
  Aborting Instance startup.
  ORA-27137: unable to allocate Large Pages to create a shared memory segment

ACTION:
  Total Shared Global Region size is 51200 MB. Increase the number of
  unused large pages to atleast 25600 (51200 MB) to allocate 100% Shared Global
  Region with Large Pages.
***********************************************************

This would have flagged the problem immediately. We would noticed that there were shared memory segments stuck in memory that couldn’t be reallocated prompting a server reboot, which incidentally did solve the problem.

For questions about huge pages in general and how to configure them see Metalink doc ID 361468.1 or give us a call. It is important to note that use of huge pages is not compatible with Automatic Memory Management (AMM) configured through the instance parameters MEMORY_TARGET and MEMORY_MAX_TARGET. You can use the legacy Automatic Shared Memory Management (ASMM) configured through the instance parameters SGA_TARGET, SGA_MAX_SIZE, and PGA_AGGREGATE_TARGET, however.

7-easy-steps-to-configure-hugepages

oracleops-support

Wikipedia

A Huge (Pages) Performance Problem

No comments:

Post a Comment