Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

Witam

Mam zainstalowane Oracle 11.2.0.2 RAC system na OS Solaria 10.
Co pare dni mam problem ze stabilnością systemu - 1 wezel rebootuje 2gi albo na odwrót, tylko że nie mogę namierzyć dlaczego tak się dzieje...
Mam 2 wezly, nazwane Mars i Venus. Przykładowo dzisiaj wezel Mars zrestartowal 2gi wezel Venus (via IPMI), tylko dlaczego... W alercie db jest tylko zapis ze mial timeout'y wiec pewnie nie przeszedl sciezki komunikacji, ale nic konkretnie...

restart nastapil ok 02:30…


Db alert log na Mars:



Thread 2 advanced to log sequence 5461 (LGWR switch)

Current log# 10 seq# 5461 mem# 0: +DATA/lnred/onlinelog/group_10.308.743814917

Thu Jul 07 01:02:23 2011

Archived Log entry 3047 added for thread 2 sequence 5460 ID 0xa9ba4d33 dest 1:

Thu Jul 07 02:31:28 2011

minact-scn master exiting with err:12751

Thu Jul 07 02:31:43 2011

IPC Send timeout detected. Sender: ospid 1940 [oracle@Mars (MMON)]

Receiver: inst 1 binc 36502 ospid 1949

IPC Send timeout to 1.0 inc 32 for msg type 8 from opid 26

IPC Send timeout: Terminating pid 26 osid 1940

Thu Jul 07 02:31:46 2011

IPC Send timeout detected. Sender: ospid 18305 [oracle@Mars (M000)]

Receiver: inst 1 binc 36502 ospid 1949

IPC Send timeout to 1.0 inc 32 for msg type 8 from opid 75

IPC Send timeout: Terminating pid 75 osid 18305

Thu Jul 07 02:32:07 2011

IPC Send timeout detected. Sender: ospid 1908 [oracle@Mars (LMD0)]

Receiver: inst 1 binc 36502 ospid 1949

Thu Jul 07 02:32:10 2011

Restarting dead background process MMON

Thu Jul 07 02:32:11 2011

MMON started with pid=26, OS id=19687

Thu Jul 07 02:32:15 2011

IPC Send timeout detected. Sender: ospid 1928 [oracle@Mars (LGWR)]

Receiver: inst 1 binc 36528 ospid 1951

IPC Send timeout to 1.1 inc 32 for msg type 73 from opid 20

Thu Jul 07 02:32:19 2011

Communications reconfiguration: instance_number 1

Thu Jul 07 02:32:24 2011

IPC Send timeout to 1.0 inc 32 for msg type 65521 from opid 12

Thu Jul 07 02:32:29 2011

IPC Send timeout detected. Sender: ospid 1928 [oracle@Mars (LGWR)]

Receiver: inst 1 binc 36528 ospid 1955

IPC Send timeout to 1.2 inc 32 for msg type 73 from opid 20

Thu Jul 07 02:33:18 2011

Evicting instance 1 from cluster

Waiting for instances to leave: 1

Thu Jul 07 02:33:38 2011

Remote instance kill is issued with system inc 34

Remote instance kill map (size 1) : 1

LMON received an instance eviction notification from instance 2

The instance eviction reason is 0x20000000

The instance eviction map is 1

Thu Jul 07 02:33:48 2011

Waiting for instances to leave: 1

Thu Jul 07 02:34:18 2011

Waiting for instances to leave: 1

Thu Jul 07 02:34:40 2011

Reconfiguration started (old inc 32, new inc 36)

List of instances:

2 (myinst: 2)

Global Resource Directory frozen

* dead instance detected - domain 0 invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Thu Jul 07 02:34:40 2011

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Thu Jul 07 02:34:40 2011

LMS 1: 1 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Thu Jul 07 02:34:40 2011

Instance recovery: looking for dead threads

Beginning instance recovery of 1 threads

Started redo scan

Completed redo scan

read 3662 KB redo, 135 data blocks need recovery

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Started redo application at

Thread 1: logseq 10409, block 19647

Recovery of Online Redo Log: Thread 1 Group 5 Seq 10409 Reading mem 0

Mem# 0: +DATA/lnred/onlinelog/group_5.303.743814897

Completed redo application of 0.02MB

Completed instance recovery at

Thread 1: logseq 10409, block 26971, scn 2612073761

62 data blocks read, 121 data blocks written, 3662 redo k-bytes read

Thread 1 advanced to log sequence 10410 (thread recovery)

Redo thread 1 internally disabled at seq 10410 (SMON)

Thu Jul 07 02:34:41 2011

Archived Log entry 3048 added for thread 1 sequence 10409 ID 0xa9ba4d33 dest 1:

Thu Jul 07 02:34:42 2011

ALTER SYSTEM SET service_names='lnwyd_service','lnred_service','backup' SCOPE=MEMORY SID='lnred2';

Thu Jul 07 02:34:42 2011

ARC3: Archiving disabled thread 1 sequence 10410

Archived Log entry 3049 added for thread 1 sequence 10410 ID 0xa9ba4d33 dest 1:

Thu Jul 07 02:34:53 2011

minact-scn: Inst 2 is now the master inc#:36 mmon proc-id:19687 status:0x7

minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.9bb067c4 gcalc-scn:0x0000.9bb06ebf

Thu Jul 07 02:35:51 2011

SKGXP: ospid 1908: network interface with IP address 169.254.220.65 no longer running (check cable)

Thu Jul 07 02:36:52 2011

SKGXP: ospid 1908: network interface with IP addressTs is now running

Thu Jul 07 02:37:28 2011

IPC Send timeout detected. Sender: ospid 1886 [oracle@Mars (PMON)]

Receiver: inst 1 binc 36502 ospid 1949

IPC Send timeout to 1.0 inc 32 for msg type 8 from opid 2

Thu Jul 07 02:41:17 2011

db_recovery_file_dest_size of 512000 MB is 80.48% used. This is a

user-specified limit on the amount of space that will be used by this

database for recovery-related files, and does not reflect the amount of

space available in the underlying filesystem or ASM diskgroup.

Thu Jul 07 02:42:28 2011

Reconfiguration started (old inc 36, new inc 38)

List of instances:

1 2 (myinst: 2)

Global Resource Directory frozen

Communication channels reestablished

Thu Jul 07 02:42:28 2011

* domain 0 valid = 1 according to instance 1

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Thu Jul 07 02:42:28 2011

LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Thu Jul 07 02:42:28 2011

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Submitted all GCS remote-cache requests

Fix write in gcs resources

Thu Jul 07 02:42:29 2011

minact-scn: Master returning as live inst:1 has inc# mismatch instinc:0 cur:38 errcnt:0

Reconfiguration complete

Thu Jul 07 02:56:18 2011

db_recovery_file_dest_size of 512000 MB is 80.48% used. This is a

user-specified limit on the amount of space that will be used by this

database for recovery-related files, and does not reflect the amount of

space available in the underlying filesystem or ASM diskgroup.

Thu Jul 07 04:09:37 2011

Thread 2 advanced to log sequence 5462 (LGWR switch)

Current log# 8 seq# 5462 mem# 0: +DATA/lnred/onlinelog/group_8.306.743814913

Thu Jul 07 04:09:39 2011





Db alert na Venus:



ed Jul 06 22:09:06 2011

Deleted Oracle managed file +FRA/lnred/archivelog/2011_07_06/thread_1_seq_10394.415.755808529

Deleted Oracle managed file +FRA/lnred/archivelog/2011_07_06/thread_2_seq_5418.421.755808553

Archived Log entry 3044 added for thread 1 sequence 10408 ID 0xa9ba4d33 dest 1:

Thu Jul 07 02:42:16 2011

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Private Interface 'aggr2:1' configured from GPnP for use as a private interconnect.

[name='aggr2:1', type=1, ip=169.254.16.251, mac=00-1b-21-49-f4-34, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]

Public Interface 'aggr1' configured from GPnP for use as a public interface.

[name='aggr1', type=1, ip=10.122.10.150, mac=00-1b-21-49-f4-30, net=10.122.10.0/23, mask=255.255.254.0, use=public/1]

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST

Autotune of undo retention is turned on.

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up:

Oracle Database 11g Release 11.2.0.2.0 - 64bit Production

With the Real Application Clusters option.

Using parameter settings in server-side pfile /md1/app/oracle/product/11.2.0/db1/dbs/initlnred1.ora

System parameters with non-default values:

processes = 300

sessions = 480

timed_statistics = TRUE

spfile = "+DATA/lnred/spfilelnred.ora"

nls_language = "POLISH"

nls_territory = "POLAND"

nls_length_semantics = "CHAR"

resource_manager_plan = ""

sga_target = 8G

control_files = "+DATA/lnred/controlfile/current.256.741561271"

db_block_size = 8192

compatible = "11.2.0.0.0"

cluster_database = TRUE

db_create_file_dest = "+DATA"

db_recovery_file_dest = "+FRA"

db_recovery_file_dest_size= 500G

thread = 1

undo_tablespace = "UNDOTBS1"

instance_number = 1

remote_login_passwordfile= "EXCLUSIVE"

db_domain = ""

dispatchers = "(PROTOCOL=TCP) (SERVICE=lnredXDB)"

remote_listener = "lnscan.adln.local:1521"

audit_file_dest = "/md1/app/oracle/admin/lnred/adump"

optimizer_features_enable= "11.1.0.7"

audit_trail = "DB"

db_name = "lnred"

open_cursors = 300

pga_aggregate_target = 8G

aq_tm_processes = 1

diagnostic_dest = "/md1/app/oracle"

Cluster communication is configured to use the following interface(s) for this instance

169.254.16.251

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Thu Jul 07 02:42:25 2011

PMON started with pid=2, OS id=1914





OS messages na Venus:



Jul 7 02:34:38 Venus unix: [ID 836849 kern.notice]

Jul 7 02:34:38 Venus ^Mpanic[cpu6]/thread=ffffffff9bec7e60:

Jul 7 02:34:38 Venus genunix: [ID 156897 kern.notice] forced crash dump initiated at user request

Jul 7 02:34:38 Venus unix: [ID 100000 kern.notice]

Jul 7 02:34:38 Venus genunix: [ID 655072 kern.notice] fffffe80017ceeb0 genunix:kadmin+517 ()

Jul 7 02:34:38 Venus genunix: [ID 655072 kern.notice] fffffe80017cef00 genunix:uadmin+c7 ()

Jul 7 02:34:38 Venus genunix: [ID 655072 kern.notice] fffffe80017cef10 unix:brand_sys_syscall+21d ()

Jul 7 02:34:38 Venus unix: [ID 100000 kern.notice]

Jul 7 02:34:38 Venus genunix: [ID 672855 kern.notice] syncing file systems...

Jul 7 02:34:39 Venus genunix: [ID 433738 kern.notice] [6]

Jul 7 02:34:39 Venus genunix: [ID 733762 kern.notice] 220

Jul 7 02:34:40 Venus genunix: [ID 433738 kern.notice] [6]

Jul 7 02:34:40 Venus genunix: [ID 733762 kern.notice] 205

Jul 7 02:34:42 Venus genunix: [ID 433738 kern.notice] [6]



nie za wiele info, ale na Venus grid log grid\log\venus\crsd\crsd.log znalazlem:



2011-07-07 02:02:10.052: [ CRSPE][47] {1:3804:25345} Processing PE command id=25249. Description: [Stat Resource : 3703910]

2011-07-07 02:02:10.054: [UiServer][48] {1:3804:25345} Done for ctx=32ca350

Here are same rubbishes like (maybe it has same problem with comunication): [NULL][NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL] [NUL]

[ clsdmt][3]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=venusDBG_CRSD))

2011-07-07 02:41:48.808: [ clsdmt][3]PID for the Process [1619], connkey 1



a na Mars grid log grid\log\mars\crsd\crsd.log :



2011-07-07 02:19:42.393: [ AGFW][42] {0:8:6} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:58

2011-07-07 02:19:42.393: [ AGFW][42] {0:8:6} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:58

2011-07-07 02:20:48.482: [ CRSCOMM][39][FFAIL] Ipc: Couldnt clscreceive message, no message: 11

2011-07-07 02:20:48.482: [ CRSCOMM][39] Ipc: Client disconnected.

2011-07-07 02:20:48.482: [ CRSCOMM][39][FFAIL] IpcL: Listener got clsc error 11 for memNum. 8

2011-07-07 02:20:48.482: [ CRSCOMM][39] IpcL: connection to member 8 has been removed

2011-07-07 02:20:48.483: [CLSFRAME][39] Removing IPC Member:{Relative|Node:0|Process:8|Type:3}

2011-07-07 02:20:48.483: [CLSFRAME][39] Disconnected from AGENT process: {Relative|Node:0|Process:8|Type:3}

2011-07-07 02:20:48.483: [ AGFW][42] {2:44971:2690} Agfw Proxy Server received process disconnected notification, count=1

2011-07-07 02:20:48.483: [ AGFW][42] {2:44971:2690} /md1/app/11.2.0/grid/bin/oraagent_oracle disconnected.

2011-07-07 02:20:48.483: [ AGFW][42] {2:44971:2690} Agent /md1/app/11.2.0/grid/bin/oraagent_oracle[18546] stopped!

2011-07-07 02:20:48.483: [ CRSCOMM][42] {2:44971:2690} IpcL: removeConnection: Member 8 does not exist.

2011-07-07 02:20:48.483: [ AGFW][42] {2:44971:2690} Created alert : (:CRSAGF00129:) : The agent: /md1/app/11.2.0/grid/bin/oraagent has been disabled because it exceeded the maximum failures (5) over a period of 600 seconds.



w tym czasie Venus OS zostal zrestartowany (via IPMI z Mars)



2011-07-07 02:34:39.963: [ CRSSE][32] Forwarding Node Leave to PE for: venus

2011-07-07 02:34:39.965: [ CRSCCL][34]Reconfig event received by clssgsgrpstat



2011-07-07 02:34:39.965: [ CRSCCL][34]cclGetMemberData called

2011-07-07 02:34:39.979: [ CRSCCL][34]Member (2, 307107, mars:11.2.0.2.0) @ found.



2011-07-07 02:34:39.999: [CLSFRAME][34] CCL MEMBER LEFT:1:1:CRSD:venus

2011-07-07 02:34:39.999: [CLSFRAME][34] Disconnected from CRSD:venus process:



i co dalej... macie jakis pomysl ?

pozdrawiam
Robert
Tomasz Serwański

Tomasz Serwański Microsoft Dynamics
CRM - konsultant
biznesowy

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

ntpd i czasy maszyn masz zsynchronizowane?
Maciek Łukaszewicz

Maciek Łukaszewicz Specjalista , PPL

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

A masz log z ASM ?
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

- czas jest zsynchronizowany,

[oracle@Venus#~]
>ps -ef | grep ntpd
root 600 1 0 lip 07 ? 0:04 /usr/lib/inet/xntpd
oracle 27886 14589 0 15:22:45 pts/1 0:00 grep ntpd

[oracle@Mars#~]
>ps -ef | grep ntpd
root 599 1 0 lip 05 ? 0:10 /usr/lib/inet/xntpd
oracle 8569 20075 0 15:30:58 pts/1 0:00 grep ntpd
[oracle@Mars#~]

- demon ntpd dziala na obu serwerach...

- log z ASM z tego okresu:

1. Na Venus jest z tego dnia dopiero po restarcie wiec chyba taki 'klasyczny' (ponizej), na Mars nie ma nic z tego dnia... (\\venus\root\md1\app\11.2.0\grid\rdbms\log\...)

Dump file /md1/app/11.2.0/grid/rdbms/log/+asm1_ora_1549.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /md1/app/11.2.0/grid
System name: SunOS
Node name: Venus
Release: 5.10
Version: Generic_144489-10
Machine: i86pc
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
Unix process pid: 1549, image: oracle@Venus

*** 2011-07-07 02:41:29.975
2011-07-07 02:41:29.936: [ default]failed to initialize skgp context
2011-07-07 02:41:29.979: [ default]slos op : sslssreghdlr
2011-07-07 02:41:29.979: [ default]slos dep : Error 0 (0)
2011-07-07 02:41:29.979: [ default]slos loc : sskgpinit1
2011-07-07 02:41:29.979: [ default]slos info:
[ CLWAL]clsw_Initialize: OLR initlevel [30000]
2011-07-07 02:41:30.012: [ default]a_init: Unable to get log name. Retval:[-4]
2011-07-07 02:41:30.231: [ GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:313] Result: (0) CLSGPNP_OK. got ASM-Profile.SPFile='+FRA/lncluster/asmparameterfile/registry.253.741483007'
kfspFileNameGet name=+FRA/lncluster/asmparameterfile/registry.253.741483007 ok=1
kgfspb_open dscstr=""
Initialized kgfd context:fffffd7ffcb0c188
2011-07-07 02:41:30.447: [ default]failed to initialize skgp context
2011-07-07 02:41:30.447: [ default]slos op : sslssreghdlr
2011-07-07 02:41:30.447: [ default]slos dep : Error 0 (0)
2011-07-07 02:41:30.447: [ default]slos loc : sskgpinit1
2011-07-07 02:41:30.447: [ default]slos info:
[ CLWAL]clsw_Initialize: OLR initlevel [30000]
2011-07-07 02:41:30.448: [ default]a_init: Unable to get log name. Retval:[-4]
2011-07-07 02:41:30.500: [ GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:313] Result: (0) CLSGPNP_OK. got ASM-Profile.DiscoveryString='/oracle_dev/*'
kgfspb_shallow_discover dscstr="/oracle_dev/*"
kgfdp_query grp FRA
kgfdp_create
kgfdp_initCall ausize=1048576 blksize=4096 grptype=1 dskcnt=1
kgfdp_doQuery sync=2
kgfdp_read
kgfdp_readMeta totPst=2 blksz=4096
kgfdp_readMeta valid blks=2
kgfdp_read copyCnt = 1
kgfdp_read okDskSet size = 1
kgfdpMetaSet_readDta
kgfdp_readDta
kgfdp_read read data blks; checking quorum: okDskSet size = 1
kgfdp_syncDevices dn=0 found=1 valid=1 [/oracle_dev/fra]
POST res = 1
kgfdp_dump
=============== PST ====================
state: 1
--------------- HDR --------------------
next: 1
last: 1
pst count: 1
pst locations: 0
dta size: 1
version: 1
ASM version: 186646528 = 11.2.0.0.0
--------------- LOC MAP ----------------
0: dirty 0 cur_loc: 0 stable_loc: 0
1: dirty 0 cur_loc: 1 stable_loc: 1
--------------- DTA --------------------
0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 1819637623 parts:
kgfdp_query res 1
kgfspb_process_pst returning 1
kgfspb_read from mirror 0 src fffffd7ffc700000
kgfspb_read from mirror 0 src fffffd7ffc700000
Terminating kgfd context fffffd7ffcb0c188

*** 2011-07-07 02:41:31.507
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1

Administruje RAC dopiero od 2 miesięcy :/ więc doświadczenie żadne... Wczesniej pare lat 8i bez RAC... Czy po tych wcześniejszych logach można coś wywnioskować która ścieżka odpytywania zawiodła ? tzn. o ile dobrze rozumiem to ClusterWare sprawdza dostepnosc nodow wysylajac np komunikat przez siec a odbierajac przez dysk w ten sposob sprawdzajac 1 i 2... i na odwrot... co tutaj zawiodlo ? brak odp przez siec czy z dysku ?

pozdrawiam
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

eh, chyba zle sprawdzalem wczesniej, ale teraz tak i jest:

[oracle@Venus#~]
# /usr/sbin/ntpq -p
remote refid st t when poll reach delay offset disp
==============================================================================
*10.122.10.30 goblin.nask.net 3 m 36 64 376 0.32 0.078 1.11

[oracle@Mars#~]
>/usr/sbin/ntpq -p
remote refid st t when poll reach delay offset disp
==============================================================================
*10.122.10.30 goblin.nask.net 3 m 33 64 376 0.32 0.078 1.11
Tomasz Serwański

Tomasz Serwański Microsoft Dynamics
CRM - konsultant
biznesowy

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

demon jest uruchamiany z -x?
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

a jak to sprawdzic ?

konto usunięte

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

Robert Grzesiak:
a jak to sprawdzic ?

ps -ef | grep ntp
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

[oracle@Venus#~]
>ps -ef | grep ntp
root 587 1 0 lip 09 ? 0:04 /usr/lib/inet/xntpd
oracle 6169 3258 0 09:30:15 pts/1 0:00 grep ntp

>ps -ef | grep ntp
root 599 1 0 lip 05 ? 0:18 /usr/lib/inet/xntpd
oracle 17001 20075 0 09:31:23 pts/1 0:00 grep ntp
[oracle@Mars#~]
Tomasz Serwański

Tomasz Serwański Microsoft Dynamics
CRM - konsultant
biznesowy

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

nie do konca.. o ile pamietam - gdzies masz plik konfiguracyjny ntpd; zrob moze:

- find /etc | grep -i ntp | grep -i conf
- przecatuj plik ktory znajdzie, zobaczysz w nim parametry uruchamiania ntpd

w wolnej chwili sie upewnie ze glupot nie pisze, ale raczej nie :)
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

\\venus\root\etc\inet\ntp.conf

# ident "@(#)ntp.client 1.3 00/07/17 SMI"
#
# /etc/inet/ntp.client
#
# An example file that could be copied over to /etc/inet/ntp.conf; it
# provides a configuration for a host that passively waits for a server
# to provide NTP packets on the ntp multicast net.
#

multicastclient 224.0.1.1

a w ... ntp.server

...
# Either a peer or server. Replace "XType" with a value from the
# table above.
server 127.127.XType.0
fudge 127.127.XType.0 stratum 0

broadcast 224.0.1.1 ttl 4

enable auth monitor
driftfile /var/ntp/ntp.drift
statsdir /var/ntp/ntpstats/
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable

keys /etc/inet/ntp.keys
trustedkey 0
requestkey 0
controlkey 0
Tomasz Serwański

Tomasz Serwański Microsoft Dynamics
CRM - konsultant
biznesowy

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

NTP Daemon must be re-configured to support Oracle Grid Infrastructure. The following changes should be made:

- Modify /etc/sysconfig/nptd. Change the following line:
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid"
TO
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

- Modify /etc/sysconfig/ntpclient. Change the following for all OPTIONS lines:
OPTIONS="-u ntp:ntp -p /var/opt/ntpclient/ntpclient.pid -c /etc/opt/ntpclient/ntp.conf"
TO
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -c /etc/opt/ntpclien
t/ntp.conf"

- Restart NTPD by executing „/etc/init.d/ntpclient restart‟ or „service ntpclient restart‟.
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

hmmm, u mnie nie ma /etc/sysconfig ale to chyba piszesz o linuxach... w Solarisie jest xntpd i nie ma takich opcji o jakich piszesz, xntpd startuje u mnie bez parametrow i ma configa takiego jak wczesniej wyslalem, sam xntpd ma opcje ktore mozna mu podac ale nie ma -x ani -u
Tomasz Serwański

Tomasz Serwański Microsoft Dynamics
CRM - konsultant
biznesowy

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

rzuc okiem na oracle'owe dokumenty i poszukaj analogicznych; opcja -x sluzy temu zeby w razie stwierdzenia niezgodnosci czasu na nodzie nie nastepowala doza jednorazowa korekta, ale zeby byl rozlozona w czasie na szereg drobnych; to nie musi byc powodem rebootow i jakich piszesz, ale moze byc jedna z przyczyn - a daje glowe ze solaris ma podobny parametr dla ntpd

rzuc tez okiem na parametry systemu/jadra; oracle sugeruje szereg zmian dla linuksa, dla solarisa pewnie tez; podobny watek znalazlem w sieci, rozwiazaniem bylo:

===

After setting below Oracle Suggested parameters we are not seeing LMON issue .

DRM Parameters :
_gc_undo_affinity= FALSE
_gc_affinity_time=0

gdp Strack parameter :

ksb_disable_diagpid = TRUE

===Tomasz Serwański edytował(a) ten post dnia 11.07.11 o godzinie 11:45
Robert Grzesiak

Robert Grzesiak oracle dba,
LexisNexis Sp. z
o.o.

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

ok, dzieki...
jeszcze macierz podejrzewac bo nie miala oryginalnego sterownika w Solku i cos admin kombinowal... :) szkoda ze z tych logow ciezko namierzyc co nie odpowiedzialo - czy problem mial ze sciezka odpytywania przez siec czy przez dysk...

Temat: problem ze stabilnoscia Oracle 11.2.0.2 RAC na Solaris 10

Skąd pochodzą volumeny dla asm? fc, iscsi? Sprawdź czy tu masz wszystko ok.

Następna dyskusja:

Solaris 10 i Oracle 10g - w...




Wyślij zaproszenie do