AIX 6.1 won't boot, I need support for make a report of incident

1

As I wrote the issue is now solved, but I don't understand what happened and how I've solved it! Now I have to report to my boss what exactly happened.

Yesterday, after a task for tuning the performance of WebSphere, I have had to reboot my P5, after reboot the system becomes unreachable on its IP on en3.

I had to access on system from HMC console, system were in runlevel 5, where after 1 hour and half of waiting time, I've get the login prompt.

The en3 adapter were down and when i tried to brings it up, command don't end and remained hanging... I've tried also with smit, same result...

I've tried to configure en0 with values of en3 (en0 were unplugged), but same problem of en3, when I press done go confirm new value, smit remained hanging.

I've tried to reboot again, and at boot time, system remained in loading daemons for at least two hours (runlevel 5), but no errors from hmc live console and no login prompt (system is placed in a remote site).

After this two hours, I've decided to call my colleague for start the system with its installation dvd and after, I've mounted rootvg volume, I've replaced original inittab and replaced network configuration with original value on en3; at next boot, system went online in about 10 minutes in good conditions.

Now, I've replaced dvd's inittab with "my original" and looking into errlog it shows many of these error:

---------------------------------------------------------------------------
LABEL:          GOENT_LINK_DOWN
IDENTIFIER:     EC0BCCD4

Date/Time:       Mon Jun 16 19:13:43 CEST 2014
Sequence Number: 11466
Machine Id:      00031A1FD600
Node Id:         gde1mo
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   ent0
Resource Class:  adapter
Resource Type:   14106902
Location:        U787B.001.DNWFS3S-P1-C5-T1

VPD:
      10/100/1000 Base-TX PCI-X Adapter:
        Part Number.................03N6525
        FRU Number..................03N6525
        EC Level....................H14007$
        Brand.......................H0
        Manufacture ID..............YL1021
        Network Address.............00145EB72A12
        ROM Level.(alterable).......GOL021

Description
ETHERNET DOWN

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
FILE NAME
line: 346 file: goent_limbo.c
PCI ETHERNET STATISTICS
0000 0007 0061 0853 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 BB80 00F0 0068 0C00 0000 0000 01A0 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000
DEVICE DRIVER INTERNAL STATE
5555 5555 0000 0000 0000 0000
SOURCE ADDRESS
0014 5EB7 2A12

Diagnostic Analysis
Diagnostic Log sequence number: 4124
Resource tested:        ent0
Menu Number:            25C1902
Description:

No trouble was found with this adapter.  However
Error Log Analysis indicates that there recently may
have been a network problem.

If your Ethernet adapter is connected to a network,
and if you are experiencing problems with network
communications, check for a loose or defective
cable or connection.

If a switch or another system is directly attached
to the Ethernet adapter, verify it is powered up,
configured, and functioning correctly.
---------------------------------------------------------------------------

It's possible that interface en0 (unplugged) configured by me with the same values of en3, has prevented the system to boot?

The error that I've pasted, it's a "warning" or "fatal"?

AIX system boot can be stopped, or slow down so much, for interface configuration mismatch?!

Any other ideas?

ilRobby

Posted 2014-06-17T13:55:47.753

Reputation: 11

Answers

0

Whether it's unplugged or not, AIX might get confused if you have the same IP for en3 as en0. I honestly can't tell you since I've never done it. When you changed your en0 values did you change your en3 values to something different?

Sounds like with the dvd inittab you had no problems. What I'd do first is set all your en3 and en0 values back to what they originally were, and then do your inittab switch. See if booting off the pristine DVD inittab has no issues, and then see if booting off your original inittab then causes you trouble. If that happens then post the two inittabs so we can see the difference.

Also, just curious but have you confirmed this p-series is not part of a cluster? And that there are no etherchannels configured?

ben

Posted 2014-06-17T13:55:47.753

Reputation: 166

I'm a bit confused. You mentined inittab config was changed from dvd, not eth conf from dvd. You had mentioned when you replaced the inittab from the dvd it booted, but changing the inittab back to what you had was able to create the issue. Is this correct? If so, put all the network configs back to how they were and then test the two inittabs. If the one from the DVD works and the one your system has been using doesn't, than we can check the differences in the inittab to see if there's an issue. – ben – 2014-06-20T18:03:45.063

The reason I asked about clustering is if the two servers are in a cluster, you could have had a failover event which effected what you're seeing. If you have a etherchannel, then changing the configs as you are might have also messed something depending on etherchannel setup. Since you have neither cluster or likely not etherchannel, then it's back to testing the inittab. Hope that makes sense! – ben – 2014-06-20T18:04:36.647

Hi @ben, I'm not an aix expert, I've tried that workaround (en3 > en0) to understand if the problem was hw/driver/kernel module... errpt not showed anything about system interfaces after first reboot... No, I've not made any change on en3. I've set only same ip-mask-gw on en0, (no domain and no ns) before setting up en0, when I've tried to bring up en3, with smit, command remained hanging three times at least... Right, first i should have to try to replace only eth conf with dvd, but i didn't think that network config could compromise boot sequence... – ilRobby – 2014-06-17T18:50:35.553

This is small p5 system, with only one lpar, that provides an old, custom, rather useless lotus domino application, that cannot be ported on other platform, or maybe on Windows, but this is another question! Clustered scenario would be useless and expensive for this application, I gave myself two/three hours for troubleshooting just because isn't a critical service! – ilRobby – 2014-06-17T18:51:08.077