0

The Problem:

When I try to run gather stats over OpenVPN on my Oracle 19c database, SQL Developer doesn't return the typical "PL/SQL procedure successfully completed" message if it runs for more than a certain amount of time.

Apparently, the connection hangs after a while, and I either need to disconnect from OpenVPN or kill SQL Developer in Windows Task Manager to close it.

My Oracle 19c database and OpenVPN server are on different cloud providers.

Running gather stats on this database typically takes about half an hour.

Running the gather stats command on SQL Developer

What I checked:

  1. Nothing unusual on the Iptables and OpenVPN logs on the OpenVPN Server or on the listener and alert logs on the Oracle 19c Server.

  2. net.ipv4.tcp_keepalive_time and net.netfilter.nf_conntrack_tcp_timeout_established are set to their default values of 7200 (2 hours) and 432000 (5 days) on both machines.

  3. If I connect to the database as system and run:

    select x.sid, x.serial#, x.username, x.status, x.osuser, x.machine, x.program, x.event, x.state, sql.sql_text from v$sqlarea sql, v$session x where x.sql_hash_value = sql.hash_value and x.sql_address = sql.address and x.username = 'myuser';

After about half an hour, I noticed that the session for gather stats is inactive. So I assume that gather stats does indeed run and finish successfully, but just doesn’t return the aforementioned output message.

Gather stats running on the database

Gather stats session inactive after about half an hour

What I tried:

  1. On a smaller database in the same instance, running gather stats over OpenVPN returns the aforementioned success message. This one takes around 10 minutes.

  2. Connecting directly (without OpenVPN) to the database by adding my IP address to the firewall of the cloud provider and running gather stats also returns the aforementioned success message.

  3. Generating a SSH public/private key pair on the Oracle 19c server and using SSH Hosts on SQL Developer, but the connection is very unstable/always resetting.

  4. Setting up a Dante proxy server. Apparently, SQL Developer can only use some kind of special proxy server.

  5. Setting up a IPSEC VPN with StrongSwan. My Windows 10 couldn't establish a connection with it for some reason.

js1018
  • 3
  • 4
  • Not wanting to rain on your parade, but doesn't '19 do this all by itself, automatically? – Phill W. Feb 01 '22 at 13:46
  • If that's the case, I wasn't aware of that. Even so, the problem remains. I haven't tried, but supposedly this will also happen with any other PL/SQL script that runs between 10 and 30 minutes over OpenVPN. – js1018 Feb 04 '22 at 12:59

1 Answers1

0

First off, I was able to confirm that gather stats does indeed finish successfully by running:

select * from v$session_longops where opname like '%Schema%' order by start_time desc;

After that I ran the following tcpdump commands on both servers:

tcpdump -i eth0 -A "(src <myipaddress> or src <myoracle19caddress> or dst <myipaddress> or dst <myoracle19caddress>) and not port ssh and not port openvpn and not icmp" (OpenVPN Server)

tcpdump -i eth0 -A "(src <myipaddress> or src <myopenvpnserveraddress> or dst <myipaddress> or dst <myopenvpnserveraddress>) and not port ssh and not icmp" (Oracle 19c Server)

And discovered that the Oracle 19c server did send the success message, but the OpenVPN server never received it.

After some digging around on various sites, I found out that I misunderstood what net.ipv4.tcp_keepalive_time actually does.

This means that [by default] the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, and then resend it every 75 seconds.

After that I found out about a network configuration of the cloud provider where I have my OpenVPN server hosted.

Idle connections

[...] implement 10-minute connection tracking for IP protocols that have a concept of a connection. This means that inbound packets associated with an established connection are permitted as long as at least one packet is sent or received for the connection within the last 10 minutes. If no packets for the connection have been sent or received for 10 minutes or longer, the idle connection's tracking entries are removed. After the connection's tracking entries have been removed, [...] does not permit additional inbound packets until at least one new outbound packet has been sent. This connection tracking applies to all sources and destinations – both internal and external IP addresses.

With this new information I got around this limitation by setting the keepalive time to a value inferior than 10 minutes by running the following command on the Oracle 19c server:

sysctl -w net.ipv4.tcp_keepalive_time=300 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5

And made these changes permanent by saving them to /etc/sysctl.conf.

Finally, SQL Developer receives the success message and closes the connection.

js1018
  • 3
  • 4