34

Infrastructure: Servers in Datacenter, OS - Debian Squeeze, Webserver - Apache 2.2.16


Situation:

The live server is in use by our cusotmers every day, which makes it impossible to test adjustments and improvements. Therefore we would like to duplicate the inbound HTTP traffic on the live server to one or multiple remote servers in realtime. The traffic has to be passed to the local Webserver (in this case Apache) AND to the remote server(s). Thereby we can adjust configurations and use different/updated code on the remote server(s) for benchmarking and comparison with the current live-server. Currently the webserver is listening to approx. 60 additional ports besides 80 and 443, because of the client structure.


Question: How can this duplication to one or multiple remote servers be implemented?

We have already tried:

  • agnoster duplicator - this would require one open session per port which is not applicable. (https://github.com/agnoster/duplicator)
  • kklis proxy - does only forward traffic to remote server, but does not pass it to the lcoal webserver. (https://github.com/kklis/proxy)
  • iptables - DNAT does only forward the traffic, but does not pass it to the local webserver
  • iptables - TEE does only duplicate to servers in the local network -> the servers are not located in the same network due to the structure of the datacenter
  • suggested alternatives provided for the question "duplicate tcp traffic with a proxy" at stackoverflow (https://stackoverflow.com/questions/7247668/duplicate-tcp-traffic-with-a-proxy) were unsuccessful. As mentioned, TEE does not work with remote servers outside the local network. teeproxy is no longer available (https://github.com/chrislusf/tee-proxy) and we could not find it somewhere else.
  • We have added a second IP address (which is in the same network) and assigned it to eth0:0 (primary IP address is assigned to eth0). No success with combining this new IP or virtual interface eth0:0 with iptables TEE function or routes.
  • suggested alternatives provided for the question "duplicate incoming tcp traffic on debian squeeze" (Duplicate incoming TCP traffic on Debian Squeeze) were unsuccessful. The cat|nc sessions (cat /tmp/prodpipe | nc 127.0.0.1 12345 and cat /tmp/testpipe | nc 127.0.0.1 23456) are interrupted after every request/connect by a client without any notice or log. Keepalive did not change this situation. TCP Packages were not transported to remote system.
  • Additional tries with with different options of socat (HowTo: http://www.cyberciti.biz/faq/linux-unix-tcp-port-forwarding/ , https://stackoverflow.com/questions/9024227/duplicate-input-unix-stream-to-multiple-tcp-clients-using-socat) and similar tools were unsuccessful, because the provided TEE function will write to FS only.
  • Of course, googling and searching for this "problem" or setup was unsuccessful as well.

We are running out of options here.

Is there a method to disable the enforcement of "server in local network" of the TEE function when using IPTABLES?

Can our goal be achieved by different usage of IPTABLES or Routes?

Do you know a different tool for this purpose which has been tested and works for these specific circumstances?

Is there a different source for tee-proxy (which would fit our requirements perfectly, AFAIK)?


Thanks in advance for your replies.

----------

edit: 05.02.2014

here is the python script, which would function the way we need it:

import socket  
import SimpleHTTPServer  
import SocketServer  
import sys, thread, time  

def main(config, errorlog):
    sys.stderr = file(errorlog, 'a')

    for settings in parse(config):
        thread.start_new_thread(server, settings)

    while True:
        time.sleep(60)

def parse(configline):
    settings = list()
    for line in file(configline):
        parts = line.split()
        settings.append((int(parts[0]), int(parts[1]), parts[2], int(parts[3])))
    return settings

def server(*settings):
    try:
        dock_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

        dock_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

        dock_socket.bind(('', settings[0]))

        dock_socket.listen(5)

        while True:
            client_socket = dock_socket.accept()[0]

            client_data = client_socket.recv(1024)
            sys.stderr.write("[OK] Data received:\n %s \n" % client_data)

            print "Forward data to local port: %s" % (settings[1])
            local_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            local_socket.connect(('', settings[1]))
            local_socket.sendall(client_data)

            print "Get response from local socket"
            client_response = local_socket.recv(1024)
            local_socket.close()

            print "Send response to client"
            client_socket.sendall(client_response)
            print "Close client socket"
            client_socket.close()

            print "Forward data to remote server: %s:%s" % (settings[2],settings[3])
            remote_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            remote_socket.connect((settings[2], settings[3]))
            remote_socket.sendall(client_data)       

            print "Close remote sockets"
            remote_socket.close()
    except:
        print "[ERROR]: ",
        print sys.exc_info()
        raise

if __name__ == '__main__':
    main('multiforwarder.config', 'error.log')

The comments to use this script:
This script forwards a number of configured local ports to another local and a remote socket servers.

Configuration:
Add to the config file port-forward.config lines with contents as follows:

Error messages are stored in file 'error.log'.

The script splits the parameters of the config file:
Split each config-line with spaces
0: local port to listen to
1: local port to forward to
2: remote ip adress of destination server
3: remote port of destination server
and return settings

Sise
  • 385
  • 2
  • 4
  • 9

9 Answers9

20

From what you describe, GOR seems to fit your needs. https://github.com/buger/gor/ "HTTP traffic replay in real-time. Replay traffic from production to staging and dev environnements." ?

Arthur Lutz
  • 375
  • 3
  • 11
11

It is impossible. TCP is statefull protocol. User end computer is involved in every step of connection and it will never answer to two separate servers trying to communicate to it. All you can do is collect all http request on webserver or some proxy and replay them. But that will not give and exact concurrency or traffic conditions of a live server.

Kazimieras Aliulis
  • 2,324
  • 2
  • 26
  • 45
  • Duplicating the TCP is impossible-- I'll agree with that. Duplicating the layer 7 traffic isn't. You can capture the requests from the client and play them back to the other servers. Simple 1 request per TCP session playback should be pretty easy. Persistent connections are going to require some thought insofar as how you time the client's additional requests. – Evan Anderson Jan 29 '14 at 15:35
  • @Kazimieras Aliulis: it is not required to communicate with two separate servers. the client is communicating with the primary server = the live server. the live server is processing the client requests and is answering the client. besides processing and answering to the client the primary server is duplicating the requests to the second server = testing server. the responses from the second server to the primary server will be discarded/ignored at the primary server and will not be forwarded to the client. – Sise Jan 30 '14 at 08:01
  • @Evan Anderson: duplication on HTTP level was our first idea as well, but e.g. apache proxy or similar tools or modules do not allow simultaneously process the requests locally and duplicating it to a remote host. if you have any other idea, please advice! :) we are preferring duplication over recording and replaying to get instant comparison results. – Sise Jan 30 '14 at 08:04
  • 1
    @Sise: you could try writing your own http proxy, that passes traffic to two servers. It should be pretty easy to do with python Twisted framework https://twistedmatrix.com/. – Kazimieras Aliulis Jan 30 '14 at 10:29
  • @Kazimieras Aliulis: that's definitely an alternative! i have never heard of it. but checking it out shows that it would fit perfectly to our purpose. We didn't consider python before, but currently we are looking at the Twisted framework and possibilities with general python as well. I will report back if we succeed! – Sise Jan 31 '14 at 07:45
  • We have tried to get the twisted one going, but switched to regular python, because this was (in our opinion) more straight forward. i have attached the python script above. the only downside: because it is not transparent, it receives data and forwards it to 2 destinations (localhost:different port and remotehost:someport). if the script is malfunctioning, the traffic will be lost. so there needs to be some measures to switch ports, if script fails. that leaves the question: is there a way to make it transparent or do we have to go with a script which switches ports in case of malfunction? – Sise Feb 05 '14 at 08:21
  • we did code our own logger and sender, but we could not reach the sending speed as the items are received. Therefore we could not compare the results. Afterwards we tried with two applications: one logging and then, separated from logging we did send it to benchmark the server. but this alternative did not deliver the desired results. we are working now with a combination of logger and using jemter to resend it afterwards. (one part of the jmeter requests are based on the recorded log files, additional scripts cover more complex requests and actions) – Sise Mar 20 '14 at 14:26
  • !!! clone TCP is impossible. BUT, we are saying the TCP traffic, of cause we can clone the traffic to one or multiple TCP connections. – Jiang YD Sep 07 '15 at 01:44
7

Teeproxy could be used to replicate traffic. The usage is really simple:

./teeproxy -l :80 -a localhost:9000 -b localhost:9001
  • a production server
  • b testing server

When you put a HAproxy (with roundrobin) before your webserver you can easily redirect 50% of your traffic to testing site:

         /------------------> production
HAproxy /                 ^
        \                /
         \---- teeproxy -.....> test (responses ignored)
Tombart
  • 2,013
  • 3
  • 27
  • 47
4

TCP, being a stateful protocol, isn't amenable to simply blasting copies of the packets at another host, as @KazimierasAliulis points out.

Picking up the packets at the layer of TCP termination and relaying them as a new TCP stream is reasonable. The duplicator tool you linked to looks like your best bet. It operates as a TCP proxy, allowing the TCP state machine to operate properly. The responses from your test machines will just be discarded. That sounds like it fits the bill for what you want exactly.

It's unclear to me why you've written off the duplicator tool as unacceptable. You will have to run multiple instances of the tool since it only listens on a single port but, presumably, you want to relay each of those different listening ports to different ports on the back-end system. If not, you could use iptables DNAT to direct all the listening ports to a single listening copy of the duplicator tool.

Unless the applications you're testing are dirt simple I expect that you're going to have problems with this testing methodology relating to timing and internal application state. What you want to do sounds deceptively simple-- I expect you're going to find a lot of edge cases.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
  • yes, you are completely right, the agnoster duplicator tool would fit our requirements except of the multi port situation. Also the discarding of the responses of the test machine is fullfilled. To achieve our goal of simulating the real/live situation as accurately as possible we can't bundle all ports on the live server to one single port on the test machine. Different ports are used to divide client devices into different customers. Thereby, we have to open 60-70 sessions of this duplicator tool. This is not very practical as you can imagine. – Sise Jan 30 '14 at 08:34
  • @Sise - Computers are good at doing tedious things. I'd think you could write a script to parse your Apache configurations and spit out the necessary command lines to run 60 - 70 instances of the duplicator tool. I can't imagine the duplicator tool is very resource intensive but, even if it were, you could run those 60 - 70 instances on another machine and do some network trickery to get the traffic over there. To me, at least, that seems completely practical and a pretty straightforward way to handle this. – Evan Anderson Jan 30 '14 at 12:32
1

I'm trying to do something similar, however, if you are simply trying to simulate the load on a server I would look at something like a load-testing framework. I've used locust.io in the past and it worked really well for simulating a load on a server. That should allow you to simulate a large number of clients and let you play with the configuration of the server without having to go through the painful process of forwarding traffic to another server.

0

As far as "we would like to duplicate the inbound HTTP traffic on the live server to one or multiple remote servers in realtime", there's one way not mentioned above, which is configuring a mirror port on the switch it's connected to.

In the case of Cisco Catalyst switches, this is called SPAN (more info here). In a Cisco environment you can even have the mirrored port on a different switch.

But the purpose of this is for traffic-analysis so it will be uni-directional - keyword in quoted text in first paragraph above: inbound. I don't think that port will allow any return traffic, and if it did, how would you deal with duplicate return traffic? That will probably just wreak havoc with your network.

So... just wanted to add one possibility to your list, but with the caveat that it will be indeed for one-way traffic. Maybe you can put a hub on that mirror port and have duplicate server replies handed by some local client simulator that would pick up initiated sessions and respond to, but then you would be duplicating incoming traffic to your duplicate server... probably not what you want.

James
  • 131
  • 2
  • we have thought about that, I have read about the alternative of using SPAN. But, because the servers are located in a data center of a third party provider, we have limited possibilities when it comes to hardware changes. I have already requested to connect 2 servers on a second nic directly. This action combined with a local network for just these 2 servers would allow me to use IPTABLES with TEE. But to go for this alternative we would need to change the external IPs of the servers, which is a NoGo because client devices are configured to connect to the set IP. – Sise Jan 30 '14 at 08:07
0

I have also written a reverse proxy / load balancer for a similar purpose with Node.js (it is just for fun, not production ready at the moment).

https://github.com/losnir/ampel

It is very opinionated, and currently supports:

  • GET Using round-robin selection (1:1)
  • POST Using request splitting. There is no concept of "master" and "shadow" -- the first backend that responds is the one that will serve the client request, and then all of the other responses will be discarded.

If someone finds it useful then I can improve it to be more flexible.

losnir
  • 101
  • 1
  • Node.js is a very strange choice of language for an application such as this which is going to require very high performance. I'm not sure this will ever be production ready. – Michael Hampton Jul 23 '18 at 13:30
  • You are absolutely right. This was not meant to be highly performant -- just easy to write (for me). I think it depends on the required load. I was able to achieve a little bit over 1,000rps on a low end machine though (2 cores). – losnir Jul 23 '18 at 13:40
0

my company had similar requirement, to clone a packet and send to another host (we run market data simulators and needed a temporary solution that would listen to a market data TCP feed, ingest each packet but also send a clone of each packet to another simulator server)

this binary runs very well, its a version of TCP Duplicator but written in golang instead of jscript, so its way faster, and works as advertised,

https://github.com/mkevac/goduplicator

perfecto25
  • 288
  • 3
  • 7
-1

there is a tool created by a guy from a Chinese company, and maybe it is what you need: https://github.com/session-replay-tools/tcpcopy

Musikoder
  • 1
  • 1
  • 2
    Hi and welcome to serverfault. Please can you provide a more detailed answer? What does the program does exactly? Is it written in C...? – bgtvfr Oct 16 '18 at 09:38