Impala is a piece of Cloudera software which implements an ODBC driver on a desktop, which then connects directly to a Hadoop cluster. In our implementation, our Hadoop cluster sits behind a DMZ, in which we are meant to proxy connections with authentication.
Is there a Proxy solution which would enable this?
The TCP protocol isn't HTTP so I cannot do a basic auth challenge as implemented by most proxies and Apache as a reverse proxy etc.
I've looked at HAProxy, and it appears I might be able to come up with a "config". I would configure it with two listeners, one for HTTP and one for TCP. The HTTP proxy could "push" a connection into a stickiness table based on IP, and could authenticate the user. I'd need a HTTP service on each cluster server (which is there as part of Hadoop anyway). The TCP proxy would be configured with a "dummy" 1st server with a weight of 100, and then the cluster nodes with a weight of zero each. Thus, I hope, users will be directed to the dummy server if they just try to connect, but would get directed to a real server with load balancing if they first hit the HTTP proxy and authenticate themselves.
{Diagram would go here if I had enough reputation}
The drama with this, apart from the fact I don't know if it will work or not, is that the login/password is hard coded (albeit with an encrypted password) into the HAProxy config script. Worst still, the HAProxy has to be bounced for a password change to come into effect. I have over 1,000 users who have forced password resets every few weeks, so I can expect a password reset request probably daily from the forgetful.
So, any ideas? I know of many HTTP proxies with LDAP integration (Apache, Squid etc) but naturally they are all HTTP only. So far the only software proxy I have seen with any Authentication is HAProxy, but it isn't good enough for us.
Given Apache is Open Source, and HAProxy is open source, and both run on Linux, how hard would it be to get a HAProxy wrapper for the mod_ldap library? I mean, enter link description here