3

From Proxy server on Wikipedia:

A proxy server may reside on the user's local computer, or at any point between the user's computer and destination servers on the Internet. A proxy server that passes unmodified requests and responses is usually called a gateway or sometimes a tunneling proxy. A forward proxy is an Internet-facing proxy used to retrieve data from a wide range of sources (in most cases anywhere on the Internet). A reverse proxy is usually an internal-facing proxy used as a front-end to control and protect access to a server on a private network. A reverse proxy commonly also performs tasks such as load-balancing, authentication, decryption and caching.

From HTTP: The Definitive Guide by David Gourley and Brian Totty:

Strictly speaking, proxies connect two or more applications that speak the same protocol, while gateways hook up two or more parties that speak different protocols. A gateway acts as a “protocol converter,” allowing a client to complete a transaction with a server, even when the client and server speak different protocols.

Figure 6-2 illustrates the difference between proxies and gateways:

  • The intermediary device in Figure 6-2a is an HTTP proxy, because the proxy speaks HTTP to both the client and server.

  • The intermediary device in Figure 6-2b is an HTTP/POP gateway, because it ties an HTTP frontend to a POP email backend. The gateway converts web transactions into the appropriate POP transactions, to allow the user to read email through HTTP. Web-based email programs such as Yahoo! Mail and MSN Hotmail are HTTP email gateways.

In practice, the difference between proxies and gateways is blurry. Because browsers and servers implement different versions of HTTP, proxies often do some amount of protocol conversion. And commercial proxy servers implement gateway functionality to support SSL security protocols, SOCKS firewalls, FTP access, and web-based applications.

Difference between a proxy server and a gateway server

A gateway connects two parties that speak the same protocols without modification by the first source, and it connects two parties that speak different protocols and convert between the protocols by the second source. Aren’t the definitions of gateway in the two sources contradicting each other?

Nginx can be used as a reverse proxy server, when the proxied server is a CGI server, according to its official online document. A reverse proxy server is a proxy server by the first source, which connects two parties that speak the same protocols by the second source. Nginx speaks to the client using HTTP, so shouldn’t Nginx also speak to the CGI server using HTTP?

Maggyero
  • 113
  • 5
Tim
  • 1,467
  • 3
  • 25
  • 38

2 Answers2

2

Communication with CGI scripts is done using the Common Gateway Interface. It is a sort of calling convention, where HTTP headers and the requested URL are passed through environment variables, whereas the request data are passed on stdin. This is a local communication as the web server must run the CGI script as a process on the same machine.

FastCGI is a slight variation where everything is passed to the CGI script using a stream socket and a binary protocol. The socket is usually a UNIX socket (so local), but can be also a regular TCP/IP connection.

Piotr P. Karwasz
  • 5,292
  • 2
  • 9
  • 20
1

Contrary to what your sources states, the difference between a proxy and a gateway is not about whether the incoming messages are transformed or not; both types of intermediaries (forwarding agent) can transform incoming messages in reality.

The key difference is explained in § 5.2.3 Components of Roy Fielding’s doctoral dissertation Architectural Styles and the Design of Network-Based Software Architectures (emphasis mine):

Intermediary components act as both a client and a server in order to forward, with possible translation, requests and responses. A proxy component is an intermediary selected by a client to provide interface encapsulation of other services, data translation, performance enhancement, or security protection. A gateway (a.k.a., reverse proxy) component is an intermediary imposed by the network or origin server to provide an interface encapsulation of other services, for data translation, performance enhancement, or security enforcement. Note that the difference between a proxy and a gateway is that a client determines when it will use a proxy.

It is also explained in § 2.3. Intermediaries of Roy Fielding and Julian Reschke’s RFC 7230 Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing (emphasis mine):

HTTP enables the use of intermediaries to satisfy requests through a chain of connections. There are three common forms of HTTP intermediary: proxy, gateway, and tunnel. In some cases, a single intermediary might act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.

[…]

A "proxy" is a message-forwarding agent that is selected by the client, usually via local configuration rules, to receive requests for some type(s) of absolute URI and attempt to satisfy those requests via translation through the HTTP interface. Some translations are minimal, such as for proxy requests for "http" URIs, whereas other requests might require translation to and from entirely different application-level protocols. Proxies are often used to group an organization's HTTP requests through a common intermediary for the sake of security, annotation services, or shared caching. Some proxies are designed to apply transformations to selected messages or payloads while they are being forwarded, as described in Section 5.7.2.

A "gateway" (a.k.a. "reverse proxy") is an intermediary that acts as an origin server for the outbound connection but translates received requests and forwards them inbound to another server or servers. Gateways are often used to encapsulate legacy or untrusted information services, to improve server performance through "accelerator" caching, and to enable partitioning or load balancing of HTTP services across multiple machines. All HTTP requirements applicable to an origin server also apply to the outbound communication of a gateway. A gateway communicates with inbound servers using any protocol that it desires, including private extensions to HTTP that are outside the scope of this specification. However, an HTTP-to-HTTP gateway that wishes to interoperate with third-party HTTP servers ought to conform to user agent requirements on the gateway's inbound connection.

A "tunnel" acts as a blind relay between two connections without changing the messages. Once active, a tunnel is not considered a party to the HTTP communication, though the tunnel might have been initiated by an HTTP request. A tunnel ceases to exist when both ends of the relayed connection are closed. Tunnels are used to extend a virtual connection through an intermediary, such as when Transport Layer Security (TLS, [RFC5246]) is used to establish confidential communication through a shared firewall proxy.

In other words:

  • a proxy is an intermediary whose intermediary nature is known to the client;
  • a gateway (also known as reverse proxy) is an intermediary whose intermediary nature is not known to the client.
Maggyero
  • 113
  • 5
  • 2
    IMHO it the terms have no clearly defined meaning and are often used with overlapping meaning. A secure web gateway as a product is technically an application level proxy. But there are also transparent proxies which are typically not called gateways. And then there are protocol transforming things (IPv4 to IPv6) which are commonly called gateways and not proxies etc. Sometimes "proxy" is used for describing the technical implementation (i.e. proxy vs. DPI) while "gateway" describing the role in the network. – Steffen Ullrich Mar 12 '21 at 17:55
  • @SteffenUllrich Thanks for your insight. But how do you interpret this RFC clause: “an intermediary that acts as an origin server”? – Maggyero Mar 12 '21 at 18:30
  • In case of a HTTP proxy: a client makes an explicit proxy request to the explicitly configured proxy which the proxy translates to a normal HTTP request to the final server. In case of a HTTP reverse proxy: the reverse proxy is the final server from the perspective of the client and so it will make a direct HTTP request to it. The reverse proxy will then forward this request to some different internal server, unbeknownst to the original client. Insofar the reverse proxy (intermediary) acts as the origin server from the perspective of the client. – Steffen Ullrich Mar 12 '21 at 18:47
  • @SteffenUllrich Does the client know the origin server and put both adresses (that of the proxy and that of the origin server) in the HTTP request in the case of an HTTP proxy? – Maggyero Mar 12 '21 at 20:08
  • If not, then, what is the difference between a request to an HTTP *proxy* and a request to an HTTP *origin server*? Is it just the *format* of the `request-target` in the request line? [RFC 7230 § 5.3](https://tools.ietf.org/html/rfc7230#section-5.3) mentions four formats: `request-target = origin-form / absolute-form / authority-form / asterisk-form`, and requires that requests to an HTTP origin server use the `origin-form`, requests to an HTTP proxy use the `absolute-form`, CONNECT requests use the `authority-form`, and OPTIONS requests use the `asterisk-form`. – Maggyero Mar 12 '21 at 20:14
  • 1
    HTTP proxy requests use `http://domain/path` vs. just `/path` for plain HTTP and use `CONNECT` requests to create a tunnel for HTTPS. This way it specifies the origin server the proxy should connect to. – Steffen Ullrich Mar 12 '21 at 20:24
  • @SteffenUllrich I see. Since the client puts only the address of the origin-server in the HTTP request, I was wondering where the address of the proxy was put, and thanks to [this post](https://stackoverflow.com/q/49977508/2326961) I realized that it is actually put in the TCP headers, not in the HTTP request. So the address in an HTTP request always identifies the *origin* server, not the *inbound* server (which might be an *intermediary* server). – Maggyero Mar 12 '21 at 23:19
  • @SteffenUllrich Also, for connecting directly to an origin server, the request-target uses the `origin-form`; for connecting to a proxy, the request-target uses the `absolute-form`; and for connecting to a tunnel, the request-target uses the `authority-form` and the request-method is CONNECT, and after connection establishment the request-target uses the `origin-form`. – Maggyero Mar 12 '21 at 23:25
  • @SteffenUllrich Now to come back to the initial question: the key difference between a *proxy* and a *gateway* (a.k.a. *reverse proxy*), is it correct to describe it the following way? **With a *proxy*, a client knows it is not directly communicating with the origin server. With a *gateway*, a client does not know it is not directly communicating with the origin server?** – Maggyero Mar 12 '21 at 23:39
  • 1
    Like I said initially, these terms are used differently and overlapping in various contexts. In the specific and narrow context of the HTTP standard these terms have the meaning you describe. In other contexts not. – Steffen Ullrich Mar 13 '21 at 06:28
  • @SteffenUllrich Alright. Yes I was specifically interested in the meaning of these terms in the context of the HTTP standard. In the same context, is it correct to say that a proxy replaces the client address by its own address at the TCP level, thereby acting like a client from the origin server’s perspective? – Maggyero Mar 13 '21 at 11:32
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/120821/discussion-between-steffen-ullrich-and-maggyero). – Steffen Ullrich Mar 13 '21 at 14:05
  • @SteffenUllrich FYI, I have found and added to my answer a new authoritative source (Roy Fielding), which is unambiguous this time. So the described difference is actually not restricted to HTTP. – Maggyero May 20 '21 at 15:30