HTTP

Oct 6, 2022

HYPERTEXT TRANSFER PROTOCOL

IETF

IETF

  • The Internet Engineering Task Force (IETF) is the standards group responsible for developing standards for Internet protocol suite
  • The group is non-profit/non-governmental and membership is open
  • The IETF publishes standards documents called RFCs (Request For Comments) that describe methods, behavior, and research applicable to the operation of the Internet

HTTP

HTTP

HyperText Transfer Protocol – application layer protocol that provides a request-response method for interacting with web servers

  • Standardized by the IETF in RFC 2616 and its updates, which describe HTTP/1.1
  • The client submits a request to the server and the server sends back a response
  • The client can be a web browser, web crawler, mobile app, or other application

HTTP Request

– a way for a client to request a resource from a server

HTTP defines several methods/verbs that are used by a client to let the server know how we wish to interact with the resource

  • GET – requests a resource; the most commonly used method
  • HEAD – acts like GET, but asks the server to only return the response headers
  • POST – requests that the server process the enclosed data
  • There are other methods

  • PUT
  • DELETE
  • CONNECT
  • OPTIONS
  • TRACE
  • PATCH

In addition to the method, the request needs to include the path/name of the resource and the HTTP version being used.

  • An initial request line would look something like:

GET /path/to/file/index.html HTTP/1.0

  • You can also provide headers as part of the request (in HTTP/1.0, all are optional, but in HTTP/1.1, the Host: header is required)

Request Headers

Host: - the domain name of the server; this is important for the concept of virtual hosting, where multiple different domains are served from one server

Connection: - specifies the connection type the client prefers

Commonly “Connection: Keep-Alive” which keeps the connection open to handle multiple requests, but HTTP/1.1 defaults to this

Headers Cont’

Accept: - content types that are acceptable in the response

text/html, text/css, image/webp, etc.

User-Agent: - the user agent string of the accessing application; you can use this to determine what web browser/OS your user is using

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36

More Headers

Accept-Encoding: - list of acceptable encodings, including compression

gzip, deflate, br

Accept-Language: – list of acceptable human languages for response – you can use this to display different versions of a web page

en-US,en

Cookie: - a cookie that was previously set by the server (we’ll talk about cookies later)

Even More

Content-Type: - the type of content include in a POST request

Referer: - the address of the previous web page (the misspelling of the word “referrer” is in the standard and in web browser implementations)

GET /home.html HTTP/1.1
Host: developer.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://developer.mozilla.org/testpage.html
Connection: keep-alive
Upgrade-Insecure-Requests: 1
If-Modified-Since: Mon, 18 Jul 2016 02:36:04 GMT
If-None-Match: "c561c68d0ba92bbeb8b0fff2a9199f722e3a621a"
Cache-Control: max-age=0

HTTP Response

the headers and/or content back from the server after an HTTP Request

The first line of the response is called the status line and includes the HTTP version and and the status code

Example: HTTP/1.1 200 OK

The HTTP status code is a numeric code plus a textual reason which describes the the status of the response

Status Codes

1xx – Informational

100 Continue – the server has received the request headers and the client should continue to send the request body

2xx – Success

200 OK – the standard response for HTTP requests which means that the request was received, proper, and the response body contains the content

3xx – Redirection

301 Moved Permanently – this and all future requests should be directed to the given address ( in the Location: header)

302 Found – commonly used as a redirect (not considered permanent, often used dynamically)

4xx – Client Error

400 Bad Request – the request cannot be filled because of bad syntax

401 Authorization Required – the request was valid, but you need to provide an authentication header

403 Forbidden – the request was valid, but the server is refusing access

404 Not Found – requested resource could not be found

5xx – Server Error

500 Internal Server Error – generic error message, oftentimes server side code encountered a problem and was unable to display a message to the browser; check the server logs

Response Headers

In addition to the status line, an HTTP response can contain other headers.

Some common examples:

  • Server: - a name for the server (usually software package/version)
  • Content-Encoding: - the encoding the server returned the content as (since we can provide multiple in the request header)
  • Content-Length: - the size in bytes of the body

Response Headers Pt2

  • Content-Type: - the type of content (text/html, text/css, etc.)
  • Set-Cookie: - set a cookie to be sent back to the server on future requests
  • Location: - used in redirection; specifies the URL to go to
  • Cache-Control – used to tell the browser how to cache (no-cache to prevent caching)

Cookies

Because HTTP is a stateless protocol, which means that each transaction is considered independent from others, there needs to be a way to connect requests to each other. Otherwise, we would never be able to have login sessions, shopping carts, Google spreadsheets, or other persistent data scenarios

The HTTP Cookie is a small piece of data that is sent to your web browser by the server which the web browser will continue to upload for subsequent requests of the server

A common usage of a cookie is to tie your web browser to a specific session on the server. This way, the server can look at the session id provided in the cookie and link it up to server side variables which it can use to send back a page specific to you.

Cookies can also be used to track what you do on the web; third-party cookies are cookies that are set for domains that are different than the domain in the address bar (say, an advertising company)

Internet HTTP & HTML IP Addresses & DNS