URL and URLConnection
1. URL (Uniform Resource Locator)
This represents the universally standard addressing structure indicating where specific web resources (web pages, images, files, etc.) actively exist upon servers scattered across the worldwide internet. Usually, they reflect the following distinct architecture:
https://www.google.com:443/search?q=java&hl=en
- Protocol (https): The established communication rule utilized locally to access the targeted resource.
- Host (www.google.com): The precise domain or IP of the localized server genuinely providing the underlying resource.
- Port (443): The default officially utilized port locally for HTTPS.
- Path (/search): The literal path leading towards a specific internal directory or distinct file within the target server's filesystem.
- Query Parameter (?q=java&hl=en): Additional, structurally organized supplemental values physically appended to adequately query the backend server.
Through exclusively employing Java's explicit java.net.URL class, developers can effortlessly parse and manipulate these intricate URL strings programmatically.
URL url = new URL("https://www.example.com:443/search?q=java");
System.out.println("Host: " + url.getHost()); // www.example.com
System.out.println("Path: " + url.getPath()); // /search
2. URLConnection
The remarkable URLConnection class is fundamentally the top-level abstract entity functionally representing the strictly established communication link established directly between the local application and the remote URL (Web Server). Using this natively, developers can methodically fetch internal server header metadata, or drastically bypass complexities to independently download the primary page body (like HTML content) fundamentally as a raw I/O stream. For widespread modern HTTP communications, the robust subclass definitively termed HttpURLConnection is persistently utilized.
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class URLExample {
public static void main(String[] args) {
try {
// Strictly designate the targeted URL
URL url = new URL("https://www.google.com");
// Initiate a connection strictly pointing to the defined URL
URLConnection conn = url.openConnection();
// Skillfully Buffer and appropriately decode the responded InputStream directly retrieved from the complex external server
BufferedReader br = new BufferedReader(
new InputStreamReader(conn.getInputStream(), "UTF-8")
);
String line;
// Methodically read sequentially universally line-by-line continuously until explicitly hitting the document's absolute tail end (null)
while((line = br.readLine()) != null) {
System.out.println(line);
}
br.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
If you enthusiastically compile and independently execute this particular logic block, the raw internal Java process rigorously hooks exclusively into the specifically provided website interface, completely dumping the completely raw localized HTML source code dynamically straight onto your visible console terminal. This exceptionally elementary fundamental methodology effectively acts intrinsically as the core mechanism powering initial rudimentary Web Crawlers globally.