Wednesday, 12 October 2016

get request java | get html page java | htmlpage java | set header in get request

get request java | get html page java | htmlpage java | set header in get request

Using HtmlUnit fire get() request. but,some cases response will not given by site properly. Difference like our String response not contain same as text like on our browser response text.
Because of our request is not browser.Actually server will confused it is not understand this is browser request.
so,we set headers,browser(User-Agent) etc in our code.

Here,i used https://scrapemania.blogspot.in for request processing.
you update by your required site url.
also update Host AND Referer in WebRequest Header (startLinkCollectionPageSource() method).

Requrie library :- HtmlUnit

Example :-
import com.gargoylesoftware.htmlunit.HttpMethod;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.net.URL;

/**
 *
 * @author vishal.khokhar
 */
public class RequestDemo {

    private WebClient webClient = new WebClient();

    public static void main(String[] args) {
        new RequestDemo().getPageDemo();
    }

    public void getPageDemo() {
        try {
            String HtmlPage = startLinkCollectionPageSource("https://scrapemania.blogspot.in/");
            System.out.println(HtmlPage);
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

    private String startLinkCollectionPageSource(String url) {
        setWebContents();
        WebRequest webRequest = null;
        String HtmlPage = null;
        try {
            webRequest = new WebRequest(new URL(url), HttpMethod.GET);

            webRequest.setAdditionalHeader("Host", "scrapemania.blogspot.in");
            webRequest.setAdditionalHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36");
            webRequest.setAdditionalHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
            webRequest.setAdditionalHeader("Accept-Language", "en-US,en;q=0.5");
            webRequest.setAdditionalHeader("Accept-Encoding", "gzip, deflate");
            webRequest.setAdditionalHeader("Referer", "https://scrapemania.blogspot.in/");
            webRequest.setAdditionalHeader("Connection", "keep-alive");
            webRequest.setAdditionalHeader("Upgrade-Insecure-Requests", "1");

            HtmlPage htmlPage = webClient.getPage(webRequest);

            HtmlPage = htmlPage.getWebResponse().getContentAsString();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return HtmlPage;
    }

    private void setWebContents() {
        webClient.getOptions().setCssEnabled(false);
        webClient.getOptions().setJavaScriptEnabled(false);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    }

No comments:

Post a Comment