xHarbour Reference Documentation > Class Reference (textmode)

THtmlDocument()

Creates a new THtmlDocument object.

Syntax

THtmlDocument():new( [<cHtmlDocument>] ) --> oTHtmlDocument

Arguments

<cHtmlDocument>: This is an optional HTML formatted character string as returned from TIpClientHTTP():readAll(). If <cHtmlDocument> does not contain the HTML tags <html>, <head> or <body>, the missing tag is inserted into the HTML document.

Return

Function THtmlDocument() creates the object and method :new() initializes it.

Description

The THtmlDocument() class provides objects for reading and creating HTML files and streams. HTML stands for Hyper Text Markup Language which is the standard file format for documents published in the internet. To learn more about HTML itself, the internet provides very good free online tutorials. The website www.w3schools.com is is a good place to quickly learn the basics on HTML.

A THtmlDocument object maintains an entire HTML document and builds from it a tree of THtmlNode() objects which contain the actual HTML data. The first HTML node is stored in the :root instance variable, which is the root node of the HTML tree. Beginning with the root node, an HTML document can be traversed or searched for particular data. The classes THtmlIteratorScan() and THtmlIteratorRegEx() are available to find a particular HTML node, based on its tag name, attribute or textual content.

Besides the root node, a THtmlDocument object has two standard nodes :head and :body.

Instance variables

:body: <body> node of the HTML document.
:changed: Changed flag.
:head: <head> node of the HTML document.
:root: Root node of the HTML document.

Methods for files and streams

:readFile( <cFileName> ) --> lSuccess: Reads a HTML file.
:toString() --> cHtmlDocument: Creates a HTML formatted string.
:writeFile( <cFileName> ) --> lSuccess: Creates a HTML file.

Methods for searching and navigating

:collect() --> aTHtmlNodes: Returns all nodes of the HTML document.
:findFirst( ... ) --> oTHtmlNode|NIL: Locates the first HTML node containing particular data.
:findFirstRegex( ... ) --> oTHtmlNode|NIL: Locates the first HTML node containing particular data using regular expressions.
:findNext() --> oTHtmlNode|NIL: Finds the next HTML node matching a search criteria.
:getNode( <cTagName> ) --> oTHtmlNode|NIL: Returns the first node matching a tag name.
:getNodes( <cTagName> ) --> aTHtmlNodes

Info

See also: THtmlCleanup(), THtmlInit(), THtmlNode(), TIpClientHttp()

Category: HTML functions , Object functions , xHarbour extensions

Source: tip\thtml.prg

LIB: xhb.lib

DLL: xhbdll.dll

Examples

Creating a simple HTML page

// The example creates a HTML document from a simple HTML string

   PROCEDURE Main
      LOCAL cString  := "<p>Hello <p>world"
      LOCAL oHtmlDoc := THtmlDocument():new( cString )

      ? oHtmlDoc:toString()

      ** output
      // <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
      // <html>
      //  <head>
      //  </head>
      //  <body>
      //   <p>Hello
      //   <p>world
      //  </body>
      // </html>

      THtmlCleanUp()
   RETURN

Loading a HTML page from the internet

// The example loads a HTML page from Google and lists
// the links contained in the HTML response.

   PROCEDURE Main
      LOCAL oHttp, cHtml, hQuery, oHtmlDoc, oNode, aLink

      oHttp:= TIpClientHttp():new( "http://www.google.de/search" )

      // build the Google query
      hQUery := Hash()
      hSetCaseMatch( hQuery, .F. )

      hQuery["q"]    := "xHarbour"
      hQuery["hl"]   := "en"
      hQuery["btnG"] := "Google+Search"

      // add query data to the TUrl object
      oHttp:oUrl:addGetForm( hQuery )

      // Connect to the HTTP server
      IF .NOT. oHttp:open()
         ? "Connection error:", oHttp:lastErrorMessage()
         QUIT
      ENDIF

      // downlowad the Google response
      cHtml   := oHttp:readAll()
      oHttp:close()
      ? Len(cHtml), "bytes received "

      oHtmlDoc := THtmlDocument():new( cHtml )

      oHtmlDoc:writeFile( "Google.html" )

      // ":a" retrieves the first <a href="url"> text </a> tag
      oNode := oHtmlDoc:body:a
      ? oNode:getText(""), oNode:href

      // ":divs(5)" returns the 5th <div> tag
      oNode := oHtmlDoc:body:divs(5)

      // "aS" is the plural of "a" and returns all <a href="url"> tags
      aLink := oNode:aS

      FOR EACH oNode IN aLink
         ? HtmlToOem( oNode:getText("") ), oNode:href
      NEXT
   RETURN

See also:	THtmlCleanup(), THtmlInit(), THtmlNode(), TIpClientHttp()
Category:	HTML functions , Object functions , xHarbour extensions
Source:	tip\thtml.prg
LIB:	xhb.lib
DLL:	xhbdll.dll