Strip HTML



Hi there.

I am currently working on a database/program that will take a certain html
page and store the strings on the page into a table, which will then be used
in reports/queries.

I have started with stripping the HTML tags off the page first. The function
works very well.
But, my problem is that I am not sure how I would illiminate the function
from removing "<br>" .

Here is the function:

'Ensure that strHTML contains something
If Len(strHTML) = 0 Then
stripHTML = strHTML
Exit Function
End If

Dim arysplit, i, j, strOutput

arysplit = Split(strHTML, "<")

'Assuming strHTML is nonempty, we want to start iterating
'from the 2nd array postition
If Len(arysplit(0)) > 0 Then j = 1 Else j = 0

'Loop through each instance of the array
For i = j To UBound(arysplit)
'Do we find a matching > sign?
If InStr(arysplit(i), ">") Then
'If so, snip out all the text between the start of the string
'and the > sign
'IF statement to NOT remove <br> tags.

arysplit(i) = Mid(arysplit(i), InStr(arysplit(i), ">") + 1)
Else
'Ah, the < was was nonmatching
arysplit(i) = "<" & arysplit(i)
End If
Next

'Rejoin the array into a single string
strOutput = Join(arysplit, "")

'Snip out the first <
strOutput = Mid(strOutput, 2 - j)

'Convert < and > to < and >
strOutput = Replace(strOutput, ">", ">")
strOutput = Replace(strOutput, "<", "<")
strOutput = Replace(strOutput, "–", "<")
stripHTML = strOutput


Thanks.
-State
.



Relevant Pages

  • Re: Lots of Response.Writes of HTML - How do YOU do it?
    ... Dimension array ... // and creates a long html string accordingly to display the info. ...
    (microsoft.public.inetserver.asp.general)
  • Re: Need to be able to add print option or save as option to the outpout of this script....
    ... First get all the names and shove them in an array. ... building script, then, start building the page, when you get to the section ... Then write the end bit of code into a string. ... If I'm right then this will display the html in a browser. ...
    (microsoft.public.scripting.vbscript)
  • Re: Strip HTML
    ... 'Ensure that strHTML contains something ... Dim arysplit, i, j, strOutput ... 'Loop through each instance of the array ... snip out all the text between the start of the string ...
    (microsoft.public.access.formscoding)
  • Re: Question about ruby syntax
    ... array of entries containing the html for every tr with a white ... So it takes the page's URI, casts it to a string and replaces the ...
    (comp.lang.ruby)
  • Re: Problem with array only accepting a few rows or elements
    ... always complained about an unterminated string constant. ... rows/elements of the array but when there are more rows in the array, ... The amount of rows accepted by the script varies. ...
    (comp.lang.javascript)

Loading