XML validation in Notes
Ian Tree 10 May 2006 14:55:41
Validating XML in the Notes Client
The increased use of XML in all kinds of application contexts dictates that the Notes Developer needs to be able to handle XML in the Notes Client. This article deals with some of the common problems (and their solutions) that can be encountered when trying to validate XML in the Notes Client.
The code presented in this article may be freely used at no charge, use of the code is covered by the terms of the GNU Lesser General Public License
.
- Background - what is validation and why do it?
- Capturing the Input
- Formatting the Input
- Parsing the XML
- Processing the Error Log
- Putting It All Together
- Summing Up
Background - what is validation and why do it?
There are two different validation states for an XML Document. The document can be "well-formed" or the document can be "valid", all "valid" XML documents are by definition also "well-formed". A "well-formed" XML document is one that conforms to all of the syntactic and structural rules of the XML language specification. A "valid" XML document is one that is "well-formed" but also conforms to all of the lexical and content rules of the Document Type Definition (DTD). All XML parsers, both SAX and DOM, generate error conditions when they are asked to parse an XML document that is not "well-formed" but only "validating" parsers generate error conditions when presented with an XML document that is not "valid", and then only when asked to do so. In this article we are looking at working with XML in the Notes Client so we will focus only on the behaviour of the LotusScript XML Parsers. The two LotusScript XML parsers are both "validating" parsers, i.e. they can both validate the conformance of an XML document against it's Document Type Definition (DTD), the DOM parser is provided by an instance of the NotesDOMParser class and the SAX parser by an instance of the NotesSAXParser class. The validation behaviour of both parsers is controlled by a property called "InputValidationOption", the property is an integer and can be set to one of the following three values.
VALIDATE_NEVER (0) - Do NOT perform any validation against the DTD.
VALIDATE_ALWAYS (1) - Always validate the document against the DTD.
VALIDATE_AUTO (2) - This is the default setting and performs validation only when a DTD is explicitly specified in the XML Document.
The DTD specifies the relationships between the different elements in an XML document and the attributes that can be applied to each element, including default values an requires items. For a good tutorial on the validation capabilities of XML DOM Parsers visit the W3Schools. The DTD can provide an extremely comprehensive specification for the "validation envelope" for an XML document. If a detailed DTD is provided for an XML document and the document is validated when it is created, stored or captured then the quantity of validation and defensive code that is needed in any application component that consumes the XML can be greatly reduced. It is also a good design strategy (the Fail Early strategy) to validate any input as thoroughly as possible at the time when it is input, rather than waiting until it is used (possibly in the background) and rejecting the input at that point in time.
Capturing the Input
A Rich Text field is, of course, the natural input vehicle for an XML Document; the formatting capabilities, especially hanging indents are a natural requirement for the efficient entry of XML. However, Rich Text and the way that Rich Text Fields interact with the UI present some unique problems in terms of validation. As it is possible to specify a NotesRichTextItem as an input source to a NotesDOMParser it would at first sight appear to be a trivial exercise to construct a QurySave event on a form that validated the XML in a Rich Text Field and inhibited the Save if the XML was invalid. However the way that Rich Text Fields work in the UI presents a problem in as much as the content of the Rich Text Field is not transferred to the Rich Text Item in the backend document until the Save processing is done. The simple validation method that we identified above would perform the not very useful task of validating the XML that was already saved in the document and ignoring the input that was provided in the UI.
The text content of the Rich Text Field that is being edited can be captured by using the NotesUIDocument.FieldGetText() method. Using this method allows us to capture the content as a single String. There is a limitation on the size of the String that is returned by the method but for most applications this is not a real limitation.
Formatting the Input
Having got the XML input in the form of a string it would be possible to pass it to an XML Parser for validation, however, if there were any problems with the XML the parser would treat the input as being on a single line and report back the location of the error as being on line "1" column "x" where x could be a pretty big number, this would not be too useful for anyone trying to correct the problem in the nicely formatted Rich Text. Therefore it is necessary to do some pre-processing on the XML before submitting it to the parser. People tend to format XML in a conventional manner and we can use the rules of convention to re-format the XML String into a Stream that will be close to the form that the user entered it. The rules are quite simple.
Each start of an element begins on a new line
An end element occurs on the same line as the previous start element
There can only be one end element on a line
The sample of XML below follows the rules of convention as stated above.
?xml version="1.0" encoding="UTF-8"? !-- Default Statistics Definitions for the Domino/SIMON set -- !DOCTYPE simonstats SYSTEM "simonstats.dtd" simonstats platform name="Domino" statsgroup name="Core" statistic name="NRPCSessions" nativename="NET.Port.Sessions.Established.Incoming" type="quantity" cumulative="yes"/statistic statistic name="HTTPSessions" nativename="Http.Accept.ConnectionsAcceted" type="quantity" cumulative="yes"/statistic statistic name="NRPCTransactions" nativename="Server.Trans.Total" type="quantity" cumulative="yes"/statistic statistic name="HTTPRequests" nativename="Http.Worker.Total.RequestsProcessed" type="quantity" cumulative="yes"/statistic /statsgroup /platform /simonstats |
The following code provides a LotusScript function the implements the re-formatting based on the rules of convention. The function takes the string as a parameter and returns the stream containing the formatted XML.
Function FormatStringAsStream(strXMLIn As String) As NotesStream Dim sessCurrent As New NotesSession Dim nstrXMLOut As NotesStream Dim iIndex As Integer Dim iStart As Integer Dim bTerminal As Boolean On Error Goto FmtError Set nstrXMLOut = sessCurrent.CreateStream() If Trim(strXMLIn) = "" Then Set FormatStringAsStream = Nothing Exit Function End If iStart = 1 iIndex = Instr(iStart, strXMLIn, "") While iIndex 0 If Len(strXMLIn) - iIndex > 1 Then bTerminal = False ' Determine if the current element is a terminal If Mid$(strXMLIn, iIndex + 1, 1) = "/" Then bTerminal = True End If Dim iNewIndex As Integer iNewIndex = Instr(iIndex, strXMLIn, "") ' Ignore Malformed XML iIndex = iNewIndex If Not bTerminal Then If Mid$(strXMLIn, iIndex - 1, 1) <> "/" Then iNewIndex = Instr(iIndex, strXMLIn, "") If (iNewIndex 0) Then If Len(strXMLIn) > iNewIndex Then If Mid$(strXMLIn, iNewIndex + 1, 1) = "/" Then iNewIndex = Instr(iNewIndex, strXMLIn, "") If iNewIndex 0 Then iIndex = iNewIndex End If End If End If End If End If End If End If showLT()") End If Wend Set FormatStringAsStream = nstrXMLOut NormalExit: Exit Function FmtError: Set FormatStringAsStream = Nothing Resume NormalExit End Function |
Parsing the XML
The natural choice of parser to use for validating the XML would be the DOM parser. With the DOM parser there is no need to provide any additional coding, with the SAX parser a minimal subset of the SAX events would need to have event handlers coded. The DOM parser also provides a nice XML log containing all of the errors once it has completed parsing. Unfortunately the natural choice turns out to be the wrong one, for rather a strange reason that has to do with the implementation of parsers in LotusScript. In the Java implementation of parsers in Notes the URI reference for the System ID of the Document Type is interpreted as being a Page in the current database, whereas in the LotusScript implementation it is treated as being file in a directory relative to the Notes Executable directory (what? ed.). When a Document Type specification is added to an XML document one of the attributes tells XML processors where they can find the Document Type Definition (DTD). In the example below the Notes/Domino Java implementation of the parsers the "simonstats.dtd" SYSTEM attribute is expected to be the name of a page in the current database (actually what it does is to change the reference to a fully formed Notes URL of the form notes:///__dbRepId/simonstats.dtd).
!DOCTYPE simonstats SYSTEM "simonstats.dtd" |
The LotusScript implementation treats the same definition in a different way, it treats the specification as an operating system file name in a directory relative to the current notes executable directory. A look at any DXL that is generated uses a Document Type specification in the following form.
!DOCTYPE document SYSTEM 'xmlschemas/domino_6_5_5.dtd' |
And indeed if you look in the Notes executable directory you will find a sub-directory called "xmlschemas" and a file called "domino_6_5_5.dtd". The LotusScript implementation allows you specify a fully qualified file name but does not support any other forms, such as, a notes URL. This is not ideal, the last thing that we want to do in a notes application is have to worry about deploying DTD file to file system (ahhh now I see, thanks. ed.).
This is where the SAX parser comes to the rescue, not because it handles the Document Type attributes any differently from the DOM parser (hardly surprising since the DOM parser is built on top of the SAX parser), but because the SAX parser supports the SAX_ResolveEntity event which is specifically supplied to identify, locate and return resources that are needed by the XML processing. The following code shows the implementation of a class that provides validation of an XML stream, using the SAX parser, it follows the same rules as the Java implementation for locating the DTD and provides a convenient Log as is done with the DOM parser.
' ' CLASS: XMLValidationHelper ' ' This class contains functions for preparing XML streams for validation ' Author Ian Tree - HMNL ' Version 1.2.1/01 ' Class XMLValidationHelper Private strClassVersion As String Private strLog As String Private iErrorCount As Integer ' ' Constructor ' Sub new() strClassVersion = "1.2.1/01" End Sub ' ' ValidateXML ' ' This method will validate the XML Stream passed using the DTD (if it can be located) ' it will create a DOMParserLog and throw an error if any errors are detected. ' Function ValidateXML(nstrXMLIn As NotesStream) Dim sessCurrent As New NotesSession Dim nspValidate As NotesSAXParser Me.strLog = "?xml version=""1.0"" encoding=""UTF-8""?DOMParserLog" iErrorCount = 0 On Error Goto InternalError If nstrXMLIn Is Nothing Then Me.iErrorCount = Me.iErrorCount + 1 Me.strLog = Me.strLog + "fatalerrorNo XML Stream was passed to the Validator./fatalerror" Else If nstrXMLIn.Bytes = 0 Then Me.iErrorCount = Me.iErrorCount + 1 Me.strLog = Me.strLog + "fatalerrorThe XML Stream was passed to the Validator was empty./fatalerror" Else ' Construct the SAX Parser to do the validation Set nspValidate = sessCurrent.CreateSAXParser(nstrXMLIn) ' Set the Parser options nspValidate.ExitOnFirstFatalError = True nspValidate.InputValidationOption = VALIDATE_ALWAYS ' Setup handlers for the events that we want to process On Event SAX_Error From nspValidate Call HandleSAX_Error On Event SAX_FatalError From nspValidate Call HandleSAX_FatalError On Event SAX_Warning From nspValidate Call HandleSAX_Warning On Event SAX_ResolveEntity From nspValidate Call HandleSAX_ResolveEntity Call nspValidate.Parse() End If End If NormalExit: Me.strLog = Me.strLog + "/DOMParserLog" On Error Goto 0 If Me.iErrorCount > 0 Then Error 4602 ' Simulate DOM Parser Error End If Exit Function InternalError: If Err() = 4603 And Me.iErrorCount 0 Then Resume NormalExit End If Me.iErrorCount = Me.iErrorCount + 1 Me.strLog Me.strLog + "fatalerror line=""0""LotusScript Error in ValidateXML (" + Str(Err()) + ") " + Error$() + " at " + Str(Erl()) + " ./fatalerror" Resume NormalExit End Function ' Handler routines for SAX events raised by the ValidateXML function ' Handle SAX_Error Sub HandleSAX_Error(Source As NotesSAXParser, nsxCurrent As NotesSAXException) Me.iErrorCount = Me.iErrorCount + 1 Me.strLog = Me.strLog + "error line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/error" End Sub ' Handle SAX_FatalError Sub HandleSAX_FatalError(Source As NotesSAXParser, nsxCurrent As NotesSAXException) Me.iErrorCount = Me.iErrorCount + 1 Me.strLog = Me.strLog + "fatalerror line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/fatalerror" End Sub ' Handle SAX_Warning Sub HandleSAX_Warning(Source As NotesSAXParser, nsxCurrent As NotesSAXException) Me.iErrorCount = Me.iErrorCount + 1 Me.strLog = Me.strLog + "warning line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/warning" End Sub ' Handle SAX_ResolveEntity Function HandleSAX_ResolveEntity(Source As NotesSAXParser, Byval strPubID As String, Byval strSysID As String) As Variant Dim sessCurrent As New NotesSession Dim dbCurrent As NotesDatabase Dim ncDesign As NotesNoteCollection Dim strNoteID As String Dim docPage As NotesDocument Dim vName As Variant Dim rtiContent As NotesRichTextItem Dim strEntityValue As String ' Try and locate a page in the current database with the same name as the System ID If Trim$(strSysID) = "" Then ' Let the default SAX Entity resolver try HandleSAX_ResolveEntity = 0 Exit Function End If Set dbCurrent = sessCurrent.CurrentDatabase Set ncDesign = dbCurrent.CreateNoteCollection(False) ncDesign.SelectPages = True Call ncDesign.BuildCollection() If ncDesign.Count = 0 Then ' Let the default SAX Entity resolver try HandleSAX_ResolveEntity = 0 Exit Function End If ' Loop through the pages trying to locate one with the requested name strNoteID = ncDesign.GetFirstNoteId While strNoteID <> "" Set docPage = dbCurrent.GetDocumentByID(strNoteID) vName = docPage.GetItemValue("$TITLE") If Trim(Lcase(vName(0))) = Trim(Lcase(strSysID)) Then strNoteID = "" Else Set docPage = Nothing strNoteID = ncDesign.GetNextNoteId(strNoteID) End If Wend ' If we managed to find the page then use the content as the entity If Not docPage Is Nothing Then Set rtiContent = docPage.GetFirstItem("$Body") If Not rtiContent Is Nothing Then strEntityValue = rtiContent.text HandleSAX_ResolveEntity = strEntityValue Else HandleSAX_ResolveEntity = 0 End If Else ' Let the default SAX Entity resolver try HandleSAX_ResolveEntity = 0 End If End Function ' Getters: Property Get ClassVersion As String ClassVersion = strClassVersion End Property Property Get PLog As String PLog = strLog End Property End Class |
Processing the Error Log
The following "utility" functions are provided to process the content of a DOM Parser Error Log. The functions provide a "style sheet" implementation that formats the complete Log into a string suitable for display in a Messagebox.
' ' FormatParserLogAsString ' ' This method will format the contents of a Parser Log as a string ' Function FormatParserLogAsString(strPLog As String) As String Dim sessCurrent As New NotesSession Dim strResult As String Dim nstrXMLLog As NotesStream Dim nstrXSLT As NotesStream Dim nstrOut As NotesStream Dim xsltTX As NotesXSLTransformer strResult = "Error converting Parser Log." On Error Goto FmtError ' Create a Stream and load the Parser Log Content to it Set nstrXMLLog = FormatStringAsStream(strPLog) If Not nstrXMLLog Is Nothing Then nstrXMLLog.Position = 0 ' Get a stream containing the XSLT for transforming a Parser Log Set nstrXSLT = GetParserLogXSLT() ' Create the stream to contain the outpout Set nstrOut = sessCurrent.CreateStream() ' Create the XSLTransformer and perform the transaformation Set xsltTX = sessCurrent.CreateXSLTransformer(nstrXMLLog, nstrXSLT, nstrOut) Call xsltTX.Process() ' Convert the output stream to a string If nstrOut.Bytes = 0 Then strResult = "Empty conversion for this log." Else nstrOut.Position = 0 strResult = "" Dim strLine As String While Not nstrOut.IsEOS strLine = nstrOut.ReadText(STMREAD_LINE, EOL_ANY) strResult = strResult + Left$(strLine, Len(strLine) - 2) + Chr(10) Wend End If End If Call nstrXMLLog.Close() Call nstrXSLT.Close() Call nstrOut.Close() FormatParserLogAsString = strResult NormalExit: Exit Function FmtError: xsltTX.LogComment = "Error: (" + Str(Err()) + ")" + Error$() + " at " + Str(Erl()) + " while converting Parser Log." FormatParserLogAsString = xsltTX.Log Resume NormalExit End Function ' GetParserLogXSLT ' ' This method will return a stream containing the XSLT for formatting a Parser Log ' Function GetParserLogXSLT() As NotesStream Dim sessCurrent As New NotesSession Dim nstrXSLT As NotesStream Set nstrXSLT = sessCurrent.CreateStream() Call nstrXSLT.WriteText("?xml version=""1.0"" encoding=""UTF-8""?", EOL_CRLF) Call nstrXSLT.WriteText("xsl:transform version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform""", EOL_CRLF) Call nstrXSLT.WriteText("xsl:output method=""text"" /", EOL_CRLF) Call nstrXSLT.WriteText("xsl:template match=""/""", EOL_CRLF) Call nstrXSLT.WriteText("xsl:textError(s) have been detected while validating your XML. /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/fatalerror""", EOL_CRLF) Call nstrXSLT.WriteText("xsl:textFatal Error: at Line /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""line""/",EOL_CRLF) Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF) Call nstrXSLT.WriteText("xsl:text /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF) Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/error""", EOL_CRLF) Call nstrXSLT.WriteText("xsl:textError: at Li /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""@line""/",EOL_CRLF) Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF) Call nstrXSLT.WriteText("xsl:text /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF) Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/warning""", EOL_CRLF) Call nstrXSLT.WriteText("xsl:textWarning: at Line /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""@line""/",EOL_CRLF) Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF) Call nstrXSLT.WriteText("xsl:text /xsl:text", EOL_CRLF) Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF) Call nstrXSLT.WriteText("/xsl:template", EOL_CRLF) Call nstrXSLT.WriteText("/xsl:transform", EOL_CRLF) nstrXSLT.Position = 0 Set GetParserLogXSLT = nstrXSLT End Function |
Putting It All Together
The following QuerySave event handler from a form shows all of the techniques described above intergrated into a complete implementation. The functions described above that are related to formatting of XML documents are all implemented in a single XMLFormatHelper class in this implementation.
Sub Querysave(Source As Notesuidocument, Continue As Variant) ' QuerySave event - validate the XML for the Statistics Definitions Dim sessCurrent As New NotesSession Dim nstrXML As NotesStream Dim nstrVHOut As NotesStream Dim strUIVal As String Dim xfhCurrent As XMLFormatHelper Dim xvhCurrent As XMLValidationHelper On Error Goto ParseError ' Capture the UI field text and convert it to a formatted stream strUIVal = Source.FieldGetText("StatsDefs") Set xfhCurrent = New XMLFormatHelper() Set nstrXML = xfhCurrent.FormatStringAsStream(strUIVal) If nstrXML Is Nothing Then Messagebox "Unable to format the XML." Continue = False Exit Sub End If ' Create a XMLValidationHelper and validate the XML Set xvhCurrent = New XMLValidationHelper nstrXML.Position = 0 Call xvhCurrent.ValidateXML(nstrXML) Continue = True NormalExit: Exit Sub ParseError: If Not xvhCurrent Is Nothing Then Messagebox xfhCurrent.FormatParserLogAsString(xvhCurrent.PLog), MB_OK, "XML Parser Error" Else Messagebox "Unknown error in QuerySave processing." End If Continue = False Resume NormalExit End Sub | |
Summing Up
By adding a few well crafted support routines to your Notes client based applications it is possible to get great additional development value from the use of XML in your applications without a lot of engineering cost. Any LotusScript developer could maintain the QuerySave code presented above without needing to have the detailed knowledge of XML and it's implementation in LotusScript.
Share: | Tweet |
- Comments [0]