KXML

From Makers Local 256
Revision as of 08:10, 26 May 2012 by Opticron (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Creator:
Opticron
Status:
Late Implementation
Born On:
00:49, 22 June 2008 (CDT)
Last Updated:
08:10, 26 May 2012 (CDT)

Overview

This project tracks the development of the DOM XML parsing library written in D called KXML. This project was born of the need for an XML parser where none existed that relied on Phobos, supported unparsed cdata nodes, and worked properly with D1.0 (it supports D2.0 as well via the exact same codebase). The parser operates on strings and attempts to allocate as little memory as possible by using slice references into the original string provided. The parser is loosely based on the Yage XML parser, also written in D. KXML has a completely new parsing engine, but attempts to remain mostly api-compatible with the Yage parser. The parser can currently deal with XML processing instructions, comments, unparsed character data, standard, and self-closing nodes. The code is on my SVN server under kxml. There is also a D2/Phobos2 range-oriented version available in libdxml2.

Usage Examples

Create an XML element with the name "foo".

XmlNode foo = new XmlNode("foo");

Add an attribute with the name "bar" and value "foobar"

foo.setAttribute("bar","foobar");

Get the first child node of foo

XmlNode firstchild = foo.getChildren()[0];

Get the attribute named bar

string barval = foo.getAttribute("bar");

Parse a string for XML

XmlNode foo = readDocument(xmlstring);

Search a node's children for a match to the XPath String

XmlNode[]xpathresults = foo.parseXPath("bar/toast");

Do an XPath search with attribute matching

XmlNode[]xpathresults = foo.parseXPath("bar[@type="left" and @lol="cats"]/toast");

Do a search for a toast element arbitrarily deep in the tree whose parent is a bar element

XmlNode[]xpathresults = foo.parseXPath("//bar/toast");

Match on inner xml

XmlNode[]xpathresults = xml.parseXPath(`//td[.="Text 2.3"]`);

Subnode text matching and inequalities

XmlNode[]xpathresults = xml.parseXPath(`//tr[@ab>=9 and th="Head"]/td`);

Quirks

  • The input string may not always be the same as the output string, even if nothing is modified
    • Always outputs XML with double quoted attributes
    • <![CDATA[]]> nodes will be escaped and left as regular, parsed character data

To Do

  • Refactor inheritance as a set of shared interfaces
    • THIS WILL PROBABLY BREAK BACKWARD COMPATIBILITY
    • in Xml(everything in kxml): toString, reset, more?
    • in XmlChild(XmlDoc,XmlNode): parseXPath, addCData, addChild, getChildren, removeChild, getCData, getInnerXML, addChildren
    • in XmlAttribute(XmlNode,XmlPI): removeAttribute, setAttribute, getAttributes, getAttribute, hasAttribute, getName, setName
    • can use casting tests to check types, need to deprecate isXXXXX functions
  • XPath improvements
    • Implement [2] type constraints
  • Improve parsing of malformed xml (<? // blah ?>)