marți, 11 noiembrie 2008

XML "Beautifier" folosind SAX

Am intalnit in ultima vreme o groaza de pagini XML, aproape imposibil de citit, datorita lipsei de indentare a textului. Mai jos, solutia: Java si parserul SAX. Calea catre fisierul buclucas este data in linie de comanda.


import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.*;


// Pretty printing cu parser SAX
class PPSax extends DefaultHandler
{
String sp=""; // spatii ptr indentare la fiecare linie
public static void main(String argv[])throws Exception {
DefaultHandler handler = new saxEcho();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(new File(argv[0]), handler);
}

// metode din DefaultHandler redefinite (metode callback)
// un marcaj de inceput element
public void startElement(String namespaceURI,String sName,String qName,
Attributes attrs) throws SAXException {
StringBuffer sb = new StringBuffer(sp+ "<"+ qName+" ");
if (attrs != null)
{
for (int i = 0; i < attrs.getLength(); i++)
{
String aName = attrs.getLocalName(i); // Attr name
if ("".equals(aName)) aName = attrs.getQName(i);
sb.append (" " + aName + "="+ "\"" + attrs.getValue(i)+"\" ");
}
}
System.out.println (sb.append (">"));
sp+=" "; // creste indentare dupa un marcaj de inceput
}

// un marcaj de sfarsit element
public void endElement(String namespaceURI, String sName, String qName )
throws SAXException {
sp=sp.substring(2); // scade indentarea
System.out.println (sp+"");
}

// un sir de caractere delimitat de marcaje
public void characters(char buf[], int offset, int len)throws SAXException {
String s = new String(buf, offset, len);
s=s.trim(); // elimina spatii albe
if (s.length()==0)
return; // daca au fost doar spatii albe
System.out.println (sp+ s);
}
}

Niciun comentariu:

Trimiteți un comentariu