Joe Williams home
I have been doing some work with Erlang lately and wanted to do figure out how to do XML parsing. After a bit of looking I found Erlsom, which is a XML parsing library for Erlang. It has a few modes including a SAX parser and a "simple sort of DOM parser". I have had experience using Java, xerces and jdom so this sounded good to me. So I created an XML file containing music data, stuff like artist, album, song title and etc. So I decided to screw around with some of the examples found in their docs. First, I had to install Erlsom. I just downloaded the tarball and extracted it and did the normal configure, make, make install. One issue I noticed is a error with non-visual ("^M") characters in the config* files. I just used dos2unix to remove the bad characters and the configure script then worked fine. Then I threw together my XML file, I had some old ID3 tag parsing code from a project of years ago that I used to create it. It basically looked like this. Then I just started up the Erlang console, loaded and parsed the XML file.
[zeusfaber@der-dieb ~]$ erl Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:2] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.6.3 (abort with ^G) 1> {ok, Xml} = file:read_file("music-library.xml"). {ok,<<"\r\n Arms and SleepersLimited Edition EP"...>>} 2> erlsom:parse_sax(Xml, [], fun(Event, Acc) -> io:format("~p~n", [Event]), Acc end). startDocument {processingInstruction,"xml", " version=\"1.0\" encoding=\"UTF-8\""} {startElement,[],"Library",[],[]} {startElement,[],"ArtistName",[],[]} {characters,"Arms and Sleepers"} {startElement,[],"AlbumTitle",[],[]} {characters,"Limited Edition EP"} {startElement,[],"SongTitle",[], [{attribute,"SongDate",[],[],"Unknown"}, {attribute,"SongGenre",[],[],"Unknown"}]} {characters,"We're all in Paris Now (pt. 1)"} --SNIP-- {startElement,[],"ArtistName",[],[]} {characters,"Wolf Parade"} {startElement,[],"AlbumTitle",[],[]} {characters,"At Mount Zoomer"} {startElement,[],"SongTitle",[], [{attribute,"SongDate",[],[],"2008"}, {attribute,"SongGenre",[],[],"Unknown"}]} {characters,"Kissing the Beehive"} {endElement,[],"SongTitle",[]} {endElement,[],"AlbumTitle",[]} {endElement,[],"ArtistName",[]} {endElement,[],"Library",[]} endDocument {ok,[],"\r\n"}
Once it loads the file you can do operations on the data. I did a few counts of artists and songs. The first counts how many times "Wolf Parade" shows up in the 'characters' field (ie {characters,"Wolf Parade"}).
3> CountWolfParade = fun(Event, Acc) -> case Event of {characters, "Wolf Parade"} -> Acc + 1; _ -> Acc end end. #Fun 4> erlsom:parse_sax(Xml, 0, CountWolfParade). {ok,9,"\r\n"}
So I have nine entires of "Wolf Parade". Next, I ran it on the other artist in the XML, Arms and Sleepers.
5> CountArmsAndSleepers = fun(Event, Acc) -> case Event of {characters, "Arms and Sleepers"} -> Acc + 1; _ -> Acc end end. #Fun 6> erlsom:parse_sax(Xml, 0, CountArmsAndSleepers). {ok,6,"\r\n"}
This time I have six. In both cases the counts matched what was in the XML file. Next, I decided to count not a characters field but based on one of the element names, specifically "SongTitle". The count should give me the total number of songs.
7> CountTotalSongs = fun(Event, Acc) -> case Event of {startElement, _, "SongTitle", _, _} -> Acc + 1; _ -> Acc end end. #Fun 8> erlsom:parse_sax(Xml, 0, CountTotalSongs). {ok,15,"\r\n"}
Don't forget that things need to match the '{startElement,[],"SongTitle",[],[]}' directive so your patten needs to look something like '{startElement, _, "SongTitle", _, _}' so it takes into account the empty brackets ('[]'). The pattern matching in Erlang and Erlsom makes parsing pretty easy although I have heard that using Erlang alone to parse XML is troublesome.
Fork me on GitHub