Search posterous

Search all posts and users. Type a name, type a favorite song title, whatever! See what comes up.
  

More posterous blogs











More recommended blogs »

Here are posterous posts filed under xml...

hdknr says...

Apache XML Libraries Configuration Instructions for FIREFOX 3:

WINDOWS (x86):
- Download APACHE XERCES-C++ v2.8.0:
http://www.apache.org/dist/xerces/c/2/binaries/xerces-c_2_8_0-x86-windows-vc_8_0.zip
- Unzip the dowloaded file

- Add the EXERCES bin directory to your PATH environment variable
set PATH=XERCES-INSTALL-DIR\bin;%PATH%

xercesが必要。

Filed under: XML

I noticed the last weeks that Google doesn’t index all my Posterous pages and that while I point my blog Prime Blogger to a .com domain name. My first idea was a missing Google Sitemap. I tried before to add my Posterous site to my Google Webmaster Tools account, but this requires a custom meta tag on your homepage or you need to upload a unique file to your site to proof the ownership. I suggested a function for a Google Sitemap to Brett from Posterous  and he gave me the hint that I’m able to change the HTML using a custom theme and yes this is key to get your site accepted in your Google Webmaster Tools account.

Submit your XML of last 20 Posterous Blogs as a sitemap to your Google Webmaster Tools account. The article explains how proof you Posterous site ownership.

Filed under: xml

Alex says...

I started working with XStream as a tool to quickly generate XML and/or JSON from Scala objects, it looks very powerful, but unfortunately needs some tweaking to produce simple and clean XML.

This code:



println(new XStream.toXML(List(1,2,3)))


produces this XML:

<scala.coloncolon serialization="custom">
  <unserializable-parents/>
  <scala.coloncolon>
    <int>1</int>
    <int>2</int>
    <int>3</int>
    <scala.ListSerializeEnd/>
  </scala.coloncolon>
</scala.coloncolon>

Instead I wanted something like this:



<list>
  <int>1</int>
  <int>2</int>
  <int>3</int>
</list>

Turns out this is possible, by writing your own custom converter:

Here's the code:



import com.thoughtworks.xstream.converters._
import com.thoughtworks.xstream.converters.collections._
import com.thoughtworks.xstream._
import com.thoughtworks.xstream.mapper._
import com.thoughtworks.xstream.io._

class ListConverter( _mapper : Mapper )  extends AbstractCollectionConverter(_mapper) {
  def canConvert( clazz: Class[_]) = {       
    // "::" is the name of the list class, also handle nil
    classOf[::[_]] == clazz || classOf[scala.Nil$] == clazz
  }
  
  def marshal( value: Any, writer: HierarchicalStreamWriter, context: MarshallingContext) = {
    val list = value.asInstanceOf[List[_]]
    for ( item <- list ) {      
      writeItem(item, context, writer)
    }
  }
  
  def unmarshal( reader: HierarchicalStreamReader, context: UnmarshallingContext ) = {
    var list : List[_] = Nil 
    while (reader.hasMoreChildren()) {
      reader.moveDown();
      val item = readItem(reader, context, list);
      list = list ::: List(item) // be sure to build the list in the same order
      reader.moveUp();
    }
    list
  }
}

object ListConverter {
  def configureXStream( stream: XStream ) = {
    stream.alias("list", classOf[::[_]])
    stream.alias("list", classOf[scala.Nil$])
    stream.registerConverter( new ListConverter(stream.getMapper) )        
  }
}


Filed under: xml

Joubert says...

When I first rolled out elev.at, the idea was to do real-time conversion of XLS and Delimited files into XML to make them more readily accessible to code (explained here). However, there are also other kinds of information on the web that can be easily digested by the human eye (and screen readers) but is not easy to use in computation without additional work. This information may be generated from databases or other data sets, but what gets published are, more often than not, the human-readable form and not easy for software to use in computations.

A new extension to elev.at makes it possible to also transform HTML Tables into XML. For example, let's say your app needs the Standard & Poor's Home Price Index for major US metropolitan areas, and these are published on a web site (e.g. on the UCLA Statistics Online Computation Resource - http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_091609_SnP_HomePriceIndex), you can grab this information using elev.at and have it returned in computable form, right from the web page on which it is published.

The query to do that is: http://elev.at/lift?table=http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_091609_SnP_HomePriceIndex&header=t

Elev.at's new /lift?table command is designed for tabular information, displayed on web pages in the TABLE element. 

See the API for documentation and examples.

Filed under: XML

fotis says...

The PHP 4 DOMXML extension has undergone some serious transformation since PHP5 and is a lot easier to use. Unlike SimpleXML, DOM can, at times, be cumbersome and unwiedly. However, it is often a better choice than SimpleXML. Please join me and find out why.

Since SimpleXML and DOM objects are interoperable you can use the former for simplicity and the latter for power. How you can exchange data between the two extensions is explained at the bottom of the article.
The DOM extension is especially useful when you want to modify XML documents , as SimpleXML for example does not allow to remove nodes from an XML document. For this article's code examples we will use the same foundation that we used in the Parsing XML with SimpleXML post.
We will use this very site's google sitemap file, which can be downloaded here. The sitemap.xml file features an xml list of pages of php-coding-practices.com for easy indexing in google.

Loading and Saving XMLDocuments

The DOM extension, just like SimpleXML, provides two ways to load xml documents - either by string or by filename:

$source = 'sitemap.xml';

$dom = new DomDocument();
$dom->load($source);

// load as string
$dom2 = new DomDocument();
$dom2->loadXML(file_get_contents($source));

In addition to that, the DomDocument object provides two functions to load html files. The advantage is that html files do not have to be well-formed to load. Here is an example:

$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test
</body></html>"
);
echo $doc->saveHTML();

The cool news is that mal-formed HTML will automatically be transferred into well-formed one. Look at this script:

$doc = new DOMDocument();
$doc->loadHTML("<html><body><p>Test
</p></body></html>"
);
echo $doc->saveHTML();

The DomDocument::loadHTML() method will automatically add a DTD (Document Type Definition) and add the missing end-tag for the opened p-tag. Cool, isn't it?

< !DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Test
</p></body></html>

Saving XML data with the DOM library is as easy. Just use DomDocument::saveHTML() and DomDocument::saveXML() with no parameters. They will automatically create XML or HTML documents from your xml contents and return them. DomDocument::saveHTMLFile() and DomDocument::save() save to html and xml files. They request a filepath paramter as a string.

XPath Queries

One of the most powerful features of the DOM extension is the way in which it integrates with XPath queries. In fact, DomXpath is much more powerful than its SimpleXML equivalent:

$source = 'sitemap.xml';
$dom = new DomDocument();
$dom->load($source);

$xpath = new DomXPath($dom);
$xpath->registerNamespace('c', 'http://www.google.com/schemas/sitemap/0.84');
$result = $xpath->query("//c:loc/text()");
echo $result->length.'
'
;
//echo $result->item(3)->data;
foreach($result as $b) {
  echo $b->data.'
'
;
}

Notice that the sitemap xml file contains a namespace already, which we register using DomXPath::registerNamespace():

< ?xml version="1.0" encoding="UTF-8"?>

We really have to register that namespace with the DomXPath object or else it will not know where to search. ;) You can also register multiple namespaces, but more on that later. Notice that we use text() within the xpath query to get the actual text contents of the nodes.
If you want to learn the ins and outs of the xpath language, I recommend reading the W3C XPath Reference.

Modifying XML Documents

Adding New Nodes

To add new data to a loaded dom documented, we need to create new DomElement objects by using the DomDocument::createElement(), DomDocument::createElementNS() and DomDocument::createTextNode() methods.
In the following we will add a new url to our urlset:

$source = 'sitemap.xml';
$dom = new DomDocument();
$dom->load($source);

// url element
$url = $dom->createElement('url');


// location
$loc = $dom->createElement('loc');
$text = $dom->createTextNode('http://php-coding-practices.com/article/');
$loc->appendChild($text);

// last modification
$lastmod= $dom->createElement('lastmod');
$text = $dom->createTextNode('2007-04-20T10:24:32+00:00');
$lastmod->appendChild($text);

// change frequency
$changefreq= $dom->createElement('changefreq');
$text = $dom->createTextNode('weekly');
$changefreq->appendChild($text);

// priority
$priority= $dom->createElement('priority');
$text = $dom->createTextNode('0.3');
$priority->appendChild($text);

    // add the elements to the url   
$url->appendChild($loc);
$url->appendChild($lastmod);
$url->appendChild($changefreq);
$url->appendChild($priority);

// add the new url to the root element (urlset)
$dom->documentElement->appendChild($url);

echo $dom->saveHtml();

The code is pretty self-explanatory. First we create a new url element as well as some sub-elements. Then we append those sub-elements to the url element, which we in turn append to the document's root element. Note that the root element can be accessed via the $dom->documentElement property. The output:

....
  <loc>http://php-coding-practices.com/2007/04/</loc>

    <lastmod>2007-04-30T16:54:58+00:00</lastmod>
    <changefreq>yearly</changefreq>
    <priority>0.5</priority>

 
  <url>
    <loc>http://php-coding-practices.com/2007/03/</loc>
    <lastmod>2007-03-29T20:04:51+00:00</lastmod>

    <changefreq>yearly</changefreq>
    <priority>0.5</priority>

  </url>

Now it was certainly not as easy as it would have been had we used SimpleXML. The DOM extension provides many more methods for more power. For example you can associate a namespace with an element
while creation using DomDocument::createElementNS(). I will provide some example code on that later in the article.

Adding Attributes To Nodes

Via DomDocument::setAttribute() we can easily add an attribute to a node object. Example:

$url = $dom->createElement('url');
...
$url->setAttribute('meta:level','3');

Here we set a fictive meta:level attribute with the value 3 to our url NodeElement from above.

Moving Data

Moving data is not as obvious as you might expect, as the DOM extension does not provide a real method that takes care of that, explicitly. Instead we will have to use
a combination of

Great post explaining how we can manipulate XML documents

Filed under: XML

Joubert says...

Elev.at now supports JSON/P callbacks so you no longer need to proxy XML results from Elev.at via a server to your web app - you can now load the XML directly in the browser.

The JSON/P callback is registered by adding the callback parameter to your lift request.

Here's an example output that lists studios in New York based on the Excel spreadsheet published by the NYC Data Mine.

The above output is dynamically generated by calling Elev.at to convert the studio dataset into XML and returning it to the JSON/P callback. We use jQuery to parse the XML and place it into our HTML.

and (click to zoom/download):

Notice that Elev.at returns the JSON data as a single key/value pair of { elevated_xml: "blah blah blah" }

Filed under: XML

Wilhelm says...

Before I moved to Posterous from Symphony, I had to take a serious thought as to how I was going to transfer my old content into my new account. I didn't want to manually transfer my articles over and Posterous doesn't offer any kind of import feature for my system. Besides, either way, I was going to lose all of my visitor's comments. I actually considered trashing all my old content and start fresh just so I wouldn't have to deal with the issue.

So, obviously, I decided to go ahead and find an easy way to move my stuff to a new home. By using a combination of PHP, cURL and RSS I was able to easily accomplish this within minutes. Since I'm such a nice guy, I'll go ahead and show you how I did it.

After digging around this service, I found that Posterous provides a very simple REST-based API which, among other things, allows you to post new content to one of your sites.

I then realized I could use my current site's RSS feed as the source of my content. I went ahead and changed the settings to this feed to display every article I ever wrote on the site.

Next, I cracked my knuckles and got to work writing the following script:

<?php 

set_time_limit(0);

define('IMPORT_RSS', 'http://www.mysite.com/'); // A direct link to the RSS feed you wish to import posts from ...
define('IMPORT_SITE_ID', 69); // The id of one of your Posterous sites ...
define('IMPORT_SITE_EMAIL', 'my@email.com'); // The email address assigned to your Posterous account ...
define('IMPORT_SITE_PASSWORD', 'password'); // The password assigned to your Posterous account ...

$RSS = new SimpleXMLElement(file_get_contents(IMPORT_RSS));

foreach($RSS->channel->item as $Entry)
{
$values = array
(
'site_id' => IMPORT_SITE_ID,
'title' => $Entry->title,
'body' => $Entry->description,
'date' => $Entry->pubDate,
);

$curl = curl_init('posterous.com/api/newpost');

curl_setopt($curl, CURLOPT_USERPWD, IMPORT_SITE_EMAIL . ':' . IMPORT_SITE_PASSWORD);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $values);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

curl_exec($curl);

curl_close($curl);
}

Before you run this script, be sure you have PHP5 compiled with cURL support. Simply change the above constants to their appropriate values, make sure your RSS feed is valid, upload the script and execute it in your browser. If your values are correct, you will start noticing your articles appearing in the proper order in your Posterous account.

Sweet!

Clearly, there are fancier ways to do this, but there's no need to create overly elaborate code I'm only going to be running once.

NOTE:

One thing you should realize when using the API, the REST end points should NOT include any part of the 'http://www.' prefix. Doing so will cause authentication errors even if you're using the correct credentials. This stalled me for a good hour or so and with the limited documentation, I was left to resolve the issue with no help. Hopefully, if you're having the same problem, this will help you out. If not, contact the Posterous support peeps.

Till next time!

Filed under: xml

sigizmund says...

Today I finally hit the task I was scared for so long — processing large XML files on Hadoop. I won’t tell you for how long I crawled the Internet trying to find some working solution… not that anyone wants to know? Eventually, I came out with the solution of my own — even though I hate re-inventing the wheel, in this particular case all the wheels I found were either square or were utterly incompatible with my model of car.

To make things more simple, I won’t include the full source code. I won’t even include the whole InputFormat class. So, to make yourself comfortable, please do following:

  1. Open LineRecordReader from org.apache.hadoop.mapreduce.lib.input so you can see it
  2. Open TextInputFormat from the same package.
  3. Create the input format and record reader of your own, just by copying and pasting the code from aforementioned classes.
  4. Change the constructor of your input format class so it’ll return your newly-defined record reader.

Now, we’re almost there. Now I’ll include the piece of code for nextKeyValue() which turned out to be the most critical method here. Hold on tight:

public boolean nextKeyValue() throws IOException
{
StringBuilder sb = new StringBuilder();
if (key == null)
{
key = new LongWritable();
}
key.set(pos);
if (value == null)
{
value = new Text();
}
int newSize = 0;

boolean xmlRecordStarted = false;
Text tmpLine = new Text();

while (pos < end)
{
newSize = in.readLine(tmpLine,
maxLineLength,
Math.max((int)
Math.min(Integer.MAX_VALUE,
end - pos),
maxLineLength));

if (newSize == 0)
{
break;
}

if (tmpLine.toString().contains("<document "))
{
xmlRecordStarted = true;
}

if (xmlRecordStarted)
{
sb.append(tmpLine.toString().replaceAll("\n", " "));
}

if (tmpLine.toString().contains("</document>"))
{
xmlRecordStarted = false;
this.value.set(sb.toString());
break;
}

pos += newSize;

}

if (newSize == 0)
{
key = null;
value = null;
return false;
}
else
{
return true;
}
}

WTF — you will say? It’s the same code? Well — yes, and no. It’s almost the same. Take a look at this line:

if (tmpLine.toString().contains("<document")) 

and this line:

if (tmpLine.toString().contains("</document>")) 

This is where we actually split the document into chunks. Code is pretty-much self-explaining so I won’t add anything else.

Now, it’s not the most clean and streamlined solution and I probably will spend a while tomorrow making it more production-ready and good-looking, but compared to other solutions, it has few major benefits:

  1. It uses very little custom code (you remember, we copied and pasted all the classes?). Unfortunately you cannot just inherit the class — some fields are private, and we clearly want to modify them.
  2. It’s configurable — you can easily change the <document and </document> strings to anything else (and again, I will do it tomorrow, but now I feel too lazy).
  3. It works.

There’re few limitations of this approach. One of them is that if the document contains something like </document><document> it obviously won’t work. Another is — you still need to parse elements in your mapper (although you can easily change it by parsing records in your record reader into Writable-compatible class).

Have fun!

Update: As you can see, I have added a space in "<document " string constant – today I realised that "<documenttype" elements has been successfully used for splits, hence producing inconsistent results.

Filed under: xml

Designer's Portfolio
Designer’s portfolio features a unique fullscreen xml website template with great motion, a clean design and deeplinking support.

Filed under: xml

SOLO - Music Website Template - with deeplinking
Featuring deep-linking, url error management, playlist maker and integrated twitter reader. Designed with a musician in mind but equally suitable for any artist.

Filed under: xml