Friday, April 20, 2012

Why does Nokogiri's to_xhtml create new `id` attributes from `name`?

Consider the following code:



require 'nokogiri' # v1.5.2
doc = Nokogiri.XML('<body><a name="foo">ick</a></body>')

puts doc.to_html
#=> <body><a name="foo">ick</a></body>

puts doc.to_xml
#=> <?xml version="1.0"?>
#=> <body>
#=> <a name="foo">ick</a>
#=> </body>

puts doc.to_xhtml
#=> <body>
#=> <a name="foo" id="foo">ick</a>
#=> </body>


Notice the new id attribute that has been created.




  1. Who is responsible for this, Nokogiri or libxml2?

  2. Why does this occur? (Is this enforcing a standard?)

    The closest I can find is this spec describing how you may put both an id and name attribute with the same value.

  3. Is there any way to avoid this, given the desire to use the to_xhtml method on input that may have <a name="foo">?



This problem arises because I have some input I am parsing with an id attribute on one element and a separate element with a name attribute that happens to conflict.





No comments:

Post a Comment