13 July 2007

Progress: Authoring SVG websites with Inkscape

About a month ago I wrote about this experiment: create a website in SVG using Inkscape exclusively, put it on the web, link to it, and wait and see if and how search engines (and Google primarily) will index it.
I could present some results earlier, but one stupid mistake made me do have such a delay.

My stupid mistake

To serve a site completely as SVG you have to put a SVG file (index.svg in my case) as DirectoryIndex in the Apache config. Without access to httpd.conf, I used .htaccess for the job, which is just fine.
Not so fine is my stupidity: a few day after the site went online I needed a .htaccess file in another subdomain of my site, so I used the one from the SVG subdomain as a template. But by doing file management with drag and drop using Nautilus over SSH, I moved the file instead of copying it, and I had the directory exposed for a few days without an index file. Just enough for Googlebot, which already was all over it due to tons of links, to index the directory content.

Conclusion

I got to the conclusion just after one week, but waited a full month trying to repair the mistake described above. The conclusion is: no major search engine will index a website made entirely with SVG, will not follow links inside SVG and will not index the text.

My logs show a very large number of visits from spiders: Googlebot, Yahoo Slurp, MSN Bot, even from the Baidu bot, but all those will do is to ask for the website root ("/") and maybe for robots.txt, so the links are not followed. (my robots.txt is empty on purpose, the goal of the experiment was to see what search engines do on their own).

I put inside the SVG pages some unique strings, to query the search engines on them later. Of course the queries return nothing, my pages are not indexed and full-text search can't be performed.

Google Webmaster Tools say "Googlebot last successfully accessed your home page on Jun 19, 2007", the day when I didn't have .htaccess and index.svg was not served as DirectoryIndex.

Thanks

My little project created a lot of interest from my readers, I got a lot of links to my experiment and as a consequence a lot of visits from various bots. Thank you all!
But no thanks to the search engines, which are not able to index pages made with SVG, a W3C standard. Shame on you!

13 comments:

  1. It's also unfortunate that Firefox doesn't let you select the text for copying.

    ReplyDelete
  2. Yes, there are a lot of imperfections when viewing a SVG website with Firefox, but for me the lack of indexing is the biggest killer, if your site is not in Google, it doesn't exist...

    ReplyDelete
  3. Reading your blog and articles was absolutely interesting for me.

    You've done very well work.

    Giving all of us samples of your great knowledge about inkscape.
    Thx thx thx for your helpful tuturials and svg's...

    For me as a svg newbie trying to get html pages well done with inkscape this is top information.

    greetings from austria...

    zyko - Helmut

    ReplyDelete
  4. zyko, you can read more Inkscape tutorials here: inkscapetutorials.wordpress.com, is an aggregation of tutorials written by a lot of people.

    ReplyDelete
  5. planning another attempt in the future? I wonder if anyone at google noticed your experiment...

    ReplyDelete
  6. Nice experiment. Leave your site up and I'll put a link on my blog.

    Did you contact anyone at Google?

    ReplyDelete
  7. rmgraham, indeed, that was one of my intentions, maybe someone at Google will see this.

    Jeff Schiller this is what I plan to do: leave it up, it waste no resource from me and I will see in logs it it ever get indexed.
    And no, I have not contacted anyone at Google, have they a right way to submit feature requests? I won't go and rant on random discussion groups.

    ReplyDelete
  8. It's worth noting that not a whole lot of people are doing web sites entirely in SVG and most of the content of an SVG is not worth indexing as searchable text.

    So why should Google or any search engine dedicate resources to indexing the contents of an SVG file?

    Nice experiment and all. I did one on whether Google indexed text content in JavaScript that got me Slashdotted. But considering all the AJAX created content, that was valuable SEO knowledge.

    This experiment is basically a lark. Until more browsers support SVG and render it in a relatively consistent way across the platforms, an Inkscape generated site in SVG is an oddity and your "shame on you" to Google is basically laughable.

    ReplyDelete
  9. Doesn't Work At McDonalds wrote:
    > It's worth noting that not a whole lot
    > of people are doing web sites entirely
    > in SVG and most of the content of an
    > SVG is not worth indexing as searchable
    > text.

    Exactly the same can be said about a lot of websites made with HTML, Flash and other technologies. But SVG is a W3C standard.

    > So why should Google or any search
    > engine dedicate resources to indexing
    > the contents of an SVG file?

    Google will index files in PDF, DOC, PPT and other and is able to do full-text searches in them, similar support for SVG, which is a W3C standard, is (I think) a reasonable expectation.

    > Until more browsers support SVG and
    > render it in a relatively consistent
    > way across the platforms, an Inkscape
    > generated site in SVG is an oddity
    > and your "shame on you" to Google
    > is basically laughable.

    This is a chicken-and-egg problem: Google does not index the sties because there are so few using the technology, new sites are not made because Google will not index them.

    About browser support, I live in Europe and a recent study show Firefox usage near 30% here in Europe and this combined with Opera and WebKit (Safari) is on the verge of nearing one third of the browser market. Is this not enough?

    Laugh as you want at my "shame on you Google", I reserve myself to laugh at the expense of the Google fanboys (I am myself a happy Google user).

    ReplyDelete
  10. Doesn't your non-use of the robots.txt file mean that your site wasn't following standards either?

    ReplyDelete
  11. Nope, the use of robots.txt is completely optional and the default is to not use this feature and it is not a part of any standard.
    You start using robots.txt when you want the search engines to work in unusual ways (like not index certain pages), not having it means "index everything".

    ReplyDelete
  12. That's really some interesting experiment. The result is quite surprising tho, since Google indexes other files (like jnlp) as if they were html files.

    ReplyDelete
  13. Google does *not* index JNLP files.

    ReplyDelete