SIOC profile for 'sioc-project.org' A SIOC profile describes the structure and contents of a community site (e.g., weblog) in a machine processable form. For more information refer to the <a href="http://rdfs.org/sioc">SIOC project page</a> SIOC - Incremental crawling In a recent blog post [1] I described a SIMILE Timeline based on SIOC data. [1] http://captsolo.net/info/blog_a.php/2006/07/14/sioc_sparql_and_timeline The post contains more information about the timeline (e.g., scripts used) and on problems encountered. One of the problems - once crawled SIOC data get old quickly. An obvious solution is incremental crawling - download only the new data. Now incremental crawling is available in our SIOC / RDF crawler [2]. Other features: - can limit to the same domain (default:on) - can exclude comments / replies (default:off) How it works: - run the crawler ( ./run ) and it's crawling results are saved to 'result.rdf' - for incremental crawling copy result file 'result.rdf' into 'input.rdf' - do crawling again and only new posts should be crawled. ( incremental crawling is on by default, but only has effect if 'input.rdf' is present ) Please try it out. :) If you want to know more about how it works and what are its limitations, please write or look at the code. Bugs can be recorded at: http://esw.w3.org/topic/SIOC/ToDoList#crawler [2] http://sw.deri.org/svn/sw/2005/08/sioc/crawler/releases/crawler_v0.7.tar.gz (requires Python and Redland) Uldis [ http://captsolo.net/info/ ] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "SIOC-Dev" group. To post to this group, send email to sioc-dev@googlegroups.com To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/sioc-dev -~----------~----~----~----~------~----~------~--~--- 2006-07-25T19:02:31+01:00 2006-07-25T19:02:31+01:00