Aperture may be interesting to us.
"Aperture is a Java framework for extracting and querying full-text
content and metadata from various information systems (e.g. file
systems, web sites, mail boxes) and the file formats (e.g. documents,
images) occurring in these systems."
http://aperture.sourceforge.net/
Mentioned on:
- http://www.semikolon.co.uk/blog/index.php?entry=entry060404-233241
Uldis
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
Hi All,
Updated SIOC RDF Crawler.
Added:
* reports User-Agent: when crawling
* can limit depth of crawling by the number of "jumps" from host to host
Found it more useful (and interesting) to limit the number of host
"jumps" as opposed to limiting just a number of page "jumps". Will add
limiting of page "jumps" later if necessary.
SIOC RDF Crawler 0.8:
http://sw.deri.org/svn/sw/2005/08/sioc/crawler/releases/crawler_v0.8.tar.gz
Uldis
[ http://captsolo.net/info/ ]
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
Hi,
Can anyone here help with an RDF validator problem I am encountering?
See: http://tinyurl.com/prxyo
"An attempt to load the RDF from URI
'http://sparql.captsolo.net/b2evo-0.9.2/xmlsrv/sioc.php?sioc_type=user&sioc_id=2&blog=1'
failed. (Undecodable data when reading URI at byte 0 using encoding
'UTF-8'. Please check encoding and encoding declaration of your
document.)"
Note that "<?xml" is located at byte 0 and it can't be undecodable UTF-8.
Also - not so sure now, but when I had similar errors before the same
page sometimes validates and sometimes - not. And while RDF Validator
In a recent blog post [1] I described a SIMILE Timeline based on SIOC
data.
[1]
http://captsolo.net/info/blog_a.php/2006/07/14/sioc_sparql_and_timeline
The post contains more information about the timeline (e.g., scripts
used) and on problems encountered. One of the problems - once crawled
SIOC data get old quickly. An obvious solution is incremental crawling
- download only the new data.
Now incremental crawling is available in our SIOC / RDF crawler [2].
Other features:
- can limit to the same domain (default:on)
- can exclude comments / replies (default:off)
How it works:
Switched the mode of our mailing so that only members can post.
This should decrease the amount of spam we are getting.
Uldis
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev
-~----------~----~----~----~------~----~------~--~---
FYI
---------- Forwarded message ----------
From: Allen Gunn
Date: Jul 20, 2006 7:28 PM
Subject: Re: [doap-interest] DOAP and lists of [favorite] applications?
To: Uldis Bojars <captsolo@gmail.com>, doap-interest@lists.gnomehack.com
There's an article on SlashDot about the Semantic Web.
Link: http://slashdot.org/article.pl?sid=06/07/19/038237
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev
-~----------~----~----~----~------~----~------~--~---
On 7/19/06, John Breslin <john.breslin@deri.org> wrote:
> > 2) sioc:Comment
>
> This would need to be moved to a types module, to keep the core SIOC
> ontology simple. We can then add other subtypes of Post later like
> Sticky, Announcement, etc.
Attaching here a nice drawing by [GNU] showing proposed relations between SIOC, SKOS and IBIS.
The IBIS part formalizes the part about SIOC and argumented discussions mentioned earlier.
Not sure if [GNU] has a particular vocabulary in mind, but there is one made by Danny Ayers [1].
http://anjo.blogs.com/metis/2005/11/a_model_for_web.html
--
Dr. John Breslin
DERI, NUI Galway
http://sw.deri.org/~jbreslin/
john.breslin@deri.org
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev
-~----------~----~----~----~------~----~------~--~---
To summarise about proposed ontology additions / changes:
= Decided:
1) Add sioc:Community
+ properties sioc:has_part and sioc:part_of pointing to and from it.
We are quite sure that a Community concept is needed.
There can still be some discussion about the property names and alternatives.
= Almost there:
2) sioc:Comment
Initially we thought "let's make all posts generic - comments are same
as posts", but there's a need to distinguish between comments and
posts.
Proposal: create a [ sioc:Comment rdfs:subClassOf sioc:Post . ]
Comment is also a post, but a specific type of. That should solve the problem.