Interaction between SIOC browsers and Talk Digger's SIOC data exporter

2006, September 2 - 17:57 — Fred

Hi guys,

Recently I talked with Alex to ask him if he could be interested in making his
SIOC Browser interacting with Talk Digger. In fact, the next version of Talk
Digger, currently in alpha testing, will generate a SIOC document for each
conversation page.

Currently Alex’s SIOC explorer is crawling and indexing some blogs that
has the
Wordpress (b2Evolution too?) SIOC exporter plugin.

What I suggested him was to let his browser crawl Talk Digger’s SIOC data
as
well.

A couple of days ago, Uldis received the acceptance of his speech at the
BlogTalk Reloaded conference. Their job (because he is not alone in the boat)
is to merge all the current SIOC browsers into a unique one for the
presentation (at least it is what I understood).

As people know, the idea behind the SIOC ontology is to link online communities
together. So the idea is the present information, in the same environment (a
SIOC browser in the present case), from many different sources/types of
communities.

So what I would suggest is to start thinking about a way to let Talk Digger
ping the SIOC browser when new SIOC data is available from Talk Digger for the
browser.

By example: a new conversation is tracked by Talk Digger. Once it is tracked,
Talk Digger would ping the SIOC browser to let him know that at the URL
“X”
there are new SIOC data ready to be crawled. Then the SIOC crawler would crawl
the URL, index the results, and present it into the browser, along with the
other sources of information (blog, forums, IRC chat logs, etc).

At first, I could create a list of links where data could be crawled, and later
we could think about a more automatic/permanent way to let them interact
together. At first I thought about a simple pinging system using XML-RPC (like
all the new search engines pinging systems). But any other methods could work
as well.

So let me know if you could be interested, if it is a good idea, and how/when
it could be done.

Salutations,

Fred

http://fgiasson.com
http://talkdigger.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev
-~----------~----~----~----~------~----~------~--~---

Developers

2006, September 2 - 17:57 — terraces

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Hi,

On 8/3/06, Frederick Giasson wrote:
>
> Hi guys,
>
> Recently I talked with Alex to ask him if he could be interested in making his
> SIOC Browser interacting with Talk Digger. In fact, the next version of Talk
> Digger, currently in alpha testing, will generate a SIOC document for each
> conversation page.
>
> Currently Alex’s SIOC explorer is crawling and indexing some blogs that
> has the
> Wordpress (b2Evolution too?) SIOC exporter plugin.

Actually, the browser is just a front-end to any SPARQL store.
What I've done with it is:
- Use Uldis crawler to get SIOC content from different sources;
- Put all the data in a RDF stote;
- Put the browser as a frontend of the store

> What I suggested him was to let his browser crawl Talk Digger’s SIOC data
> as
> well.
>
> A couple of days ago, Uldis received the acceptance of his speech at the
> BlogTalk Reloaded conference. Their job (because he is not alone in the boat)
> is to merge all the current SIOC browsers into a unique one for the
> presentation (at least it is what I understood).
>
> As people know, the idea behind the SIOC ontology is to link online communities
> together. So the idea is the present information, in the same environment (a
> SIOC browser in the present case), from many different sources/types of
> communities.
>
> So what I would suggest is to start thinking about a way to let Talk Digger
> ping the SIOC browser when new SIOC data is available from Talk Digger for the
> browser.

Indeed, we should get a way to link between the plaform(s) that will
crawl / store SIOC data and the ones that create some, so that stores
will be up-to-date.

> At first, I could create a list of links where data could be crawled, and later
> we could think about a more automatic/permanent way to let them interact
> together. At first I thought about a simple pinging system using XML-RPC (like
> all the new search engines pinging systems). But any other methods could work
> as well.

An RDF file containing recently created links could be good to start:
a cron could fetch it and get only new URLs using incremental crawler,
then we'll put data it in the store.
A ping service will be nicer, but longer to implement I think, so we
can start with this list.

> So let me know if you could be interested, if it is a good idea, and how/when
> it could be done.

Let's start with it :)
What you could do first is to create this URLs page, so we can try the
crawler and its incremental functions on it to see if we fetch the
data as aspected.

In the meanwhile, we'll work on the browser with Uldis, so we can
think on specific queries for TalkDigger.

Best,

Alex.

2006, September 2 - 17:57 — Fred

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Hi Alex,

> Let's start with it :)
> What you could do first is to create this URLs page, so we can try the
> crawler and its incremental functions on it to see if we fetch the
> data as aspected.

Great perfect, it is what I thought.

What is the file format you would like to use? A single list of urls in a text
files separeted by carriage-return? :)

Or in XML, or RDF, or... tell me tell me! :)

> In the meanwhile, we'll work on the browser with Uldis, so we can
> think on specific queries for TalkDigger.

It is perfect for me. I could give you the list tomorrow without any problem,
so you could start to crawl it as soon as possible.

But before, I have to know: the crawler is understanding which version of the
Ontology? Right now I am using the last "stable" one (the one without the
recent modifications).

Keep me in touch.

Salutations,

Fred

2006, September 2 - 17:57 — terraces

Interaction between SIOC browsers and Talk Digger's SIOC data ex

> > Let's start with it :)
> > What you could do first is to create this URLs page, so we can try the
> > crawler and its incremental functions on it to see if we fetch the
> > data as aspected.
>
> Great perfect, it is what I thought.
>
> What is the file format you would like to use? A single list of urls in a text
> files separeted by carriage-return? :)
>
> Or in XML, or RDF, or... tell me tell me! :)
I think that an RDF file that defines new URLs with seeAlso property
is the best way to do. So that the script will fetch this file, go
throught the seeAlso links, and get the files only if they're new.
One problem is that the file will become bigger every day, so we must
find a way to get not too much already-fetched URLs in it. This issue
will be solved with a ping system, but at the moment, maybe your file
can show only URLs created during the last 5 days. I'll try to make
the cron run daily.

>
> > In the meanwhile, we'll work on the browser with Uldis, so we can
> > think on specific queries for TalkDigger.
>
> It is perfect for me. I could give you the list tomorrow without any problem,
> so you could start to crawl it as soon as possible.
>
> But before, I have to know: the crawler is understanding which version of the
> Ontology? Right now I am using the last "stable" one (the one without the
> recent modifications).
Humm ... I think it works with the stable one at the moment, i.e. the
one used in PHP API.
Anyway, I'll have a look at TalkDigger SIOC data to see if it fits
with the current browser.

BTW, TD data will be in the same store than other blogs, maybe it can
lead to interesting queries :)

Alex.

2006, September 2 - 17:57 — Fred

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Hi Alex,

> I think that an RDF file that defines new URLs with seeAlso property
> is the best way to do. So that the script will fetch this file, go
> throught the seeAlso links, and get the files only if they're new.
> One problem is that the file will become bigger every day, so we must
> find a way to get not too much already-fetched URLs in it. This issue
> will be solved with a ping system, but at the moment, maybe your file
> can show only URLs created during the last 5 days. I'll try to make
> the cron run daily.

I have no problem with that. However what I am thinking to do is creating a
test file with a couple of hundred of URLs to test it with your crawler. If it
works fine, I will generate a list of the thousands of Conversations I am
currently tracking and send the file via email. And if everything works fine,
we will then be able create an online list with the conversations that changed
in the last 5 days for example.

What do you think of this plan?

Also, do you know when the new ontology will be supported by the crawler?
Because I was planning to change it in the next days/weeks.

> BTW, TD data will be in the same store than other blogs, maybe it can
> lead to interesting queries :)

It is exactly what I hope :)

Take care,

Salutations,

Fred

2006, September 2 - 17:57 — terraces

Interaction between SIOC browsers and Talk Digger's SIOC data ex

On 8/3/06, Frederick Giasson wrote:
>
> I have no problem with that. However what I am thinking to do is creating a
> test file with a couple of hundred of URLs to test it with your crawler. If it
> works fine, I will generate a list of the thousands of Conversations I am
> currently tracking and send the file via email. And if everything works fine,
> we will then be able create an online list with the conversations that changed
> in the last 5 days for example.
>
> What do you think of this plan?
That's fine. would be a good progressive way to see how it works.

> Also, do you know when the new ontology will be supported by the crawler?
> Because I was planning to change it in the next days/weeks.
Actually, the data will be in the store anyway, we'll just have to
make queries that will take the new, rather than the older, ontology
into account.
But I think we'll also update the API, and the exporters, to the latest version.

Are you using Community in your SIOC data ? At the moment, the data in
the store mainly consists of Posts, so we'll be able to write new
queries if using 'Community' (most active, ...)

2006, September 2 - 17:57 — Fred

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Hi,

> Actually, the data will be in the store anyway, we'll just have to
> make queries that will take the new, rather than the older, ontology
> into account.
> But I think we'll also update the API, and the exporters, to the
latest
> version.
>
> Are you using Community in your SIOC data ? At the moment, the data
in
> the store mainly consists of Posts, so we'll be able to write new
> queries if using 'Community' (most active, ...)

Nah, not at the moment. I'll have to take a day in the next week to re
-implement it with the new specs.

So tell me when you would like to have the links: before or after my
update?

Thanks!

Salutations,

Fred

2006, September 2 - 17:57 — Christoph Görn

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Good night :)

Maybe a look at the scutter vocabulary at http://rdfweb.org/topic/
ScutterVocab could be helpful. ldodd's slug uses it to keep track of
what the scutter/crawler fetched and what the return caode etc. was.
This gives a map of what may be valuable to fech again or not (maybe
many not-modified responses from one domain).

Christoph

Am 03.08.2006 um 22:37 schrieb Alexandre Passant:

>>> What you could do first is to create this URLs page, so we can
>>> try the
>>> crawler and its incremental functions on it to see if we fetch the
>>> data as aspected.
>>
>> Great perfect, it is what I thought.
>>
>> What is the file format you would like to use? A single list of
>> urls in a text
>> files separeted by carriage-return? :)
>>
>> Or in XML, or RDF, or... tell me tell me! :)
> I think that an RDF file that defines new URLs with seeAlso property
> is the best way to do.

--
blogging at: http://B4mad.Net/datenbrei
info at: http://B4mad.Net/FOAF/goern.rdf#goern
gpg key: http://pgpkeys.pca.dfn.de:11371/pks/lookup?
op=get&search=0xB10DFF8D88FD746C
x509 root ca certificate: http://b4mad.net/CA/

2006, September 2 - 17:57 — terraces

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Hi,

On 8/3/06, Christoph Görn wrote:
> Good night :)
Indeed :)

> Maybe a look at the scutter vocabulary at http://rdfweb.org/topic/
> ScutterVocab could be helpful. ldodd's slug uses it to keep track of
> what the scutter/crawler fetched and what the return caode etc. was.
> This gives a map of what may be valuable to fech again or not (maybe
> many not-modified responses from one domain).

While using latest version of SIOC crawler, I noticed it also got a
RDF vocab to store fetched files properties (Errors, ...), which is
indeed, is useful, eg: I parsed it to put in the store only data that
have been fetched correctly.

Maybe these 2 vocabs could be merged ?

> Christoph
>
> Am 03.08.2006 um 22:37 schrieb Alexandre Passant:
>
> >>> What you could do first is to create this URLs page, so we can
> >>> try the
> >>> crawler and its incremental functions on it to see if we fetch the
> >>> data as aspected.
> >>
> >> Great perfect, it is what I thought.
> >>
> >> What is the file format you would like to use? A single list of
> >> urls in a text
> >> files separeted by carriage-return? :)
> >>
> >> Or in XML, or RDF, or... tell me tell me! :)
> > I think that an RDF file that defines new URLs with seeAlso property
> > is the best way to do.
>
> --
> blogging at: http://B4mad.Net/datenbrei
> info at: http://B4mad.Net/FOAF/goern.rdf#goern
> gpg key: http://pgpkeys.pca.dfn.de:11371/pks/lookup?
> op=get&search=0xB10DFF8D88FD746C
> x509 root ca certificate: http://b4mad.net/CA/
>
>
>
>
>

2006, September 2 - 17:57 — Christoph Görn

Interaction between SIOC browsers and Talk Digger's SIOC data ex

hmm. I mentioned scutter vocab to Uldis earlier... he thought it is
overblown... I think: go with the flow/majority :)
Using the same scutter vocab as others may enable switching from one
crawler software to another... I use slug for my crawling...

Am 03.08.2006 um 22:51 schrieb Alexandre Passant:
>> Maybe a look at the scutter vocabulary at http://rdfweb.org/topic/
>> ScutterVocab could be helpful.
>
> While using latest version of SIOC crawler, I noticed it also got a
> RDF vocab to store fetched files properties (Errors, ...), which is
> indeed, is useful, eg: I parsed it to put in the store only data that
> have been fetched correctly.
> Maybe these 2 vocabs could be merged ?

--
Christoph Görn
http://B4mad.Net/FOAF/goern.rdf#goern

Usability schmusability... where's the part where we talk about how
this helps users kick ass?

Interaction between SIOC browsers and Talk Digger's SIOC data exporter

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Interaction between SIOC browsers and Talk Digger's SIOC data ex

Primary links

Syndicate

User login