Word-count in SIOC data

Someone on Twitter asked for a way to get a total wordcount of all
posts on a blog. Sound like word-count of a post is a type of metadata
people may want to know (there can be other metadata that's not
exported yet and may be cool to export).

Posting the idea here so we can find (and do) it later. Note that SIOC
already exports text of a post so this information can be retrieved by
a filters further in the food-chain - no need to do specific count for
each exporter, just take a "sioc:content" or "content:encoded" and
count words for any SIOC data.

Still, if the wordcount was exported we would need to count the total
wordcount of all the posts. Does SPARQL support data aggregation (SUM,
AVERAGE, ...)?

Uldis

[ http://captsolo.net/info/ ]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Word-count in SIOC data

On 26/04/07, Uldis Bojars wrote:
>
> Someone on Twitter asked for a way to get a total wordcount of all
> posts on a blog. Sound like word-count of a post is a type of metadata
> people may want to know (there can be other metadata that's not
> exported yet and may be cool to export).

Right, this is something I want to get for my blog data before too
long. The way I'd been thinking about it would be to have a little
agent run over posts and programmatically inserting a triple : _:post
x:wordCount "123" . or whatever.

> Posting the idea here so we can find (and do) it later. Note that SIOC
> already exports text of a post so this information can be retrieved by
> a filters further in the food-chain - no need to do specific count for
> each exporter, just take a "sioc:content" or "content:encoded" and
> count words for any SIOC data.
>
> Still, if the wordcount was exported we would need to count the total
> wordcount of all the posts. Does SPARQL support data aggregation (SUM,
> AVERAGE, ...)?

It's not in the spec, but I believe both Jena/ARQ and Redland/Rasqal
both support extensions which can do aggregation. An alternative would
be to calculate it from SPARQL results - with JSON it should be pretty
trivial, XML/XSLT word count should be on the web somewhere.

Cheers,
Danny.

--

http://dannyayers.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Word-count in SIOC data

If I can go off on a bit of a tangent, I've been sort of wondering for
a while about calculated property values and RDF. I've been thinking
about this in the context of RDF-in-HTML, but it's also something that
could apply to SIOC exporters.

An example: in my blog's sidebar, I have a list of categories. How do
I represent that my blog has 123 posts in the category "semweb" ? If
we restrict this to RDF-in-HTML for a moment, do I use an ontology for
describing aggregates as regular triples, in the same way as for
non-calculated properties, like dc:title?

Or should they be represented in a special way - perhaps an ancillary
semantic html format that connected with the RDFa/eRDF, but wouldn't
be picked up by a standard eRDF/RDFa parser?

I'm interested in coming up with an answer to this because of how it
ties into my roundtripleting /eRDF-T explorations (
http://semwebdev.keithalexander.co.uk/blog/posts/roundtripleting
http://semwebdev.keithalexander.co.uk/blog/posts/erdft
).

Basically I'm wondering if it's possible to use an RDF-in-HTML syntax that:
* describes a variable in a web page template so that it could be
processed to populate that variable with the calculated value (eg: no
of posts with category "semweb").
* describes it in such a way that makes sense both to the 'template
processor' and any RDF-in-HTML parser given the resulting html
document. the process should be reversible; both template processor
and RDF-in-HTML parser should see that it is a calculated property,
and ideally, how it is calculated.
* describes it in a simple, readily understandable way.

Thoughts?

Keith

(hope that made some kind of sense - if not, let me know)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "SIOC-Dev" group.
To post to this group, send email to sioc-dev@googlegroups.com
To unsubscribe from this group, send email to sioc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sioc-dev?hl=en
-~----------~----~----~----~------~----~------~--~---